mirror of
https://github.com/FluidInference/FluidAudio.git
synced 2026-05-12 20:20:36 +00:00
7d074e1ee6
## Summary - Move `Benchmarks/nemo` to `Scripts/nemo_ami_benchmark` - Move `Tools/voice_cloning` to `Scripts/voice_cloning` - Remove now-empty `Benchmarks/` and `Tools/` top-level directories Consolidates standalone Python utilities into a single `Scripts/` directory to reduce top-level clutter. ## Test plan - [x] Verify files moved correctly (no content changes) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/344" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
1.9 KiB
1.9 KiB
Voice Cloning Evaluation Scripts
Tools for evaluating PocketTTS voice cloning quality using spectral similarity.
evaluate_voice.py
Compares a reference voice sample with synthesized TTS output using mel-spectrogram and MFCC similarity metrics. No neural network required.
Install
pip install librosa numpy
# Or minimal (scipy fallback):
pip install scipy numpy
# Optional for plotting:
pip install matplotlib
Usage
# Basic comparison
python evaluate_voice.py reference.wav synthesized.wav
# With visualization
python evaluate_voice.py reference.wav synthesized.wav --plot
# JSON output
python evaluate_voice.py reference.wav synthesized.wav --json
Metrics
| Metric | Description |
|---|---|
| Mel Similarity | Cosine similarity of mean mel spectrum (voice timbre) |
| MFCC Similarity | Cosine similarity of mean MFCCs (voice characteristics) |
| MFCC Std Similarity | Similarity of MFCC dynamics |
| Combined Score | Weighted average (0.4 mel + 0.4 mfcc + 0.2 mfcc_std) |
Quality Thresholds
| Score | Quality | Meaning |
|---|---|---|
| 0.90+ | Excellent | Very close spectral match |
| 0.80+ | Good | Similar voice characteristics |
| 0.70+ | Fair | Some similarity |
| <0.70 | Poor | Different spectral characteristics |
Example Workflow
# 1. Clone a voice using FluidAudio CLI
fluidaudio tts "Hello, this is a test." --backend pocket --clone-voice speaker.wav -o output.wav
# 2. Evaluate the result
python Tools/voice_cloning/evaluate_voice.py speaker.wav output.wav --plot
Output Example
Reference: speaker.wav
Synthesized: output.wav
Reference duration: 5.23s
Synthesized duration: 2.15s
Computing spectral similarity...
Mel Similarity: 0.9234
MFCC Similarity: 0.8876
MFCC Std Similarity: 0.8543
Combined Score: 0.8951
Quality: Good