mirror of https://github.com/FluidInference/FluidAudio.git synced 2026-05-12 20:20:36 +00:00

Files

T

Alex e98d96fd7f docs: fix CLI name references to fluidaudiocli (#372 )

## Summary
- Replace all `swift run fluidaudio` references with `swift run
fluidaudiocli` across docs and source to match the actual executable
name in Package.swift
- Add GitHub comments policy to CLAUDE.md development guidelines

## Files changed
- **CLAUDE.md** — CLI commands updated + GitHub comments rule added
- **README.md** — All CLI examples updated
- **Documentation/** — CLI.md, GettingStarted guides, Kokoro.md,
Benchmarks.md, CustomPronunciation.md
- **Sources/FluidAudioCLI/README.md** — CLI examples updated
- **Sources/FluidAudioCLI/Commands/VadBenchmark.swift** — Error messages
updated
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/372"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

2026-03-14 17:49:36 -04:00

4.0 KiB

Raw Permalink Blame History

Command Line Interface (CLI)

This guide collects commonly used fluidaudio CLI commands for ASR, diarization, VAD, and datasets.

TTS

TTS is built into the CLI. Run it directly:

swift run fluidaudiocli tts "Hello from FluidAudio" --output out.wav

# Multilingual G2P benchmark
swift run fluidaudiocli g2p-benchmark

ASR

# Transcribe an audio file (batch)
swift run fluidaudiocli transcribe audio.wav

# English-only run with higher accuracy
swift run fluidaudiocli transcribe audio.wav --model-version v2

# Transcribe with Qwen3 ASR
swift run fluidaudiocli qwen3-transcribe audio.wav

# Streaming ASR with Parakeet EOU
swift run fluidaudiocli parakeet-eou --input audio.wav

# Transcribe multiple files in parallel
swift run fluidaudiocli multi-stream audio1.wav audio2.wav

# Benchmark ASR on LibriSpeech
swift run fluidaudiocli asr-benchmark --subset test-clean --max-files 50

# English benchmark preset (LibriSpeech)
swift run fluidaudiocli asr-benchmark --subset test-clean --max-files 50 --model-version v2

# Multilingual ASR (FLEURS) benchmark
swift run fluidaudiocli fleurs-benchmark --languages en_us,fr_fr --samples 10

# Qwen3 ASR benchmark
swift run fluidaudiocli qwen3-benchmark

# CTC keyword spotting benchmark on Earnings22
swift run fluidaudiocli ctc-earnings-benchmark

Diarization

# Run AMI benchmark (auto-download dataset)
swift run fluidaudiocli diarization-benchmark --auto-download

# Tune threshold and save results
swift run fluidaudiocli diarization-benchmark --threshold 0.7 --output results.json

# Quick test on a single AMI file
swift run fluidaudiocli diarization-benchmark --single-file ES2004a --threshold 0.8

# Real-time-ish streaming benchmark (~3s chunks with 2s overlap)
swift run fluidaudiocli diarization-benchmark --single-file ES2004a \
  --chunk-seconds 3 --overlap-seconds 2

# Balanced throughput/quality (~10s chunks with 5s overlap)
swift run fluidaudiocli diarization-benchmark --dataset ami-sdm \
  --chunk-seconds 10 --overlap-seconds 5

# Run the full VBx offline pipeline
swift run fluidaudiocli diarization-benchmark --mode offline --dataset ami-sdm --threshold 0.6

# Process a single file with streaming vs. offline inference
swift run fluidaudiocli process meeting.wav --mode streaming --threshold 0.7
swift run fluidaudiocli process meeting.wav --mode offline --threshold 0.6 --debug

# Sortformer streaming diarization
swift run fluidaudiocli sortformer audio.wav

# Sortformer benchmark on AMI dataset
swift run fluidaudiocli sortformer-benchmark

--mode offline switches the CLI to OfflineDiarizerManager, running the full VBx pipeline with PLDA refinement. Expect DER ≈ 18–20 % on AMI-SDM with threshold 0.6.
Add --rttm /path/to/ground_truth.rttm to process to compute DER/JER in-place, or --export-embeddings embeddings.json for debugging speaker vectors.
GitHub Actions workflow offline-pipeline.yml replays fluidaudio diarization-benchmark --mode offline --single-file ES2004a on every PR so failures in model downloads or clustering logic are caught early.

VAD

# Offline segmentation with seconds output (default mode)
swift run fluidaudiocli vad-analyze path/to/audio.wav

# Streaming only with 128 ms chunks and a custom threshold (timestamps emitted in seconds)
swift run fluidaudiocli vad-analyze path/to/audio.wav --streaming --threshold 0.65 --min-silence-ms 400

# Run VAD benchmark (mini50 dataset by default)
swift run fluidaudiocli vad-benchmark --num-files 50 --threshold 0.3

# Save benchmark results and enable debug output
swift run fluidaudiocli vad-benchmark --all-files --output vad_results.json --debug

swift run fluidaudiocli vad-analyze --help lists every tuning option (padding, negative threshold overrides, max-duration splitting, etc.).

Datasets

# Download test sets
swift run fluidaudiocli download --dataset librispeech-test-clean
swift run fluidaudiocli download --dataset librispeech-test-other
swift run fluidaudiocli download --dataset ami-sdm
swift run fluidaudiocli download --dataset vad

4.0 KiB Raw Permalink Blame History Unescape Escape

Command Line Interface (CLI)

TTS

ASR

Diarization

VAD

Datasets

4.0 KiB

Raw Permalink Blame History