FluidAudio

mirror of https://github.com/FluidInference/FluidAudio.git synced 2026-05-12 20:20:36 +00:00

Author	SHA1	Message	Date
Alex	2593f55415	Add Japanese ASR support with JSUT and Common Voice datasets (#478 ) ## Summary Adds comprehensive Japanese ASR support to FluidAudio with benchmark datasets and CLI commands. ## Changes ### Core Japanese ASR Support - CtcJaManager.swift - Japanese CTC transcription manager (actor-based) - CtcJaModels.swift - Japanese model loading and management - ModelNames.swift - Added Japanese model registry (`parakeetCtcJa`, `CTCJa` enum) - AsrModels.swift - Added `.ctcJa` model version (3,072 vocab, 1,024 hidden, blank_id=3072) - AsrManager.swift - Added `.ctcJa` case with error directing to `CtcJaManager` ### CLI Commands - JapaneseAsrBenchmark.swift (459 lines) - New `ja-benchmark` command - JSUT basic5000 dataset support - Mozilla Common Voice (MCV) test set support - Auto-download capability - CER (Character Error Rate) evaluation - DownloadCommand.swift - Added JSUT and MCV Japanese dataset downloads - TranscribeCommand.swift - Added `.ctcJa` model version support - AsrBenchmark.swift - Added `.ctcJa` switch case ### Dataset Support - JapaneseDatasetDownloader.swift (387 lines) - Dataset download and parsing - JSUT basic5000 (5,000 sentences, clean studio recordings) - Mozilla Common Voice Japanese test split - Efficient streaming downloads - Metadata extraction and validation ## Usage ### CLI Commands ```bash # Benchmark on JSUT basic5000 (100 samples) swift run fluidaudiocli ja-benchmark --dataset jsut --samples 100 # Benchmark on Common Voice test (500 samples, auto-download) swift run fluidaudiocli ja-benchmark --dataset cv-test --samples 500 --auto-download # Download datasets swift run fluidaudiocli download --dataset jsut swift run fluidaudiocli download --dataset cv-ja-test ``` ### Swift API ```swift // Load and use Japanese CTC transcription let manager = try await CtcJaManager.load() let text = try manager.transcribe(audioURL: japaneseAudioFile) ``` ## Model Info - Repo: `FluidInference/parakeet-ctc-0.6b-ja-coreml` - Architecture: 600M parameter CTC-only - Vocabulary: 3,072 Japanese SentencePiece tokens + 1 blank (id: 3072) - Encoder: 1,024 hidden size - Expected CER: 6.5% on JSUT basic5000, 13.3% on MCV 16.1 test ## Testing - ✅ Builds successfully (`swift build`) - ✅ Model loading integration tested - ✅ CLI commands compile and link correctly - ⏳ Runtime benchmark testing pending (requires model download) ## Related - Mobius PR #39: Japanese CTC CoreML conversion (https://github.com/FluidInference/mobius/pull/39) 🤖 Generated with Claude Code <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/478" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> ---------	2026-04-04 12:57:32 -04:00
Alex	d9eef864d2	ASR tech debt cleanup: remove dead code, fix bugs, add benchmark script 28/03/2026 (#460 ) ## Summary Systematic cleanup of the ASR module addressing tech debt items from #457. Net reduction of ~430 lines while fixing real bugs and improving maintainability. ### Bug fixes - `enableFP16` silently ignored — `optimizedConfiguration(enableFP16:)` delegated to a shared factory that hardcoded `allowLowPrecisionAccumulationOnGPU = true`, ignoring the caller's parameter - `MLArrayCache.returnArray` only reset float32 data — cached arrays of other types (float16, int32) retained stale data from previous use - CTC model auto-detection broken — `Repo.parakeetCtc110m.folderName` returned `"parakeet-ctc-110m"` instead of `"parakeet-ctc-110m-coreml"` because the `folderName` switch fell through to a `default` case that stripped the `-coreml` suffix. Same for `parakeetCtc06b`. - Duplicate tokens at chunk merge boundary — `mergeByMidpoint` used `<=`/`>=` so tokens exactly at the cutoff appeared in both left and right chunks ### Dead code removal - Deleted `ANEOptimizer` indirection layer (166 lines) — was a pass-through wrapping `MLModel` with no optimization - Deleted `PerformanceMonitor` actor and `AggregatedMetrics` — never instantiated, component times hardcoded to 0 - Deleted `getFloat16Array` from MLArrayCache — never called - Deleted `sliceEncoderOutput` from AsrTranscription — never called (30 lines) - Deleted `loadWithANEOptimization` from AsrModels — never called - Removed unused `tokenTimings` parameter chain through `processTranscriptionResult` - Removed unused `import OSLog` / `import CoreML` across 5 files - Removed `nonisolated(unsafe)` from SlidingWindowAsrManager (types already Sendable) ### Duplication elimination - Extracted `clearCachedCtcData()` helper (replaced 3× triple-nil assignments) - Extracted `decoderState(for:)` / `setDecoderState(_:for:)` (replaced 4× switch blocks) - Extracted `frameAlignedAudio()` (replaced 2× duplicated frame-alignment blocks) - Added `ASRConstants.secondsPerEncoderFrame` (replaced 5× magic `0.08`) - Replaced hardcoded `16_000` with `config.sampleRate` / `ASRConstants.sampleRate` - Extracted `MLModelConfigurationUtils.defaultConfiguration()` (replaced 5× copy-pasted config methods) - Extracted `MLModelConfigurationUtils.defaultModelsDirectory()` (replaced 3× copy-pasted directory methods) - Consolidated duplicate `vocabularyFile` / `vocabularyFileArray` constants ### File organization - Moved `PerformanceMetrics.swift`, `ProgressEmitter.swift`, `MLArrayCache.swift` from `ASR/Parakeet/` to `Shared/` (used by multiple modules) - Renamed `StreamingAudioSourceFactory` → `AudioSourceFactory`, `StreamingAudioSampleSource` → `AudioSampleSource` (types used by both ASR and Diarizer) - Renamed files to match type names: `SortformerDiarizerPipeline.swift` → `SortformerDiarizer.swift`, `LSEENDDiarizerAPI.swift` → `LSEENDDiarizer.swift`, `NemotronPipeline.swift` → `NemotronStreamingAsrManager+Pipeline.swift` - Replaced force unwraps in `RnntDecoder.swift` with `guard let` + descriptive errors - Removed stale TODO about decoder state in AsrManager ### Benchmark script - Added `Scripts/run_parakeet_benchmarks.sh` — runs all 6 benchmarks (v3, v2, TDT-CTC-110M, CTC earnings, EOU 320ms, Nemotron 1120ms) with WER comparison against `benchmarks100.md` baselines and regression detection - Referenced from `Documentation/ASR/benchmarks100.md` ## Verified — no regressions ``` Model Baseline Current Delta Parakeet TDT v3 (0.6B) 2.6% 2.64% +0.04% Parakeet TDT v2 (0.6B) 3.8% 3.79% -0.01% CTC-TDT 110M 3.6% 3.56% -0.04% CTC Earnings 16.54% 16.51% -0.03% EOU 320ms (120M) 7.11% 7.11% +0.00% Nemotron 1120ms (0.6B) 1.99% 1.99% +0.00% ``` ## Test plan - [x] `swift build` passes - [x] `swift test` passes (all existing tests, updated for removed dead code) - [x] All 6 ASR benchmarks match baselines (100 files each) - [ ] `swift format lint` passes	2026-03-28 23:44:10 -04:00
Alex	8aa0dfcdac	fix: clean up diarization test infrastructure (#395 ) ## Summary - Extract shared fixture helpers into `DiarizationTestFixtures` enum, removing ~200 lines of duplicate code across `LSEENDIntegrationTests` and `SpeakerEnrollmentTests` - Replace fragile `Mirror`-based private state inspection with `internal` `hasActiveSession` property on `LSEENDDiarizerAPI` - Fix non-deterministic `srand48` seed in `SortformerTests` (use constant `42` instead of time-based seed) - Fix asymmetric skip guards in Sortformer enrollment tests (`XCTSkipIf` instead of `XCTAssertNotNil` for host-dependent segments) ## Test plan - [x] `swift build --build-tests` passes - [ ] `swift test --filter SortformerTests` passes - [ ] `swift test --filter LSEENDIntegrationTests` passes - [ ] `swift test --filter SpeakerEnrollmentTests` passes <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/395" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->	2026-03-18 12:51:34 -04:00
Alex	7d074e1ee6	chore: consolidate Python scripts into Scripts/ (#344 ) ## Summary - Move `Benchmarks/nemo` to `Scripts/nemo_ami_benchmark` - Move `Tools/voice_cloning` to `Scripts/voice_cloning` - Remove now-empty `Benchmarks/` and `Tools/` top-level directories Consolidates standalone Python utilities into a single `Scripts/` directory to reduce top-level clutter. ## Test plan - [x] Verify files moved correctly (no content changes) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/344" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->	2026-03-04 12:46:03 -05:00

4 Commits