mirror of
https://github.com/FluidInference/FluidAudio.git
synced 2026-05-12 20:20:36 +00:00
2593f55415
## Summary Adds comprehensive Japanese ASR support to FluidAudio with benchmark datasets and CLI commands. ## Changes ### Core Japanese ASR Support - **CtcJaManager.swift** - Japanese CTC transcription manager (actor-based) - **CtcJaModels.swift** - Japanese model loading and management - **ModelNames.swift** - Added Japanese model registry (`parakeetCtcJa`, `CTCJa` enum) - **AsrModels.swift** - Added `.ctcJa` model version (3,072 vocab, 1,024 hidden, blank_id=3072) - **AsrManager.swift** - Added `.ctcJa` case with error directing to `CtcJaManager` ### CLI Commands - **JapaneseAsrBenchmark.swift** (459 lines) - New `ja-benchmark` command - JSUT basic5000 dataset support - Mozilla Common Voice (MCV) test set support - Auto-download capability - CER (Character Error Rate) evaluation - **DownloadCommand.swift** - Added JSUT and MCV Japanese dataset downloads - **TranscribeCommand.swift** - Added `.ctcJa` model version support - **AsrBenchmark.swift** - Added `.ctcJa` switch case ### Dataset Support - **JapaneseDatasetDownloader.swift** (387 lines) - Dataset download and parsing - JSUT basic5000 (5,000 sentences, clean studio recordings) - Mozilla Common Voice Japanese test split - Efficient streaming downloads - Metadata extraction and validation ## Usage ### CLI Commands ```bash # Benchmark on JSUT basic5000 (100 samples) swift run fluidaudiocli ja-benchmark --dataset jsut --samples 100 # Benchmark on Common Voice test (500 samples, auto-download) swift run fluidaudiocli ja-benchmark --dataset cv-test --samples 500 --auto-download # Download datasets swift run fluidaudiocli download --dataset jsut swift run fluidaudiocli download --dataset cv-ja-test ``` ### Swift API ```swift // Load and use Japanese CTC transcription let manager = try await CtcJaManager.load() let text = try manager.transcribe(audioURL: japaneseAudioFile) ``` ## Model Info - **Repo**: `FluidInference/parakeet-ctc-0.6b-ja-coreml` - **Architecture**: 600M parameter CTC-only - **Vocabulary**: 3,072 Japanese SentencePiece tokens + 1 blank (id: 3072) - **Encoder**: 1,024 hidden size - **Expected CER**: 6.5% on JSUT basic5000, 13.3% on MCV 16.1 test ## Testing - ✅ Builds successfully (`swift build`) - ✅ Model loading integration tested - ✅ CLI commands compile and link correctly - ⏳ Runtime benchmark testing pending (requires model download) ## Related - Mobius PR #39: Japanese CTC CoreML conversion (https://github.com/FluidInference/mobius/pull/39) 🤖 Generated with Claude Code <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/478" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> ---------