## Summary Addresses #457 — ASR architecture inconsistencies, tech debt, and misplaced code. ### Naming consistency - Standardized `Manager` suffix: `StreamingAsrEngine` → `StreamingAsrManager` (protocol) - Streaming-first prefix: `EouStreamingAsrManager` → `StreamingEouAsrManager`, `NemotronStreamingAsrManager` → `StreamingNemotronAsrManager` - `AsrManager.initialize(models:)` → `loadModels(_:)` (matches streaming managers) - `AsrManager.resetState()` → `reset()` ### Dead code removal - Removed CTC logit caching from `AsrManager` (~60 lines) — `SlidingWindowAsrManager` never read the cache, it runs its own CTC inference via `CtcKeywordSpotter` - Removed `StreamingAsrManagerFactory` — moved `createManager()` onto `StreamingModelVariant` enum ### Lifecycle consistency - Added `cleanup()` to `StreamingAsrManager` protocol and all implementations - Every ASR manager now has both `reset()` and `cleanup()` ### File organization - Split `AsrManager+Transcription.swift` (441 lines) into: - `+Transcription.swift` (129 lines) — high-level API - `+Pipeline.swift` (152 lines) — CoreML inference - `+TokenProcessing.swift` (170 lines) — confidence, timings, dedup - Moved `MLMultiArray.reset(to:)` to `Shared/MLMultiArray+Extensions.swift` - Made `transcribeChunk()` internal ## Verification 6 benchmarks × 100 files, zero WER regressions: | Model | Baseline | Current | Delta | |-------|----------|---------|-------| | Parakeet TDT v3 | 2.6% | 2.64% | +0.04% | | Parakeet TDT v2 | 3.8% | 3.79% | -0.01% | | CTC-TDT 110M | 3.6% | 3.56% | -0.04% | | CTC Earnings | 16.54% | 16.51% | -0.03% | | EOU 320ms | 7.11% | 7.11% | +0.00% | | Nemotron 1120ms | 1.99% | 1.99% | +0.00% | ## Test plan - [x] `swift build` passes - [x] All 6 subset benchmarks pass with zero WER regressions - [ ] `swift test` CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/468" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
4.4 KiB
Manual Model Loading for ASR
FluidAudio usually downloads ASR CoreML bundles from HuggingFace with AsrModels.downloadAndLoad. When you need to operate in an offline or pre-provisioned environment, you can skip the download helper and point the pipeline at models you staged yourself. This guide shows how to prepare the assets and wire them into the AsrManager manually.
Required assets
Each ASR release ships four CoreML bundles plus the shared vocabulary file:
Preprocessor.mlmodelcEncoder.mlmodelcDecoder.mlmodelcJointDecision.mlmodelcparakeet_vocab.json
Pick the folder that matches the version you want to serve:
- Multilingual (
AsrModelVersion.v3):FluidInference/parakeet-tdt-0.6b-v3-coreml - English only (
AsrModelVersion.v2):FluidInference/parakeet-tdt-0.6b-v2-coreml
Stage the directory layout
AsrModels.load(from:) expects the directory you pass to be the staged HuggingFace repo folder itself (the one that contains the .mlmodelc bundles and parakeet_vocab.json). Place the assets in the structure below (replace /opt/models with your storage path):
/opt/models
└── parakeet-tdt-0.6b-v3-coreml
├── Preprocessor.mlmodelc
│ ├── coremldata.bin
│ └── ...
├── Encoder.mlmodelc
├── Decoder.mlmodelc
├── JointDecision.mlmodelc
└── parakeet_vocab.json
If you are deploying the English-only variant, swap the folder name for parakeet-tdt-0.6b-v2-coreml and ensure the four .mlmodelc bundles plus parakeet_vocab.json are present.
Manual download options
- Clone from HuggingFace with Git LFS (recommended when you can connect once):
git lfs install git clone https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml mv parakeet-tdt-0.6b-v3-coreml /opt/models/ - Use the HuggingFace web UI to download the
.tararchives for each.mlmodelcbundle and extract them into the layout above. - Copy the prepared directory from another machine that already ran
downloadAndLoad(the cache lives at~/Library/Application Support/FluidAudio/Models/<repo>on macOS).
After staging the files, call AsrModels.modelsExist(at:) in your app or a small Swift script to double-check that the four bundles and parakeet_vocab.json are readable.
Loading models without auto-download
The key is to call AsrModels.load(from:configuration:version:) with the repo folder URL. This loads the CoreML bundles you staged and never attempts a network download when everything is in place.
import FluidAudio
import CoreML
@main
struct ManualLoader {
static func main() async {
do {
// Point to the staged repo directory that contains the Core ML bundles
let repoDirectory = URL(fileURLWithPath: "/opt/models/parakeet-tdt-0.6b-v3-coreml", isDirectory: true)
// Optional: customize compute units; default uses CPU+ANE
var configuration = AsrModels.defaultConfiguration()
let models = try await AsrModels.load(
from: repoDirectory,
configuration: configuration,
version: .v3
)
let asrManager = AsrManager()
try await asrManager.loadModels(models)
// ... proceed with transcription workflow
} catch {
print("Failed to load ASR models: \(error)")
}
}
}
Switching versions at runtime
Pass .v2 for the English-only repo:
let englishRepo = URL(fileURLWithPath: "/opt/models/parakeet-tdt-0.6b-v2-coreml", isDirectory: true)
let englishModels = try await AsrModels.load(from: englishRepo, version: .v2)
Troubleshooting tips
- Use
AsrModels.modelsExist(at:)before callingloadto confirm the vocabulary file and all four.mlmodelcbundles are present. AsrModels.loadreads the vocabulary from the same repo folder. Make sureparakeet_vocab.jsonsits beside the model bundles.- If you see
AsrModelsError.modelNotFound, double-check for typos in the folder names or missingcoremldata.binfiles inside each.mlmodelcdirectory. loadstill reports helpful diagnostics throughOSLog. Run your build with theOS_ACTIVITY_MODEenvironment variable cleared so you can see the log lines during bring-up.
With this setup the ASR pipeline stays entirely offline while still using the exact CoreML bundles FluidAudio ships on HuggingFace.