mirror of https://github.com/FluidInference/FluidAudio.git synced 2026-05-12 20:20:36 +00:00

Files

T

Alex ea50062181 ASR architecture cleanup: naming, dead code, file organization 29/03/2026 (#457 ) (#468 )

## Summary

Addresses #457 — ASR architecture inconsistencies, tech debt, and
misplaced code.

### Naming consistency
- Standardized `Manager` suffix: `StreamingAsrEngine` →
`StreamingAsrManager` (protocol)
- Streaming-first prefix: `EouStreamingAsrManager` →
`StreamingEouAsrManager`, `NemotronStreamingAsrManager` →
`StreamingNemotronAsrManager`
- `AsrManager.initialize(models:)` → `loadModels(_:)` (matches streaming
managers)
- `AsrManager.resetState()` → `reset()`

### Dead code removal
- Removed CTC logit caching from `AsrManager` (~60 lines) —
`SlidingWindowAsrManager` never read the cache, it runs its own CTC
inference via `CtcKeywordSpotter`
- Removed `StreamingAsrManagerFactory` — moved `createManager()` onto
`StreamingModelVariant` enum

### Lifecycle consistency
- Added `cleanup()` to `StreamingAsrManager` protocol and all
implementations
- Every ASR manager now has both `reset()` and `cleanup()`

### File organization
- Split `AsrManager+Transcription.swift` (441 lines) into:
  - `+Transcription.swift` (129 lines) — high-level API
  - `+Pipeline.swift` (152 lines) — CoreML inference
  - `+TokenProcessing.swift` (170 lines) — confidence, timings, dedup
- Moved `MLMultiArray.reset(to:)` to
`Shared/MLMultiArray+Extensions.swift`
- Made `transcribeChunk()` internal

## Verification

6 benchmarks × 100 files, zero WER regressions:

| Model | Baseline | Current | Delta |
|-------|----------|---------|-------|
| Parakeet TDT v3 | 2.6% | 2.64% | +0.04% |
| Parakeet TDT v2 | 3.8% | 3.79% | -0.01% |
| CTC-TDT 110M | 3.6% | 3.56% | -0.04% |
| CTC Earnings | 16.54% | 16.51% | -0.03% |
| EOU 320ms | 7.11% | 7.11% | +0.00% |
| Nemotron 1120ms | 1.99% | 1.99% | +0.00% |

## Test plan
- [x] `swift build` passes
- [x] All 6 subset benchmarks pass with zero WER regressions
- [ ] `swift test` CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/468"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

2026-03-29 20:29:50 -04:00

4.4 KiB

Raw Permalink Blame History

Manual Model Loading for ASR

FluidAudio usually downloads ASR CoreML bundles from HuggingFace with AsrModels.downloadAndLoad. When you need to operate in an offline or pre-provisioned environment, you can skip the download helper and point the pipeline at models you staged yourself. This guide shows how to prepare the assets and wire them into the AsrManager manually.

Required assets

Each ASR release ships four CoreML bundles plus the shared vocabulary file:

Preprocessor.mlmodelc
Encoder.mlmodelc
Decoder.mlmodelc
JointDecision.mlmodelc
parakeet_vocab.json

Pick the folder that matches the version you want to serve:

Multilingual (AsrModelVersion.v3): FluidInference/parakeet-tdt-0.6b-v3-coreml
English only (AsrModelVersion.v2): FluidInference/parakeet-tdt-0.6b-v2-coreml

Stage the directory layout

AsrModels.load(from:) expects the directory you pass to be the staged HuggingFace repo folder itself (the one that contains the .mlmodelc bundles and parakeet_vocab.json). Place the assets in the structure below (replace /opt/models with your storage path):

/opt/models
└── parakeet-tdt-0.6b-v3-coreml
    ├── Preprocessor.mlmodelc
    │   ├── coremldata.bin
    │   └── ...
    ├── Encoder.mlmodelc
    ├── Decoder.mlmodelc
    ├── JointDecision.mlmodelc
    └── parakeet_vocab.json

If you are deploying the English-only variant, swap the folder name for parakeet-tdt-0.6b-v2-coreml and ensure the four .mlmodelc bundles plus parakeet_vocab.json are present.

Manual download options

Clone from HuggingFace with Git LFS (recommended when you can connect once):

git lfs install
git clone https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml
mv parakeet-tdt-0.6b-v3-coreml /opt/models/

Use the HuggingFace web UI to download the .tar archives for each .mlmodelc bundle and extract them into the layout above.
Copy the prepared directory from another machine that already ran downloadAndLoad (the cache lives at ~/Library/Application Support/FluidAudio/Models/<repo> on macOS).

After staging the files, call AsrModels.modelsExist(at:) in your app or a small Swift script to double-check that the four bundles and parakeet_vocab.json are readable.

Loading models without auto-download

The key is to call AsrModels.load(from:configuration:version:) with the repo folder URL. This loads the CoreML bundles you staged and never attempts a network download when everything is in place.

import FluidAudio
import CoreML

@main
struct ManualLoader {
    static func main() async {
        do {
            // Point to the staged repo directory that contains the Core ML bundles
            let repoDirectory = URL(fileURLWithPath: "/opt/models/parakeet-tdt-0.6b-v3-coreml", isDirectory: true)

            // Optional: customize compute units; default uses CPU+ANE
            var configuration = AsrModels.defaultConfiguration()

            let models = try await AsrModels.load(
                from: repoDirectory,
                configuration: configuration,
                version: .v3
            )

            let asrManager = AsrManager()
            try await asrManager.loadModels(models)

            // ... proceed with transcription workflow
        } catch {
            print("Failed to load ASR models: \(error)")
        }
    }
}

Switching versions at runtime

Pass .v2 for the English-only repo:

let englishRepo = URL(fileURLWithPath: "/opt/models/parakeet-tdt-0.6b-v2-coreml", isDirectory: true)
let englishModels = try await AsrModels.load(from: englishRepo, version: .v2)

Troubleshooting tips

Use AsrModels.modelsExist(at:) before calling load to confirm the vocabulary file and all four .mlmodelc bundles are present.
AsrModels.load reads the vocabulary from the same repo folder. Make sure parakeet_vocab.json sits beside the model bundles.
If you see AsrModelsError.modelNotFound, double-check for typos in the folder names or missing coremldata.bin files inside each .mlmodelc directory.
load still reports helpful diagnostics through OSLog. Run your build with the OS_ACTIVITY_MODE environment variable cleared so you can see the log lines during bring-up.

With this setup the ASR pipeline stays entirely offline while still using the exact CoreML bundles FluidAudio ships on HuggingFace.

4.4 KiB Raw Permalink Blame History