Files
FluidAudio/Scripts
Alex 7c9be31c05 fix(benchmark): repair 3 pre-existing script/download bugs (#534)
## Summary

Three unrelated pre-existing bugs surfaced while validating PR #515. All
of them block `Scripts/parakeet_subset_benchmark.sh --download` from
succeeding, but none are related to the v3 script-filtering work.
Consolidating into one PR since each fix is ~1–3 lines.

### 1. Japanese TDT folder-name mismatch

`Scripts/parakeet_subset_benchmark.sh` verifies the Japanese TDT model
at `$MODELS_DIR/parakeet-tdt-ja/`, but the folder was renamed to
`parakeet-ja` in 4ef33f0b6 (`Repo.parakeetJa.folderName =
"parakeet-ja"`). Result: `verify_assets()` always reported missing
assets even on a fully provisioned machine. One-line rename to match.

### 2. EOU streaming CLI writes to wrong path

`ParakeetEouCommand` had a default / `--use-cache` split where the
default branch produced `$CWD/Models/<chunk>/<chunk>/` (double-nested,
relative to CWD) as the load path, while `downloadModels()` called
`deletingLastPathComponent().deletingLastPathComponent()` then
`DownloadUtils.downloadRepo(repo, to:)` which appended `folderName =
"parakeet-eou-streaming/<chunk>"`. Net effect: files landed at
`$CWD/Models/parakeet-eou-streaming/<chunk>/` while `loadModels()`
looked at `$CWD/Models/<chunk>/<chunk>/` — model load failed silently.

Unified on Application Support (matches every other CoreML model in
FluidAudio). `--use-cache` retained as a no-op flag for backward
compatibility.

### 3. earnings22-kws dataset 404

HuggingFace consolidated `argmaxinc/earnings22-kws-golden` into
`argmaxinc/contextual-earnings22`. The old id now returns 404 from the
Datasets-Server REST API (no redirect follow). The new dataset has the
same feature schema (`audio`, `file_id`, `text`, `dictionary`, ...), so
swapping the id is sufficient — no downstream consumer changes needed.

## Test plan

Ran `Scripts/parakeet_subset_benchmark.sh --download` end-to-end:

- [x] `verify_assets` correctly resolves `parakeet-ja/` (all 5 expected
files present)
- [x] EOU warmup: `Models downloaded to ~/Library/Application
Support/FluidAudio/Models/parakeet-eou-streaming/320ms`, 0.00% WER on
warmup file
- [x] earnings22-kws: 1140+ files downloaded (was 0 before), no 404
- [x] `swift build` passes

Out of scope but observed (pre-existing, unrelated):
- `ctc-earnings-benchmark --auto-download` does not actually
auto-download CTC-110m model
- THCHS-30 dataset hit HF IP rate limit (429) — transient
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/534"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-04-21 04:22:18 -04:00
..