Files
Alex 974841c9b0 refactor: custom vocabulary restructure, dead code removal, pure Swift dataset download (#276)
Closes #268

## Summary

Restructures the custom vocabulary (context biasing) module into a clean
subdirectory layout, removes dead code, and replaces the Python-based
dataset downloader with pure Swift.

### From issue #268 checklist
- [x] Unit tests for custom vocab
- [x] Custom vocab structural reorg
- [x] Clean up 110m HF repo and Swift pathing logic
- [x] Break up large files
- [x] Verify Parakeet TDT v3 and v2 via benchmarking
- [x] Verify Parakeet EOU works too
- [ ] Pure Swift dataset download for custom vocab 
- [ ] updated custom vocab doc
- [ ] extended cli test for our new unit tests

### Changes

**Directory restructure**: `ContextBiasing/` → `CustomVocabulary/` with
subdirectories:
- `WordSpotting/` — CTC keyword spotting (DP algorithm, inference,
tokenizer, models)
- `Rescorer/` — vocabulary rescoring (token rescoring, evaluation,
utilities)
  - `BKTree/` — experimental BK-tree approximate string matching

## Benchmark verification

All benchmarks verified against `Documentation/Benchmarks.md` reference
values, note minor differences might be due to mac hardware specs:

| Model | Metric | This PR | Reference | Status |
|-------|--------|---------|-----------|--------|
| TDT v3 | WER | 2.6% | 2.5% | within noise |
| TDT v3 | CER | 1.0% | 1.0% | match |
| TDT v2 | WER | 2.2% | 2.1% | within noise |
| TDT v2 | CER | 0.7% | 0.7% | match |
| CTC Earnings22 | WER | 14.68% | 14.68% | match |
| CTC Earnings22 | Vocab F-score | 91.6% | 91.7% | within noise |
| EOU 160ms | WER | 8.29% | 8.29% | match |
2026-01-30 12:17:09 -05:00

52 lines
1.2 KiB
YAML

name: Build and Test
on:
pull_request:
branches: [main]
push:
branches: [main]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build-and-test-macos:
name: Build and Test Swift Package (macOS)
runs-on: macos-15
steps:
- name: Checkout code
uses: actions/checkout@v5
- name: Check versions
run: |
swift --version
xcodebuild -version
- name: Build package
run: swift build
- name: Run tests
run: swift test --parallel --num-workers $(sysctl -n hw.ncpu)
timeout-minutes: 20
build-ios:
name: Build (iOS)
runs-on: macos-15
steps:
- name: Checkout code
uses: actions/checkout@v5
- name: List available simulators
run: xcrun simctl list devices available
- name: Install iOS platform
run: |
sudo xcode-select -s /Applications/Xcode_16.app
xcodebuild -downloadPlatform iOS
- name: Build for iOS
run: |
xcodebuild -scheme FluidAudio \
-destination 'generic/platform=iOS' \
-derivedDataPath .build \
build