Files
Alex 06fc2ab3f0 Fix EOU frame count calculation for center-padded mel spectrograms (#444)
## Summary

Fixes #441 - StreamingEouAsrManager with 320ms chunks was producing
incorrect frame counts, causing shape mismatches.

- Updated `AudioMelSpectrogram.computeFlat()` to use correct frame count
formula
- Updated `AudioMelSpectrogram.computeFlatTransposed()` with `.center`
padding mode
- Changed from `numFrames = audioCount / hopLength` to `numFrames = 1 +
(paddedCount - winLength) / hopLength`
- This accounts for nFFT/2 center padding applied before STFT
processing, matching NeMo's computation

## Root Cause

The original formula didn't account for the center padding (nFFT/2 on
each side) that's applied to audio before windowing. This caused the
frame count to be off by 1, producing 63 frames instead of 64 for 630ms
audio chunks.

## Test Results

### Frame Count Validation Tests
Added `EouChunkSizeFrameCountTests` - all passing:
-  160ms: 17 frames (was 16)
-  320ms: 64 frames (was 63) ← **Issue #441 error case**
-  1280ms: 129 frames (was 128)
-  Tested with 10 different audio lengths per chunk size

### Integration Tests (10 files per chunk size)
**30 transcriptions total - 100% success rate:**

| Chunk Size | Files | Success | Avg WER | Overall WER |
|------------|-------|---------|---------|-------------|
| 160ms | 10/10 | 100% | 8.40% | 9.64% |
| 320ms | 10/10 | 100% | 4.92% | 5.72% |
| 1280ms | 10/10 | 100% | 7.19% | 7.83% |

** No shape mismatch errors detected across all 30 transcriptions**

The 320ms chunk size (the problematic one from issue #441) now works
perfectly and actually achieves the lowest WER!

## Test Plan

- [x] All `AudioMelSpectrogramTests` pass
- [x] Added `EouChunkSizeFrameCountTests` - all passing
- [x] Integration test: 10 files × 3 chunk sizes = 30 successful
transcriptions
- [x] WER calculation confirms transcription quality maintained (5-10%
WER)
- [x] Verified no shape mismatch errors

All tests pass successfully.
2026-03-27 18:41:36 -04:00

2.5 KiB

FluidAudio - Agent Development Guide

Build & Test Commands

swift build                                    # Build project
swift build -c release                        # Release build
swift test                                     # Run all tests
swift test --filter CITests                   # Run single test class
swift test --filter CITests.testPackageImports # Run single test method
swift format --in-place --recursive --configuration .swift-format Sources/ Tests/

Architecture

  • FluidAudio/: Main library (ASR/, Diarizer/, VAD/, Shared/ modules)
  • FluidAudioCLI/: CLI tool with benchmarking and processing commands
  • Tests/FluidAudioTests/: Comprehensive test suite
  • Models: Auto-downloaded from HuggingFace with CoreML compilation
  • Processing Pipeline: Audio → VAD → Diarization → ASR → Timestamped transcripts

Critical Rules

  • NEVER use @unchecked Sendable - implement proper thread safety with actors/MainActor
  • NEVER create dummy/mock models or synthetic audio data - use real models only
  • NEVER create simplified versions - implement full solutions or consult first
  • NEVER run git push unless explicitly requested by user
  • Add unit tests when writing new code

Code Style (swift-format config)

  • Line length: 120 chars, 4-space indentation
  • Import order: Alphabetical preferred (import CoreML, import Foundation, import OSLog), but OrderedImports rule is disabled due to Swift 6.1 (GitHub Actions CI) vs 6.3 (local) formatter incompatibility
  • Naming: lowerCamelCase for variables/functions, UpperCamelCase for types
  • Error handling: Use proper Swift error handling, no force unwrapping in production
  • Documentation: Triple-slash comments (///) for public APIs
  • Thread safety: Use actors, @MainActor, or proper locking - never @unchecked Sendable
  • Control flow: Prefer flattened if statements with early returns/continues over nested if statements. Use guard statements and inverted conditions to exit early. Nested if statements should be absolutely avoided.

Clean code

  • When adding new interfaces, make sure that the API is consistent with the other model managers
  • Files should be isolated and the code should contain a single responsibility for each

Mobius Plan

When users ask you to perform tasks that might be more compilcated, make sure you look at PLANS.md and follow the instructions there to plan the change out first and follow the instructions there. The plans should be in a .mobius/ folder and never committed directly to Github