mirror of
https://github.com/FluidInference/FluidAudio.git
synced 2026-05-12 20:20:36 +00:00
7e51dc6903
This PR addresses three high-priority consistency improvements in the Parakeet ASR folder from issue #457. ## Summary - ✅ **Task 1:** Standardized lifecycle method names across all managers (13 files) - ✅ **Task 2:** Consolidated ~230 lines of duplicate token deduplication logic - ✅ **Task 3:** Extracted shared streaming code into reusable utilities ## Changes ### 1. Lifecycle Method Standardization Unified naming conventions to eliminate confusion: | Manager | Old Method | New Method | |---------|-----------|------------| | `AsrManager` | `loadModels(_:)` | `configure(models:)` | | `SlidingWindowAsrSession` | `initialize()` | `loadModels()` | | `SlidingWindowAsrManager` | `start()` | `startStreaming()` | | `StreamingEouAsrManager` | `loadModelsFromHuggingFace()` | `loadModels()` | **Files updated:** 5 managers + 8 CLI commands ### 2. Token Deduplication Consolidation Extracted duplicate matching algorithms into generic, type-safe utilities: **New Files:** - `SequenceMatch.swift` - Data structure for sequence matches - `SequenceMatcher.swift` - 5 reusable matching algorithms: - `findSuffixPrefixMatch()` - O(n) greedy boundary detection - `findBoundedSubstringMatch()` - Windowed search - `findLongestCommonSubsequence()` - O(n²) LCS via DP - `findContiguousMatches()` - Longest consecutive run - `consolidateMatches()` - Merge adjacent matches - `TokenDeduplicationRegressionTests.swift` - 12 comprehensive tests **Refactored:** - `AsrManager+TokenProcessing.swift` - Reduced from ~65 to ~40 lines (-38%) - `ChunkProcessor.swift` - Removed ~77 lines of duplicate code ### 3. Streaming Code Extraction Created utilities for common patterns in both `StreamingEouAsrManager` and `StreamingNemotronAsrManager`: **New Utilities:** - `EncoderCacheManager` - Cache initialization and extraction - `StreamingAsrUtils` - Audio buffering, state reset, token decoding ## Impact | Metric | Result | |--------|--------| | **Duplicate code eliminated** | ~230 lines | | **New reusable utilities** | 430 lines | | **Test coverage** | +12 regression tests | | **API consistency** | Unified lifecycle naming | | **Performance** | No regression ✅ | | **WER** | 0.4% (verified) ✅ | | **RTFx** | 43.3x (verified) ✅ | | **Tests** | 25/25 passing ✅ | ## Testing ```bash # Token deduplication regression tests swift test --filter TokenDeduplicationRegressionTests # ✅ 12/12 tests passing # Nemotron streaming tests swift test --filter StreamingNemotronAsrManagerTests # ✅ 16/16 tests passing # ASR benchmark (no WER regression) swift run -c release fluidaudiocli asr-benchmark --max-files 10 # ✅ WER: 0.4%, RTFx: 43.3x ``` ## Breaking Changes ⚠️ This PR contains breaking API changes: - Renamed lifecycle methods (no deprecation wrappers) - All call sites updated in this PR Closes #457 <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/494" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> ---------