## Summary - Update text-processing-rs description in README and PostProcessing docs to reflect current capabilities - Now mentions ITN + TN support across 7 languages (EN, DE, ES, FR, HI, JA, ZH) - Added 100% NeMo test compatibility note (3,011 tests passing) ## Test plan - [x] Docs-only change, no code affected <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/365" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
2.8 KiB
Text Processing
Overview
text-processing-rs provides both Inverse Text Normalization (ITN) and Text Normalization (TN) across 7 languages (EN, DE, ES, FR, HI, JA, ZH). 100% NeMo test compatibility (3,011 tests). Rust port of NVIDIA NeMo Text Processing with Swift wrapper.
Inverse Text Normalization (ITN)
ITN converts spoken-form ASR output to written form — useful for post-processing ASR transcriptions:
| Input (spoken) | Output (written) |
|---|---|
| "two hundred" | "200" |
| "five dollars and fifty cents" | "$5.50" |
| "january fifth twenty twenty five" | "January 5, 2025" |
| "two thirty pm" | "2:30 p.m." |
| "test at gmail dot com" | "test@gmail.com" |
Text Normalization (TN)
TN converts written-form text to spoken form — useful for TTS preprocessing:
| Input (written) | Output (spoken) |
|---|---|
| "123" | "one hundred twenty three" |
| "$5.50" | "five dollars and fifty cents" |
| "January 5, 2025" | "january fifth twenty twenty five" |
| "2:30 PM" | "two thirty p m" |
| "1st" | "first" |
Using with FluidAudio
FluidAudio includes optional support for text-processing-rs through the TextNormalizer class. The library uses dynamic loading, so it's completely optional — if not linked, normalize() returns the input unchanged.
ITN (Spoken to Written)
import FluidAudio
let normalizer = TextNormalizer.shared
// Check if native library is available
if normalizer.isNativeAvailable {
print("ITN version: \(normalizer.version ?? "unknown")")
}
// Normalize spoken-form text
let result = normalizer.normalize("two hundred dollars")
// Returns "$200" (with native library) or "two hundred dollars" (without)
TN (Written to Spoken)
// Convert written text to spoken form for TTS
let spoken = normalizer.tnNormalize("$5.50")
// Returns "five dollars and fifty cents"
let spoken = normalizer.tnNormalize("January 5, 2025")
// Returns "january fifth twenty twenty five"
With ASR Results
// Transcribe audio
let asrResult = try await asrManager.transcribe(samples, source: .system)
// Normalize the result (ITN: spoken → written)
let normalizedResult = normalizer.normalize(result: asrResult)
print(normalizedResult.text) // Written form
Linking the Native Library
To enable text processing support, link your app against libnemo_text_processing:
- Build text-processing-rs for your target platform
- Add the library to your Xcode project's linker settings
TextNormalizer.isNativeAvailablewill returntrue
See the text-processing-rs README for build instructions.