mirror of https://github.com/FluidInference/FluidAudio.git synced 2026-05-12 20:20:36 +00:00

Files

T

Alex d83c9e2587 docs: update text-processing-rs description with multilingual support (#365 )

## Summary

- Update text-processing-rs description in README and PostProcessing
docs to reflect current capabilities
- Now mentions ITN + TN support across 7 languages (EN, DE, ES, FR, HI,
JA, ZH)
- Added 100% NeMo test compatibility note (3,011 tests passing)

## Test plan

- [x] Docs-only change, no code affected
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/365"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

2026-03-12 22:44:22 -04:00

2.8 KiB

Raw Permalink Blame History

Text Processing

Overview

text-processing-rs provides both Inverse Text Normalization (ITN) and Text Normalization (TN) across 7 languages (EN, DE, ES, FR, HI, JA, ZH). 100% NeMo test compatibility (3,011 tests). Rust port of NVIDIA NeMo Text Processing with Swift wrapper.

Inverse Text Normalization (ITN)

ITN converts spoken-form ASR output to written form — useful for post-processing ASR transcriptions:

Input (spoken)	Output (written)
"two hundred"	"200"
"five dollars and fifty cents"	"$5.50"
"january fifth twenty twenty five"	"January 5, 2025"
"two thirty pm"	"2:30 p.m."
"test at gmail dot com"	"test@gmail.com"

Text Normalization (TN)

TN converts written-form text to spoken form — useful for TTS preprocessing:

Input (written)	Output (spoken)
"123"	"one hundred twenty three"
"$5.50"	"five dollars and fifty cents"
"January 5, 2025"	"january fifth twenty twenty five"
"2:30 PM"	"two thirty p m"
"1st"	"first"

Using with FluidAudio

FluidAudio includes optional support for text-processing-rs through the TextNormalizer class. The library uses dynamic loading, so it's completely optional — if not linked, normalize() returns the input unchanged.

ITN (Spoken to Written)

import FluidAudio

let normalizer = TextNormalizer.shared

// Check if native library is available
if normalizer.isNativeAvailable {
    print("ITN version: \(normalizer.version ?? "unknown")")
}

// Normalize spoken-form text
let result = normalizer.normalize("two hundred dollars")
// Returns "$200" (with native library) or "two hundred dollars" (without)

TN (Written to Spoken)

// Convert written text to spoken form for TTS
let spoken = normalizer.tnNormalize("$5.50")
// Returns "five dollars and fifty cents"

let spoken = normalizer.tnNormalize("January 5, 2025")
// Returns "january fifth twenty twenty five"

With ASR Results

// Transcribe audio
let asrResult = try await asrManager.transcribe(samples, source: .system)

// Normalize the result (ITN: spoken → written)
let normalizedResult = normalizer.normalize(result: asrResult)
print(normalizedResult.text)  // Written form

Linking the Native Library

To enable text processing support, link your app against libnemo_text_processing:

Build text-processing-rs for your target platform
Add the library to your Xcode project's linker settings
TextNormalizer.isNativeAvailable will return true

See the text-processing-rs README for build instructions.

2.8 KiB Raw Permalink Blame History