mirror of
https://github.com/FluidInference/FluidAudio.git
synced 2026-05-12 20:20:36 +00:00
docs: update text-processing-rs description with multilingual support (#365)
## Summary - Update text-processing-rs description in README and PostProcessing docs to reflect current capabilities - Now mentions ITN + TN support across 7 languages (EN, DE, ES, FR, HI, JA, ZH) - Added 100% NeMo test compatibility note (3,011 tests passing) ## Test plan - [x] Docs-only change, no code affected <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/fluidinference/fluidaudio/pull/365" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
This commit is contained in:
@@ -1,8 +1,12 @@
|
||||
# Post-Processing ASR Output
|
||||
# Text Processing
|
||||
|
||||
## Overview
|
||||
|
||||
**[text-processing-rs](https://github.com/FluidInference/text-processing-rs)** provides both Inverse Text Normalization (ITN) and Text Normalization (TN) across 7 languages (EN, DE, ES, FR, HI, JA, ZH). 100% NeMo test compatibility (3,011 tests). Rust port of [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing) with Swift wrapper.
|
||||
|
||||
## Inverse Text Normalization (ITN)
|
||||
|
||||
Inverse Text Normalization converts spoken-form ASR output to written form:
|
||||
ITN converts spoken-form ASR output to written form — useful for post-processing ASR transcriptions:
|
||||
|
||||
| Input (spoken) | Output (written) |
|
||||
|----------------|------------------|
|
||||
@@ -12,17 +16,23 @@ Inverse Text Normalization converts spoken-form ASR output to written form:
|
||||
| "two thirty pm" | "2:30 p.m." |
|
||||
| "test at gmail dot com" | "test@gmail.com" |
|
||||
|
||||
## Post-Processing Tools
|
||||
## Text Normalization (TN)
|
||||
|
||||
| Tool | Description | Language |
|
||||
|------|-------------|----------|
|
||||
| **[text-processing-rs](https://github.com/FluidInference/text-processing-rs)** | Inverse Text Normalization (ITN) - converts spoken-form ASR output to written form. Rust port of [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing) with Swift wrapper. | Rust, Swift |
|
||||
TN converts written-form text to spoken form — useful for TTS preprocessing:
|
||||
|
||||
## Using ITN with FluidAudio
|
||||
| Input (written) | Output (spoken) |
|
||||
|-----------------|-----------------|
|
||||
| "123" | "one hundred twenty three" |
|
||||
| "$5.50" | "five dollars and fifty cents" |
|
||||
| "January 5, 2025" | "january fifth twenty twenty five" |
|
||||
| "2:30 PM" | "two thirty p m" |
|
||||
| "1st" | "first" |
|
||||
|
||||
FluidAudio includes optional support for text-processing-rs through the `TextNormalizer` class. The library uses dynamic loading, so it's completely optional - if not linked, `normalize()` returns the input unchanged.
|
||||
## Using with FluidAudio
|
||||
|
||||
### Basic Usage
|
||||
FluidAudio includes optional support for text-processing-rs through the `TextNormalizer` class. The library uses dynamic loading, so it's completely optional — if not linked, `normalize()` returns the input unchanged.
|
||||
|
||||
### ITN (Spoken to Written)
|
||||
|
||||
```swift
|
||||
import FluidAudio
|
||||
@@ -39,20 +49,31 @@ let result = normalizer.normalize("two hundred dollars")
|
||||
// Returns "$200" (with native library) or "two hundred dollars" (without)
|
||||
```
|
||||
|
||||
### TN (Written to Spoken)
|
||||
|
||||
```swift
|
||||
// Convert written text to spoken form for TTS
|
||||
let spoken = normalizer.tnNormalize("$5.50")
|
||||
// Returns "five dollars and fifty cents"
|
||||
|
||||
let spoken = normalizer.tnNormalize("January 5, 2025")
|
||||
// Returns "january fifth twenty twenty five"
|
||||
```
|
||||
|
||||
### With ASR Results
|
||||
|
||||
```swift
|
||||
// Transcribe audio
|
||||
let asrResult = try await asrManager.transcribe(samples, source: .system)
|
||||
|
||||
// Normalize the result
|
||||
// Normalize the result (ITN: spoken → written)
|
||||
let normalizedResult = normalizer.normalize(result: asrResult)
|
||||
print(normalizedResult.text) // Written form
|
||||
```
|
||||
|
||||
### Linking the Native Library
|
||||
|
||||
To enable ITN support, link your app against `libnemo_text_processing`:
|
||||
To enable text processing support, link your app against `libnemo_text_processing`:
|
||||
|
||||
1. Build text-processing-rs for your target platform
|
||||
2. Add the library to your Xcode project's linker settings
|
||||
|
||||
@@ -123,7 +123,7 @@ Enhance ASR output with post-processing:
|
||||
|
||||
| Tool | Description | Language |
|
||||
|------|-------------|----------|
|
||||
| **[text-processing-rs](https://github.com/FluidInference/text-processing-rs)** | Inverse Text Normalization (ITN) - converts spoken-form ASR output to written form ("two hundred" → "200", "five dollars" → "$5"). Rust port of [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing) with Swift wrapper. | Rust, Swift |
|
||||
| **[text-processing-rs](https://github.com/FluidInference/text-processing-rs)** | Inverse Text Normalization (ITN) and Text Normalization (TN) across 7 languages (EN, DE, ES, FR, HI, JA, ZH). 100% NeMo test compatibility (3,011 tests). Converts spoken-form ASR output to written form ("two hundred" → "200", "five dollars" → "$5"). Rust port of [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing) with Swift wrapper. | Rust, Swift |
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user