Commit Graph

  • 5b005b2a55 fix(tts/pockettts): skip short-text padding for mid-sentence chunks (#584) fix/pockettts-584-french-chunker Alex-Wengg 2026-05-12 10:39:58 -04:00
  • e609be9a24 fix(tts/pockettts): normalize French text and preserve mid-sentence chunks (#584) Alex-Wengg 2026-05-12 10:13:03 -04:00
  • 847a985ae4 fix(tts/pocket-tts): repair v1 voice cloning for pocket-tts 2.0.0 (#592) (#601) main Alex 2026-05-12 08:55:44 -04:00
  • a0092cf163 Fixed LS-EEND Memory Leak + Updated Docs (#605) Benjamin Lee 2026-05-12 05:53:59 -07:00
  • bfa14a1773 fix(asr): add melChunkContext opt-out flag for Issue #594 fix/asr-594-french-chunk-boundary Alex-Wengg 2026-05-12 01:34:33 -04:00
  • 927af7a829 test(asr): widen PunctuationCommitLayer debounce-cancel timing margin deprecate-cosyvoice3-mono-kokoro Alex-Wengg 2026-05-11 19:43:37 -04:00
  • 28e74d021d ci: remove obsolete Kokoro mono smoke test workflow Alex-Wengg 2026-05-11 19:14:38 -04:00
  • 49ecf8eab3 deprecate: remove CosyVoice3 and mono Kokoro (#571) Alex-Wengg 2026-05-11 19:08:12 -04:00
  • d9d06c731a ci: use CLAUDE_CODE_OAUTH_TOKEN for Claude Code Action (#600) Alex 2026-05-11 11:15:55 -04:00
  • ae1ef30240 ci: add Claude Code Action workflow (#599) Alex 2026-05-11 11:05:37 -04:00
  • 6d4e09fe37 Add Resonant to showcase (#598) TMS 2026-05-11 16:05:36 +02:00
  • fb8b779380 feat(tts/magpie): warmup API for cold-start mitigation (#60 Track 2) (#595) Alex 2026-05-10 16:51:09 -04:00
  • 2c45df3035 docs(tts): refresh Benchmarks.md per #590; wire styletts2 + --variant into tts-benchmark (#593) Alex 2026-05-09 21:47:45 -04:00
  • a400080380 Make SpeakerManager a struct and de-async DiarizerManager (#591) panv-kw 2026-05-09 23:14:07 +02:00
  • 3ff5ae2d0c refactor(tts): async StyleTTS2 predict + drop non-native Magpie synthesizeStream (#589) Alex 2026-05-09 12:54:07 -04:00
  • ce59fb14b8 feat(tts): StyleTTS2 LibriTTS (iteration_3) CoreML backend (#588) v0.14.5 Alex 2026-05-09 00:25:54 -04:00
  • b3a725db3e Fix: Prevent Metal crash when targetTokens is 0 in Kokoro TTS (#586) Greg Young 2026-05-08 17:56:13 -04:00
  • 1a27c9de31 Add Utter app to showcase in README.md (#585) Joe Petrakovich 2026-05-08 23:58:18 +09:00
  • 024bd8e454 chore(tts): remove StyleTTS2 backend, models, and references local 2026-05-07 13:32:01 -04:00
  • a53aff438b fix(tts): guard direct Float16 reads with #if arch(arm64) (CosyVoice3, StyleTTS2) (#582) Prakash Joshi Pax 2026-05-05 19:12:27 +05:45
  • 284ce520f9 feat(tts/magpie): nanocodec v4 (fp32 + int8 palettize) precision (#581) Alex 2026-05-04 23:22:34 -04:00
  • 8389c1b714 feat(tts/magpie): nanocodec v1/v2/v3 + decoder_step ANE pin + dual-precision API (#580) Alex 2026-05-04 22:35:22 -04:00
  • 6690b3a92a Remove reports.md and KokoroAne.md changes — not needed for this PR fix/kokoro-ane-zh-noise Alex-Wengg 2026-05-04 21:32:14 -04:00
  • 44f352c984 docs: tighten reports + KokoroAne noise note Alex-Wengg 2026-05-04 21:29:01 -04:00
  • a38770726f Merge remote-tracking branch 'origin/main' into fix/kokoro-ane-zh-noise Alex-Wengg 2026-05-04 21:23:51 -04:00
  • 1fc8e6ab4f docs(report): correct nanocodec upload — only .mlmodelc on HF Alex-Wengg 2026-05-04 21:20:29 -04:00
  • ccfff49e06 docs(report): add section 13 — Magpie nanocodec v1/v2/v3 versioning Alex-Wengg 2026-05-04 21:18:57 -04:00
  • 4bd31469fb refactor(tts/magpie): nanocodec v1/v2/v3 versioning (drop t24 prefix) Alex-Wengg 2026-05-04 20:19:14 -04:00
  • 2f0aab7a70 feat(tts/magpie): dual fp16/fp32 nanocodec t24 builds via MagpieNanocodecPrecision Alex-Wengg 2026-05-04 19:58:28 -04:00
  • ec70515049 feat(tts/magpie): chunked T=24 fp32 nanocodec + edge-pad (Phase C v2) Alex-Wengg 2026-05-04 18:28:22 -04:00
  • 5879a32b3d fix(tts/magpie): pin decoder_step to ANE for ~2x speedup + correct EOS Alex-Wengg 2026-05-04 16:27:56 -04:00
  • bdbff4d88a feat(tts/kokoro-ane/zh): consolidated Mandarin G2P (erhua + jieba HMM + g2pW) (#572 items 1, 3, 4) (#579) v0.14.4 Alex 2026-05-04 01:01:39 -04:00
  • 684ceaf42b feat(tts/kokoro-ane/zh): POS-aware tone sandhi (#572 item 5) (#577) Alex 2026-05-04 00:39:50 -04:00
  • f202200d1f feat(tts/kokoro-ane): user-supplied Mandarin custom lexicon (#572 item 6) (#578) Alex 2026-05-04 00:39:37 -04:00
  • 0ea7c900b0 feat(tts/kokoro-ane/zh): number/date/currency verbalization (#572 item 2) (#573) Alex 2026-05-04 00:36:31 -04:00
  • 770af29a65 feat(tts/kokoro-ane/zh): include g2pw.mlmodelc in requiredModelsZh feat/mandarin-g2pw Alex-Wengg 2026-05-04 00:31:06 -04:00
  • e4ce919762 Finalized DiarzerTimeline segment updates no longer commit tentative segments (#568) Benjamin Lee 2026-05-03 21:18:18 -07:00
  • f909bfca4b feat(tts/kokoro-ane/zh): g2pW polyphone disambiguation (#572 item 1) Alex-Wengg 2026-05-04 00:15:43 -04:00
  • 73fb51f64c feat(tts/kokoro-ane/zh): jieba HMM tail for OOV segmentation (#572 item 4) feat/mandarin-jieba-hmm Alex-Wengg 2026-05-04 00:00:09 -04:00
  • 2a64b92363 feat(tts/kokoro-ane/zh): erhua merging (#572 item 3) feat/mandarin-erhua Alex-Wengg 2026-05-03 23:44:10 -04:00
  • 98acce358a feat(tts/kokoro-ane): add Mandarin (v1.1-zh) variant (#570) Alex 2026-05-03 22:03:27 -04:00
  • 2cdee77adf docs: session report — Mandarin ASR + Kokoro v1.1-zh noise fix Alex-Wengg 2026-05-03 19:31:06 -04:00
  • 75450e4584 docs(tts/kokoro-ane): note atan2 phase correction in KokoroNoise Alex-Wengg 2026-05-03 19:11:40 -04:00
  • 821e0f97bc Fixed an LS-EEND constructor (#567) Benjamin Lee 2026-05-02 16:24:54 -07:00
  • 0a9aace382 Fixed short segment filter for trailing tentative segments in DiarizerTimeline (#566) Benjamin Lee 2026-05-02 10:11:32 -07:00
  • 5bb84bc0b0 Fix DiarizerTimeline Short Segment Filter (#565) Benjamin Lee 2026-05-01 20:25:58 -07:00
  • 15768cc250 feat(tts/pocket): chunked cond_step prefill (3-12x speedup) feat/pocket-tts-cond-step-chunked Alex-Wengg 2026-05-01 18:58:20 -04:00
  • cad8a2b563 feat(asr/cohere): long-form transcribeLong + cold/warm docs (#564) Alex 2026-05-01 10:26:27 -04:00
  • 7603ac6733 feat(tts/benchmark): tts-benchmark CLI covering all TTS backends (#557) Alex 2026-05-01 09:09:42 -04:00
  • b5d8017d1f feat(asr/parakeet-v3): default to int4-per-channel encoder (#560) Alex 2026-04-30 23:00:43 -04:00
  • 35f6ba697f Added Back the Old LS-EEND Constructors (#563) Benjamin Lee 2026-04-30 17:24:18 -07:00
  • 4065a9917e Optimized LS-EEND API (#526) Benjamin Lee 2026-04-30 14:49:32 -07:00
  • 4db4af1390 Add Dictato to showcase (#561) Alessandro 2026-04-30 14:10:27 +02:00
  • 3e3ee69084 docs: add top-level architecture overview (#559) Alex 2026-04-29 15:33:09 -04:00
  • c4d56a5cb5 Feat/pocket tts int8 precision swap (#558) Zhongpai Gao 2026-04-29 13:57:15 -04:00
  • fdf330c0f0 docs(tts): correct PocketTTS multilingual coverage in status outline docs/tts-status-outline Alex-Wengg 2026-04-29 10:05:12 -04:00
  • e0608ff12a docs(tts): add TTS-only status outline and StyleTTS2 registry rows Alex-Wengg 2026-04-29 09:50:50 -04:00
  • 00ea906c20 fix: remove module_map from MachTaskSelfWrapper subspec (#546) v0.14.3 dianshu 2026-04-29 21:25:25 +08:00
  • 248b76b8b6 feat(tts/styletts2): scaffold StyleTTS2 4-stage pipeline integration (#554) Alex 2026-04-29 09:24:44 -04:00
  • e332c18b49 docs(models): fix Cohere Transcribe Model Sources link target (#553) Alex 2026-04-28 17:57:09 -04:00
  • 5c16ee120e docs(models): add Cohere Transcribe + Qwen3-ASR rows (#551) Alex 2026-04-28 17:50:48 -04:00
  • e435319a2f docs(models): drop Parakeet CTC Japanese + ASR/TTS row cleanups (#552) Alex 2026-04-28 17:50:30 -04:00
  • d89cf01ba6 docs(models): list CosyVoice3 under Not Production Ready (#550) Alex 2026-04-28 17:28:48 -04:00
  • 3d9d422202 feat(tts/magpie): add NVIDIA Magpie TTS Multilingual 357M Swift port (#541) Alex 2026-04-28 10:54:00 -04:00
  • b82d4f2fc8 feat(tts): CosyVoice3 Mandarin zero-shot TTS port (#536) v0.14.2 Alex 2026-04-28 09:57:13 -04:00
  • eff1752ebf feat(tts/pocket): multi-language support (EN + 9 new packs) (#549) Alex 2026-04-27 22:21:43 -04:00
  • f2b5081bef refactor(tts/pocket): unify buffered synthesize on streaming pipeline feat/pocket-tts-languages Alex-Wengg 2026-04-27 18:35:53 -04:00
  • a741da0ab4 docs(tts/pocket): scrub stale legacy-English references in synthesizer Alex-Wengg 2026-04-27 18:33:31 -04:00
  • afca5c4370 refactor(tts/pocket): extract shared MLMultiArray deepCopy helper Alex-Wengg 2026-04-27 18:30:15 -04:00
  • 168d7b9af4 refactor(tts/pocket): drop dead KV-state copy dance + scrub legacy comments Alex-Wengg 2026-04-27 18:21:05 -04:00
  • f377071188 docs(tts/pocket): scrub stale legacy-English references in PocketTtsModelStore Alex-Wengg 2026-04-27 18:18:36 -04:00
  • 49e37ed095 refactor(tts/pocket): collapse Mimi key discovery to single pass Alex-Wengg 2026-04-27 18:15:47 -04:00
  • aec1ea6fac refactor(tts/pocket): make PocketTtsLayerKeys.discover expectedLayers required Alex-Wengg 2026-04-27 18:02:06 -04:00
  • bb2ae7b1bf refactor(tts/pocket): drop dead defaults + redundant mkdir in ResourceDownloader Alex-Wengg 2026-04-27 18:00:35 -04:00
  • dfd9462b08 refactor(tts/pocket): drop legacy English support — v2 uniform layout Alex-Wengg 2026-04-27 10:23:55 -04:00
  • e30105e60f refactor(tts/pocket): drop dead code across pocket-tts + cli Alex-Wengg 2026-04-27 09:42:30 -04:00
  • 38f8dfd495 refactor(tts/pocket): drop dead code in mimi/layer key discovery Alex-Wengg 2026-04-27 09:30:34 -04:00
  • cdcb6713d9 refactor(tts/pocket): drop dead code in loader + downloader Alex-Wengg 2026-04-27 09:19:15 -04:00
  • 9b1b5eda0b fix(tts/pocket): flatten nested if let in voice loader + downloader Alex-Wengg 2026-04-27 03:15:07 -04:00
  • a03af53961 fix(cli/tts): apply de-essing in --seed path; flatten nested if Alex-Wengg 2026-04-27 02:59:53 -04:00
  • adebac9489 fix(tts/pocket): bounds-check safetensors header length before adding Alex-Wengg 2026-04-27 02:47:30 -04:00
  • 2f4ba4dcd9 fix(tts/pocket): address Devin review on PR #540 Alex-Wengg 2026-04-27 02:36:19 -04:00
  • caaa506794 feat(cli/tts): add --seed for PocketTTS deterministic mode + strict --language validation Alex-Wengg 2026-04-27 02:32:14 -04:00
  • 94f4bdef07 Revert "refactor(tts/pocket): drop v1 (legacy English root) HF layout" Alex-Wengg 2026-04-25 19:05:39 -04:00
  • 6cfe4a1b1f refactor(tts/pocket): drop v1 (legacy English root) HF layout Alex-Wengg 2026-04-25 18:31:58 -04:00
  • 7f74c41815 fix(tts/pocket): address Devin review on PR #540 Alex-Wengg 2026-04-25 18:09:59 -04:00
  • 53ac418105 chore(tts/pocket): drop dead aliases + tautological tests Alex-Wengg 2026-04-25 17:31:38 -04:00
  • b6ec8f91e9 feat(tts/pocket): native v2 voice prebakes + dynamic Mimi decoder schema Alex-Wengg 2026-04-25 01:04:30 -04:00
  • 0e8f923172 feat(tts/pocket): add multi-language support (EN + 9 new packs) Alex-Wengg 2026-04-24 22:40:41 -04:00
  • 982f117eb4 fix: avoid misleading confidence warning in SlidingWindowAsrManager.finish() (#548) Alexandre Mendonça Alvaro 2026-04-27 22:50:42 -03:00
  • 7c115f6b4e feat(tts/kokoro-ane): add laishere 7-stage CoreML chain (ANE-optimized) (#547) Alex 2026-04-27 20:08:49 -04:00
  • 5dfd39798d feat(cli): add speaker-similarity command feat/speaker-similarity-cli Alex-Wengg 2026-04-26 13:12:47 -04:00
  • ca14cfe3d7 feat(tts/pocket): add multi-language pack support experiment/pocket-tts-int8 Alex-Wengg 2026-04-26 12:34:06 -04:00
  • d302273d49 fix(diarizer): convert SpeakerManager to actor, Speaker to struct (#528) (#539) v0.14.1 Alex 2026-04-23 22:13:47 -04:00
  • 2ea0727541 ASR: fix Parakeet TDT v3 emitting Cyrillic for short Latin-script utterances (#512) (#515) Alex 2026-04-23 17:43:09 -04:00
  • cc4e712643 feat(asr/cohere): ANE-friendly static-shape decoder (v2) (#537) Alex 2026-04-23 17:42:34 -04:00
  • bd5ba7e1b7 fix abbreviation handling for kokoro (#538) Sachin Desai 2026-04-23 14:40:26 -07:00
  • b10bdcb51d feat(asr): add Cohere Transcribe (INT8 encoder + FP16 cache-external decoder) (#487) v0.14.0 Alex 2026-04-23 10:59:07 -04:00
  • fa2112915d perf(tts/cosyvoice3): adopt stateful LLM-Decode via MLState Alex-Wengg 2026-04-22 01:29:32 -04:00
  • b8b8c9e32b perf(cosyvoice3): swap Flow fp32/cpuOnly → fp16/cpuAndGPU (3× faster) Alex-Wengg 2026-04-21 21:39:31 -04:00