180 Commits

Author SHA1 Message Date
Michael Hansen d310989555 Fix dictionary with multiple pronunciations 2020-01-07 19:40:59 -05:00
Michael Hansen ed581ecf9d Fixing fuzzywuzzy and others with converters 2020-01-05 20:43:15 -05:00
Michael Hansen f8aedd4ef5 Update Docker update docs 2020-01-05 16:55:37 -05:00
Michael Hansen 14c1386496 Possible fix for threading issues 2020-01-05 16:46:14 -05:00
Michael Hansen 153b642057 Add Rpi Zero to docs 2020-01-05 15:14:45 -05:00
Michael Hansen dec32102dd Merge pull request #146 from esdeboer/freebsd
sed -i is not POSIX compliant, instead make a temp copy and rename to…
2020-01-05 14:50:38 -05:00
Michael Hansen f365c69265 Merge pull request #142 from maxbachmann/cleanup
code cleanup
2020-01-05 14:50:02 -05:00
Michael Hansen 1724c328b7 Trying to fix Docker image 2020-01-05 11:14:56 -05:00
Eric de Boer 6db4a8d341 sed -i is not POSIX compliant, instead make a temp copy and rename to original. 2020-01-05 16:53:38 +01:00
Michael Hansen b70e8a8569 Copying profiles in Docker 2020-01-04 22:41:04 -05:00
Michael Hansen 2e4828da06 Fix dockerignore 2020-01-04 22:34:15 -05:00
Michael Hansen 96cfe69753 Re-generated Dockerfile 2020-01-04 22:02:31 -05:00
maxbachmann 3e8e246c1c swap vars without temp var 2020-01-05 01:02:35 +01:00
Michael Hansen 80a5008b93 Copy built-in slots to Docker 2020-01-04 16:55:38 -05:00
Michael Hansen e26ecf82f1 Bump rhasspy-nly to 0.1.4.1 2020-01-04 16:41:10 -05:00
Michael Hansen e4db52f845 Merge pull request #138 from esdeboer/master
Install dependencies before running yarn build.
2020-01-04 16:38:41 -05:00
Michael Hansen 846313e236 Documented slot programs, number ranges, converters 2020-01-04 16:28:03 -05:00
Michael Hansen b68a3fac4a Add rhasspy/days and rhasspy/months slots 2020-01-04 16:15:34 -05:00
Michael Hansen 7459f0d9d9 Add rhasspy/number 2020-01-04 15:53:30 -05:00
Michael Hansen 617b789d89 Add locales to profiles 2020-01-04 15:53:13 -05:00
Michael Hansen bb20cd280b Transforming number ranges to rhasspy/number 2020-01-04 15:04:52 -05:00
Michael Hansen ce780feb74 Add en rhasspy/number and rhasspy/days slots 2020-01-04 12:31:46 -05:00
Michael Hansen 2225262a53 Add system slots/slot programs 2020-01-04 12:28:29 -05:00
Michael Hansen 5b5529339b Minor clean up in tutorial 2020-01-04 10:46:02 -05:00
Michael Hansen 90f5c5aef7 Touch up tutorials 2020-01-04 10:42:55 -05:00
Eric de Boer 78f263582d Install dependencies before running yarn build. 2020-01-04 09:44:02 +01:00
Michael Hansen 6c8608f1a1 Merge pull request #136 from esdeboer/master
Only download Kaldi when it is requested to be installed.
2020-01-03 17:21:57 -05:00
Michael Hansen 3e5437856b Merge pull request #137 from xLAva/Feature_Fuzzywuzzy_SpeedUp
Fuzzywuzzy: major speed improvement by disabling the debug log spam
2020-01-03 17:21:17 -05:00
Michael Hansen 15aaea2810 Support siteId in /api/text-to-speech 2020-01-03 17:19:27 -05:00
xLAva 61d8930e38 Fuzzywuzzy: major speed improvement by disabling the debug log spam 2020-01-03 22:43:26 +01:00
Eric de Boer 5748b2dc3a Only download Kaldi when it is requested to be installed. 2020-01-03 20:57:21 +01:00
Michael Hansen 8f7158f7cc Only allow a single hotword to be detected by snowboy (single_detection) 2020-01-03 14:11:01 -05:00
Michael Hansen 97226286e3 Fix phonetisaurus download link in build-from-source.sh 2020-01-03 11:46:41 -05:00
Michael Hansen 896b3ddfba Run isort 2020-01-03 11:18:45 -05:00
Michael Hansen 1772f6e740 Add slot programs 2020-01-03 11:17:42 -05:00
Michael Hansen b5dfd6518b Add converter args 2020-01-03 10:36:06 -05:00
Michael Hansen 2730c131d0 Merge pull request #133 from maxbachmann/cleanup
do some code cleanup
2020-01-03 09:37:41 -05:00
maxbachmann 05ded030c8 do some code cleanup 2020-01-03 08:52:49 +01:00
Michael Hansen 3b90383145 Trying to fix Jekyll build errors 2020-01-02 23:22:02 -05:00
Michael Hansen 1bb5462150 Merge pull request #130 from kroka/patch-1
add missing ASR fields for Hermes MQTT publishing
2020-01-02 23:13:26 -05:00
Michael Hansen 95a354e2a3 Merge pull request #132 from maxbachmann/master
correct spelling mistake
2020-01-02 23:13:08 -05:00
Michael Hansen d203a3ed75 Add custom converters (programs) 2020-01-02 17:16:34 -05:00
Michael Hansen 59d473b931 Add number ranges 2020-01-02 16:37:16 -05:00
Michael Hansen 17737f7fed Bump version 2020-01-02 16:22:29 -05:00
Michael Hansen 76cf173849 Doing int conversion with built-in number conversion 2020-01-02 16:20:05 -05:00
maxbachmann 55d1cfacdd correct spelling mistake 2020-01-02 18:52:14 +01:00
Michael Hansen 4f6d02169c Force casing on slot inputs 2020-01-02 11:40:04 -05:00
Michael Hansen 74761b942f Fix overwrite_all in slot params 2020-01-02 10:57:10 -05:00
Michael Hansen b88acb3a34 Parse JSON from requests with json5 2020-01-02 10:47:58 -05:00
kroka 7b323a08bb add missing fields for Hermes publishing
prevents a null pointer access in hermes-python
2020-01-02 16:42:28 +01:00
Michael Hansen ec55dbfa5b Remove --yes from apt-get install commands 2020-01-01 22:37:23 -05:00
Michael Hansen 15af0ae3c1 Merge pull request #121 from jthomasdewald/master
Home Assistant Template Example
2020-01-01 22:35:06 -05:00
Michael Hansen f8542f7ac1 Merge pull request #126 from Romkabouter/remove-x-hassio-key
Change X-HASSIO-KEY to Authorization
2020-01-01 09:06:14 -05:00
Paul Romkes de67b3318c Change X-HASSIO-KEY to Authorization 2020-01-01 11:32:09 +01:00
jthomasdewald b47dca03aa Update command-listener doc
Changed vad_mode description to match webrtcvad docs
2019-12-31 15:03:47 -08:00
Michael Hansen 89a1921c3e Merge 2019-12-31 13:00:48 -05:00
Michael Hansen e5fe2a31b3 Add mypy to check, code cleanup 2019-12-31 12:54:10 -05:00
Michael Hansen afdd241c57 Add awake webhook 2019-12-31 12:40:56 -05:00
Michael Hansen bea38cc64f Fix wakeword issues on TTS pause 2019-12-30 21:49:20 -05:00
Michael Hansen c2562aa674 Don't disable wake system by default with TTS 2019-12-30 17:09:52 -05:00
Michael Hansen 7dec472ec4 Add update instructions to docs 2019-12-28 20:51:07 -05:00
Michael Hansen 007ea4266e Bump version 2019-12-27 22:12:38 -05:00
Michael Hansen a627f8746c Code cleanup 2019-12-27 21:19:46 -05:00
Michael Hansen 13f183afd4 Reset PyAudio on error 2019-12-27 21:17:03 -05:00
Michael Hansen 130cbeb7a8 Consolidate actor events. Stop wake on TTS speak. 2019-12-27 21:07:00 -05:00
jthomasdewald 358e7b087e Home Assistant Template Example
Example files to get Rhasspy to control any light with only one automation and one script.
2019-12-27 15:25:36 -08:00
Michael Hansen 8e2d2f2352 Play error sound when intent not recognized 2019-12-27 11:16:38 -05:00
Michael Hansen ac3c92e24a Fix pop sound in pico2wave 2019-12-27 10:59:52 -05:00
Michael Hansen a501c52954 Add /api/speech-to-text endpoint to docs 2019-12-26 22:29:22 -05:00
Michael Hansen f8f0b48140 Setting text and raw_text when intent is not recognized 2019-12-26 22:11:58 -05:00
Michael Hansen 2a8972fb99 Merge branch 'master' of https://github.com/synesthesiam/rhasspy 2019-12-26 11:10:37 -05:00
Michael Hansen ef0211505a Fix about page in docs 2019-12-26 10:56:21 -05:00
Michael Hansen c961fc8814 Updated docs to link to community site 2019-12-26 10:36:18 -05:00
Michael Hansen cbbfc23395 Merge pull request #118 from jthomasdewald/master
Clarify intent handling for sever / client setup
2019-12-25 18:04:10 -05:00
Michael Hansen 0c2a1931f6 Merge pull request #115 from frkos/patch-1
Porcupine optimizer tool is deprecated
2019-12-25 18:02:52 -05:00
Michael Hansen cfa90ea5d5 Add _text and _raw_text slots to hass events 2019-12-25 18:00:39 -05:00
jthomasdewald 414457f150 Clarify intent handling for sever / client setup 2019-12-25 10:08:18 -08:00
Michael Hansen 0dbb84b355 Logging transcription in Kaldi stt 2019-12-24 21:50:07 -05:00
Michael Hansen b1ff836c4e Using Witten-Bell method for ngrammake 2019-12-24 10:58:39 -05:00
frkos 640be7b0ac Update wake-word.md 2019-12-24 13:02:26 +03:00
frkos 421f59518a Update Porcupine wake-word docs
Porcupine optimizer tool is retired, so the link doesn't work
Accroding to the link https://github.com/Picovoice/porcupine#picovoice-console :
The console succeeds the (now retired) optimizer tool, as it can be used to train custom wake-words (Porcupine .ppn files).
2019-12-24 12:58:52 +03:00
Michael Hansen 3eb4368b37 Add missing pieces of web ui profile 2019-12-23 19:02:57 -05:00
Michael Hansen a2df6149bb Document systemd service and build from source 2019-12-23 18:42:10 -05:00
Michael Hansen 5b60b17dd3 Add example systemd service 2019-12-23 18:21:10 -05:00
Michael Hansen 62626cc6a1 Add option to send intents directly to Home Assistant 2019-12-23 14:54:47 -05:00
Michael Hansen 2e09b75f52 Minor fixes for mypy 2019-12-23 12:29:33 -05:00
Michael Hansen aa658fb29b Fix all linting errors 2019-12-23 11:58:47 -05:00
Michael Hansen ad2e208fc2 Not closing audio device by default 2019-12-23 10:27:18 -05:00
Michael Hansen f32cd5c93a Using json5 to parse profile 2019-12-23 10:26:56 -05:00
Michael Hansen 927123d491 Add extra instructions to development docs 2019-12-22 09:30:35 -05:00
Michael Hansen 9cc9e2efe6 Merge branch 'master' of https://github.com/synesthesiam/rhasspy 2019-12-22 09:08:09 -05:00
Michael Hansen b9ef100721 Merge pull request #109 from koenvervloesem/docs-development
Docs: Add development page
2019-12-22 09:07:42 -05:00
Michael Hansen da5b9b2fb5 Merge pull request #112 from koenvervloesem/docs-fixes
Docs: Fix nested unordered lists for mkdocs
2019-12-22 09:06:59 -05:00
Koen Vervloesem 27c681f758 Docs: Fix nested unordered lists for mkdocs 2019-12-22 11:42:56 +01:00
Michael Hansen 2af6130d22 Working on Rpi Zero support (armv6l) 2019-12-21 19:13:41 -05:00
Michael Hansen 268c2c5295 Bump rhasspynlu version. Remove kaldi online.conf check 2019-12-21 16:09:06 -05:00
Koen Vervloesem 797615acf7 Docs: Add development page 2019-12-21 17:12:14 +01:00
Michael Hansen ec786fd5db Merge pull request #108 from jasonhildebrand/support-marytts-effects
Support marytts effects
2019-12-20 20:55:20 -05:00
Jason Hildebrand bcda393d7b Merge branch 'master' into support-marytts-effects 2019-12-20 19:11:54 -06:00
Jason Hildebrand e7a67ad2be Add config option for marytts voice effects, and update documentation. 2019-12-20 18:59:28 -06:00
Michael Hansen 306a7e62bd Fix tutorial link to sentences 2019-12-20 16:46:13 -05:00
Michael Hansen 99b35270c8 Merge pull request #107 from koenvervloesem/docs-various-fixes
Various documentation fixes
2019-12-20 16:45:19 -05:00
Koen Vervloesem a69b445d51 Docs: Clean language 2019-12-20 22:05:45 +01:00
Koen Vervloesem 59ee156c1d Replace redirected URL for Hermes reference on snips.ai 2019-12-20 21:49:31 +01:00
Koen Vervloesem afcc2c59d4 Docs: Fix broken link 2019-12-20 21:36:50 +01:00
Koen Vervloesem 25afcc3559 Docs: Fix installation link 2019-12-20 21:08:12 +01:00
Koen Vervloesem 8cd3a10299 Docs: Too much indentation in tutorial example 2019-12-20 21:03:58 +01:00
Michael Hansen 67a841c080 Merge pull request #101 from koenvervloesem/docs-markdownlint
Fix documentation style issues with Markdownlint
2019-12-20 10:06:53 -05:00
Michael Hansen 28e098330a Add /api/intents endpoint 2019-12-20 10:02:36 -05:00
Koen Vervloesem 91f6571662 Fix documentation style issues with Markdownlint 2019-12-20 09:47:40 +01:00
Michael Hansen d9ef37c005 Fix acoustic_model training error 2019-12-19 08:25:44 -05:00
Michael Hansen f93dfe8a4e Add --noweb --nosudo to build-from-source.sh 2019-12-18 19:23:35 -05:00
Michael Hansen 1322ba3a3b Add remote intent handler 2019-12-18 19:23:28 -05:00
Michael Hansen 163cd9670c Merge branch 'master' of https://github.com/synesthesiam/rhasspy 2019-12-18 18:50:19 -05:00
Michael Hansen 580ea54b42 Merge pull request #94 from koenvervloesem/docs-reference-mqtt
Docs: Beginning of MQTT reference
2019-12-18 09:39:34 -05:00
Michael Hansen c4a9b60990 Merge pull request #95 from koenvervloesem/bump-version-to-2.4.14
Bump Rhasspy version
2019-12-18 09:19:29 -05:00
Michael Hansen 89875f1644 Bump version number 2019-12-18 09:25:40 -05:00
Michael Hansen 6f6924cee4 Fix sentences path fallback 2019-12-18 09:25:27 -05:00
Koen Vervloesem 19830d8d39 Docs: Clarify audio format specifications 2019-12-18 14:23:43 +01:00
Koen Vervloesem a9ccba37ee Docs: Fix bullet point 2019-12-18 14:19:04 +01:00
Koen Vervloesem 21ab4d4be2 Bump Rhasspy version 2019-12-18 10:56:41 +01:00
Koen Vervloesem 72383facae Add more of the supported Hermes MQTT topics 2019-12-18 09:08:30 +01:00
Koen Vervloesem d663c880f7 Docs: Beginning of MQTT reference 2019-12-17 22:22:03 +01:00
Michael Hansen cad172b450 Bump version 2019-12-17 11:12:20 -05:00
Michael Hansen 713b199669 Merge pull request #91 from underscorephil/patch-1
Update rgb tutorial to use ChangeLightState
2019-12-16 22:37:46 -05:00
Phil Jackson 8fdbddb2b8 Update rgb tutorial to use ChangeLightState
The tutorial was using both ChangeLightState and ChangeLightColor. Unifying to one should make the tutorial easier to follow.
2019-12-16 21:32:08 -06:00
Michael Hansen df974c4faf Add Home Assistant TTS integration 2019-12-16 16:32:14 -05:00
Michael Hansen 72fd9ced65 Added HA STT platform integration 2019-12-16 15:57:02 -05:00
Michael Hansen a82ecb8b52 Set wakeId and siteId in core recognize intent 2019-12-16 13:54:08 -05:00
Michael Hansen d052be5290 Using writeframes instead of writeframesraw 2019-12-16 13:28:15 -05:00
Michael Hansen 481bd3883f Avoiding snowboy segfault when trying to load missing model 2019-12-16 12:48:44 -05:00
Michael Hansen 97c68d1d0d Add wakeId and siteId to intent JSON 2019-12-16 12:43:22 -05:00
Michael Hansen b0500afa3f Add --log-level to app.py 2019-12-16 12:18:40 -05:00
Michael Hansen b356d2218f Catching ini parse errors in training 2019-12-16 11:38:32 -05:00
Michael Hansen a5ce7e6ef3 Code cleanup 2019-12-16 11:19:34 -05:00
Michael Hansen b5109850ae Large update to documentation 2019-12-16 11:12:32 -05:00
Michael Hansen 974784bb4f Add voice/locale params to MaryTTS emulation endpoint 2019-12-14 11:53:28 -05:00
Michael Hansen 81680f00d2 Add Discourse logos 2019-12-14 11:46:52 -05:00
Michael Hansen 4718800ec0 Add voice/language override for text to speech 2019-12-14 11:46:33 -05:00
Michael Hansen a9f4122875 maybe_download fix in build_from_source script 2019-12-13 22:58:42 -05:00
Michael Hansen 11d311f7ed Merge branch 'master' of https://github.com/synesthesiam/rhasspy 2019-12-13 17:11:07 -05:00
Michael Hansen 7c50ea5790 Editing documentation 2019-12-13 17:12:14 -05:00
Michael Hansen 18dca38e9a Add build-from-source.sh script 2019-12-13 15:17:35 -05:00
Michael Hansen 94b59c16bc Add MaryTTS /process emulation 2019-12-13 14:19:07 -05:00
Michael Hansen 7921287040 Allow multiple ini files for sentences in intents directory 2019-12-13 11:17:31 -05:00
Michael Hansen 38211e06ba Fix porcupine read path 2019-12-12 23:13:57 -05:00
Michael Hansen 2269bebf33 Working on splitting sentences.ini 2019-12-12 15:50:22 -05:00
Michael Hansen 565506b1df Code cleanup 2019-12-12 12:59:43 -05:00
Michael Hansen cc7e1b9a25 Add swagger API page using swagger-ui-py for quart 2019-12-12 11:56:45 -05:00
Michael Hansen c76c78674c Fix snowboy sensitivity in web ui. Add JSON profile editor. 2019-12-12 11:50:52 -05:00
Michael Hansen 3a925952e2 Fix webrtcvad timeout issue 2019-12-12 10:39:16 -05:00
Michael Hansen 9f6420e4cc Merge pull request #69 from koenvervloesem/download-message
Show URL when download fails
2019-12-12 06:36:06 -05:00
Michael Hansen 2e720ffa67 Merge pull request #66 from koenvervloesem/fix-venv
Upgrade pip in create-venv.sh script
2019-12-12 06:35:28 -05:00
Michael Hansen 55a4788cc6 Using fixed versions in requirements.txt 2019-12-12 06:31:29 -05:00
Koen Vervloesem 6cf18735c5 Show URL when download fails 2019-12-12 11:24:00 +01:00
Koen Vervloesem f27e333ac8 Use $PYTHON for pip upgrade 2019-12-12 08:48:03 +01:00
Koen Vervloesem 547a63ab59 Upgrade pip in create-venv.sh script 2019-12-12 08:43:15 +01:00
Michael Hansen 45ea5996ce Bump Rhasspy version 2019-12-11 17:16:53 -05:00
Michael Hansen 37cf6c85da Replacing numbers in sentences/slots. Using rhasspy-nlu 0.1.1 2019-12-11 17:13:51 -05:00
Michael Hansen 4383145401 Bump rhasspy-nlu version 2019-12-11 13:10:28 -05:00
Michael Hansen f7ed88de8b Support multiple wake words in snowboy 2019-12-11 11:03:10 -05:00
Michael Hansen b1d7695a4c Add VERSION file and /api/version endpoint 2019-12-11 09:41:40 -05:00
Michael Hansen f58a2451cf Merge branch 'rhasspy-nlu' 2019-12-10 16:35:14 -05:00
Michael Hansen 049a173b14 Code cleanup 2019-12-10 16:30:25 -05:00
Michael Hansen c02ff73be8 Using rhasspy-nlu for language model training and intent recognition 2019-12-10 16:17:28 -05:00
Michael Hansen cda3a02775 Merge pull request #59 from koenvervloesem/linting-fixes
Linting fixes
2019-12-09 17:29:26 -05:00
Koen Vervloesem 292a2fdf10 Style consistency fixes 2019-12-09 22:08:53 +01:00
Koen Vervloesem 07dcbebf79 Use -n instead of ! -z 2019-12-09 22:00:47 +01:00
Koen Vervloesem 59ba6e5dda Change non-standard which to command -v 2019-12-09 21:53:25 +01:00
Koen Vervloesem d08b62148d Fix import order 2019-12-09 21:39:57 +01:00
Koen Vervloesem 91bce4cb8b Remove unused import 2019-12-09 21:37:51 +01:00
Koen Vervloesem d8d6486508 Import typing.List 2019-12-09 21:36:33 +01:00
Koen Vervloesem 3f1d0946be Don't reimport requests 2019-12-09 21:34:42 +01:00
Koen Vervloesem e744330761 Fix yamllint errors 2019-12-09 21:22:22 +01:00
Michael Hansen 5e6030818d Add fr kaldi files to Docker build 2019-12-08 22:03:08 -05:00
Michael Hansen 8d0f6f37a2 Changing profile JSON clears training cache 2019-12-08 21:42:10 -05:00
Michael Hansen 396531f6ec Add French kaldi profile 2019-12-08 21:42:01 -05:00
Michael Hansen 51d9bc0c8f Minor fix 2019-12-08 14:44:40 -05:00
Michael Hansen bea3f30789 User-provided Python to create-venv 2019-12-08 14:43:32 -05:00
Michael Hansen 9f8babff34 Fix download-dependencies.sh loops 2019-12-07 23:02:11 -05:00
181 changed files with 9865 additions and 3852 deletions
+171 -22
View File
@@ -1,26 +1,175 @@
.git/
.venv/
node_modules/
__pycache__/
test/
tools/
etc/test/
download/precise-engine/
download/kaldi/
opt/
*
!etc/qemu-*
etc/homeassistant/config/.storage
examples/typical/home-assistant/config/.storage
examples/typical-intent/home-assistant/config/.storage
examples/client-server/home-assistant/config/.storage
examples/mqtt-hermes/home-assistant/config/.storage
!download/rhasspy-tools*
!download/pocketsphinx-python.tar.gz
!download/snowboy*
!download/kaldi*
profiles/*/base_dictionary.txt
profiles/*/base_language_model.txt
profiles/*/acoustic_model/
profiles/*/g2p.fst
!requirements.txt
!dist/
!etc/wav
profiles/en-kaldi/
profiles/en-zamia/
!docker/run.sh
!docker/rhasspy
profiles/*/download/
!profiles/defaults.json
!profiles/zh/profile.json
!profiles/zh/custom_words.txt
!profiles/zh/espeak_phonemes.txt
!profiles/zh/phoneme_examples.txt
!profiles/zh/frequent_words.txt
!profiles/zh/sentences.ini
!profiles/zh/stop_words.txt
!profiles/zh/slots
!profiles/zh/slot_programs
!profiles/hi/profile.json
!profiles/hi/custom_words.txt
!profiles/hi/espeak_phonemes.txt
!profiles/hi/phoneme_examples.txt
!profiles/hi/frequent_words.txt
!profiles/hi/sentences.ini
!profiles/hi/stop_words.txt
!profiles/hi/slots
!profiles/hi/slot_programs
!profiles/el/profile.json
!profiles/el/custom_words.txt
!profiles/el/espeak_phonemes.txt
!profiles/el/phoneme_examples.txt
!profiles/el/frequent_words.txt
!profiles/el/sentences.ini
!profiles/el/stop_words.txt
!profiles/el/slots
!profiles/el/slot_programs
!profiles/es/profile.json
!profiles/es/custom_words.txt
!profiles/es/espeak_phonemes.txt
!profiles/es/phoneme_examples.txt
!profiles/es/frequent_words.txt
!profiles/es/sentences.ini
!profiles/es/stop_words.txt
!profiles/es/slots
!profiles/es/slot_programs
!profiles/it/profile.json
!profiles/it/custom_words.txt
!profiles/it/espeak_phonemes.txt
!profiles/it/phoneme_examples.txt
!profiles/it/frequent_words.txt
!profiles/it/sentences.ini
!profiles/it/stop_words.txt
!profiles/it/slots
!profiles/it/slot_programs
!profiles/ru/profile.json
!profiles/ru/custom_words.txt
!profiles/ru/espeak_phonemes.txt
!profiles/ru/phoneme_examples.txt
!profiles/ru/frequent_words.txt
!profiles/ru/sentences.ini
!profiles/ru/stop_words.txt
!profiles/ru/slots
!profiles/ru/slot_programs
!profiles/pt/profile.json
!profiles/pt/custom_words.txt
!profiles/pt/espeak_phonemes.txt
!profiles/pt/phoneme_examples.txt
!profiles/pt/frequent_words.txt
!profiles/pt/sentences.ini
!profiles/pt/stop_words.txt
!profiles/pt/slots
!profiles/pt/slot_programs
!profiles/sv/profile.json
!profiles/sv/custom_words.txt
!profiles/sv/espeak_phonemes.txt
!profiles/sv/phoneme_examples.txt
!profiles/sv/frequent_words.txt
!profiles/sv/sentences.ini
!profiles/sv/stop_words.txt
!profiles/sv/slots
!profiles/sv/slot_programs
!profiles/vi/profile.json
!profiles/vi/custom_words.txt
!profiles/vi/espeak_phonemes.txt
!profiles/vi/phoneme_examples.txt
!profiles/vi/frequent_words.txt
!profiles/vi/sentences.ini
!profiles/vi/stop_words.txt
!profiles/vi/slots
!profiles/vi/slot_programs
!profiles/ca/profile.json
!profiles/ca/custom_words.txt
!profiles/ca/espeak_phonemes.txt
!profiles/ca/phoneme_examples.txt
!profiles/ca/frequent_words.txt
!profiles/ca/sentences.ini
!profiles/ca/stop_words.txt
!profiles/ca/slots
!profiles/ca/slot_programs
!profiles/nl/profile.json
!profiles/nl/custom_words.txt
!profiles/nl/espeak_phonemes.txt
!profiles/nl/phoneme_examples.txt
!profiles/nl/frequent_words.txt
!profiles/nl/sentences.ini
!profiles/nl/stop_words.txt
!profiles/nl/slots
!profiles/nl/slot_programs
!profiles/nl/kaldi/custom_words.txt
!profiles/nl/kaldi/espeak_phonemes.txt
!profiles/nl/kaldi/phoneme_examples.txt
!profiles/de/profile.json
!profiles/de/custom_words.txt
!profiles/de/espeak_phonemes.txt
!profiles/de/phoneme_examples.txt
!profiles/de/frequent_words.txt
!profiles/de/sentences.ini
!profiles/de/stop_words.txt
!profiles/de/slots
!profiles/de/slot_programs
!profiles/de/kaldi/custom_words.txt
!profiles/de/kaldi/espeak_phonemes.txt
!profiles/de/kaldi/phoneme_examples.txt
!profiles/fr/profile.json
!profiles/fr/custom_words.txt
!profiles/fr/espeak_phonemes.txt
!profiles/fr/phoneme_examples.txt
!profiles/fr/frequent_words.txt
!profiles/fr/sentences.ini
!profiles/fr/stop_words.txt
!profiles/fr/slots
!profiles/fr/slot_programs
!profiles/fr/kaldi/custom_words.txt
!profiles/fr/kaldi/espeak_phonemes.txt
!profiles/fr/kaldi/phoneme_examples.txt
!profiles/en/profile.json
!profiles/en/custom_words.txt
!profiles/en/espeak_phonemes.txt
!profiles/en/phoneme_examples.txt
!profiles/en/frequent_words.txt
!profiles/en/sentences.ini
!profiles/en/stop_words.txt
!profiles/en/slots
!profiles/en/slot_programs
!profiles/en/kaldi/custom_words.txt
!profiles/en/kaldi/espeak_phonemes.txt
!profiles/en/kaldi/phoneme_examples.txt
!rhasspy/profile_schema.json
!rhasspy/*.py
!rhasspy/train/*.py
!rhasspy/train/jsgf2fst/*.py
!*.py
!VERSION
+8 -4
View File
@@ -1,11 +1,13 @@
.PHONY: web-dist docker manifest docs-uml g2p
.PHONY: web-dist docker manifest docs-uml g2p check
SHELL := bash
# -----------------------------------------------------------------------------
# Docker
# -----------------------------------------------------------------------------
docker: web-dist docker-amd64 docker-armhf docker-aarch64 docker-push manifest
docker: web-dist docker-amd64 docker-armhf docker-aarch64
docker-deploy: docker-push manifest
docker-amd64:
docker build . -f docker/templates/dockerfiles/Dockerfile.prebuilt.alsa.all \
@@ -81,5 +83,7 @@ g2p: $(G2P_MODELS)
# Testing
# -----------------------------------------------------------------------------
mypy:
mypy app.py rhasspy
check:
flake8 --exclude=lexconvert.py app.py test.py rhasspy/*.py
pylint --ignore=lexconvert.py app.py test.py rhasspy/*.py
mypy app.py test.py rhasspy/*.py
+2 -2
View File
@@ -3,11 +3,11 @@
Rhasspy (pronounced RAH-SPEE) is an offline, [multilingual](#supported-languages) voice assistant toolkit inspired by [Jasper](https://jasperproject.github.io/) that works well with [Home Assistant](https://www.home-assistant.io/), [Hass.io](https://www.home-assistant.io/hassio/), and [Node-RED](https://nodered.org).
* [Documentation](https://rhasspy.readthedocs.io/)
* [Discussion](https://community.rhasspy.org)
* [Video Introduction](https://www.youtube.com/watch?v=ijKTR_GqWwA)
* [Hass.IO Add-On Repository](https://github.com/synesthesiam/hassio-addons)
* [Discussion](https://community.home-assistant.io/t/rhasspy-offline-voice-assistant-toolkit/60862)
Rhasspy transca voice commands into [JSON](https://json.org) events that can trigger actions in home automation software, like [Home Assistant automations](https://www.home-assistant.io/docs/automation/trigger/#event-trigger) or [Node-RED flows](https://rhasspy.readthedocs.io/en/latest/usage/#node-red). You define custom voice commands in a [profile](https://rhasspy.readthedocs.io/en/latest/profiles/) using a [specialized template syntax](https://rhasspy.readthedocs.io/en/latest/training/#sentencesini), and Rhasspy takes care of the rest.
Rhasspy transcribes voice commands into [JSON](https://json.org) events that can trigger actions in home automation software, like [Home Assistant automations](https://www.home-assistant.io/docs/automation/trigger/#event-trigger) or [Node-RED flows](https://rhasspy.readthedocs.io/en/latest/usage/#node-red). You define custom voice commands in a [profile](https://rhasspy.readthedocs.io/en/latest/profiles/) using a [specialized template syntax](https://rhasspy.readthedocs.io/en/latest/training/#sentencesini), and Rhasspy takes care of the rest.
To run Rhasspy with the English (en) profile using Docker:
+1
View File
@@ -0,0 +1 @@
2.4.16
+6
View File
@@ -0,0 +1,6 @@
defaults:
-
scope:
path: ""
values:
render_with_liquid: false
+255 -53
View File
@@ -2,15 +2,18 @@
import argparse
import asyncio
import atexit
import concurrent.futures
import json
import logging
import os
import re
import time
from pathlib import Path
from typing import Any, List, Tuple, Union
from typing import Any, Dict, List, Tuple, Union
from uuid import uuid4
import attr
import json5
from quart import (
Quart,
Response,
@@ -22,26 +25,27 @@ from quart import (
websocket,
)
from quart_cors import cors
from swagger_ui import quart_api_doc
from rhasspy.actor import ActorSystem, ConfigureEvent, RhasspyActor
from rhasspy.core import RhasspyCore
from rhasspy.dialogue import ProfileTrainingFailed
from rhasspy.intent import IntentRecognized
from rhasspy.events import IntentRecognized, ProfileTrainingFailed
from rhasspy.utils import (
FunctionLoggingHandler,
buffer_to_wav,
load_phoneme_examples,
recursive_remove,
get_all_intents,
get_ini_paths,
get_wav_duration,
load_phoneme_examples,
read_dict,
recursive_remove,
)
# -----------------------------------------------------------------------------
# Flask Web App Setup
# Quart Web App Setup
# -----------------------------------------------------------------------------
logger = logging.getLogger(__name__)
logging.root.setLevel(logging.DEBUG)
loop = asyncio.get_event_loop()
@@ -82,8 +86,15 @@ parser.add_argument(
parser.add_argument(
"--ssl", nargs=2, help="Use SSL with <CERT_FILE <KEY_FILE>", default=None
)
parser.add_argument("--log-level", default="DEBUG", help="Set logging level")
args = parser.parse_args()
# Set log level
log_level = getattr(logging, args.log_level.upper())
logging.basicConfig(level=log_level)
logger.debug(args)
system_profiles_dir = os.path.abspath(args.system_profiles)
@@ -147,6 +158,15 @@ async def start_rhasspy() -> None:
# -----------------------------------------------------------------------------
@app.route("/api/version")
async def api_version() -> Response:
"""Get Rhasspy version."""
return await send_file(Path("VERSION"))
# -----------------------------------------------------------------------------
@app.route("/api/profiles")
async def api_profiles() -> Response:
"""Get list of available profiles and verify necessary files."""
@@ -166,7 +186,7 @@ async def api_profiles() -> Response:
return jsonify(
{
"default_profile": core.profile.name,
"profiles": sorted(list(profile_names)),
"profiles": sorted(profile_names),
"downloaded": downloaded,
"missing_files": missing_files,
}
@@ -276,14 +296,14 @@ async def api_profile() -> Union[str, Response]:
if request.method == "POST":
# Ensure that JSON is valid
profile_json = await request.json
profile_json = json5.loads(await request.data)
recursive_remove(core.profile.system_json, profile_json)
profile_path = Path(core.profile.write_path("profile.json"))
with open(profile_path, "w") as profile_file:
json.dump(profile_json, profile_file, indent=4)
msg = "Wrote profile to %s" % profile_path
msg = f"Wrote profile to {profile_path}"
logger.debug(msg)
return msg
@@ -294,7 +314,7 @@ async def api_profile() -> Union[str, Response]:
if layers == "profile":
# Local settings only
profile_path = Path(core.profile.read_path("profile.json"))
return send_file(profile_path) # , mimetype="application/json")
return await send_file(profile_path)
return jsonify(core.profile.json)
@@ -415,7 +435,35 @@ async def api_sentences():
assert core is not None
if request.method == "POST":
# Update sentences
# POST
if request.mimetype == "application/json":
# Update multiple ini files at once. Paths as keys (relative to
# profile directory), sentences as values.
num_chars = 0
paths_written = []
sentences_dict = json5.loads(await request.data)
for sentences_path, sentences_text in sentences_dict.items():
# Path is relative to profile directory
sentences_path = Path(core.profile.write_path(sentences_path))
if sentences_text.strip():
# Overwrite file
logger.debug("Writing %s", sentences_path)
sentences_path.parent.mkdir(parents=True, exist_ok=True)
sentences_path.write_text(sentences_text)
num_chars += len(sentences_text)
paths_written.append(sentences_path)
elif sentences_path.is_file():
# Remove file
logger.debug("Removing %s", sentences_path)
sentences_path.unlink()
return f"Wrote {num_chars} char(s) to {[str(p) for p in paths_written]}"
# Update sentences.ini only
sentences_path = Path(
core.profile.write_path(core.profile.get("speech_to_text.sentences_ini"))
)
@@ -423,18 +471,48 @@ async def api_sentences():
data = await request.data
with open(sentences_path, "wb") as sentences_file:
sentences_file.write(data)
return "Wrote %s byte(s) to %s" % (len(data), sentences_path)
return f"Wrote {len(data)} byte(s) to {sentences_path}"
# Return sentences
sentences_path = Path(
core.profile.read_path(core.profile.get("speech_to_text.sentences_ini"))
# GET
sentences_path_rel = core.profile.read_path(
core.profile.get("speech_to_text.sentences_ini")
)
sentences_path = Path(sentences_path_rel)
if prefers_json():
# Return multiple .ini files, keyed by path relative to profile
# directory.
sentences_dict = {}
if sentences_path.is_file():
try:
# Try user profile dir first
profile_dir = Path(core.profile.user_profiles_dir) / core.profile.name
key = str(sentences_path.relative_to(profile_dir))
except Exception:
# Fall back to system profile dir
profile_dir = Path(core.profile.system_profiles_dir) / core.profile.name
key = str(sentences_path.relative_to(profile_dir))
sentences_dict[key] = sentences_path.read_text()
ini_dir = Path(
core.profile.read_path(core.profile.get("speech_to_text.sentences_dir"))
)
# Add all .ini files from sentences_dir
if ini_dir.is_dir():
for ini_path in ini_dir.glob("*.ini"):
key = str(ini_path.relative_to(core.profile.read_path()))
sentences_dict[key] = ini_path.read_text()
return jsonify(sentences_dict)
# Return sentences.ini contents only
if not sentences_path.is_file():
return "" # no sentences yet
# Return file contents
return await send_file(sentences_path) # , mimetype="text/plain")
return await send_file(sentences_path)
# -----------------------------------------------------------------------------
@@ -449,7 +527,9 @@ async def api_custom_words():
if request.method == "POST":
custom_words_path = Path(
core.profile.write_path(
core.profile.get(f"speech_to_text.{speech_system}.custom_words")
core.profile.get(
f"speech_to_text.{speech_system}.custom_words", "custom_words.txt"
)
)
)
@@ -466,11 +546,13 @@ async def api_custom_words():
print(line, file=custom_words_file)
lines_written += 1
return "Wrote %s line(s) to %s" % (lines_written, custom_words_path)
return f"Wrote {lines_written} line(s) to {custom_words_path}"
custom_words_path = Path(
core.profile.read_path(
core.profile.get(f"speech_to_text.{speech_system}.custom_words")
core.profile.get(
f"speech_to_text.{speech_system}.custom_words", "custom_words.txt"
)
)
)
@@ -682,7 +764,9 @@ async def api_unknown_words() -> Response:
unknown_words = {}
unknown_path = Path(
core.profile.read_path(
core.profile.get(f"speech_to_text.{speech_system}.unknown_words")
core.profile.get(
f"speech_to_text.{speech_system}.unknown_words", "unknown_words.txt"
)
)
)
@@ -702,18 +786,28 @@ last_sentence = ""
@app.route("/api/text-to-speech", methods=["POST"])
async def api_text_to_speech() -> str:
async def api_text_to_speech() -> Union[bytes, str]:
"""Speak a sentence with text to speech system."""
global last_sentence
repeat = request.args.get("repeat", "false").strip().lower() == "true"
play = request.args.get("play", "true").strip().lower() == "true"
language = request.args.get("language")
voice = request.args.get("voice")
siteId = request.args.get("siteId")
data = await request.data
sentence = last_sentence if repeat else data.decode().strip()
assert core is not None
await core.speak_sentence(sentence)
result = await core.speak_sentence(
sentence, play=play, language=language, voice=voice, siteId=siteId
)
last_sentence = sentence
if not play:
# Return WAV data instead of speaking
return result.wav_data
return sentence
@@ -725,17 +819,29 @@ async def api_slots() -> Union[str, Response]:
"""Get the values of all slots."""
assert core is not None
slots_dir = Path(
core.profile.read_path(core.profile.get("speech_to_text.slots_dir"))
)
if request.method == "POST":
overwrite_all = request.args.get("overwrite_all", "false").lower() == "true"
new_slot_values = await request.json
new_slot_values = json5.loads(await request.data)
word_casing = core.profile.get(
"speech_to_text.dictionary_casing", "ignore"
).lower()
word_transform = lambda s: s
if word_casing == "lower":
word_transform = str.lower
elif word_casing == "upper":
word_transform = str.upper
slots_dir = Path(
core.profile.write_path(
core.profile.get("speech_to_text.slots_dir", "slots")
)
)
if overwrite_all:
# Remote existing values first
for name in new_slot_values.keys():
for name in new_slot_values:
slots_path = safe_join(slots_dir, f"{name}")
if slots_path.is_file():
try:
@@ -747,32 +853,42 @@ async def api_slots() -> Union[str, Response]:
if isinstance(values, str):
values = [values]
slots_path = Path(
core.profile.write_path(
core.profile.get("speech_to_text.slots_dir", "slots"), f"{name}"
)
)
slots_path = slots_dir / name
# Create directories
slots_path.parent.mkdir(parents=True, exist_ok=True)
# Write data
with open(slots_path, "w") as slots_file:
for value in values:
value = value.strip()
if value:
print(value, file=slots_file)
# Merge with existing values
values = {word_transform(v.strip()) for v in values}
if slots_path.is_file():
values.update(
word_transform(line.strip())
for line in slots_path.read_text().splitlines()
)
# Write merged values
if values:
with open(slots_path, "w") as slots_file:
for value in values:
if value:
print(value, file=slots_file)
return "OK"
# Read slots into dictionary
slots_dir = Path(
core.profile.read_path(core.profile.get("speech_to_text.slots_dir", "slots"))
)
slots_dict = {}
for slot_file_path in slots_dir.glob("*"):
if slot_file_path.is_file():
slot_name = slot_file_path.name
slots_dict[slot_name] = [
line.strip() for line in slot_file_path.read_text().splitlines()
]
if slots_dir.is_dir():
for slot_file_path in slots_dir.glob("*"):
if slot_file_path.is_file():
slot_name = slot_file_path.name
slots_dict[slot_name] = [
line.strip() for line in slot_file_path.read_text().splitlines()
]
return jsonify(slots_dict)
@@ -824,6 +940,73 @@ def api_slots_by_name(name: str) -> Union[str, Response]:
# -----------------------------------------------------------------------------
@app.route("/api/intents")
def api_intents():
"""Return JSON with information about intents."""
assert core is not None
sentences_ini = Path(
core.profile.read_path(core.profile.get("speech_to_text.sentences_ini"))
)
sentences_dir = Path(
core.profile.read_path(core.profile.get("speech_to_text.sentences_dir"))
)
# Load all .ini files and parse
ini_paths: List[Path] = get_ini_paths(sentences_ini, sentences_dir)
intents: Dict[str, Any] = get_all_intents(ini_paths)
def add_type(item, item_dict: Dict[str, Any]):
"""Add item_type to expression dictionary."""
item_dict["item_type"] = type(item).__name__
if hasattr(item, "items"):
# Group, alternative, etc.
for sub_item, sub_item_dict in zip(item.items, item_dict["items"]):
add_type(sub_item, sub_item_dict)
elif hasattr(item, "rule_body"):
# Rule
add_type(item.rule_body, item_dict["rule_body"])
# Convert to dictionary
intents_dict = {}
for intent_name, intent_sentences in intents.items():
sentence_dicts = []
for sentence in intent_sentences:
sentence_dict = attr.asdict(sentence)
# Add item_type field
add_type(sentence, sentence_dict)
sentence_dicts.append(sentence_dict)
intents_dict[intent_name] = sentence_dicts
# Convert to JSON
return jsonify(intents_dict)
# -----------------------------------------------------------------------------
@app.route("/process", methods=["GET"])
async def marytts_process():
"""Emulate MaryTTS /process API"""
global last_sentence
assert core is not None
sentence = request.args.get("INPUT_TEXT", "")
voice = request.args.get("VOICE")
locale = request.args.get("LOCALE")
spoken = await core.speak_sentence(
sentence, play=False, voice=voice, language=locale
)
return spoken.wav_data
# -----------------------------------------------------------------------------
@app.errorhandler(Exception)
async def handle_error(err) -> Tuple[str, int]:
"""Return error as text."""
@@ -835,31 +1018,38 @@ async def handle_error(err) -> Tuple[str, int]:
# Static Routes
# ---------------------------------------------------------------------
web_dir = os.path.join(os.getcwd(), "dist")
web_dir = Path("dist")
assert web_dir.is_dir(), f"Missing web directory {web_dir}"
css_dir = web_dir / "css"
js_dir = web_dir / "js"
img_dir = web_dir / "img"
webfonts_dir = web_dir / "webfonts"
@app.route("/css/<path:filename>", methods=["GET"])
async def css(filename) -> Response:
"""CSS static endpoint."""
return await send_from_directory(os.path.join(web_dir, "css"), filename)
return await send_from_directory(css_dir, filename)
@app.route("/js/<path:filename>", methods=["GET"])
async def js(filename) -> Response:
"""Javascript static endpoint."""
return await send_from_directory(os.path.join(web_dir, "js"), filename)
return await send_from_directory(js_dir, filename)
@app.route("/img/<path:filename>", methods=["GET"])
async def img(filename) -> Response:
"""Image static endpoint."""
return await send_from_directory(os.path.join(web_dir, "img"), filename)
return await send_from_directory(img_dir, filename)
@app.route("/webfonts/<path:filename>", methods=["GET"])
async def webfonts(filename) -> Response:
"""Web font static endpoint."""
return await send_from_directory(os.path.join(web_dir, "webfonts"), filename)
return await send_from_directory(webfonts_dir, filename)
# ----------------------------------------------------------------------------
@@ -870,13 +1060,13 @@ async def webfonts(filename) -> Response:
@app.route("/", methods=["GET"])
async def index() -> Response:
"""Render main web page."""
return await send_file(os.path.join(web_dir, "index.html"))
return await send_file(web_dir / "index.html")
@app.route("/swagger.yaml", methods=["GET"])
async def swagger_yaml() -> Response:
"""OpenAPI static endpoint."""
return await send_file(os.path.join(web_dir, "swagger.yaml"))
return await send_file(web_dir / "swagger.yaml")
# -----------------------------------------------------------------------------
@@ -960,6 +1150,8 @@ async def api_events_log() -> None:
while True:
text = await q.get()
await websocket.send(text)
except concurrent.futures.CancelledError:
pass
except Exception:
logger.exception("api_events_log")
@@ -970,15 +1162,25 @@ async def api_events_log() -> None:
# -----------------------------------------------------------------------------
# Swagger UI
quart_api_doc(
app, config_path=(web_dir / "swagger.yaml"), url_prefix="/api", title="Rhasspy API"
)
# -----------------------------------------------------------------------------
def prefers_json() -> bool:
"""True if client prefers JSON over plain text."""
return quality(request.accept_mimetypes, "application/json") > quality(
request.accept_mimetypes, "text/plain"
)
def quality(accept, key: str) -> float:
"""Return Accept quality for media type."""
for option in accept.options:
# pylint: disable=W0212
if accept._values_match(key, option.value):
return option.quality
return 0.0
+7 -7
View File
@@ -36,7 +36,7 @@ def main():
# Load dictionary
word_dict = {}
logging.info("Loading dictionary from %s" % args.dictionary)
logging.info("Loading dictionary from %s", args.dictionary)
with open(args.dictionary, "r") as dict_file:
read_dict(dict_file, word_dict)
@@ -53,7 +53,7 @@ def main():
all_words.append(word)
assert len(phonemes) == len(phoneme_words), "Not enough words to cover phonemes"
logging.debug("Phonemes: %s" % ", ".join(phoneme_words.keys()))
logging.debug("Phonemes: %s", ", ".join(phoneme_words))
phoneme_hyps = defaultdict(lambda: defaultdict(float))
@@ -66,7 +66,7 @@ def main():
phoneme_hyps[phoneme][hyp] = count
# Sample words from the dictionary
logging.info("Starting %s sample(s)" % args.samples)
logging.info("Starting %s sample(s)", args.samples)
phoneme_futures = {}
with ProcessPoolExecutor() as executor:
# Schedule eSpeak word samples
@@ -80,7 +80,7 @@ def main():
for i, future in enumerate(as_completed(phoneme_futures)):
if i % len(phonemes) == 0:
logging.info(
"Sample %s of %s" % ((i // len(phonemes) + 1), args.samples)
"Sample %s of %s", (i // len(phonemes) + 1), args.samples
)
phoneme = phoneme_futures[future]
@@ -113,14 +113,14 @@ def main():
best = {}
todo = set(phonemes)
used = set()
while len(todo) > 0:
while todo:
for phoneme in list(todo):
best_to_worst = sorted(
phoneme_hyps[phoneme].items(), key=lambda kv: kv[1], reverse=True
)
for hyp, count in best_to_worst:
if not hyp in used:
if hyp not in used:
best[phoneme] = hyp
used.add(hyp)
todo.remove(phoneme)
@@ -165,7 +165,7 @@ def read_dict(dict_file, word_dict):
"""
for line in dict_file:
line = line.strip()
if len(line) == 0:
if not line:
continue
word, pronounce = re.split("[ ]+", line, maxsplit=1)
+36
View File
@@ -0,0 +1,36 @@
#!/usr/bin/env python3
import argparse
import calendar
import json
import locale
from pathlib import Path
def main():
parser = argparse.ArgumentParser("generate-slots")
parser.add_argument("profiles_dir")
args = parser.parse_args()
for profile_dir in Path(args.profiles_dir).glob("*"):
if not profile_dir.is_dir():
continue
with open(profile_dir / "profile.json", "r") as profile_file:
profile = json.load(profile_file)
locale_name = profile["locale"] + ".UTF-8"
locale.setlocale(locale.LC_ALL, locale_name)
print(locale_name)
slots_dir = profile_dir / "slots" / "rhasspy"
slots_dir.mkdir(parents=True, exist_ok=True)
# Day names
(slots_dir / "days").write_text('\n'.join(calendar.day_name))
# Month names
(slots_dir / "months").write_text('\n'.join(filter(None, calendar.month_name)))
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+1 -3
View File
@@ -38,9 +38,7 @@ def main():
if not os.path.exists(html_path):
# Download
url = "https://www.ezglot.com/most-frequently-used-words.php?l={0}&submit=Select".format(
language
)
url = f"https://www.ezglot.com/most-frequently-used-words.php?l={language}&submit=Select"
print(f"Downloading from {url}")
with open(html_path, "w") as html_file:
+3 -3
View File
@@ -26,7 +26,7 @@ def main():
with open(args.dictionary, "r") as dict_file:
for line in dict_file:
line = line.strip()
if len(line) == 0:
if not line:
continue
parts = re.split(r"[\t ]+", line)
@@ -44,11 +44,11 @@ def main():
# Pick unique example words for every phoneme
used_words = set()
for phoneme in sorted(examples.keys()):
for phoneme in sorted(examples):
# Choose the shortest, unused example word for this phoneme.
# Exclude words with 3 or fewer letters.
for word, pron in sorted(examples[phoneme], key=lambda kv: len(kv[0])):
if len(word) > 3 and (not word in used_words):
if len(word) > 3 and (word not in used_words):
# Output format is:
# phoneme word pronunciation
print(phoneme, word, " ".join(pron))
+5 -5
View File
@@ -31,7 +31,7 @@ def main():
with open(args.dictionary, "r") as dict_file:
for line in dict_file:
line = line.strip()
if len(line) == 0:
if not line:
continue
parts = re.split(r"[\t ]+", line)
@@ -70,7 +70,7 @@ def main():
with open(args.frequent_phones, "r") as freq_phones_file:
for line in freq_phones_file:
line = line.strip()
if len(line) == 0:
if not line:
continue
parts = re.split(r"[ ]+", line, maxsplit=1)
@@ -82,7 +82,7 @@ def main():
mappings = []
bad_espeak = (":", ";", "-", "#")
for word, espeak in freq_espeak.items():
if not word in freq_phonemes:
if word not in freq_phonemes:
# No pronunciation
continue
@@ -134,7 +134,7 @@ def main():
m = 4
for p in all_phonemes:
candidate_counts = [
(e, phoneme_counts[(cp, e)]) for (cp, e) in phoneme_counts.keys() if cp == p
(e, phoneme_counts[(cp, e)]) for (cp, e) in phoneme_counts if cp == p
]
candidate_counts = [ec for ec in candidate_counts if ec[1] > n]
candidate_counts = sorted(candidate_counts, key=lambda x: x[1], reverse=True)
@@ -213,7 +213,7 @@ assign(P, E) :- maybe_assign(P, E).
predicates = []
for line in proc.stdout.splitlines():
line = line.decode().strip()
if len(line) == 0:
if not line:
continue
elif line.startswith("OPTIMUM FOUND"):
break
+1 -1
View File
@@ -20,7 +20,7 @@ def main():
with open(dict_path, "r") as dict_file:
for line in dict_file:
line = line.strip()
if len(line) == 0:
if not line:
continue
parts = re.split(r"[ ]+", line)
+1 -1
View File
@@ -12,7 +12,7 @@ def main():
with open(sys.argv[1], "r") as dict_file:
for line in dict_file:
line = line.strip()
if len(line) == 0:
if not line:
continue
parts = re.split(r"[ ]+", line)
+411
View File
@@ -0,0 +1,411 @@
#!/usr/bin/env bash
this_dir="$( cd "$( dirname "$0" )" && pwd )"
CPU_ARCH="$(uname --m)"
# -----------------------------------------------------------------------------
# Command-line Arguments
# -----------------------------------------------------------------------------
. "${this_dir}/etc/shflags"
DEFINE_string 'venv' "${this_dir}/.venv" 'Path to create virtual environment'
DEFINE_string 'download-dir' "${this_dir}/download" 'Directory to cache downloaded files'
DEFINE_string 'build-dir' "${this_dir}/build_${CPU_ARCH}" 'Directory to build dependencies in'
DEFINE_boolean 'system' true 'Install system dependencies'
DEFINE_boolean 'flair' false 'Install flair'
DEFINE_boolean 'precise' false 'Install Mycroft Precise'
DEFINE_boolean 'adapt' false 'Install Mycroft Adapt'
DEFINE_boolean 'google' false 'Install Google Text to Speech'
DEFINE_boolean 'kaldi' false 'Install Kaldi'
DEFINE_boolean 'offline' false "Don't download anything"
DEFINE_boolean 'web' true "Build Vue web interface with yarn"
DEFINE_boolean 'sudo' true "Use sudo for apt"
DEFINE_integer 'make-threads' 4 'Number of threads to use with make' 'j'
DEFINE_string 'python' 'python3' 'Path to Python executable'
FLAGS "$@" || exit $?
eval set -- "${FLAGS_ARGV}"
# -----------------------------------------------------------------------------
# Default Settings
# -----------------------------------------------------------------------------
set -e
python="${FLAGS_python}"
venv="${FLAGS_venv}"
download_dir="${FLAGS_download_dir}"
mkdir -p "${download_dir}"
echo "Download directory: ${download_dir}"
build_dir="${FLAGS_build_dir}"
mkdir -p "${build_dir}"
echo "Build directory: ${build_dir}"
if [[ "${FLAGS_system}" -eq "${FLAGS_FALSE}" ]]; then
no_system='true'
fi
if [[ "${FLAGS_flair}" -eq "${FLAGS_FALSE}" ]]; then
no_flair='true'
fi
if [[ "${FLAGS_precise}" -eq "${FLAGS_FALSE}" ]]; then
no_precise='true'
fi
if [[ "${FLAGS_adapt}" -eq "${FLAGS_FALSE}" ]]; then
no_adapt='true'
fi
if [[ "${FLAGS_kaldi}" -eq "${FLAGS_FALSE}" ]]; then
no_kaldi='true'
fi
if [[ "${FLAGS_google}" -eq "${FLAGS_FALSE}" ]]; then
no_google='true'
fi
if [[ "${FLAGS_offline}" -eq "${FLAGS_TRUE}" ]]; then
offline='true'
fi
if [[ "${FLAGS_web}" -eq "${FLAGS_FALSE}" ]]; then
no_web='true'
fi
if [[ "${FLAGS_sudo}" -eq "${FLAGS_TRUE}" ]]; then
function run_sudo {
sudo "$@"
}
else
function run_sudo {
"$@"
}
fi
make_threads="${FLAGS_make_threads}"
# -----------------------------------------------------------------------------
# Create a temporary directory for building stuff
temp_dir="$(mktemp -d)"
function cleanup {
rm -rf "${temp_dir}"
}
trap cleanup EXIT
function maybe_download {
if [[ ! -s "$2" ]]; then
if [[ -n "${offline}" ]]; then
echo "Need to download $1 but offline."
exit 1
fi
mkdir -p "$(dirname "$2")"
curl -sSfL -o "$2" "$1" || { echo "Can't download $1"; exit 1; }
echo "$1 => $2"
fi
}
# -----------------------------------------------------------------------------
echo "Checking required programs"
if [[ -z "${no_web}" ]]; then
if [[ ! -n "$(command -v yarn)" ]]; then
echo "Please install yarn to continue (https://yarnpkg.com)"
exit 1
fi
fi
# -----------------------------------------------------------------------------
if [[ -z "${no_system}" ]]; then
echo "Installing system dependencies"
run_sudo apt-get update
run_sudo apt-get install --no-install-recommends \
python3 python3-pip python3-venv python3-dev \
python \
build-essential autoconf autoconf-archive libtool automake bison \
sox espeak flite swig portaudio19-dev \
libatlas-base-dev \
gfortran \
sphinxbase-utils sphinxtrain pocketsphinx \
jq checkinstall unzip xz-utils \
curl \
lame
fi
# -----------------------------------------------------------------------------
echo "Downloading dependencies"
# Python-Pocketsphinx
pocketsphinx_file="${download_dir}/pocketsphinx-python.tar.gz"
if [[ ! -s "${pocketsphinx_file}" ]]; then
pocketsphinx_url='https://github.com/synesthesiam/pocketsphinx-python/releases/download/v1.0/pocketsphinx-python.tar.gz'
echo "Downloading pocketsphinx (${pocketsphinx_url})"
maybe_download "${pocketsphinx_url}" "${pocketsphinx_file}"
fi
# OpenFST
openfst_dir="${build_dir}/openfst-1.6.9"
if [[ ! -d "${openfst_dir}/build" ]]; then
openfst_file="${download_dir}/openfst-1.6.9.tar.gz"
if [[ ! -s "${openfst_file}" ]]; then
openfst_url='http://openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.9.tar.gz'
echo "Downloading openfst (${openfst_url})"
maybe_download "${openfst_url}" "${openfst_file}"
fi
fi
# Opengrm
opengrm_dir="${build_dir}/opengrm-ngram-1.3.4"
if [[ ! -d "${opengrm_dir}/build" ]]; then
opengrm_file="${download_dir}/opengrm-ngram-1.3.4.tar.gz"
if [[ ! -s "${opengrm_file}" ]]; then
opengrm_url='http://www.opengrm.org/twiki/pub/GRM/NGramDownload/opengrm-ngram-1.3.4.tar.gz'
echo "Downloading opengrm (${opengrm_url})"
maybe_download "${opengrm_url}" "${opengrm_file}"
fi
fi
# Phonetisaurus
phonetisaurus_dir="${build_dir}/phonetisaurus"
if [[ ! -d "${phonetisaurus_dir}/build" ]]; then
phonetisaurus_file="${download_dir}/phonetisaurus-2019.tar.gz"
if [[ ! -s "${phonetisaurus_file}" ]]; then
phonetisaurus_url='https://github.com/synesthesiam/docker-phonetisaurus/raw/master/download/phonetisaurus-2019.tar.gz'
echo "Downloading phonetisaurus (${phonetisaurus_url})"
maybe_download "${phonetisaurus_url}" "${phonetisaurus_file}"
fi
fi
# Kaldi
kaldi_dir="${this_dir}/opt/kaldi"
if [[ -z "${no_kaldi}" && ! -d "${kaldi_dir}" ]]; then
install libatlas-base-dev libatlas3-base gfortran
run_sudo ldconfig
kaldi_file="${download_dir}/kaldi-2019.tar.gz"
if [[ ! -s "${kaldi_file}" ]]; then
kaldi_url='https://github.com/kaldi-asr/kaldi/archive/master.tar.gz'
echo "Downloading kaldi (${kaldi_url})"
maybe_download "${kaldi_url}" "${kaldi_file}"
fi
fi
# -----------------------------------------------------------------------------
# Re-create virtual environment
echo "Creating virtual environment"
rm -rf "${venv}"
"${python}" -m venv "${venv}"
source "${venv}/bin/activate"
pip3 install wheel setuptools
# -----------------------------------------------------------------------------
# openfst
# http://www.openfst.org
#
# Required to build languag models and do intent recognition.
# -----------------------------------------------------------------------------
if [[ ! -d "${openfst_dir}/build" ]]; then
echo "Building openfst (${openfst_file})"
tar -C "${build_dir}" -xf "${openfst_file}" && \
cd "${openfst_dir}" && \
./configure "--prefix=${openfst_dir}/build" \
--enable-far \
--disable-static \
--enable-shared \
--enable-ngram-fsts && \
make -j "${make_threads}" && \
make install
fi
# Copy build artifacts into virtual environment
cp -R "${openfst_dir}"/build/include/* "${venv}/include/"
cp -R "${openfst_dir}"/build/lib/*.so* "${venv}/lib/"
cp -R "${openfst_dir}"/build/bin/* "${venv}/bin/"
# -----------------------------------------------------------------------------
# opengrm
# http://www.opengrm.org/twiki/bin/view/GRM/NGramLibrary
#
# Required to build language models.
# -----------------------------------------------------------------------------
# opengrm
if [[ ! -d "${opengrm_dir}/build" ]]; then
echo "Building opengrm (${opengrm_file})"
export CXXFLAGS="-I${venv}/include"
export LDFLAGS="-L${venv}/lib"
tar -C "${build_dir}" -xf "${opengrm_file}" && \
cd "${opengrm_dir}" && \
./configure "--prefix=${opengrm_dir}/build" && \
make -j "${make_threads}" && \
make install
fi
# Copy build artifacts into virtual environment
cp -R "${opengrm_dir}"/build/bin/* "${venv}/bin/"
cp -R "${opengrm_dir}"/build/include/* "${venv}/include/"
cp -R "${opengrm_dir}"/build/lib/*.so* "${venv}/lib/"
# -----------------------------------------------------------------------------
# phonetisaurus
# https://github.com/AdolfVonKleist/Phonetisaurus
#
# Required to guess word pronunciations.
# -----------------------------------------------------------------------------
if [[ ! -d "${phonetisaurus_dir}/build" ]]; then
echo "Installing phonetisaurus (${phonetisaurus_file})"
tar -C "${build_dir}" -xf "${phonetisaurus_file}" && \
cd "${phonetisaurus_dir}" && \
./configure "--prefix=${phonetisaurus_dir}/build" \
--with-openfst-includes="${venv}/include" \
--with-openfst-libs="${venv}/lib" && \
make -j "${make_threads}" && \
make install
fi
# Copy build artifacts into virtual environment
cp -R "${phonetisaurus_dir}"/build/bin/* "${venv}/bin/"
# -----------------------------------------------------------------------------
# kaldi
# https://kaldi-asr.org
#
# Required for speech recognition with Kaldi-based profiles.
# -----------------------------------------------------------------------------
if [[ -z "${no_kaldi}" && ! -f "${kaldi_dir}/src/online2bin/online2-wav-nnet3-latgen-faster" ]]; then
echo "Installing kaldi (${kaldi_file})"
# armhf
if [[ -f '/usr/lib/arm-linux-gnueabihf/libatlas.so' ]]; then
# Kaldi install doesn't check here, despite in being in ldconfig
export ATLASLIBDIR='/usr/lib/arm-linux-gnueabihf'
fi
# aarch64
if [[ -f '/usr/lib/aarch64-linux-gnu/libatlas.so' ]]; then
# Kaldi install doesn't check here, despite in being in ldconfig
export ATLASLIBDIR='/usr/lib/aarch64-linux-gnu'
fi
tar -C "${build_dir}" -xf "${kaldi_file}" && \
cp "${this_dir}/etc/linux_atlas_aarch64.mk" "${kaldi_dir}/src/makefiles/" && \
cd "${kaldi_dir}/tools" && \
make -j "${make_threads}" && \
cd "${kaldi_dir}/src" && \
./configure --shared --mathlib=ATLAS --use-cuda=no && \
make depend -j "${make_threads}" && \
make -j "${make_threads}"
fi
# -----------------------------------------------------------------------------
# Python requirements
# -----------------------------------------------------------------------------
echo "Installing Python requirements"
"${python}" -m pip install requests
# pytorch is not available on ARM
case "${CPU_ARCH}" in
armv7l|arm64v8)
no_flair="true" ;;
esac
requirements_file="${temp_dir}/requirements.txt"
temp_requirements_file="${temp_dir}/temp_requirements.txt"
cp "${this_dir}/requirements.txt" "${requirements_file}"
# Exclude requirements
if [[ -n "${no_flair}" ]]; then
echo "Excluding flair from virtual environment"
sed '/^flair/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
if [[ -n "${no_precise}" ]]; then
echo "Excluding Mycroft Precise from virtual environment"
sed '/^precise-runner/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
if [[ -n "${no_adapt}" ]]; then
echo "Excluding Mycroft Adapt from virtual environment"
sed '/^adapt-parser/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
if [[ -n "${no_google}" ]]; then
echo "Excluding Google Text to Speech from virtual environment"
sed '/^google-cloud-texttospeech/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
# Install everything except openfst first
sed '/^openfst/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
"${python}" -m pip install -r "${requirements_file}"
echo "Installing Python openfst wrapper"
"${python}" -m pip install \
--global-option=build_ext \
--global-option="-I${venv}/include" \
--global-option="-L${venv}/lib" \
-r <(grep '^openfst' "${this_dir}/requirements.txt")
# -----------------------------------------------------------------------------
# Pocketsphinx for Python
# https://github.com/cmusphinx/pocketsphinx
#
# Speech to text for most profiles.
# -----------------------------------------------------------------------------
pocketsphinx_file="${download_dir}/pocketsphinx-python.tar.gz"
echo "Installing Python pocketsphinx (${pocketsphinx_file})"
"${python}" -m pip install "${pocketsphinx_file}"
# -----------------------------------------------------------------------------
# Snowboy
# https://snowboy.kitt.ai
#
# Wake word system
# -----------------------------------------------------------------------------
case "${CPU_ARCH}" in
x86_64|armv7l)
snowboy_file="${download_dir}/snowboy-1.3.0.tar.gz"
echo "Installing snowboy (${snowboy_file})"
"${python}" -m pip install "${snowboy_file}"
;;
*)
echo "Not installing snowboy (${CPU_ARCH} not supported)"
esac
# -----------------------------------------------------------------------------
if [[ -z "${no_web}" ]]; then
echo "Building web interface"
cd "${this_dir}" && yarn install && yarn build
fi
+59 -39
View File
@@ -17,6 +17,7 @@ DEFINE_boolean 'google' false 'Install Google Text to Speech'
DEFINE_boolean 'kaldi' true 'Install Kaldi'
DEFINE_boolean 'offline' false "Don't download anything"
DEFINE_integer 'make-threads' 4 'Number of threads to use with make' 'j'
DEFINE_string 'python' '' 'Path to Python executable'
FLAGS "$@" || exit $?
eval set -- "${FLAGS_ARGV}"
@@ -75,14 +76,14 @@ trap cleanup EXIT
# -----------------------------------------------------------------------------
function maybe_download {
if [[ ! -f "$2" ]]; then
if [[ ! -z "${offline}" ]]; then
if [[ ! -s "$2" ]]; then
if [[ -n "${offline}" ]]; then
echo "Need to download $1 but offline."
exit 1
fi
mkdir -p "$(dirname "$2")"
curl -sSfL -o "$2" "$1"
curl -sSfL -o "$2" "$1" || { echo "Can't download $1"; exit 1; }
echo "$1 => $2"
fi
}
@@ -94,7 +95,7 @@ function maybe_download {
if [[ -z "${no_system}" ]]; then
echo "Installing system dependencies"
sudo apt-get update
sudo apt-get install --no-install-recommends --yes \
sudo apt-get install --no-install-recommends \
python3 python3-pip python3-venv python3-dev \
python \
build-essential autoconf autoconf-archive libtool automake bison \
@@ -103,38 +104,45 @@ if [[ -z "${no_system}" ]]; then
gfortran \
sphinxbase-utils sphinxtrain pocketsphinx \
jq checkinstall unzip xz-utils \
curl
curl \
lame
fi
# -----------------------------------------------------------------------------
# Python >= 3.6
# -----------------------------------------------------------------------------
if [[ ! -z "$(which python3.8)" ]]; then
PYTHON='python3.8'
elif [[ ! -z "$(which python3.7)" ]]; then
PYTHON='python3.7'
elif [[ ! -z "$(which python3.6)" ]]; then
PYTHON='python3.6'
if [[ -z "${FLAGS_python}" ]]; then
# Auto-detect Python
if [[ -n "$(command -v python3.8)" ]]; then
PYTHON='python3.8'
elif [[ -n "$(command -v python3.7)" ]]; then
PYTHON='python3.7'
elif [[ -n "$(command -v python3.6)" ]]; then
PYTHON='python3.6'
else
echo "Installing Python 3.6 from source. This is going to take a LONG time."
sudo apt-get install --no-install-recommends \
tk-dev libncurses5-dev libncursesw5-dev \
libreadline6-dev libdb5.3-dev libgdbm-dev \
libsqlite3-dev libssl-dev libbz2-dev \
libexpat1-dev liblzma-dev zlib1g-dev
python_file="${download_dir}/Python-3.6.8.tar.xz"
python_url='https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz'
maybe_download "${python_url}" "${python_file}"
tar -C "${temp_dir}" -xf "${python_file}"
cd "${temp_dir}/Python-3.6.8" && \
./configure && \
make -j "${make_threads}" && \
sudo make altinstall
PYTHON='python3.6'
fi
else
echo "Installing Python 3.6 from source. This is going to take a LONG time."
sudo apt-get install --no-install-recommends --yes \
tk-dev libncurses5-dev libncursesw5-dev \
libreadline6-dev libdb5.3-dev libgdbm-dev \
libsqlite3-dev libssl-dev libbz2-dev \
libexpat1-dev liblzma-dev zlib1g-dev
python_file="${download_dir}/Python-3.6.8.tar.xz"
python_url='https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz'
maybe_download "${python_url}" "${python_file}"
tar -C "${temp_dir}" -xf "${python_file}"
cd "${temp_dir}/Python-3.6.8" && \
./configure && \
make -j "${make_threads}" && \
sudo make altinstall
PYTHON='python3.6'
# User-provided Python
PYTHON="${FLAGS_python}"
fi
# -----------------------------------------------------------------------------
@@ -155,19 +163,23 @@ case "${CPU_ARCH}" in
arm64v8)
FRIENDLY_ARCH=aarch64
;;
*)
FRIENDLY_ARCH="${CPU_ARCH}"
;;
esac
echo "Downloading dependencies"
download_args=()
if [[ ! -z "${offline}" ]]; then
if [[ -n "${offline}" ]]; then
download_args+=('--offline')
fi
if [[ ! -z "${no_precise}" ]]; then
if [[ -n "${no_precise}" ]]; then
download_args+=('--noprecise')
fi
if [[ ! -z "${no_kaldi}" ]]; then
if [[ -n "${no_kaldi}" ]]; then
download_args+=('--nokaldi')
fi
@@ -201,6 +213,9 @@ export LD_LIBRARY_PATH="${venv}/lib:${LD_LIBRARY_PATH}"
# shellcheck source=/dev/null
source "${venv}/bin/activate"
echo "Upgrading pip"
"${PYTHON}" -m pip install --upgrade pip
echo "Installing Python requirements"
"${PYTHON}" -m pip install wheel setuptools
"${PYTHON}" -m pip install requests
@@ -208,38 +223,43 @@ echo "Installing Python requirements"
# pytorch is not available on ARM
case "${CPU_ARCH}" in
armv7l|arm64v8)
no_flair="true" ;;
no_flair="true" ;;
esac
requirements_file="${temp_dir}/requirements.txt"
cp "${this_dir}/requirements.txt" "${requirements_file}"
# Exclude requirements
if [[ ! -z "${no_flair}" ]]; then
if [[ -n "${no_flair}" ]]; then
echo "Excluding flair from virtual environment"
sed -i '/^flair/d' "${requirements_file}"
fi
if [[ ! -z "${no_precise}" ]]; then
if [[ -n "${no_precise}" ]]; then
echo "Excluding Mycroft Precise from virtual environment"
sed -i '/^precise-runner/d' "${requirements_file}"
fi
if [[ ! -z "${no_adapt}" ]]; then
if [[ -n "${no_adapt}" ]]; then
echo "Excluding Mycroft Adapt from virtual environment"
sed -i '/^adapt-parser/d' "${requirements_file}"
fi
if [[ ! -z "${no_google}" ]]; then
if [[ -n "${no_google}" ]]; then
echo "Excluding Google Text to Speech from virtual environment"
sed -i '/^google-cloud-texttospeech/d' "${requirements_file}"
fi
# Install everything except openfst first
sed -i '/^openfst/d' "${requirements_file}"
python3 -m pip install -r "${requirements_file}"
# Install Python openfst wrapper
"${PYTHON}" -m pip install \
--global-option=build_ext \
--global-option="-I${venv}/include" \
--global-option="-L${venv}/lib" \
-r "${requirements_file}"
-r <(grep '^openfst' "${this_dir}/requirements.txt")
# -----------------------------------------------------------------------------
# Pocketsphinx for Python
@@ -266,7 +286,7 @@ esac
# Mycroft Precise
# -----------------------------------------------------------------------------
if [[ -z "${no_precise}" && -z "$(which precise-engine)" ]]; then
if [[ -z "${no_precise}" && -z "$(command -v precise-engine)" ]]; then
case "${CPU_ARCH}" in
x86_64|armv7l)
echo "Installing Mycroft Precise"
+1 -1
View File
@@ -1,5 +1,5 @@
Package: rhasspy-server
Version: 2.4.8
Version: 2.4.10
Section: utils
Priority: optional
Depends: sox,alsa-utils,espeak,libstdc++6,jq,xz-utils,unzip,curl,sphinxbase-utils,sphinxtrain,flite,libatlas-base-dev,gfortran
+4 -4
View File
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
rhasspy_version="2.4.8"
rhasspy_version="2.4.10"
this_dir="$( cd "$( dirname "$0" )" && pwd )"
@@ -46,7 +46,7 @@ fi
cd "${this_dir}"
source "${venv}/bin/activate"
if [[ -z "$(which pyinstaller)" ]]; then
if [[ -z "$(command -v pyinstaller)" ]]; then
echo "Missing PyInstaller"
exit 1
fi
@@ -131,7 +131,7 @@ cp "${this_dir}/app.py" "${share_dir}/src/"
# -----------------------------------------------------------------------------
echo "Copying Kaldi"
kaldi_src="${venv}/kaldi"
kaldi_src="${this_dir}/opt/kaldi"
if [[ ! -d "${kaldi_src}" ]]; then
echo "Missing Kaldi at ${kaldi_src}"
exit 1
@@ -145,7 +145,7 @@ rsync -av --delete "${kaldi_src}/" "${kaldi_dest}/"
rm -f "${kaldi_dest}/egs/wsj/s5/utils/utils"
# Turn duplicate .so files into symbolic links
function fix_library_links() {
function fix_library_links {
lib_dir="$1"
for lib in "${lib_dir}"/*.so; do
+1 -127
View File
@@ -1,127 +1 @@
COPY profiles/zh/profile.json \
profiles/zh/custom_words.txt \
profiles/zh/espeak_phonemes.txt \
profiles/zh/phoneme_examples.txt \
profiles/zh/frequent_words.txt \
profiles/zh/sentences.ini \
profiles/zh/stop_words.txt ${RHASSPY_APP}/profiles/zh/
COPY profiles/hi/ \
profiles/hi/profile.json \
profiles/hi/custom_words.txt \
profiles/hi/espeak_phonemes.txt \
profiles/hi/phoneme_examples.txt \
profiles/hi/frequent_words.txt \
profiles/hi/sentences.ini \
profiles/hi/stop_words.txt ${RHASSPY_APP}/profiles/hi/
COPY profiles/el/profile.json \
profiles/el/custom_words.txt \
profiles/el/espeak_phonemes.txt \
profiles/el/phoneme_examples.txt \
profiles/el/frequent_words.txt \
profiles/el/sentences.ini \
profiles/el/stop_words.txt ${RHASSPY_APP}/profiles/el/
COPY profiles/de/profile.json \
profiles/de/custom_words.txt \
profiles/de/espeak_phonemes.txt \
profiles/de/phoneme_examples.txt \
profiles/de/frequent_words.txt \
profiles/de/sentences.ini \
profiles/de/stop_words.txt ${RHASSPY_APP}/profiles/de/
COPY profiles/de/kaldi/custom_words.txt \
profiles/de/kaldi/espeak_phonemes.txt \
profiles/de/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/de/kaldi/
COPY profiles/it/profile.json \
profiles/it/custom_words.txt \
profiles/it/espeak_phonemes.txt \
profiles/it/phoneme_examples.txt \
profiles/it/frequent_words.txt \
profiles/it/sentences.ini \
profiles/it/stop_words.txt ${RHASSPY_APP}/profiles/it/
COPY profiles/es/profile.json \
profiles/es/custom_words.txt \
profiles/es/espeak_phonemes.txt \
profiles/es/phoneme_examples.txt \
profiles/es/frequent_words.txt \
profiles/es/sentences.ini \
profiles/es/stop_words.txt ${RHASSPY_APP}/profiles/es/
COPY profiles/fr/profile.json \
profiles/fr/custom_words.txt \
profiles/fr/espeak_phonemes.txt \
profiles/fr/phoneme_examples.txt \
profiles/fr/frequent_words.txt \
profiles/fr/sentences.ini \
profiles/fr/stop_words.txt ${RHASSPY_APP}/profiles/fr/
COPY profiles/ru/profile.json \
profiles/ru/custom_words.txt \
profiles/ru/espeak_phonemes.txt \
profiles/ru/phoneme_examples.txt \
profiles/ru/frequent_words.txt \
profiles/ru/sentences.ini \
profiles/ru/stop_words.txt ${RHASSPY_APP}/profiles/ru/
COPY profiles/nl/profile.json \
profiles/nl/custom_words.txt \
profiles/nl/espeak_phonemes.txt \
profiles/nl/phoneme_examples.txt \
profiles/nl/frequent_words.txt \
profiles/nl/sentences.ini \
profiles/nl/stop_words.txt ${RHASSPY_APP}/profiles/nl/
COPY profiles/nl/kaldi/custom_words.txt \
profiles/nl/kaldi/espeak_phonemes.txt \
profiles/nl/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/nl/kaldi/
COPY profiles/vi/profile.json \
profiles/vi/custom_words.txt \
profiles/vi/espeak_phonemes.txt \
profiles/vi/phoneme_examples.txt \
profiles/vi/frequent_words.txt \
profiles/vi/sentences.ini \
profiles/vi/stop_words.txt ${RHASSPY_APP}/profiles/vi/
COPY profiles/pt/profile.json \
profiles/pt/custom_words.txt \
profiles/pt/espeak_phonemes.txt \
profiles/pt/phoneme_examples.txt \
profiles/pt/frequent_words.txt \
profiles/pt/sentences.ini \
profiles/pt/stop_words.txt ${RHASSPY_APP}/profiles/pt/
COPY profiles/sv/profile.json \
profiles/sv/custom_words.txt \
profiles/sv/espeak_phonemes.txt \
profiles/sv/phoneme_examples.txt \
profiles/sv/frequent_words.txt \
profiles/sv/sentences.ini \
profiles/sv/stop_words.txt ${RHASSPY_APP}/profiles/sv/
COPY profiles/ca/profile.json \
profiles/ca/custom_words.txt \
profiles/ca/espeak_phonemes.txt \
profiles/ca/phoneme_examples.txt \
profiles/ca/frequent_words.txt \
profiles/ca/sentences.ini \
profiles/ca/stop_words.txt ${RHASSPY_APP}/profiles/ca/
COPY profiles/en/profile.json \
profiles/en/custom_words.txt \
profiles/en/espeak_phonemes.txt \
profiles/en/phoneme_examples.txt \
profiles/en/frequent_words.txt \
profiles/en/sentences.ini \
profiles/en/stop_words.txt ${RHASSPY_APP}/profiles/en/
COPY profiles/en/kaldi/custom_words.txt \
profiles/en/kaldi/espeak_phonemes.txt \
profiles/en/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/en/kaldi/
COPY profiles/ ${RHASSPY_APP}/profiles/
@@ -48,7 +48,7 @@ RUN python3 -m pip install --no-cache-dir /pocketsphinx-python.tar.gz && \
COPY download/snowboy-1.3.0.tar.gz /
RUN if [ "$BUILD_ARCH" != "aarch64" ]; then pip3 install --no-cache-dir /snowboy-1.3.0.tar.gz; fi
RUN apt-get install --no-install-recommends --yes flite libttspico-utils
RUN apt-get install --no-install-recommends --yes flite libttspico-utils lame
COPY download/kaldi_${BUILD_ARCH}.tar.gz /kaldi.tar.gz
RUN mkdir -p /opt && \
@@ -72,133 +72,7 @@ RUN chmod +x /run.sh
COPY profiles/zh/profile.json \
profiles/zh/custom_words.txt \
profiles/zh/espeak_phonemes.txt \
profiles/zh/phoneme_examples.txt \
profiles/zh/frequent_words.txt \
profiles/zh/sentences.ini \
profiles/zh/stop_words.txt ${RHASSPY_APP}/profiles/zh/
COPY profiles/hi/ \
profiles/hi/profile.json \
profiles/hi/custom_words.txt \
profiles/hi/espeak_phonemes.txt \
profiles/hi/phoneme_examples.txt \
profiles/hi/frequent_words.txt \
profiles/hi/sentences.ini \
profiles/hi/stop_words.txt ${RHASSPY_APP}/profiles/hi/
COPY profiles/el/profile.json \
profiles/el/custom_words.txt \
profiles/el/espeak_phonemes.txt \
profiles/el/phoneme_examples.txt \
profiles/el/frequent_words.txt \
profiles/el/sentences.ini \
profiles/el/stop_words.txt ${RHASSPY_APP}/profiles/el/
COPY profiles/de/profile.json \
profiles/de/custom_words.txt \
profiles/de/espeak_phonemes.txt \
profiles/de/phoneme_examples.txt \
profiles/de/frequent_words.txt \
profiles/de/sentences.ini \
profiles/de/stop_words.txt ${RHASSPY_APP}/profiles/de/
COPY profiles/de/kaldi/custom_words.txt \
profiles/de/kaldi/espeak_phonemes.txt \
profiles/de/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/de/kaldi/
COPY profiles/it/profile.json \
profiles/it/custom_words.txt \
profiles/it/espeak_phonemes.txt \
profiles/it/phoneme_examples.txt \
profiles/it/frequent_words.txt \
profiles/it/sentences.ini \
profiles/it/stop_words.txt ${RHASSPY_APP}/profiles/it/
COPY profiles/es/profile.json \
profiles/es/custom_words.txt \
profiles/es/espeak_phonemes.txt \
profiles/es/phoneme_examples.txt \
profiles/es/frequent_words.txt \
profiles/es/sentences.ini \
profiles/es/stop_words.txt ${RHASSPY_APP}/profiles/es/
COPY profiles/fr/profile.json \
profiles/fr/custom_words.txt \
profiles/fr/espeak_phonemes.txt \
profiles/fr/phoneme_examples.txt \
profiles/fr/frequent_words.txt \
profiles/fr/sentences.ini \
profiles/fr/stop_words.txt ${RHASSPY_APP}/profiles/fr/
COPY profiles/ru/profile.json \
profiles/ru/custom_words.txt \
profiles/ru/espeak_phonemes.txt \
profiles/ru/phoneme_examples.txt \
profiles/ru/frequent_words.txt \
profiles/ru/sentences.ini \
profiles/ru/stop_words.txt ${RHASSPY_APP}/profiles/ru/
COPY profiles/nl/profile.json \
profiles/nl/custom_words.txt \
profiles/nl/espeak_phonemes.txt \
profiles/nl/phoneme_examples.txt \
profiles/nl/frequent_words.txt \
profiles/nl/sentences.ini \
profiles/nl/stop_words.txt ${RHASSPY_APP}/profiles/nl/
COPY profiles/nl/kaldi/custom_words.txt \
profiles/nl/kaldi/espeak_phonemes.txt \
profiles/nl/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/nl/kaldi/
COPY profiles/vi/profile.json \
profiles/vi/custom_words.txt \
profiles/vi/espeak_phonemes.txt \
profiles/vi/phoneme_examples.txt \
profiles/vi/frequent_words.txt \
profiles/vi/sentences.ini \
profiles/vi/stop_words.txt ${RHASSPY_APP}/profiles/vi/
COPY profiles/pt/profile.json \
profiles/pt/custom_words.txt \
profiles/pt/espeak_phonemes.txt \
profiles/pt/phoneme_examples.txt \
profiles/pt/frequent_words.txt \
profiles/pt/sentences.ini \
profiles/pt/stop_words.txt ${RHASSPY_APP}/profiles/pt/
COPY profiles/sv/profile.json \
profiles/sv/custom_words.txt \
profiles/sv/espeak_phonemes.txt \
profiles/sv/phoneme_examples.txt \
profiles/sv/frequent_words.txt \
profiles/sv/sentences.ini \
profiles/sv/stop_words.txt ${RHASSPY_APP}/profiles/sv/
COPY profiles/ca/profile.json \
profiles/ca/custom_words.txt \
profiles/ca/espeak_phonemes.txt \
profiles/ca/phoneme_examples.txt \
profiles/ca/frequent_words.txt \
profiles/ca/sentences.ini \
profiles/ca/stop_words.txt ${RHASSPY_APP}/profiles/ca/
COPY profiles/en/profile.json \
profiles/en/custom_words.txt \
profiles/en/espeak_phonemes.txt \
profiles/en/phoneme_examples.txt \
profiles/en/frequent_words.txt \
profiles/en/sentences.ini \
profiles/en/stop_words.txt ${RHASSPY_APP}/profiles/en/
COPY profiles/en/kaldi/custom_words.txt \
profiles/en/kaldi/espeak_phonemes.txt \
profiles/en/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/en/kaldi/
COPY profiles/ ${RHASSPY_APP}/profiles/
COPY profiles/defaults.json ${RHASSPY_APP}/profiles/
COPY docker/rhasspy ${RHASSPY_APP}/bin/
@@ -209,6 +83,7 @@ COPY rhasspy/train/jsgf2fst/*.py ${RHASSPY_APP}/rhasspy/train/jsgf2fst/
COPY rhasspy/train/*.py ${RHASSPY_APP}/rhasspy/train/
COPY *.py ${RHASSPY_APP}/
COPY rhasspy/*.py ${RHASSPY_APP}/rhasspy/
COPY VERSION ${RHASSPY_APP}/
ENV CONFIG_PATH /data/options.json
ENV KALDI_PREFIX /opt
+1
View File
@@ -7,3 +7,4 @@ COPY rhasspy/train/jsgf2fst/*.py ${RHASSPY_APP}/rhasspy/train/jsgf2fst/
COPY rhasspy/train/*.py ${RHASSPY_APP}/rhasspy/train/
COPY *.py ${RHASSPY_APP}/
COPY rhasspy/*.py ${RHASSPY_APP}/rhasspy/
COPY VERSION ${RHASSPY_APP}/
+1 -1
View File
@@ -1 +1 @@
RUN apt-get install --no-install-recommends --yes flite libttspico-utils
RUN apt-get install --no-install-recommends --yes flite libttspico-utils lame
+1 -1
View File
@@ -1 +1 @@
theme: jekyll-theme-cayman
theme: jekyll-theme-cayman
+35 -1
View File
@@ -2,15 +2,49 @@
Rhasspy was created and is currently maintained by [Michael Hansen](https://synesthesiam.com/).
![Mike head](img/mike-head.png)
<img src="../img/mike-head.png" style="max-height: 100px;" title="Mike head">
Special thanks to:
* [Romkabouter](https://github.com/Romkabouter)
* [koenvervloesem](https://github.com/koenvervloesem)
* [FunkyBoT](https://community.home-assistant.io/u/FunkyBoT)
* [fastjack](https://community.rhasspy.org/u/fastjack)
* [S_n_Nguy_n](https://community.home-assistant.io/u/S_n_Nguy_n)
## Motivation
A typical voice assistant (Alexa, Google Home, etc.) solves a number of important problems:
1. Deciding when to record audio ([wake word](wake-word.md))
2. Listening for voice commands ([command listener](command-listener.md))
3. Transcribing command/question ([speech to text](speech-to-text.md))
4. Interpreting the speaker's **intent** from the text ([intent recognition](intent-recognition.md))
5. Fulfilling the speaker's intent ([intent handling](intent-handling.md))
Rhasspy provides **offline, private solutions** to problems 1-4 using off-the-shelf tools. These tools are:
* **Wake word**
* [Pocketsphinx keyphrase](https://cmusphinx.github.io/wiki/tutoriallm/#using-keyword-lists-with-pocketsphinx)
* [Mycroft Precise](https://github.com/MycroftAI/mycroft-precise)
* [snowboy](https://snowboy.kitt.ai)
* [porcupine](https://github.com/Picovoice/Porcupine)
* **Command listener**
* [webrtcvad](https://github.com/wiseman/py-webrtcvad)
* **Speech to text**
* [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx)
* [Kaldi](https://kaldi-asr.org)
* **Intent recognition**
* [OpenFST](https://www.openfst.org)
* [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy)
* [Mycroft Adapt](https://github.com/MycroftAI/adapt)
* [flair](http://github.com/zalandoresearch/flair)
* [Rasa NLU](https://rasa.com/)
For problem 5 (fulfilling the speaker's intent), Rhasspy works with external home automation software, such as Home Assistant's built-in [automation capability](https://www.home-assistant.io/docs/automation/) or a [Node-RED flow](https://nodered.org).
For each intent you define, Rhasspy emits a JSON event that can do anything Home Assistant can do (toggle switches, call REST services, etc.). This means that Rhasspy will do very little out of the box compared to other voice assistants, but there are also be *no limits* to what can be done.
## Supporting Tools
The following tools/libraries help to support Rhasspy:
+8 -9
View File
@@ -22,11 +22,11 @@ Add to your [profile](profiles.md):
```
Set `microphone.pyaudio.device` to a PyAudio device number or leave blank for the default device.
Streams 30ms chunks of 16-bit, 16 Khz mono audio by default (480 frames).
Streams 30ms chunks of 16-bit, 16 kHz mono audio by default (480 frames).
See `rhasspy.audio_recorder.PyAudioRecorder` for details.
## ALSA
## ALSA
Starts an `arecord` process locally and reads audio data from its standard out.
Works best with [ALSA](https://www.alsa-project.org/main/index.php/Main_Page).
@@ -42,7 +42,7 @@ Add to your [profile](profiles.md):
}
}
```
Set `microphone.arecord.device` to the name of the ALSA device to use (`-D` flag
to `arecord`) or leave blank for the default device.
By default, calls `arecord -t raw -r 16000 -f S16_LE -c 1` and reads 30ms (960
@@ -52,7 +52,7 @@ See `rhasspy.audio_recorder.ARecordAudioRecorder` for details.
## MQTT/Hermes
Listens to the `hermes/audioServer/<SITE_ID>/audioFrame` topic for WAV data ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol)).
Listens to the `hermes/audioServer/<SITE_ID>/audioFrame` topic for WAV data ([Hermes protocol](https://docs.snips.ai/reference/hermes)).
This allows Rhasspy to receive audio from [Snips.AI](https://snips.ai/).
Audio data is automatically converted to 16-bit, 16 kHz mono with [sox](http://sox.sourceforge.net).
@@ -72,7 +72,7 @@ Add to your [profile](profiles.md):
"site_id": "default"
}
```
Adjust the `mqtt` configuration to connect to your MQTT broker.
Set `mqtt.site_id` to match your Snips.AI siteId.
@@ -80,7 +80,7 @@ See `rhasspy.audio_recorder.HermesAudioRecorder` for details.
## HTTP Stream
Accepts chunks of 16-bit 16Khz mono audio via an HTTP POST stream (assumes [chunked transfer encoding](https://en.wikipedia.org/wiki/Chunked_transfer_encoding)).
Accepts chunks of 16-bit 16 kHz mono audio via an HTTP POST stream (assumes [chunked transfer encoding](https://en.wikipedia.org/wiki/Chunked_transfer_encoding)).
Add to your [profile](profiles.md):
@@ -95,7 +95,7 @@ Add to your [profile](profiles.md):
}
```
Set `microphone.http.stop_after` to one of "never", "text", or "intent". When set to "never", you can continously stream (chunked) audio into Rhasspy across multiple voice commands. When set to "text" or "intent", the stream will be closed when the first voice command has been transcribed ("text") or recognized ("intent"). Once closed, you can perform an HTTP GET request to the stream URL to retrieve the result (text for transcriptions or JSON for intent).
Set `microphone.http.stop_after` to one of "never", "text", or "intent". When set to "never", you can continuously stream (chunked) audio into Rhasspy across multiple voice commands. When set to "text" or "intent", the stream will be closed when the first voice command has been transcribed ("text") or recognized ("intent"). Once closed, you can perform an HTTP GET request to the stream URL to retrieve the result (text for transcriptions or JSON for intent).
Note that `microphone.http.port` must be different than Rhasspy's webserver port (usually 12101).
@@ -122,7 +122,7 @@ Set `microphone.gstreamer.pipeline` to your GStreamer pipeline **without a sink*
udpsrc port=12333 ! rawaudioparse use-sink-caps=false format=pcm pcm-format=s16le sample-rate=16000 num-channels=1 ! queue ! audioconvert ! audioresample
```
which "simply" receives raw 16-bit 16khz audio chunks via UDP port 12333. You could stream microphone audio to Rhasspy from another machine by running the following terminal command:
which "simply" receives raw 16-bit 16 kHz audio chunks via UDP port 12333. You could stream microphone audio to Rhasspy from another machine by running the following terminal command:
```bash
gst-launch-1.0 \
@@ -152,4 +152,3 @@ Add to your [profile](profiles.md):
```
See `rhasspy.audio_recorder.DummyAudioRecorder` for details.
+25 -22
View File
@@ -9,41 +9,44 @@ Plays WAV files on the local device by calling the `aplay` command. Should work
Add to your [profile](profiles.md):
"sounds": {
"system": "aplay",
"aplay": {
"device": ""
}
}
```json
"sounds": {
"system": "aplay",
"aplay": {
"device": ""
}
}
```
If provided, `sounds.aplay.device` is passed to `aplay` with the `-D` argument.
Leave it blank to use the default device.
See `rhasspy.audio_player.APlayAudioPlayer` for details.
## MQTT/Hermes
Publishes WAV data to the `hermes/audioServer/<SITE_ID>/playBytes/<REQUEST_ID>` topic ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol)).
Publishes WAV data to the `hermes/audioServer/<SITE_ID>/playBytes/<REQUEST_ID>` topic ([Hermes protocol](https://docs.snips.ai/reference/hermes)).
This allows Rhasspy to send audio to [Snips.AI](https://snips.ai/).
Rhasspy will always try to send 16 kHz, 16-bit mono audio.
Rhasspy will by default send 16 kHz, 16-bit mono audio, unless specified otherwise.
The request id is generated each time a sound is played using `uuid.uuid4`.
Add to your [profile](profiles.md):
"sounds": {
"system": "hermes"
},
"mqtt": {
"enabled": true,
"host": "localhost",
"username": "",
"port": 1883,
"password": "",
"site_id": "default"
}
```json
"sounds": {
"system": "hermes"
},
"mqtt": {
"enabled": true,
"host": "localhost",
"username": "",
"port": 1883,
"password": "",
"site_id": "default"
}
```
Adjust the `mqtt` configuration to connect to your MQTT broker.
Set `mqtt.site_id` to match your Snips.AI siteId.
+13 -10
View File
@@ -11,7 +11,6 @@ You can also make Rhasspy record a voice command using the [HTTP API](usage.md#h
2. Speaking your voice command
3. POST-ing to `/api/stop-recording`. Rhasspy will stop recording and process the voice command.
## WebRTCVAD
Listens for a voice commands using [webrtcvad](https://github.com/wiseman/py-webrtcvad) to detect speech and silence.
@@ -33,11 +32,11 @@ Add to your [profile](profiles.md):
}
}
```
This system listens for up to `timeout_sec` for a voice command. The first few frames of audio data are discarded (`throwaway_buffers`) to avoid clicks from the microphone being engaged. When speech is detected for some number of successive frames (`speech_buffers`), the voice command is considered to have *started*. After `min_sec`, Rhasspy will start listening for silence. If at least `silence_sec` goes by without any speech detected, the command is considered *finished*, and the recorded WAV data is sent to the [speech recognition system](speech-to-text.md).
You may want to adjust `min_sec`, `silence_sec`, and `vad_mode` for your environment.
These control how short a voice command can be (`min_sec`), how much silence is required before Rhasspy stops listening (`silence_sec`), and how sensitive the voice activity detector is (`vad_mode`, higher is more sensitive).
These control how short a voice command can be (`min_sec`), how much silence is required before Rhasspy stops listening (`silence_sec`), and how aggressive the voice activity filter `vad_mode` is: this is an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is the most aggressive.
**NOTE**: you must set `chunk_size` such that (relative to sample rate) it produces 10, 20, or 30 millisecond buffers. This is required by `webrtcvad`.
@@ -60,15 +59,15 @@ Add to your [profile](profiles.md):
}
}
```
See `rhasspy.command_listener.OneShotCommandListener` for details.
## MQTT/Hermes
Subscribes to the `hermes/asr/startListening` and `hermes/asr/stopListening` topics ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol)).
Subscribes to the `hermes/asr/startListening` and `hermes/asr/stopListening` topics ([Hermes protocol](https://docs.snips.ai/reference/hermes)).
This allows Rhasspy to be controlled by [Snips.AI](https://snips.ai/).
Wakes up Rhasspy when `startListening` is received and starts recording. Stops recording when `stopListening` is received and processes the voice command.
Wakes up Rhasspy when `startListening` is received and starts recording. Stops recording when `stopListening` is received and processes the voice command.
Add to your [profile](profiles.md):
@@ -96,12 +95,16 @@ Set `mqtt.site_id` to match your Snips.AI siteId.
Using [mosquitto_pub](https://mosquitto.org/man/mosquitto_pub-1.html), wake up Rhasspy with:
mosquitto_pub -t 'hermes/asr/startListening' -m '{ "siteId": "default" }'
```bash
mosquitto_pub -t 'hermes/asr/startListening' -m '{ "siteId": "default" }'
```
Say your voice command, then stop recording with:
mosquitto_pub -t 'hermes/asr/stopListening' -m '{ "siteId": "default" }'
```bash
mosquitto_pub -t 'hermes/asr/stopListening' -m '{ "siteId": "default" }'
```
Rhasspy should process your voice command.
See `rhasspy.command.HermesCommandListener` for details.
+84
View File
@@ -0,0 +1,84 @@
# Development
Rhasspy's code can be found [on GitHub](https://github.com/synesthesiam/rhasspy).
## Set up your development environment
If you want to start developing on Rhasspy, [fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the repository, and clone your fork:
```bash
git clone https://github.com/<your_username>/rhasspy.git
cd rhasspy
```
Add the original repository as an [upstream remote](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/configuring-a-remote-for-a-fork):
```bash
git remote add upstream https://github.com/synesthesiam/rhasspy.git
```
Then follow the installation steps for a [virtual environment](installation.md#virtual-environment). If the `create-venv.sh` script fails, please [report an issue](https://github.com/synesthesiam/rhasspy/issues) before proceeding.
If you pull changes, make sure to re-download and extract `rhasspy-web-dist.tar.gz` from [the releases page](https://github.com/synesthesiam/rhasspy/releases/tag/v2.0). This contains the pre-compiled web artifacts. Alternatively, you can install [yarn](https://yarnpkg.com) and run `yarn build` in the `rhasspy` directory after a `git pull`.
## Run the unit tests
A good start to check whether your development environment is set up correctly (or to find some bugs) is to run the unit tests:
```bash
./run-tests.sh
```
This will run tests against pre-recorded WAV files in `rhasspy/etc/test` for specific languages. You can run tests only for a specific language (profile) like this:
```bash
./run-tests.sh -p en
```
Its good practice to run the unit tests before and after you work on something, to be sure your changes don't accidentally break something.
## Keeping your fork synchronized
When the upstream repository has new commits, you should [synchronize your fork](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork):
```bash
git fetch upstream
git checkout master
git merge upstream/master
```
Then [update your fork on GitHub](https://help.github.com/en/github/using-git/pushing-commits-to-a-remote-repository):
```bash
git push
```
Your fork is now synchronized to the original repository.
## Development practices
* Before starting significant work, please propose it and discuss it first on the [issue tracker](https://github.com/synesthesiam/rhasspy/issues) on GitHub. Other people may have suggestions, will want to collaborate and will wish to review your code.
* Please work on one piece of conceptual work at a time. Keep each narrative of work in a different branch.
* As much as possible, have each commit solve one problem.
* A commit must not leave the project in a non-functional state.
* Run the unit tests before you create a commit.
* Treat code, tests and documentation as one.
* Create a [pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) from your fork.
## Development workflow
If you want to start working on a specific feature or bug fix, this is an example workflow:
* Synchronize your fork with the upstream repository.
* Create a new branch: `git checkout -b <nameofbranch>`
* Create your changes.
* Add the changed files with `git add <files>`.
* Commit your changes with `git commit`.
* Push your changes to your fork on GitHub.
* Create a pull request from your fork.
## License of contributions
By submitting patches to this project, you agree to allow them to be redistributed under the projects [license](license.md) according to the normal forms and usages of the open source community.
It is your responsibility to make sure you have all the necessary rights to contribute to the project.
+4 -1
View File
@@ -4,6 +4,9 @@ Rhasspy is designed to be run on different kinds of hardware, such as:
* Raspberry Pi 2-3 B/B+ (`armhf`/`aarch64`)
* Desktop/laptop/server (`amd64`)
* Raspberry Pi Zero (`armv6l`)
* You must use a [virtual environment](installation.md#virtual-environment)
* The [Kaldi speech recognizer](speech-to-text.md#kaldi) is **not** supported
The table below summarizes architecture compatibility with Rhasspy's components:
@@ -30,7 +33,7 @@ The table below summarizes architecture compatibility with Rhasspy's components:
To run Rhasspy on a Raspberry Pi, you'll need at least a 4 GB SD card and a good power supply. I highly recommend the [CanaKit Starter Kit](https://www.amazon.com/CanaKit-Raspberry-Starter-Premium-Black/dp/B07BCC8PK7), which includes a 32 GB SD card, a 2.5 A power supply, and a case.
Some components of Rhasspy will not work on the Raspberry Pi 3 B+ model (`aarch64`). As of the time of this writing, these are:
Some components of Rhasspy will not work on the Raspberry Pi 3 B+ model with a 64-bit operating system (`aarch64`). As of the time of this writing, these are:
* [snowboy](wake-word.md#snowboy) (wake word)
* [Mycroft Precise](wake-word.md#mycroft-precise) (wake word)
Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

+140
View File
@@ -0,0 +1,140 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="149.42726mm"
height="36.848656mm"
viewBox="0 0 149.42726 36.848656"
version="1.1"
id="svg860"
inkscape:version="0.92.3 (2405546, 2018-03-11)"
sodipodi:docname="rhasspy-discourse-logo.svg"
inkscape:export-filename="./rhasspy-discourse-logo.png"
inkscape:export-xdpi="82.716721"
inkscape:export-ydpi="82.716721">
<defs
id="defs854" />
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="0.9899495"
inkscape:cx="268.11251"
inkscape:cy="139.11788"
inkscape:document-units="mm"
inkscape:current-layer="layer1"
showgrid="false"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0"
inkscape:window-width="1440"
inkscape:window-height="755"
inkscape:window-x="0"
inkscape:window-y="0"
inkscape:window-maximized="1" />
<metadata
id="metadata857">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(47.552776,-100.1735)">
<circle
style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.5;stroke-linecap:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
id="path1476"
cx="-29.128448"
cy="118.59783"
r="18.174328" />
<g
transform="matrix(0.80207931,0,0,0.80207931,-74.139422,96.215375)"
id="g2275">
<g
id="text817"
style="font-style:normal;font-weight:normal;font-size:41.37965775px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1.03449142"
transform="rotate(-45)"
aria-label="R">
<path
sodipodi:nodetypes="ccccccccccccccccssccccccccc"
inkscape:connector-curvature="0"
id="path819"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:41.38083267px;font-family:'CC Adamantium';-inkscape-font-specification:'CC Adamantium, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;writing-mode:lr-tb;text-anchor:start;stroke-width:1.03449142"
d="M 31.509252,62.491941 31.16794,75.667305 H 30.83455 L 28.604505,65.027061 15.316738,59.73528 13.781121,75.667305 H 13.286086 L 11.528138,71.268365 10.110857,66.929543 9.526899,61.048002 6.9616279,56.651114 8.7144255,54.299084 6.356732,51.899807 8.2521138,50.246675 6.1006224,45.789404 9.891565,42.358563 c 2.435726,-1.492588 4.806268,-0.545105 7.30443,-1.317335 4.174203,-1.290327 7.29492,-1.792422 11.275957,5.059621 0.756691,1.302392 3.239334,1.578749 4.130578,3.198298 -0.882306,1.555823 -2.064327,2.923061 -3.546063,4.101714 -1.481735,1.171918 -3.152055,2.175457 -5.01096,3.010617 -1.852169,0.828425 -3.852512,1.535617 -6.001029,2.121576 z M 25.51612,48.388298 c -6.142518,4.42909 -6.341445,-0.106922 -8.663766,-3.716207 l -1.283048,13.860963 c 5.545523,-1.913183 8.340713,-6.051669 9.946814,-10.144756 z" />
</g>
<ellipse
ry="0.93544334"
rx="0.33408689"
cy="21.859995"
cx="52.059788"
id="path2115"
style="opacity:1;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1" />
<ellipse
transform="rotate(-45)"
style="opacity:1;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1"
id="ellipse2117"
cx="18.873178"
cy="50.914211"
rx="0.33408689"
ry="0.93544334" />
<path
inkscape:connector-curvature="0"
id="path2119"
d="m 64.331743,23.950737 -0.788701,-2.167883 0.785662,-0.376444 0.715334,2.441702 z"
style="fill:#000000;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
<path
style="fill:#000000;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 69.630908,29.701094 1.48309,-1.766977 0.718843,0.492181 -1.75691,1.840348 z"
id="path2121"
inkscape:connector-curvature="0" />
<path
sodipodi:nodetypes="cscc"
inkscape:connector-curvature="0"
id="path2123"
d="m 47.978861,19.145376 c -0.0362,0.284741 -0.632118,0.443544 -1.331028,0.354698 -0.698909,-0.08885 -1.236142,-0.391701 -1.199944,-0.676442 z"
style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1" />
<path
style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1"
d="m 49.344113,18.846496 c 0.224679,0.178626 0.762248,-0.123625 1.200693,-0.675116 0.438451,-0.55148 0.611752,-1.143345 0.387075,-1.321972 z"
id="path2126"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cscc" />
<path
sodipodi:nodetypes="ccc"
inkscape:connector-curvature="0"
id="path2128"
d="m 43.707615,19.788656 8.68626,10.557147 c 2.944473,-4.699489 1.792375,-9.398979 -0.200452,-14.098468"
style="opacity:1;fill:none;stroke:#808080;stroke-width:0.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
</g>
<text
xml:space="preserve"
style="font-style:normal;font-weight:normal;font-size:30.12816238px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.5"
x="-5.9640822"
y="128.49496"
id="text824"><tspan
sodipodi:role="line"
id="tspan822"
x="-5.9640822"
y="128.49496"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:30.12828064px;font-family:'Sansus Webissimo';-inkscape-font-specification:'Sansus Webissimo, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;writing-mode:lr-tb;text-anchor:start;fill:#ffffff;stroke:#000000;stroke-width:0.5">RHASSPY</tspan></text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 6.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

+123
View File
@@ -0,0 +1,123 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="36.848656mm"
height="36.848656mm"
viewBox="0 0 36.848656 36.848656"
version="1.1"
id="svg860"
inkscape:version="0.92.3 (2405546, 2018-03-11)"
sodipodi:docname="rhasspy-raven-square.svg"
inkscape:export-filename="./rhasspy-discourse-square-logo-nocircle.png"
inkscape:export-xdpi="352.92468"
inkscape:export-ydpi="352.92468">
<defs
id="defs854" />
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="1.979899"
inkscape:cx="-98.08577"
inkscape:cy="43.808495"
inkscape:document-units="mm"
inkscape:current-layer="layer1"
showgrid="false"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0"
inkscape:window-width="1440"
inkscape:window-height="755"
inkscape:window-x="0"
inkscape:window-y="0"
inkscape:window-maximized="1" />
<metadata
id="metadata857">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(47.552776,-100.1735)">
<g
transform="matrix(0.80207931,0,0,0.80207931,-74.139422,96.215375)"
id="g2275">
<g
id="text817"
style="font-style:normal;font-weight:normal;font-size:41.37965775px;line-height:1.25;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1.03449142"
transform="rotate(-45)"
aria-label="R">
<path
sodipodi:nodetypes="ccccccccccccccccssccccccccc"
inkscape:connector-curvature="0"
id="path819"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:41.38083267px;font-family:'CC Adamantium';-inkscape-font-specification:'CC Adamantium, Normal';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;text-align:start;writing-mode:lr-tb;text-anchor:start;stroke-width:1.03449142"
d="M 31.509252,62.491941 31.16794,75.667305 H 30.83455 L 28.604505,65.027061 15.316738,59.73528 13.781121,75.667305 H 13.286086 L 11.528138,71.268365 10.110857,66.929543 9.526899,61.048002 6.9616279,56.651114 8.7144255,54.299084 6.356732,51.899807 8.2521138,50.246675 6.1006224,45.789404 9.891565,42.358563 c 2.435726,-1.492588 4.806268,-0.545105 7.30443,-1.317335 4.174203,-1.290327 7.29492,-1.792422 11.275957,5.059621 0.756691,1.302392 3.239334,1.578749 4.130578,3.198298 -0.882306,1.555823 -2.064327,2.923061 -3.546063,4.101714 -1.481735,1.171918 -3.152055,2.175457 -5.01096,3.010617 -1.852169,0.828425 -3.852512,1.535617 -6.001029,2.121576 z M 25.51612,48.388298 c -6.142518,4.42909 -6.341445,-0.106922 -8.663766,-3.716207 l -1.283048,13.860963 c 5.545523,-1.913183 8.340713,-6.051669 9.946814,-10.144756 z" />
</g>
<ellipse
ry="0.93544334"
rx="0.33408689"
cy="21.859995"
cx="52.059788"
id="path2115"
style="opacity:1;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1" />
<ellipse
transform="rotate(-45)"
style="opacity:1;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1"
id="ellipse2117"
cx="18.873178"
cy="50.914211"
rx="0.33408689"
ry="0.93544334" />
<path
inkscape:connector-curvature="0"
id="path2119"
d="m 64.331743,23.950737 -0.788701,-2.167883 0.785662,-0.376444 0.715334,2.441702 z"
style="fill:#000000;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
<path
style="fill:#000000;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 69.630908,29.701094 1.48309,-1.766977 0.718843,0.492181 -1.75691,1.840348 z"
id="path2121"
inkscape:connector-curvature="0" />
<path
sodipodi:nodetypes="cscc"
inkscape:connector-curvature="0"
id="path2123"
d="m 47.978861,19.145376 c -0.0362,0.284741 -0.632118,0.443544 -1.331028,0.354698 -0.698909,-0.08885 -1.236142,-0.391701 -1.199944,-0.676442 z"
style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1" />
<path
style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1;stroke-opacity:1"
d="m 49.344113,18.846496 c 0.224679,0.178626 0.762248,-0.123625 1.200693,-0.675116 0.438451,-0.55148 0.611752,-1.143345 0.387075,-1.321972 z"
id="path2126"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cscc" />
<path
sodipodi:nodetypes="ccc"
inkscape:connector-curvature="0"
id="path2128"
d="m 43.707615,19.788656 8.68626,10.557147 c 2.944473,-4.699489 1.792375,-9.398979 -0.200452,-14.098468"
style="opacity:1;fill:none;stroke:#808080;stroke-width:0.5;stroke-linecap:round;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 5.8 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 65 KiB

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 50 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 85 KiB

After

Width:  |  Height:  |  Size: 55 KiB

+39 -146
View File
@@ -1,41 +1,51 @@
![Rhasspy logo](img/rhasspy.svg)
<img src="img/rhasspy.svg" style="max-height: 200px;" title="Rhasspy logo">
Rhasspy (pronounced RAH-SPEE) is an [open source](https://github.com/synesthesiam/rhasspy), fully offline voice assistant toolkit for [many languages](#supported-languages) that works well with [Home Assistant](https://www.home-assistant.io/), [Hass.io](https://www.home-assistant.io/hassio/), and [Node-RED](https://nodered.org).
Rhasspy transforms voice commands into [JSON](https://json.org) events that can trigger actions in home automation software, like [Home Assistant automations](https://www.home-assistant.io/docs/automation/trigger/#event-trigger) or [Node-RED flows](usage.md#node-red). You define custom voice commands in a [profile](profiles.md) using a [specialized template syntax](training.md), and Rhasspy takes care of the rest.
You specify voice commands in a [template language](training.md):
## Motivation
```
[LightState]
states = (on | off)
turn (<states>){state} [the] light
```
A typical voice assistant (Alexa, Google Home, etc.) solves a number of important problems:
and Rhasspy will produce [JSON](https://json.org) events that can trigger actions in [home automation software](https://www.home-assistant.io/docs/automation/trigger/#event-trigger) or [Node-RED flows](usage.md#node-red):
1. Deciding when to record audio ([wake word](wake-word.md))
2. Listening for voice commands ([command listener](command-listener.md))
3. Transcribing command/question ([speech to text](speech-to-text.md))
4. Interpreting the speaker's **intent** from the text ([intent recognition](intent-recognition.md))
5. Fulfilling the speaker's intent ([intent handling](intent-handling.md))
```json
{
"text": "turn on the light",
"intent": {
"name": "LightState"
},
"slots": {
"state": "on"
}
}
```
Rhasspy provides **offline, private solutions** to problems 1-4 using off-the-shelf tools. These tools are:
Rhasspy is <strong>optimized for</strong>:
* **Wake word**
* [Pocketsphinx keyphrase](https://cmusphinx.github.io/wiki/tutoriallm/#using-keyword-lists-with-pocketsphinx)
* [Mycroft Precise](https://github.com/MycroftAI/mycroft-precise)
* [snowboy](https://snowboy.kitt.ai)
* [porcupine](https://github.com/Picovoice/Porcupine)
* **Command listener**
* [webrtcvad](https://github.com/wiseman/py-webrtcvad)
* **Speech to text**
* [Pocketsphinx](https://github.com/cmusphinx/pocketsphinx)
* [Kaldi](https://kaldi-asr.org)
* **Intent recognition**
* [OpenFST](https://www.openfst.org)
* [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy)
* [Mycroft Adapt](https://github.com/MycroftAI/adapt)
* [flair](http://github.com/zalandoresearch/flair)
* [Rasa NLU](https://rasa.com/)
* Working with external services via [MQTT](usage.md#mqtt), [HTTP](usage.md#http-api), and [Websockets](usage.md#websocket-events)
* Home Assistant and Hass.IO have [built-in support](usage.md#home-assistant)
* Pre-specified voice commands that are described well [by a grammar](training.md#sentencesini)
* You can also do [open-ended speech recognition](speech-to-text.md#open-transcription)
* Voice commands with [uncommon words or pronunciations](usage.md#words-tab)
* New words are added phonetically with [automated assistance](https://github.com/AdolfVonKleist/Phonetisaurus)
For problem 5 (fulfilling the speaker's intent), Rhasspy works with external home automation software, such as Home Assistant's built-in [automation capability](https://www.home-assistant.io/docs/automation/) or a [Node-RED flow](https://nodered.org).
## Getting Started
For each intent you define, Rhasspy emits a JSON event that can do anything Home Assistant can do (toggle switches, call REST services, etc.). This means that Rhasspy will do very little out of the box compared to other voice assistants, but there are also be *no limits* to what can be done.
Ready to try Rhasspy? Follow the steps below and check out the [tutorials](tutorials.md).
1. Make sure you have the [necessary hardware](hardware.md)
2. Choose an [installation method](installation.md)
3. Access the [web interface](usage.md#web-interface) to download a profile
4. Author your [custom voice commands](training.md) and train Rhasspy
5. Connect Rhasspy to [Home Assistant](usage.md#home-assistant) or a [Node-RED](usage.md#node-red) flow
## Getting Help
If you have problems, please stop by the [Rhasspy community site](https://community.rhasspy.org) or [open a GitHub issue](https://github.com/synesthesiam/rhasspy/issues).
## Supported Languages
@@ -56,124 +66,7 @@ Rhasspy supports the following languages:
* Swedish (`sv`)
* Catalan (`ca`)
Support for these languages comes directly from existing [CMU Sphinx](https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/) and [Kaldi](https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html) acoustic models.
It is possible to extend Rhasspy to new languages with only:
* A [phonetic dictionary](https://cmusphinx.github.io/wiki/tutorialdict/#using-g2p-seq2seq-to-extend-the-dictionary)
* A trained [acoustic model](https://cmusphinx.github.io/wiki/tutorialam/)
* A [grapheme to phoneme model](https://github.com/AdolfVonKleist/Phonetisaurus)
The table below summarizes language support across the various supporting technologies that Rhasspy uses:
| Category | Name | Offline? | en | de | es | fr | it | nl | ru | el | hi | zh | vi | pt | sv | ca |
| -------- | ------ | -------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- |
| **Wake Word** | [pocketsphinx](wake-word.md#pocketsphinx) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | | | |
| | [porcupine](wake-word.md#porcupine) | &#x2713; | &#x2713; | | | | | | | | | | | | | |
| | [snowboy](wake-word.md#snowboy) | *requires account* | &#x2713; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; |
| | [precise](wake-word.md#mycroft-precise) | &#x2713; | &#x2713; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; |
| **Speech to Text** | [pocketsphinx](speech-to-text.md#pocketsphinx) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | | &#x2713; |
| | [kaldi](speech-to-text.md#kaldi) | &#x2713; | | | | | | | | | | | &#x2713; | | &#x2713; | |
| **Intent Recognition** | [fsticuffs](intent-recognition.md#fsticuffs) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [fuzzywuzzy](intent-recognition.md#fuzzywuzzy) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [adapt](intent-recognition.md#mycroft-adapt) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [flair](intent-recognition.md#flair) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | | | | | | &#x2713; | | &#x2713; |
| | [rasaNLU](intent-recognition.md#rasanlu) | *needs extra software* | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| **Text to Speech** | [espeak](text-to-speech.md#espeak) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [flite](text-to-speech.md#flite) | &#x2713; | &#x2713; | | | | | | | | &#x2713; | | | | | |
| | [picotts](text-to-speech.md#picotts) | &#x2713; | &#x2713; | | | | | | | | | | | | | |
| | [marytts](text-to-speech.md#marytts) | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | | &#x2713; | | | | | | | |
| | [wavenet](text-to-speech.md#google-wavenet) | | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | |
&bull; - yes, but requires training/customization
## How It Works
Rhasspy starts off asleep, listening for a [wake word](wake-word.md). Once awoken, it listens for a [voice command](command-listener.md). After recording the command, its transcribed with the [speech to text](speech-to-text.md) system into text, which is then run through an [intent recognizer](intent-recognition.md). Finally, the recognized intent is used to generate an event that can be [handled by Home Assistant or Node-RED](intent-handling.md).
![Rhasspy overview](img/rhasspy-overview.png)
## Customization
Every step of Rhasspy's processing pipeline can be customized, including using a remote Rhasspy server via its [HTTP API](usage.md#http-api) for [speech to text](speech-to-text.md#remote-http-server) and [intent recognition](intent-recognition.md#remote-http-server). Some useful Rhasspy API endpoints are:
* `/api/listen-for-command`
* POST to wake Rhasspy up and start listening for a voice command
* `/api/train`
* POST to re-train your profile
* `/api/speech-to-intent`
* POST a WAV file and have Rhasspy process it as a voice command
* `/api/text-to-intent`
* POST text and have Rhasspy process it as command
* `/api/text-to-speech`
* POST text and have Rhasspy speak it
Additionally, you can call out to a custom external program for [wake word detection](wake-word.md#command), [voice command listening](command-listener.md#command), [speech recognition](speech-to-text.md#command), [intent recognition](intent-recognition.md#command), and event [intent handling](intent-handling.md#command)! This means that you can use Rhasspy as a general voice command toolkit, with or without Home Assistant.
## RGB Light Example
Let's say you have an RGB light of some kind in your bedroom that's [hooked up already to Home Assistant](https://www.home-assistant.io/components/light.mqtt). You'd like to be able to say things like "*set the bedroom light to red*" to change its color. To start, let's write a [Home Assistant automation](https://www.home-assistant.io/docs/automation/action/) to help you out:
automation:
# Change the light in the bedroom to red.
trigger:
...
action:
service: light.turn_on
data:
rgb_color: [255, 0, 0]
entity_id: light.bedroom
Now you just need the trigger! Rhasspy will send events that can be caught with the [event trigger platform](https://www.home-assistant.io/docs/automation/trigger/#event-trigger). A different event will be sent for each *intent* that you define, with slot values corresponding to important parts of the command (like light name and color). Let's start by defining an intent in Rhasspy called `ChangeLightColor` that can be said a few different ways:
[ChangeLightColor]
colors = (red | green | blue) {color}
set [the] (bedroom){name} [to] <colors>
This is a [simplified JSGF grammar](doc/sentences/md) that will generate the following sentences:
* set the bedroom to red
* set the bedroom to green
* set the bedroom to blue
* set the bedroom red
* set the bedroom green
* set the bedroom blue
* set bedroom to red
* set bedroom to green
* set bedroom to blue
* set bedroom red
* set bedroom green
* set bedroom blue
Rhasspy uses these sentences to create an [ARPA language model](https://cmusphinx.github.io/wiki/arpaformat/) for speech recognition, and also train an intent recognizer that can extract relevant parts of the command. The `{color}` tag in the `colors` rule will make Rhasspy put a `color` property in each event with the name of the recognized color (red, green, or blue). Likewise, the `{name}` tag on `bedroom` will add a `name` property to the event.
If trained on these sentences, Rhasspy will now recognize commands like "*set the bedroom light to red*" and send a `rhasspy_ChangeLightState` to Home Assistant with the following data:
{
"name": "bedroom",
"color": "red"
}
You can now fill in the rest of the Home Assistant automation:
automation:
# Change the light in the bedroom to red.
trigger:
platform: event
event_type: rhasspy_ChangeLightState
event_data:
name: bedroom
color: red
action:
service: light.turn_on
data:
rgb_color: [255, 0, 0]
entity_id: light.bedroom
This will handle the specific case of setting the bedroom light to red, but not any other color. You can either add additional automations to handle these, or make use of [automation templating](https://www.home-assistant.io/docs/automation/templating/) to do it all at once.
Intended Audience
---------------------
## Intended Audience
Rhasspy is intended for advanced users that want to have a voice interface to Home Assistant, but value **privacy** and **freedom** above all else. There are many other voice assistants, but none (to my knowledge) that:
+138 -29
View File
@@ -2,20 +2,22 @@
Rhasspy should run in a variety of software environments, including:
* Within a [Docker](https://www.docker.com/) container
* As a [Hass.io add-on](https://www.home-assistant.io/addons/)
* Inside a [Python virtual environment](https://docs.python-guide.org/dev/virtualenvs/)
* Within a [Docker](#docker) container
* As a [Hass.io add-on](#hassio)
* Inside a [Python virtual environment](#virtual-environment)
* Running as a [service](#running-as-a-service)
* Build [from source](#build-from-source)
### Docker
## Docker
The easiest way to try Rhasspy is with Docker. To get started, make sure you have [Docker installed](https://docs.docker.com/install/):
curl -sSL https://get.docker.com | sh
and that your user is part of the `docker` group:
sudo usermod -a -G docker $USER
**Be sure to reboot** after adding yourself to the `docker` group!
Next, start the [Rhasspy Docker image](https://hub.docker.com/r/synesthesiam/rhasspy-server) in the background:
@@ -27,9 +29,9 @@ Next, start the [Rhasspy Docker image](https://hub.docker.com/r/synesthesiam/rha
synesthesiam/rhasspy-server:latest \
--user-profiles /profiles \
--profile en
This will start Rhasspy with the English profile (`en`) in the background (`-d`) on port 12101 (`-p`) and give Rhasspy access to your microphone (`--device`). Any changes you make to [your profile](profiles.md) will be saved to `~/.config/rhasspy`.
Once it starts, Rhasspy's web interface should be accessible at [http://localhost:12101](http://localhost:12101). If something went wrong, trying running docker with `-it` instead of `-d` to see the output.
If you're using [docker compose](https://docs.docker.com/compose/), add the following to your `docker-compose.yml` file:
@@ -44,10 +46,25 @@ If you're using [docker compose](https://docs.docker.com/compose/), add the foll
devices:
- "/dev/snd:/dev/snd"
command: --user-profiles /profiles --profile en
### Updating Docker Image
### Hass.io
To update your Rhasspy Docker image, just run:
The second easiest was to install Rhasspy is as a [Hass.io add-on](https://www.home-assistant.io/addons/). Following the [installation instructions for Hass.io](https://www.home-assistant.io/hassio/installation/) before proceeding.
```bash
docker pull synesthesiam/rhasspy-server:latest
```
on your Rhasspy server and restart the Docker container. This may require running something like:
```bash
docker rm <container-name>
```
before doing a `docker run...`
## Hass.io
The second easiest way to install Rhasspy is as a [Hass.io add-on](https://www.home-assistant.io/addons/). Follow the [installation instructions for Hass.io](https://www.home-assistant.io/hassio/installation/) before proceeding.
To install the add-on, add my [Hass.IO Add-On Repository](https://github.com/synesthesiam/hassio-addons) in the Add-On Store, refresh, then install the "Rhasspy Assistant" under “Synesthesiam Hass.IO Add-Ons” (all the way at the bottom of the Add-On Store screen).
@@ -61,35 +78,127 @@ Before starting the add-on, make sure to give it access to your microphone and s
![Audio settings for Hass.io](img/hass-io-audio.png)
### Updating Hass.IO Add-On
### Virtual Environment
You should receive notifications when a new version of Rhasspy is available for Hass.IO. Follow the instructions from Hass.IO on how to update the add-on.
## Virtual Environment
Rhasspy can be installed into a Python virtual environment, though there are a number of requirements. This may be desirable, however, if you have trouble getting Rhasspy to access your microphone from within a Docker container. To start, clone the repo somewhere:
git clone https://github.com/synesthesiam/rhasspy.git
```bash
git clone https://github.com/synesthesiam/rhasspy.git
```
Then run the `download-dependencies.sh` and `create-venv.sh` scripts (assumes a Debian distribution):
cd rhasspy/
./download-dependencies.sh
./create-venv.sh
```bash
cd rhasspy/
./download-dependencies.sh
./create-venv.sh
```
Once the installation finishes (5-10 minutes on a Raspberry Pi 3), you can use the `run-venv.sh` script to start Rhasspy:
./run-venv.sh --profile en
```bash
./run-venv.sh --profile en
```
If all is well, the web interface will be available at [http://localhost:12101](http://localhost:12101)
### Software Requirements
### Updating Virtual Environment
At its core, Rhasspy requires:
To update your Rhasspy virtual environment to the latest version, run:
```bash
git pull origin master
```
in your `rhasspy` directory, and then update your Python dependencies:
```bash
source .venv/bin/activate
pip3 install -r requirements.txt
```
You should also re-build the web interface:
1. Install [yarn](https://yarnpkg.com) on your system
2. Run `yarn build` in the `rhasspy` directory
3. Restart any running instances of Rhasspy
### Running as a Service
Once installed, Rhasspy can be run as a [systemd service](https://systemd.io/). An [example unit file](https://github.com/synesthesiam/rhasspy/blob/master/etc/rhasspy.service) is available (thanks [UnderpantsGnome](https://github.com/UnderpantsGnome)):
```
[Unit]
Description=Rhasspy
After=syslog.target network.target
[Service]
Type=simple
WorkingDirectory=/home/<USER>/path/to/rhasspy
ExecStart=/bin/bash -lc './run-venv.sh --profile <LANGUAGE>'
RestartSec=1
Restart=on-failure
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=rhasspy
[Install]
WantedBy=multi-user.target
```
* Replace `/home/<USER>/path/to/rhasspy` with the full path to your Rhasspy installation (where `run-venv.sh` is).
* Replace `<LANGUAGE>` with your profile language (e.g., `en`)
Create a file named `rhasspy.service` in the `/home/<USER>/.config/systemd/user` directory (you may need to create the directory itself). Once the file has been saved, run:
```bash
systemctl --user daemon-reload
```
Then, you can start Rhasspy with:
```bash
systemctl --user start rhasspy
```
If you'd like Rhasspy to start on boot, run:
```bash
systemctl --user enable --now rhasspy
```
## Build From Source
The `create-venv.sh` script uses [pre-compiled binaries](https://github.com/synesthesiam/rhasspy/releases/tag/v2.0) for Rhasspy's required tools:
* [OpenFST](https://www.openfst.org)
* [Opengrm](http://www.opengrm.org/twiki/bin/view/GRM/NGramLibrary)
* [Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus)
* [Kaldi](https://kaldi-asr.org)
The [build-from-source.sh](https://github.com/synesthesiam/rhasspy/blob/master/build-from-source.sh) attempts to build all of these tools from source. The binary artifacts (command-line tools, shared libraries) are installed into the `bin` and `lib` directories of a Python virtual environment. The `run-venv.sh` script automatically adds these directories to `PATH` and `LD_LIBRARY_PATH` before starting Rhasspy.
### Swap Size
On low memory devices like the Raspberry Pi, building the tools above can quickly consume the entire RAM. Before building, it's highly recommended that you increase the available swap space by several gigabytes:
1. Edit `/etc/dphys-swapfile`
2. Change `CONF_SWAPSIZE` to something large, like 2048 (2GB)
3. Reboot
### Kaldi
You can skip building Kaldi if you plan to just [use Pocketsphinx](speech-to-text.md#pocketsphinx) for speech recognition.
### Updating Source Install
Follow the same instructions as [updating a virtual environment](#updating-virtual-environment).
* Linux
* Python 3.6
* [Flask](https://pypi.org/project/Flask/) web server, including
* [flask-swagger-ui](https://pypi.org/project/flask-swagger-ui/) for HTTP API documentation
* [Flask-Cors](https://pypi.org/project/Flask-Cors/) for [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) stuff
* [Flask-Sockets](https://pypi.org/project/Flask-Sockets/) for websocket support
* [pydash](https://pypi.org/project/pydash/) utility library
To actually use any components, however, requires a lot of [extra software](about.md#supporting-tools).
+72 -3
View File
@@ -1,6 +1,10 @@
# Intent Handling
After a voice command has been transcribed and your intent has been successfully recognized, Rhasspy is ready to send a JSON event to Home Assistant or Node-RED.
After a voice command has been transcribed and your intent has been successfully recognized, Rhasspy is ready to send a JSON event to another system like Home Assistant or Node-RED.
* [Home Assistant](#home-assistant)
* [Remote Server](#remote-server)
* [Command](#command)
Regardless of which intent handling system you choose, Rhasspy emits JSON events [over a websocket connection](usage.md#websocket-events).
@@ -112,10 +116,60 @@ Set `home_assistant.pem_file` to the full path to your <a href="http://docs.pyth
Use the environment variable `RHASSPY_PROFILE_DIR` to reference your current profile's directory. For example, `$RHASSPY_PROFILE_DIR/my.pem` will tell Rhasspy to use a file named `my.pem` in your profile directory when verifying your self-signed certificate.
## Remote Server
Rhasspy can POST the intent JSON to a remote URL.
Add to your [profile](profiles.md):
```json
"handle": {
"system": "remote",
"remote": {
"url": "http://<address>:<port>/path/to/endpoint"
}
}
```
When an intent is recognized, Rhasspy will POST to `handle.remote.url` with the intent JSON. You should **return JSON** back, optionally with additional information. If `handle.forward_to_hass` is `true`, Rhasspy will look for a `hass_event` property of the returned JSON with the following structure:
```json
{
// rest of input JSON
// ...
"hass_event": {
"event_type": "...",
"event_data": {
"key": "value",
// ...
}
}
}
```
Rhasspy will create the Home Assistant event based on this information. If it is **not** present, the remaining intent information will be used to construct the event as normal (i.e., `intent` and `entities`). If `handle.forward_to_hass` is `false`, the output of your program is not used.
### Speech
If the returned JSON contains a "speech" key like this:
```json
{
...
"speech": {
"text": "Some text to speak."
}
}
```
then Rhasspy will forward `speech.text` to the configured [text to speech](text-to-speech.md) system.
See `rhasspy.intent_handler.RemoteIntentHandler` for details.
## Command
Once an intent is successfully recognized, Rhasspy will send an event to Home Assistant with the details. You can call a custom program instead *or in addition* to this behavior.
Add to your [profile](profiles.md):
```json
@@ -144,7 +198,7 @@ When an intent is recognized, Rhasspy will call your custom program with the int
}
}
```
Rhasspy will create the Home Assistant event based on this information. If it is **not** present, the remaining intent information will be used to construct the event as normal (i.e., `intent` and `entities`). If `handle.forward_to_hass` is `false`, the output of your program is not used.
The following environment variables are available to your program:
@@ -155,6 +209,21 @@ The following environment variables are available to your program:
See [handle.sh](https://github.com/synesthesiam/rhasspy/blob/master/bin/mock-commands/handle.sh) for an example program.
### Speech
If the returned JSON contains a "speech" key like this:
```json
{
...
"speech": {
"text": "Some text to speak."
}
}
```
then Rhasspy will forward `speech.text` to the configured [text to speech](text-to-speech.md) system.
See `rhasspy.intent_handler.CommandIntentHandler` for details.
## Dummy
+10 -4
View File
@@ -1,6 +1,6 @@
# Intent Recognition
After your voice command has been transcribed by the [speech to text](speech-to-text.md) system, the next step is to recognize your intent.
After your voice command has been transcribed by the [speech to text](speech-to-text.md) system, the next step is to recognize your intent.
The end result is a JSON event with information about the intent.
The following table summarizes the trade-offs of using each intent recognizer:
@@ -61,7 +61,7 @@ Add to your [profile](profiles.md):
```json
"intent": {
"system": "adapt",
"system": "adapt",
"adapt": {
"stop_words": "stop_words.txt"
}
@@ -80,7 +80,7 @@ Add to your [profile](profiles.md):
```json
"intent": {
"system": "flair",
"system": "flair",
"flair": {
"data_dir": "flair_data",
"max_epochs": 25,
@@ -155,6 +155,12 @@ Because Home Assistant will already handle your intent (probably using an [inten
See `rhasspy.intent.HomeAssistantConversationRecognizer` for details.
## MQTT/Hermes
Publishes intent recognitions/failures to `hermes/intent/<INTENT_NAME>` or `hermes/nlu/intentNotRecognized` ([Hermes protocol](https://docs.snips.ai/reference/hermes)).
This is enabled by default and controlled by the `mqtt.publish_intents` setting in your [profile](profiles.md).
## Command
Recognizes intents from text using a custom external program.
@@ -190,7 +196,7 @@ When a voice command is successfully transcribed, your program will be called wi
"text": "set the bedroom light to red"
}
```
The following environment variables are available to your program:
* `$RHASSPY_BASE_DIR` - path to the directory where Rhasspy is running from
+2 -220
View File
@@ -40,226 +40,8 @@ If you need to install Rhasspy onto a machine that is not connected to the inter
2. `fr-g2p.tar.gz`
3. `fr-small.lm.gz`
If your user profile directory is `$HOME/.config/rhasspy/profiles`, then you should download/copy all three artifacts to `$HOME/.config/rhasspy/profiles/fr/download` on the offline machine. Now, when Rhasspy loads the `fr` profile and you click "Download", it will extract the files in the `download` directory without going out to the internet.
If you want to know precisely which files Rhasspy is looking for for a given profile, visit the `profiles` directory in [the source code](https://github.com/synesthesiam/rhasspy/tree/master/profiles) and examine these scripts in that profile's directory:
* `download-profile.sh`
* Downloads and extracts all required binary artifacts. Uses cache in `download` directory unless `--delete` option is given.
* `check-profile.sh`
* Verifies that required binary artifacts are present. Returns non-zero exit code if download is required.
If your user profile directory is `$HOME/.config/rhasspy/profiles`, then you should download/copy all three artifacts to `$HOME/.config/rhasspy/profiles/fr/download` on the offline machine. Now, when Rhasspy loads the `fr` profile and you click "Download", it will extract the files in the `download` directory without going out to the internet.
## Available Settings
All available profile sections and settings are listed below:
* `rhasspy` - configuration for Rhasspy assistant
* `preload_profile` - true if speech/intent recognizers should be loaded immediately for default profile (default: `true`)
* `listen_on_start` - true if Rhasspy should listen for wake word at startup (default: `true`)
* `load_timeout_sec` - number of seconds to wait for internal actors before proceeding with start up
* `home_assistant` - how to communicate with Home Assistant/Hass.io
* `url` - Base URL of Home Assistant server (no `/api`)
* `access_token` - long-lived access token for Home Assistant (Hass.io token is used automatically)
* `api_password` - Password, if you have that enabled (deprecated)
* `pem_file` - Full path to your <a href="http://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification">CA_BUNDLE file or a directory with certificates of trusted CAs</a>
* `event_type_format` - Python format string used to create event type from intent type (`{0}`)
* `speech_to_text` - transcribing [voice commands to text](speech-to-text.md)
* `system` - name of speech to text system (`pocketsphinx`, `remote`, `command`, or `dummy`)
* `pocketsphinx` - configuration for [Pocketsphinx](speech-to-text.md#pocketsphinx)
* `compatible` - true if profile can use pocketsphinx for speech recognition
* `acoustic_model` - directory with CMU 16Khz acoustic model
* `base_dictionary` - large text file with word pronunciations (read only)
* `custom_words` - small text file with words/pronunciations added by user
* `dictionary` - text file with all words/pronunciations needed for example sentences
* `unknown_words` - small text file with guessed word pronunciations (from phonetisaurus)
* `language_model` - text file with trigram [ARPA language model](https://cmusphinx.github.io/wiki/arpaformat/) built from example sentences
* `open_transcription` - true if general language model should be used (custom voices commands ignored)
* `base_language_model` - large general language model (read only)
* `mllr_matrix` - MLLR matrix from [acoustic model tuning](https://cmusphinx.github.io/wiki/tutorialtuning/)
* `mix_weight` - how much of the base language model to [mix in during training](training.md#language-model-mixing) (0-1)
* `mix_fst` - path to save mixed ngram FST model
* `kaldi` - configuration for [Kaldi](speech-to-text.md#kaldi)
* `compatible` - true if profile can use Kaldi for speech recognition
* `kaldi_dir` - absolute path to Kaldi root directory
* `model_dir` - directory where Kaldi model is stored (relative to profile directory)
* `graph` - directory where HCLG.fst is located (relative to `model_dir`)
* `base_graph` - directory where large general HCLG.fst is located (relative to `model_dir`)
* `base_dictionary` - large text file with word pronunciations (read only)
* `custom_words` - small text file with words/pronunciations added by user
* `dictionary` - text file with all words/pronunciations needed for example sentences
* `open_transcription` - true if general language model should be used (custom voices commands ignored)
* `unknown_words` - small text file with guessed word pronunciations (from phonetisaurus)
* `mix_weight` - how much of the base language model to [mix in during training](training.md#language-model-mixing) (0-1)
* `mix_fst` - path to save mixed ngram FST model
* `remote` - configuration for [remote Rhasspy server](speech-to-text.md#remote-http-server)
* `url` - URL to POST WAV data for transcription (e.g., `http://your-rhasspy-server:12101/api/speech-to-text`)
* `command` - configuration for [external speech-to-text program](speech-to-text.md#command)
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `sentences_ini` - Ini file with example [sentences/JSGF templates](training.md#sentencesini) grouped by intent
* `g2p_model` - finite-state transducer for phonetisaurus to guess word pronunciations
* `g2p_casing` - casing to force for g2p model (`upper`, `lower`, or blank)
* `dictionary_casing` - casing to force for dictionary words (`upper`, `lower`, or blank)
* `grammars_dir` - directory to write generated JSGF grammars from sentences ini file
* `fsts_dir` - directory to write generated finite state transducers from JSGF grammars
* `intent` - transforming text commands to intents
* `system` - intent recognition system (`fsticuffs`, `fuzzywuzzy`, `rasa`, `remote`, `adapt`, `command`, or `dummy`)
* `fsticuffs` - configuration for [OpenFST-based](https://www.openfst.org) intent recognizer
* `intent_fst` - path to generated finite state transducer with all intents combined
* `ignore_unknown_words` - true if words not in the FST symbol table should be ignored
* `fuzzy` - true if text is matching in a fuzzy manner, skipping words in `stop_words.txt`
* `fuzzywuzzy` - configuration for simplistic [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) based intent recognizer
* `examples_json` - JSON file with intents/example sentences
* `min_confidence` - minimum confidence required for intent to be converted to a JSON event (0-1)
* `remote` - configuration for remote Rhasspy server
* `url` - URL to POST text to for intent recognition (e.g., `http://your-rhasspy-server:12101/api/text-to-intent`)
* `rasa` - configuration for [Rasa NLU](https://rasa.com/) based intent recognizer
* `url` - URL of remote Rasa NLU server (e.g., `http://localhost:5005/`)
* `examples_markdown` - Markdown file to generate with intents/example sentences
* `project_name` - name of project to generate during training
* `adapt` - configuration for [Mycroft Adapt](https://github.com/MycroftAI/adapt) based intent recognizer
* `stop_words` - text file with words to ignore in training sentences
* `command` - configuration for external speech-to-text program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `text_to_speech` - pronouncing words
* `system` - text to speech system (`espeak`, `flite`, `picotts`, `marytts`, `command`, or `dummy`)
* `espeak` - configuration for [eSpeak](http://espeak.sourceforge.net)
* `phoneme_map` - text file mapping CMU phonemes to eSpeak phonemes
* `flite` - configuration for [flite](http://www.festvox.org/flite)
* `voice` - name of voice to use (e.g., `kal16`, `rms`, `awb`)
* `picotts` - configuration for [PicoTTS](https://en.wikipedia.org/wiki/SVOX)
* `language` - language to use (default if not present)
* `marytts` - configuration for [MaryTTS](http://mary.dfki.de)
* `url` - address:port of MaryTTS server (port is usually 59125)
* `voice` - name of voice to use (e.g., `cmu-slt`). Default if not present.
* `locale` - name of locale to use (e.g., `en-US`). Default if not present.
* `wavenet` - configuration for Google's [WaveNet](https://cloud.google.com/text-to-speech/docs/wavenet)
* `cache_dir` - path to directory in your profile where WAV files are cached
* `credentials_json` - path to the JSON credentials file (generated online)
* `gender` - gender of speaker (`MALE` `FEMALE`)
* `language_code` - language/locale e.g. `en-US`,
* `sample_rate` - WAV sample rate (default: 22050)
* `url` - URL of WaveNet endpoint
* `voice` - voice to use (e.g., `Wavenet-C`)
* `fallback_tts` - text to speech system to use when offline or error occurs (e.g., `espeak`)
* `phoneme_examples` - text file with examples for each CMU phoneme
* `training` - training speech/intent recognizers
* `dictionary_number_duplicates` - true if duplicate words in dictionary should be suffixed by `(2)`, `(3)`, etc.
* `tokenizer` - system used to break sentences into words (`regex` only for now)
* `regex` - configuration for regex tokenizer
* `replace` - list of dictionaries with patterns/replacements used on each example sentence
* `split` - pattern used to break sentences into words
* `unknown_words` - configuration for dealing with words not in base/custom dictionaries
* `fail_when_present` - true if Rhasspy should halt training when unknown words are found
* `guess_pronunciations` - true if [Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus) should be used to guess how an unknown word is pronounced
* `speech_to_text` - training for speech decoder
* `system` - speech to text training system (`auto`, `pocketsphinx`, `kaldi`, `command`, or `dummy`)
* `command` - configuration for external speech-to-text training program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `intent` - training for intent recognizer
* `system` - intent recognizer training system (`auto`, `fsticuffs`, `fuzzywuzzy`, `rasa`, `adapt`, `command`, or `dummy`)
* `command` - configuration for external intent recognizer training program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `wake` - waking Rhasspy up for speech input
* `system` - wake word recognition system (`pocketsphinx`, `snowboy`, `precise`, `porcupine`, `command`, or `dummy`)
* `pocketsphinx` - configuration for Pocketsphinx wake word recognizer
* `keyphrase` - phrase to wake up on (3-4 syllables recommended)
* `threshold` - sensitivity of detection (recommended range 1e-50 to 1e-5)
* `chunk_size` - number of bytes per chunk to feed to Pocketsphinx (default 960)
* `snowboy` - configuration for [snowboy](https://snowboy.kitt.ai)
* `model` - path to model file (in profile directory)
* `sensitivity` - model sensitivity (0-1, default 0.5)
* `audio_gain` - audio gain (default 1)
* `chunk_size` - number of bytes per chunk to feed to snowboy (default 960)
* `precise` - configuration for [Mycroft Precise](https://github.com/MycroftAI/mycroft-precise)
* `engine_path` - path to the precise-engine binary
* `model` - path to model file (in profile directory)
* `sensitivity` - model sensitivity (0-1, default 0.5)
* `trigger_level` - number of events to trigger activation (default 3)
* `chunk_size` - number of bytes per chunk to feed to Precise (default 2048)
* `porcupine` - configuration for [PicoVoice's Porcupine](https://github.com/Picovoice/Porcupine)
* `library_path` - path to `libpv_porcupine.so` for your platform/architecture
* `model_path` - path to the `porcupine_params.pv` (lib/common)
* `keyword_path` - path to the `.ppn` keyword file
* `sensitivity` - model sensitivity (0-1, default 0.5)
* `command` - configuration for external speech-to-text program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `microphone` - configuration for audio recording
* `system` - audio recording system (`pyaudio`, `arecord`, `hermes`, `http`, or `dummy`)
* `pyaudio` - configuration for [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/) microphone
* `device` - index of device to use or empty for default device
* `frames_per_buffer` - number of frames to read at a time (default 480)
* `arecord` - configuration for ALSA microphone
* `device` - name of ALSA device (see `arecord -L`) to use or empty for default device
* `chunk_size` - number of bytes to read at a time (default 960)
* `http` - configuration for HTTP audio stream
* `host` - hostname or IP address of HTTP audio server (default 127.0.0.1)
* `port` - port to receive audio stream on (default 12333)
* `stop_after` - one of "never", "text", or "intent" ([see documentation](audio-input.md#http-stream))
* `gstreamer` - configuration for GStreamer audio recorder
* `pipeline` - GStreamer pipeline (e.g., `FILTER ! FILTER ! ...`) without sink
* `hermes` - configuration for MQTT "microphone" ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol))
* Subscribes to WAV data from `hermes/audioServer/<SITE_ID>/audioFrame`
* Requires MQTT to be enabled
* `sounds` - configuration for feedback sounds from Rhasspy
* `system` - which sound output system to use (`aplay`, `hermes`, or `dummy`)
* `wake` - path to WAV file to play when Rhasspy wakes up
* `recorded` - path to WAV file to play when a command finishes recording
* `aplay` - configuration for ALSA speakers
* `device` - name of ALSA device (see `aplay -L`) to use or empty for default device
* `hermes` - configuration for MQTT "speakers" ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol))
* WAV data published to `hermes/audioServer/<SITE_ID>/playBytes/<REQUEST_ID>`
* Requires MQTT to be enabled
* `command`
* `system` - which voice command listener system to use (`webrtcvad`, `oneshot`, `hermes`, or `dummy`)
* `webrtcvad` - configuration for [webrtcvad](https://github.com/wiseman/py-webrtcvad) system
* `sample_rate` - sample rate of input audio
* `chunk_size` - bytes per buffer (must be 10,20,30 ms)
* `vad_mode` - sensitivity of `webrtcvad` (0-3)
* `min_sec` - minimum number of seconds in a command
* `silence_sec` - number of seconds of silences after voice command before stopping
* `timeout_sec` - maximum number of seconds before stopping
* `throwaway_buffers` - number of buffers to drop when recording starts
* `speech_buffers` - number of buffers with speech before command starts
* `oneshot` - configuration for voice command system that takes first audio frame as entire command
* `timeout_sec` - maximum number of seconds before stopping
* `command` - configuration for external voice command program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `hermes` - configuration for MQTT-based voice command system that listens betweens `startListening` and `stopListening` commands ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol))
* `timeout_sec` - maximum number of seconds before stopping
* `handle`
* `system` - which intent handling system to use (`hass`, `command`, or `dummy`)
* `forward_to_hass` - true if intents are always forwarded to Home Assistant (even if `system` is `command`)
* `command` - configuration for external speech-to-text program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `mqtt` - configuration for MQTT ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol))
* `enabled` - true if MQTT client should be started
* `host` - MQTT host
* `port` - MQTT port
* `username` - MQTT username (blank for anonymous)
* `password` - MQTT password
* `reconnect_sec` - number of seconds before client will reconnect
* `site_id` - ID of site ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol))
* `publish_intents` - true if intents are published to MQTT
* `tuning` - configuration for acoustic model tuning
* `system` - system for tuning (currently only `sphinxtrain`)
* `sphinxtrain` - configuration for [sphinxtrain](https://github.com/cmusphinx/sphinxtrain) based acoustic model tuning
* `mllr_matrix` - name of generated MLLR matrix (should match `speech_to_text.pocketsphinx.mllr_matrix`)
* `download` - configuration for profile file downloading
* `cache_dir` - directory in your profile where downloaded files are cached
* `conditions` - profile settings that will trigger file downloads
* keys are profile setting paths (e.g., `wake.system`)
* values are dictionaries whose keys are profile settings values (e.g., `snowboy`)
* settings may have the form `<=N` or `!X` to mean "less than or equal to N" or "not X"
* leaf nodes are dictionaries whose keys are destination file paths and whose values reference the `files` dictionary
* `files` - locations, etc. of files to download
* keys are names of files
* values are dictionaries with:
* `url` - URL of file to download
* `cache` - `false` if file should be downloaded directly into profile (skipping cache)
See [the reference](reference.md#profile-settings) for all available profile settings.
+601
View File
@@ -0,0 +1,601 @@
# Reference
* [Supported Languages](#supported-languages)
* [HTTP API](#http-api)
* [Websocket API](#websocket-api)
* [MQTT API](#mqtt-api)
* [Command Line](#command-line)
* [Profile Settings](#profile-settings)
## Supported Languages
The table below lists which components and compatible with Rhasspy's supported languages.
| Category | Name | Offline? | en | de | es | fr | it | nl | ru | el | hi | zh | vi | pt | sv | ca |
| -------- | ------ | -------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- |
| **Wake Word** | [pocketsphinx](wake-word.md#pocketsphinx) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | | | |
| | [porcupine](wake-word.md#porcupine) | &#x2713; | &#x2713; | | | | | | | | | | | | | |
| | [snowboy](wake-word.md#snowboy) | *requires account* | &#x2713; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; |
| | [precise](wake-word.md#mycroft-precise) | &#x2713; | &#x2713; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; | &bull; |
| **Speech to Text** | [pocketsphinx](speech-to-text.md#pocketsphinx) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | | &#x2713; |
| | [kaldi](speech-to-text.md#kaldi) | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | | &#x2713; | | | | | &#x2713; | | &#x2713; | |
| **Intent Recognition** | [fsticuffs](intent-recognition.md#fsticuffs) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [fuzzywuzzy](intent-recognition.md#fuzzywuzzy) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [adapt](intent-recognition.md#mycroft-adapt) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [flair](intent-recognition.md#flair) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | | | | | | &#x2713; | | &#x2713; |
| | [rasaNLU](intent-recognition.md#rasanlu) | *needs extra software* | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| **Text to Speech** | [espeak](text-to-speech.md#espeak) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [flite](text-to-speech.md#flite) | &#x2713; | &#x2713; | | | | | | | | &#x2713; | | | | | |
| | [picotts](text-to-speech.md#picotts) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | | | | | | | | |
| | [marytts](text-to-speech.md#marytts) | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | | &#x2713; | | | | | | | |
| | [wavenet](text-to-speech.md#google-wavenet) | | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | |
&bull; - yes, but requires training/customization
## HTTP API
Rhasspy's HTTP endpoints are documented below. You can also visit `/api/` in your Rhasspy server (note the final slash) to try out each endpoint.
Application authors may want to use the [rhasspy-client](https://pypi.org/project/rhasspy-client/), which provides a high-level interface to a remote Rhasspy server.
### Endpoints
* `/api/custom-words`
* GET custom word dictionary as plain text, or POST to overwrite it
* See `custom_words.txt` in your profile directory
* `/api/download-profile`
* Force Rhasspy to re-download profile
* `?delete=true` - clear download cache
* `/api/listen-for-command`
* POST to wake Rhasspy up and start listening for a voice command
* Returns intent JSON when command is finished
* `?nohass=true` - stop Rhasspy from handling the intent
* `?timeout=<seconds>` - override default command timeout
* `?entity=<entity>&value=<value>` - set custom entity/value in recognized intent
* `/api/listen-for-wake-word`
* POST to wake Rhasspy up and return immediately
* `/api/lookup`
* POST word as plain text to look up or guess pronunciation
* `?n=<number>` - return at most `n` guessed pronunciations
* `/api/microphones`
* GET list of available microphones
* `/api/phonemes`
* GET example phonemes from speech recognizer for your profile
* See `phoneme_examples.txt` in your profile directory
* `/api/play-wav`
* POST to play WAV data
* `/api/profile`
* GET the JSON for your profile, or POST to overwrite it
* `?layers=profile` to only see settings different from `defaults.json`
* See `profile.json` in your profile directory
* `/api/restart`
* Restart Rhasspy server
* `/api/sentences`
* GET voice command templates or POST to overwrite
* Set `Accept: application/json` to GET JSON with all sentence files
* Set `Content-Type: application/json` to POST JSON with sentences for multiple files
* See `sentences.ini` and `intents` directory in your profile
* `/api/slots`
* GET slot values as JSON or POST to add to/overwrite them
* `?overwrite_all=true` to clear slots in JSON before writing
* `/api/speakers`
* GET list of available audio output devices
* `/api/speech-to-intent`
* POST a WAV file and have Rhasspy process it as a voice command
* Returns intent JSON when command is finished
* `?nohass=true` - stop Rhasspy from handling the intent
* `/api/speech-to-text`
* POST a WAV file and have Rhasspy return the text transcription
* Set `Accept: application/json` to receive JSON with more details
* `?noheader=true` - send raw 16-bit 16Khz mono audio without a WAV header
* `/api/start-recording`
* POST to have Rhasspy start recording a voice command
* `/api/stop-recording`
* POST to have Rhasspy stop recording and process recorded data as a voice command
* Returns intent JSON when command has been processed
* `?nohass=true` - stop Rhasspy from handling the intent
* `/api/test-microphones`
* GET list of available microphones and if they're working
* `/api/text-to-intent`
* POST text and have Rhasspy process it as command
* Returns intent JSON when command has been processed
* `?nohass=true` - stop Rhasspy from handling the intent
* `/api/text-to-speech`
* POST text and have Rhasspy speak it
* `?play=false` - get WAV data instead of having Rhasspy speak
* `?voice=<voice>` - override default TTS voice
* `?language=<language>` - override default TTS language or locale
* `?repeat=true` - have Rhasspy repeat the last sentence it spoke
* `/api/train`
* POST to re-train your profile
* `?nocache=true` - re-train profile from scratch
* `/api/unknown-words`
* GET words that Rhasspy doesn't know in your sentences
* See `unknown_words.txt` in your profile directory
## Websocket API
* `/api/events/intent`
* Listen for recognized intents published as JSON
* `/api/events/log`
* Listen for log messages published as plain text
## MQTT API
Rhasspy implements part of the [Hermes](https://docs.snips.ai/reference/hermes) protocol. Various services of Rhasspy can be configured to pass along MQTT messages or to react to MQTT messages following the Hermes protocol.
* `hermes/audioServer/<SITE_ID>/playBytes/<REQUEST_ID>`
* Rhasspy publishes audio in WAV format to this topic. By default it is 16 kHz, 16-bit mono for compatibility reasons, but other types are possible too.
* `SITE_ID` is set in Rhasspy's `mqtt` configuration.
* `REQUEST_ID` is generated using `uuid.uuid4` each time a sound is played.
* `hermes/audioServer/<SITE_ID>/audioFrame`
* Rhasspy listens to this topic for WAV data. Audio is automatically converted to 16 kHz, 16-bit mono audio and played.
* `SITE_ID` is set in Rhasspy's `mqtt` configuration.
* `hermes/asr/startListening`
* Rhasspy wakes up and starts recording on receiving this topic.
* The payload is a JSON object with a `siteId` key that holds Rhasspy's site ID.
* `hermes/asr/stopListening`
* Rhasspy stops recording and processes the voice command on receiving this topic.
* The payload is a JSON object with a `siteId` key that holds Rhasspy's site ID.
* `hermes/intent/<INTENT_NAME>`
* Rhasspy publishes a message to this topic on recognition of an intent.
* The payload is a JSON object with the recognized intent, entities and text.
* `hermes/nlu/intentNotRecognized`
* Rhasspy publishes a message to this topic when it doesn't recognize an intent.
* `hermes/asr/textCaptured`
* Rhasspy publishes a transcription to this topic each time a voice command is recognized.
* `hermes/hotword/<WAKEWORD_ID>/detected`
* Rhasspy wakes up when a message is received on this topic.
## Command Line
Rhasspy provides a powerful [command-line interface](usage.md#command-line) called `rhasspy-cli`.
For `rhasspy-cli --profile <PROFILE_NAME> <COMMAND> <ARGUMENTS>`, `<COMMAND>` can be:
* `info`
* Print profile JSON to standard out
* Add `--defaults` to only print settings from `defaults.json`
* `wav2text`
* Convert WAV file(s) to text
* `wav2intent`
* Convert WAV file(s) to intent JSON
* Add `--handle` to have Rhasspy send events to Home Assistant
* `text2intent`
* Convert text command(s) to intent JSON
* Add `--handle` to have Rhasspy send events to Home Assistant
* `train`
* Re-train your profile
* `mic2wav`
* Listen for a voice command and output WAV data
* Add `--timeout <SECONDS>` to stop recording after some number of seconds
* `mic2text`
* Listen for a voice command and convert it to text
* Add `--timeout <SECONDS>` to stop recording after some number of seconds
* `mic2intent`
* Listen for a voice command output intent JSON
* Add `--handle` to have Rhasspy send events to Home Assistant
* Add `--timeout <SECONDS>` to stop recording after some number of seconds
* `word2phonemes`
* Print the CMU phonemes for a word (possibly unknown)
* Add `-n <COUNT>` to control the maximum number of guessed pronunciations
* `word2wav`
* Pronounce a word (possibly unknown) and output WAV data
* `text2speech`
* Speaks one or more sentences using Rhasspy's text to speech system
* `text2wav`
* Converts a single sentence to WAV using Rhasspy's text to speech system
* `sleep`
* Run Rhasspy and wait until wake word is spoken
* `download`
* Download necessary profile files from the internet
### Profile Operations
Print the complete JSON for the English profile with:
rhasspy-cli --profile en info
You can combine this with other commands, such as `jq` to get at specific pieces:
rhasspy-cli info --profile en | jq .wake.pocketsphinx.keyphrase
Output (JSON):
"okay rhasspy"
### Training
Retrain your the English profile with:
rhasspy-cli --profile en train
Add `--debug` before `train` for more information.
### Speech to Text/Intent
Convert a WAV file to text from stdin:
rhasspy-cli --profile en wav2text < what-time-is-it.wav
Output (text):
what time is it
Convert multiple WAV files:
rhasspy-cli --profile en wav2text what-time-is-it.wav turn-on-the-living-room-lamp.wav
Output (JSON)
```json
{
"what-time-is-it.wav": "what time is it",
"turn-on-the-living-room-lamp.wav": "turn on the living room lamp"
}
```
Convert multiple WAV file(s) to intents **and** handle them:
rhasspy-cli --profile en wav2intent --handle what-time-is-it.wav turn-on-the-living-room-lamp.wav
Output (JSON):
```json
{
"what_time_is_it.wav": {
"text": "what time is it",
"intent": {
"name": "GetTime",
"confidence": 1.0
},
"entities": []
},
"turn_on_living_room_lamp.wav": {
"text": "turn on the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "on"
},
{
"entity": "name",
"value": "living room lamp"
}
]
}
}
```
### Text to Intent
Handle a command as if it was spoken:
rhasspy-cli --profile en text2intent --handle "turn off the living room lamp"
Output (JSON):
```json
{
"turn off the living room lamp": {
"text": "turn off the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "off"
},
{
"entity": "name",
"value": "living room lamp"
}
]
}
}
```
### Record Your Voice
Save a voice command to a WAV:
rhasspy-cli --profile en mic2wav > my-voice-command.wav
You can listen to it with:
aplay my-voice-command.wav
### Test Your Wake Word
Start Rhasspy and wait for wake word:
rhasspy-cli --profile en sleep
Should exit and print the wake word when its spoken.
### Text to Speech
Have Rhasspy speak one or more sentences:
rhasspy-cli --profile en text2speech "We ride at dawn!"
Use a different text to speech system and voice:
rhasspy-cli --profile en \
--set 'text_to_speech.system' 'flite' \
--set 'text_to_speech.flite.voice' 'slt' \
text2speech "We ride at dawn!"
### Pronounce Words
Speak words Rhasspy doesn't know!
rhasspy-cli --profile en word2wav raxacoricofallapatorius | aplay
### Text to Speech to Text to Intent
Use the miracle of Unix pipes to have Rhasspy interpret voice commands from itself:
rhasspy-cli --profile en \
--set 'text_to_speech.system' 'picotts' \
text2wav "turn on the living room lamp" | \
rhasspy-cli --profile en wav2text | \
rhasspy-cli --profile en text2intent
Output (JSON):
```json
{
"turn on the living room lamp": {
"text": "turn on the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "on"
},
{
"entity": "name",
"value": "living room lamp"
}
],
"speech_confidence": 1,
"slots": {
"state": "on",
"name": "living room lamp"
}
}
}
```
## Profile Settings
All available profile sections and settings are listed below:
* `rhasspy` - configuration for Rhasspy assistant
* `preload_profile` - true if speech/intent recognizers should be loaded immediately for default profile (default: `true`)
* `listen_on_start` - true if Rhasspy should listen for wake word at startup (default: `true`)
* `load_timeout_sec` - number of seconds to wait for internal actors before proceeding with start up
* `home_assistant` - how to communicate with Home Assistant/Hass.io
* `url` - Base URL of Home Assistant server (no `/api`)
* `access_token` - long-lived access token for Home Assistant (Hass.io token is used automatically)
* `api_password` - Password, if you have that enabled (deprecated)
* `pem_file` - Full path to your <a href="http://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification">CA_BUNDLE file or a directory with certificates of trusted CAs</a>
* `event_type_format` - Python format string used to create event type from intent type (`{0}`)
* `speech_to_text` - transcribing [voice commands to text](speech-to-text.md)
* `system` - name of speech to text system (`pocketsphinx`, `kaldi`, `remote`, `command`, or `dummy`)
* `pocketsphinx` - configuration for [Pocketsphinx](speech-to-text.md#pocketsphinx)
* `compatible` - true if profile can use pocketsphinx for speech recognition
* `acoustic_model` - directory with CMU 16 kHz acoustic model
* `base_dictionary` - large text file with word pronunciations (read only)
* `custom_words` - small text file with words/pronunciations added by user
* `dictionary` - text file with all words/pronunciations needed for example sentences
* `unknown_words` - small text file with guessed word pronunciations (from phonetisaurus)
* `language_model` - text file with trigram [ARPA language model](https://cmusphinx.github.io/wiki/arpaformat/) built from example sentences
* `open_transcription` - true if general language model should be used (custom voices commands ignored)
* `base_language_model` - large general language model (read only)
* `mllr_matrix` - MLLR matrix from [acoustic model tuning](https://cmusphinx.github.io/wiki/tutorialtuning/)
* `mix_weight` - how much of the base language model to [mix in during training](training.md#language-model-mixing) (0-1)
* `mix_fst` - path to save mixed ngram FST model
* `kaldi` - configuration for [Kaldi](speech-to-text.md#kaldi)
* `compatible` - true if profile can use Kaldi for speech recognition
* `kaldi_dir` - absolute path to Kaldi root directory
* `model_dir` - directory where Kaldi model is stored (relative to profile directory)
* `graph` - directory where HCLG.fst is located (relative to `model_dir`)
* `base_graph` - directory where large general HCLG.fst is located (relative to `model_dir`)
* `base_dictionary` - large text file with word pronunciations (read only)
* `custom_words` - small text file with words/pronunciations added by user
* `dictionary` - text file with all words/pronunciations needed for example sentences
* `open_transcription` - true if general language model should be used (custom voices commands ignored)
* `unknown_words` - small text file with guessed word pronunciations (from phonetisaurus)
* `mix_weight` - how much of the base language model to [mix in during training](training.md#language-model-mixing) (0-1)
* `mix_fst` - path to save mixed ngram FST model
* `remote` - configuration for [remote Rhasspy server](speech-to-text.md#remote-http-server)
* `url` - URL to POST WAV data for transcription (e.g., `http://your-rhasspy-server:12101/api/speech-to-text`)
* `command` - configuration for [external speech-to-text program](speech-to-text.md#command)
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `sentences_ini` - Ini file with example [sentences/JSGF templates](training.md#sentencesini) grouped by intent
* `sentences_dir` - Directory with additional sentence templates (default: `intents`)
* `g2p_model` - finite-state transducer for phonetisaurus to guess word pronunciations
* `g2p_casing` - casing to force for g2p model (`upper`, `lower`, or blank)
* `dictionary_casing` - casing to force for dictionary words (`upper`, `lower`, or blank)
* `slots_dir` - directory to look for [slots lists](training.md#slots-lists) (default: `slots`)
* `slot_programs` - directory to look for [slot programs](training.md#slot-programs) (default `slot_programs`)
* `fsts_dir` - directory to write generated finite state transducers from JSGF grammars
* `intent` - transforming text commands to intents
* `system` - intent recognition system (`fsticuffs`, `fuzzywuzzy`, `rasa`, `remote`, `adapt`, `command`, or `dummy`)
* `fsticuffs` - configuration for [OpenFST-based](https://www.openfst.org) intent recognizer
* `intent_fst` - path to generated finite state transducer with all intents combined
* `converters_dir` - directory to look for [converter](training.md#converters) programs (default: `converters`)
* `ignore_unknown_words` - true if words not in the FST symbol table should be ignored
* `fuzzy` - true if text is matching in a fuzzy manner, skipping words in `stop_words.txt`
* `fuzzywuzzy` - configuration for simplistic [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) based intent recognizer
* `examples_json` - JSON file with intents/example sentences
* `min_confidence` - minimum confidence required for intent to be converted to a JSON event (0-1)
* `remote` - configuration for remote Rhasspy server
* `url` - URL to POST text to for intent recognition (e.g., `http://your-rhasspy-server:12101/api/text-to-intent`)
* `rasa` - configuration for [Rasa NLU](https://rasa.com/) based intent recognizer
* `url` - URL of remote Rasa NLU server (e.g., `http://localhost:5005/`)
* `examples_markdown` - Markdown file to generate with intents/example sentences
* `project_name` - name of project to generate during training
* `adapt` - configuration for [Mycroft Adapt](https://github.com/MycroftAI/adapt) based intent recognizer
* `stop_words` - text file with words to ignore in training sentences
* `command` - configuration for external speech-to-text program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `replace_numbers` if true, automatically replace number ranges (`N..M`) or numbers (`N`) with words
* `text_to_speech` - pronouncing words
* `system` - text to speech system (`espeak`, `flite`, `picotts`, `marytts`, `command`, or `dummy`)
* `espeak` - configuration for [eSpeak](http://espeak.sourceforge.net)
* `phoneme_map` - text file mapping CMU phonemes to eSpeak phonemes
* `flite` - configuration for [flite](http://www.festvox.org/flite)
* `voice` - name of voice to use (e.g., `kal16`, `rms`, `awb`)
* `picotts` - configuration for [PicoTTS](https://en.wikipedia.org/wiki/SVOX)
* `language` - language to use (default if not present)
* `marytts` - configuration for [MaryTTS](http://mary.dfki.de)
* `url` - address:port of MaryTTS server (port is usually 59125)
* `voice` - name of voice to use (e.g., `cmu-slt`). Default if not present.
* `locale` - name of locale to use (e.g., `en-US`). Default if not present.
* `wavenet` - configuration for Google's [WaveNet](https://cloud.google.com/text-to-speech/docs/wavenet)
* `cache_dir` - path to directory in your profile where WAV files are cached
* `credentials_json` - path to the JSON credentials file (generated online)
* `gender` - gender of speaker (`MALE` `FEMALE`)
* `language_code` - language/locale e.g. `en-US`,
* `sample_rate` - WAV sample rate (default: 22050)
* `url` - URL of WaveNet endpoint
* `voice` - voice to use (e.g., `Wavenet-C`)
* `fallback_tts` - text to speech system to use when offline or error occurs (e.g., `espeak`)
* `phoneme_examples` - text file with examples for each CMU phoneme
* `training` - training speech/intent recognizers
* `dictionary_number_duplicates` - true if duplicate words in dictionary should be suffixed by `(2)`, `(3)`, etc.
* `tokenizer` - system used to break sentences into words (`regex` only for now)
* `regex` - configuration for regex tokenizer
* `replace` - list of dictionaries with patterns/replacements used on each example sentence
* `split` - pattern used to break sentences into words
* `unknown_words` - configuration for dealing with words not in base/custom dictionaries
* `fail_when_present` - true if Rhasspy should halt training when unknown words are found
* `guess_pronunciations` - true if [Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus) should be used to guess how an unknown word is pronounced
* `speech_to_text` - training for speech decoder
* `system` - speech to text training system (`auto`, `pocketsphinx`, `kaldi`, `command`, or `dummy`)
* `command` - configuration for external speech-to-text training program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `intent` - training for intent recognizer
* `system` - intent recognizer training system (`auto`, `fsticuffs`, `fuzzywuzzy`, `rasa`, `adapt`, `command`, or `dummy`)
* `command` - configuration for external intent recognizer training program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `wake` - waking Rhasspy up for speech input
* `system` - wake word recognition system (`pocketsphinx`, `snowboy`, `precise`, `porcupine`, `command`, or `dummy`)
* `pocketsphinx` - configuration for Pocketsphinx wake word recognizer
* `keyphrase` - phrase to wake up on (3-4 syllables recommended)
* `threshold` - sensitivity of detection (recommended range 1e-50 to 1e-5)
* `chunk_size` - number of bytes per chunk to feed to Pocketsphinx (default 960)
* `snowboy` - configuration for [snowboy](https://snowboy.kitt.ai)
* `model` - path to model file(s), separated by commas (in profile directory)
* `sensitivity` - model sensitivity (0-1, default 0.5)
* `audio_gain` - audio gain (default 1)
* `apply_frontend` - true if ApplyFrontend should be set
* `chunk_size` - number of bytes per chunk to feed to snowboy (default 960)
* `model_settings` - settings for each snowboy model path (e.g., `snowboy/snowboy.umdl`)
* `<MODEL_PATH>`
* `sensitivity` - model sensitivity
* `audio_gain` - audio gain
* `apply_frontend` - true if ApplyFrontend should be set
* `precise` - configuration for [Mycroft Precise](https://github.com/MycroftAI/mycroft-precise)
* `engine_path` - path to the precise-engine binary
* `model` - path to model file (in profile directory)
* `sensitivity` - model sensitivity (0-1, default 0.5)
* `trigger_level` - number of events to trigger activation (default 3)
* `chunk_size` - number of bytes per chunk to feed to Precise (default 2048)
* `porcupine` - configuration for [PicoVoice's Porcupine](https://github.com/Picovoice/Porcupine)
* `library_path` - path to `libpv_porcupine.so` for your platform/architecture
* `model_path` - path to the `porcupine_params.pv` (lib/common)
* `keyword_path` - path to the `.ppn` keyword file
* `sensitivity` - model sensitivity (0-1, default 0.5)
* `command` - configuration for external speech-to-text program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `microphone` - configuration for audio recording
* `system` - audio recording system (`pyaudio`, `arecord`, `hermes`, `gstreamer`, `http`, or `dummy`)
* `pyaudio` - configuration for [PyAudio](https://people.csail.mit.edu/hubert/pyaudio/) microphone
* `device` - index of device to use or empty for default device
* `frames_per_buffer` - number of frames to read at a time (default 480)
* `arecord` - configuration for ALSA microphone
* `device` - name of ALSA device (see `arecord -L`) to use or empty for default device
* `chunk_size` - number of bytes to read at a time (default 960)
* `http` - configuration for HTTP audio stream
* `host` - hostname or IP address of HTTP audio server (default 127.0.0.1)
* `port` - port to receive audio stream on (default 12333)
* `stop_after` - one of "never", "text", or "intent" ([see documentation](audio-input.md#http-stream))
* `gstreamer` - configuration for GStreamer audio recorder
* `pipeline` - GStreamer pipeline (e.g., `FILTER ! FILTER ! ...`) without sink
* `hermes` - configuration for MQTT "microphone" ([Hermes protocol](https://docs.snips.ai/reference/hermes))
* Subscribes to WAV data from `hermes/audioServer/<SITE_ID>/audioFrame`
* Requires MQTT to be enabled
* `sounds` - configuration for feedback sounds from Rhasspy
* `system` - which sound output system to use (`aplay`, `hermes`, or `dummy`)
* `wake` - path to WAV file to play when Rhasspy wakes up
* `recorded` - path to WAV file to play when a command finishes recording
* `aplay` - configuration for ALSA speakers
* `device` - name of ALSA device (see `aplay -L`) to use or empty for default device
* `hermes` - configuration for MQTT "speakers" ([Hermes protocol](https://docs.snips.ai/reference/hermes))
* WAV data published to `hermes/audioServer/<SITE_ID>/playBytes/<REQUEST_ID>`
* Requires MQTT to be enabled
* `command`
* `system` - which voice command listener system to use (`webrtcvad`, `oneshot`, `hermes`, or `dummy`)
* `webrtcvad` - configuration for [webrtcvad](https://github.com/wiseman/py-webrtcvad) system
* `sample_rate` - sample rate of input audio
* `chunk_size` - bytes per buffer (must be 10,20,30 ms)
* `vad_mode` - sensitivity of `webrtcvad` (0-3)
* `min_sec` - minimum number of seconds in a command
* `silence_sec` - number of seconds of silences after voice command before stopping
* `timeout_sec` - maximum number of seconds before stopping
* `throwaway_buffers` - number of buffers to drop when recording starts
* `speech_buffers` - number of buffers with speech before command starts
* `oneshot` - configuration for voice command system that takes first audio frame as entire command
* `timeout_sec` - maximum number of seconds before stopping
* `command` - configuration for external voice command program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `hermes` - configuration for MQTT-based voice command system that listens betweens `startListening` and `stopListening` commands ([Hermes protocol](https://docs.snips.ai/reference/hermes))
* `timeout_sec` - maximum number of seconds before stopping
* `handle`
* `system` - which intent handling system to use (`hass`, `command`, or `dummy`)
* `forward_to_hass` - true if intents are always forwarded to Home Assistant (even if `system` is `command` or `remote`)
* `command` - configuration for external speech-to-text program
* `program` - path to executable
* `arguments` - list of arguments to pass to program
* `remote` - configuration for remote HTTP intent handler
* `url` - URL to POST intent JSON to and receive response JSON from
* `mqtt` - configuration for MQTT ([Hermes protocol](https://docs.snips.ai/reference/hermes))
* `enabled` - true if MQTT client should be started
* `host` - MQTT host
* `port` - MQTT port
* `username` - MQTT username (blank for anonymous)
* `password` - MQTT password
* `reconnect_sec` - number of seconds before client will reconnect
* `site_id` - ID of site ([Hermes protocol](https://docs.snips.ai/reference/hermes))
* `publish_intents` - true if intents are published to MQTT
* `download` - configuration for profile file downloading
* `cache_dir` - directory in your profile where downloaded files are cached
* `conditions` - profile settings that will trigger file downloads
* keys are profile setting paths (e.g., `wake.system`)
* values are dictionaries whose keys are profile settings values (e.g., `snowboy`)
* settings may have the form `<=N` or `!X` to mean "less than or equal to N" or "not X"
* leaf nodes are dictionaries whose keys are destination file paths and whose values reference the `files` dictionary
* `files` - locations, etc. of files to download
* keys are names of files
* values are dictionaries with:
* `url` - URL of file to download
* `cache` - `false` if file should be downloaded directly into profile (skipping cache)
+34 -1
View File
@@ -7,7 +7,7 @@ The following table summarizes language support for the various speech to text s
| System | en | de | es | fr | it | nl | ru | el | hi | zh | vi | pt | ca |
| ------ | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- |
| [pocketsphinx](speech-to-text.md#pocketsphinx) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; |
| [kaldi](speech-to-text.md#kaldi) | &#x2713; | &#x2713; | | | | &#x2713; | | | | | &#x2713; | | |
| [kaldi](speech-to-text.md#kaldi) | &#x2713; | &#x2713; | | &#x2713; | | &#x2713; | | | | | &#x2713; | | |
## Pocketsphinx
@@ -98,6 +98,39 @@ During speech recognition, 16-bit 16 kHz mono WAV data will be POST-ed to the en
See `rhasspy.stt.RemoteDecoder` for details.
## MQTT/Hermes
Publishes transcriptions to `hermes/asr/textCaptured` ([Hermes protocol](https://docs.snips.ai/reference/hermes)) each time a voice command is spoken.
This is enabled by default.
## Home Assistant STT Platform
Use an [STT platform](https://www.home-assistant.io/integrations/stt) on your Home Assistant server.
This is the same way [Ada](https://github.com/home-assistant/ada) sends speech to Home Assistant.
Add to your [profile](profiles.md):
```json
"speech_to_text": {
"system": "hass_stt",
"hass_stt": {
"platform": "...",
"sample_rate": 16000,
"bit_size": 16,
"channels": 1,
"language": "en-US"
}
}
```
The settings from your profile's `home_assistant` section are automatically used (URL, access token, etc.).
Rhasspy will convert audio to the configured format before streaming it to Home Assistant.
In the future, this will be auto-detected from the STT platform API.
See `rhasspy.stt.HomeAssistantSTTIntegration` for details.
## Command
Calls a custom external program to do speech recognition.
+68 -2
View File
@@ -89,8 +89,24 @@ To run the Docker image, simply execute:
```bash
docker run -it -p 59125:59125 synesthesiam/marytts:5.2
```
and visit [http://localhost:59125](http://localhost:59125) after it starts. For more English voices, run the following commands in a Bash shell:
and visit [http://localhost:59125](http://localhost:59125) after it starts.
If you're using [docker compose](https://docs.docker.com/compose/), add the following to your docker-compose.yml file:
marytts:
image: synesthesiam/marytts:5.2
restart: unless-stopped
ports:
- "59125:59125"
When using docker-compose, set `marytts.url` in your profile to be `http://marytts:59125`. This will allow rhasspy, from within
its docker container, to resolve and connect to marytts (its sibling container).
### Adding Voices
For more English voices, run the following commands in a Bash shell:
```bash
mkdir -p marytts-5.2/download
@@ -111,6 +127,37 @@ Change the first line to select the voice you'd like to add. It's not recommende
See `rhasspy.tts.MaryTTSSentenceSpeaker` for details.
### Audio Effects
MaryTTS is capable of applying several audio effects when producing speech. See the web interface at [http://localhost:59125](http://localhost:59125)
to experiment with this.
To use these effects within Rhasspy, set `text_to_speech.marytts.effects` within your profile, for example:
```json
"text_to_speech": {
"system": "marytts",
"marytts": {
"url": "http://localhost:59125",
"effects": {
"effect_Volume_selected": "on",
"effect_Volume_parameters": "amount=0.9;",
"effect_TractScaler_selected": "on",
"effect_TractScaler_parameters": "amount:1.2;",
"effect_F0Add_selected": "on",
"effect_F0Add_parameters": "f0Add:-50.0;",
"effect_Robot_selected": "on",
"effect_Robot_parameters": "amount=50.0;"
}
}
}
```
You can determine the names of the parameters by examining the web interface [http://localhost:59125](http://localhost:59125)
using your browser's Developer Tools.
## Google WaveNet
Uses Google's [WaveNet](https://cloud.google.com/text-to-speech/docs/wavenet) text to speech system. This **requires a Google account and an internet connection to function**. Rhasspy will cache WAV files for previously spoken sentences, but you will be sending Google information for every new sentence that Rhasspy speaks.
@@ -143,6 +190,25 @@ Contributed by [Romkabouter](https://github.com/Romkabouter).
See `rhasspy.tts.GoogleWaveNetSentenceSpeaker` for details.
## Home Assistant TTS Platform
Use a [TTS platform](https://www.home-assistant.io/integrations/tts) on your Home Assistant server.
Add to your [profile](profiles.md):
```json
"text_to_speech": {
"system": "hass_tts",
"hass_tts": {
"platform": "..."
}
}
```
The settings from your profile's `home_assistant` section are automatically used (URL, access token, etc.).
See `rhasspy.tts.HomeAssistantSentenceSpeaker` for details.
## Command
You can extend Rhasspy easily with your own external text to speech system. When a sentence needs to be spoken, Rhasspy will call your custom program with the text given on standard in. Your program should return the corresponding WAV data on standard out.
+313 -369
View File
@@ -1,48 +1,319 @@
# Training
Rhasspy is designed to recognize voice commands that [you provide](#sentencesini). These commands are categorized by **intent**, and may contain variable **slots** or **entities**, such as the color and name of a light.
Rhasspy is designed to recognize voice commands [in a template language](#sentencesini). These commands are categorized by **intent**, and may contain [slots](#slots-lists) or [named entities](#tags), such as the color and name of a light.
During the training process, Rhasspy simultaneously trains *both* a speech and intent recognizer. The speech recognizer converts voice commands to text, and the intent recognizer converts text to JSON events. Combined, they enable a low power, offline system like a Raspberry Pi to understand and respond to your voice commands.
## How It Works
Recognizing voice commands typically involves two main steps:
1. Speech to text (transcription)
2. Text to intent (recognition)
For step (1), Rhasspy uses [pocketsphinx](https://github.com/cmusphinx/pocketsphinx) or [Kaldi](https://kaldi-asr.org), and generates a custom [ARPA language model](https://cmusphinx.github.io/wiki/arpaformat/) during the training process. Specifically, the steps are:
1. Convert the grammar from your [sentences.ini](#sentencesini) file to a [finite state transducer](https://www.openfst.org)
2. (Optionally) generate all possible sentences that can be spoken with entities tagged (e.g., `name` is `bedroom light`, `color` is `red`)
3. Use the [opengrm](http://www.opengrm.org/twiki/bin/view/GRM/NGramLibrary) toolkit to create a custom language model
4. Train an intent recognizer with the tagged sentences
Additionally, a custom [CMU phonetic dictionary](https://cmusphinx.github.io/wiki/tutorialdict/) is generated with *only* the words in your voice commands (and wake word, if you're using a [pocketsphinx keyphrase](wake-word.md#pocketsphinx)). If the pronunciation of a word is not known, Rhasspy calls out to [phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus) to get a guess, and then halts training. Once you've confirmed the pronunciations by adding them to your [custom words](#custom-words), training can continue.
For step (4), Rhasspy can use a [variety of intent recognition systems](intent-recognition.md). However, most are all trained from the **tagged sentences** generated from [sentences.ini](#sentencesini), e.g., `turn [on](state) the [living room lamp](name)`. These sentences are transformed into JSON, like:
{
"ChangeLightState": [
{
"text": "turn on the living room lamp",
"entities": [
{ "entity": "state", "value": "on" },
{ "entity": "name", "value": "living room lamp" }
]
},
...
],
...
}
and provided as training material to the intent recognition system. The [fuzzywuzzy](intent-recognition.md#fuzzywuzzy) system, for example, simply saves the JSON file and, during recognition, finds the closest matching sentence according to the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance). The [default intent recognizer](intent-recognition.md#fsticuffs) interacts directly with the finite state transducer(s) generated in step (1) and, while less tolerant of errors than `fuzzywuzzy`, is significantly faster for large sets of voice commands (i.e., millions).
More sophisticated systems like [Rasa NLU](intent-recognition.md#rasanlu) use machine learning techniques to classify sentences by intent and assign slot (entity) values. These systems are much better at recognizing sentences not seen during training, but can take minutes to hours to train.
* Intent Recognition
* [Basic Syntax](#basic-syntax)
* [Named Entities](#tags)
* [Number Ranges](#number-ranges)
* [Slots](#slots-lists)
* [Slot Synonyms](#slot-synonyms)
* [Slot Programs](#slot-programs)
* [Converters](#converters)
* Speech Recognition
* [Custom Words](#custom-words)
* [Language Model Mixing](#language-model-mixing)
## sentences.ini
Voice commands are recognized by Rhasspy from a set of sentences that you define in your [profile](profiles.md). These are stored in an [ini file](https://docs.python.org/3/library/configparser.html) whose "values" are simplified [JSGF grammars](https://www.w3.org/TR/jsgf/). The set of all sentences *generated* from these grammars is used to train an [ARPA language model](https://cmusphinx.github.io/wiki/arpaformat/) and an intent recognizer.
Voice commands stored in an [ini file](https://docs.python.org/3/library/configparser.html) whose "sections" are intents and "values" are sentence templates.
### Basic Syntax
To get started, simply list your intents (surround by brackets) and the possible ways of invoking them below:
```
[TestIntent1]
this is a sentence
this is another sentence for the same intent
[TestIntent2]
this is a sentence for a different intent
```
If you say "this is a sentence" after hitting the `Train` button, it will generate a `TestIntent1`.
### Groups
You can group multiple words together using `(parentheses)` like:
```
turn on the (living room lamp)
```
Groups (sometimes called sequences) can be [tagged](#tags) and [substituted](#substitutions) like single words. They may also contain [alternatives](#alternatives).
### Optional Words
Within a sentence template, you can specify optional word(s) by surrounding them `[with brackets]`. For example:
```
[an] example sentence [with] some optional words
```
will match:
* `an example sentence with some optional words`
* `example sentence with some optional words`
* `an example sentence some optional words`
* `example sentence some optional words`
### Alternatives
A set of items where only one is matched at a time is `(specified | like | this)`. For N items, there will be N matched sentences (unless you nest optional words, etc.). The template:
```
set the light to (red | green | blue)
```
will match:
* `set the light to red`
* `set the light to green`
* `set the light to blue`
### Tags
Named entities are marked in your sentence templates with `{tags}`. The name of the `{entity}` is between the curly braces, while the `(value of the){entity}` comes immediately before:
```
[SetLightColor]
set the light to (red | green | blue){color}
```
With the `{color}` tag attached to `(red | green | blue)`, Rhasspy will match:
* `set the light to [red](color)`
* `set the light to [green](color)`
* `set the light to [blue](color)`
When the `SetLightColor` intent is recognized, the JSON event will contain a `color` property whose value is either "red", "green" or "blue".
#### Tag Synonyms
Tag/named entity values can be (substituted](#substitutions) using the colon (`:`) inside the `{curly:braces}` like:
```
turn on the (living room lamp){name:light_1}
```
Now the `name` property of the intent JSON event will contain "light_1" instead of "living room lamp".
### Substitutions
The colon (`:`) is used to put something different than what's spoken into the recognized intent JSON. The left-hand side of the `:` is what Rhasspy expects to hear, while the right-hand side is what gets put into the intent:
```
turn on the (living room lamp):light_1
```
In this example, the spoken phrase "living room lamp" will be replaced by "light_1" in the recognized intent. Substitutions work for single words, [groups](#groups), [alternatives](#alternatives), and [tags](#tags):
```
turn on the living room lamp:light
(turn | switch):switch on the living room lamp
turn (on){action:activate} the living room lamp
```
See [tag synonyms](#tag-synonyms) for more details on tag substitution.
You can leave the left-hand or right-hand side (or both!) of the `:` empty:
```
these: words: will: be: dropped:
:these :will :be :added
```
When the right-hand side is empty (`dropped:`), the spoken word will not appear in the intent. An empty left-hand side (`:added`) means the word is *not* spoken, but will appear in the intent.
Leaving **both** sides empty does nothing unless you attach a [tag](#tags) it. This allows you to embed a named entity in a voice command without matching specific words:
```
turn on the living room lamp (:){domain:light}
```
An intent from the example above will contain a `domain` entity whose value is `light`.
### Rules
Rules allow you to reuse parts of your sentence templates. They're defined by `rule_name = ...` alongside other sentences and referenced by `<rule_name>`. For example:
```
colors = (red | green | blue)
set the light to <colors>
```
which is equivalent to:
```
set the light to (red | green | blue)
```
You can **share rules** across intents by referencing them as `<IntentName.rule_name>` like:
[SetLightColor]
colors = (red | green | blue)
set the light to <colors>
[GetLightColor]
is the light <SetLightColor.colors>
The second intent (`GetLightColor`) references the `colors` rule from `SetLightColor`. Rule references without a dot must exist in the current intent.
### Number Ranges
Rhasspy supports using number literals (`75`) and number ranges (`1..10`) directly in your sentence templates. During training, the [num2words](https://pypi.org/project/num2words) package is used to generate words that the speech recognizer can handle ("seventy five"). For example:
```
[SetBrightness]
set brightness to (0..100){brightness}
```
The `brightness` property of the recognized `SetBrightness` intent will automatically be [converted](#converters) to an integer for you. You can optionally add a step to the integer range:
```
evens = 0..100,2
odds = 1..100,2
```
Under the hood, number ranges are actually references to the `rhasspy/number` [slot program](#slot-programs). You can override this behavior by creating your `slot_programs/rhasspy/number` program or disable it entirely by setting `intent.replace_numbers` to `false` in [your profile](profiles.md).
### Slots Lists
Large [alternatives](#alternatives) can become unwieldy quickly. For example, say you have a list of movie names:
```
movies = ("Primer" | "Moon" | "Chronicle" | "Timecrimes" | "Mulholland Drive" | ... )
```
Rather than keep this list in `sentences.ini`, you may put each movie name on a separate line in a file named `slots/movies` (no file extension) and reference it as `$movies`. Rhasspy automatically loads all files in the `slots` directory of your [profile](#profiles.md) and makes them available as slots lists.
For the example above, the file `slots/movies` should contain:
```
Primer
Moon
Chronicle
Timecrimes
Mullholand Drive
```
Now you can simply use the placeholder `$movies` in your sentence templates:
```
[PlayMovie]
play ($movies){movie_name}
```
When matched, the `PlayMovie` intent JSON will contain `movie_name` property with either "Primer", "Moon", etc.
Make sure to **re-train** Rhasspy whenever you update your slot values!
#### Slot Directories
Slot files can be put in **sub-directories** under `slots`. A list in `slots/foo/bar` should be referenced in `sentences.ini` as `$foo/bar`.
#### Slot Synonyms
Slot values are themselves sentence templates! So you can use all of the familiar syntax from above. Slot "synonyms" can be created simply using [substitutions](#substitutions). So a file named `slots/rooms` may contain:
```
[the:] (den | playroom | downstairs):den
```
which is referenced by `$rooms` and will match:
* the den
* den
* the playroom
* playroom
* the downstairs
* downstairs
This will always output just "den" because `[the:]` optionally matches "the" and then drops the word.
#### Slot Programs
Slot lists are great if your slot values always stay the same and are easily written out by hand. If you have slot values that you need to be generated *each time Rhasspy is trained*, you can use slot programs.
Create a directory named `slot_programs` in your profile (e.g., `$HOME/.config/rhasspy/profiles/en/slot_programs`):
```bash
slot_programs="${HOME}/.config/rhasspy/profiles/en/slot_programs"
mkdir -p "${slot_programs}"
```
Add a file in `slot_programs` with the name of your slot, e.g. `colors`. Write a program in this file, such as a bash script. Make sure to include the [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) and mark the file as executable:
```bash
cat <<EOF > "${slot_programs}/colors"
#/usr/bin/env bash
echo 'red'
echo 'green'
echo 'blue'
EOF
chmod +x "${slot_programs}/colors"
```
Now, when you reference `$colors` in your `sentences.ini`, Rhasspy will run the program you wrote and collect the slot values from each line. Note that you can output all the same things as regular [slots lists](#slots-lists), including optional words, alternatives, etc.
You can pass **arguments** to your program using the syntax `$name,arg1,arg2,...` in `sentences.ini` (no spaces). Arguments will be pass on the command-line, so `arg1` and `arg2` will be `$1` and `$2` in a bash script.
Like regular slots lists, slot programs can also be put in sub-directories under `slot_programs`. A program in `slot_programs/foo/bar` should be referenced in `sentences.ini` as `$foo/bar`.
#### Built-in Slots
Rhasspy includes a few built-in slots for each language:
* `$rhasspy/days` - day names of the week
* `$rhasspy/months` - month names of the year
### Converters
By default, all named entity values in a recognized intent's JSON are strings. If you need a different data type, such as an integer or float, or want to do some kind of complex *conversion*, use a converter:
```
[SetBrightness]
set brightness to (low:0 | medium:0.5 | high:1){brightness!float}
```
The `!name` syntax calls a converter by name. Rhasspy includes several built-in converters:
* int - convert to integer
* float - convert to real
* bool - convert to boolean
* lower - lower-case
* upper - upper-case
You can define your own converters by placing a file in the `converters` directory of your profile. Like [slot programs](#slot-programs), this file should contain a [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) and be marked as executable (`chmod +x`). A file named `converters/foo/bar` should be referenced as `!foo/bar` in `sentences.ini`.
Your custom converter will receive the value to convert on standard in (`stdin`) encoded as JSON. You should print a converted JSON value to standard out `stdout`. The example below demonstrates converting a string value into an integer:
```python
#!/usr/bin/env python3
import sys
import json
value = json.load(sys.stdin)
print(int(value))
```
Converters can be *chained*, so `!foo!bar` will call the `foo` converter and then pass the result to `bar`.
### Special Cases
If one of your sentences happens to start with an optional word (e.g., `[the]`), this can lead to a problem:
[SomeIntent]
[the] problem sentence
Python's [configparser](https://docs.python.org/3/library/configparser.html) will interpret `[the]` as a new section header, which will produce a new intent, grammar, etc. Rhasspy handles this special case by using a backslash escape sequence (`\[`):
[SomeIntent]
\[the] problem sentence
Now `[the]` will be properly interpreted as a sentence under `[SomeIntent]`. You only need to escape a `[` if it's the **very first** character in your sentence.
### Motivation
@@ -67,162 +338,6 @@ Compared to JSON, YAML, etc., there is minimal syntactic overhead for the purpos
Each of these shortcomings are addressed by considering the space between intent headings (`[Intent 1]`, etc.) as a **grammar** that represent many possible voice commands. The possible sentences, stripped of their tags, are used as input to [opengrm](https://www.opengrm.org) to produce a standard ARPA language model for [pocketsphinx](https://github.com/cmusphinx/pocketsphinx) or [Kaldi](https://kaldi-asr.org). The tagged sentences are then used to train an intent recognizer.
### Optional Words
Within a sentence, you can specify optional word(s) by surrounding them `[with brackets]`. These will generate at least two sentences: one with the optional word(s), and one without. So the following sentence template:
[an] example sentence [with] some optional words
will generate 4 concrete sentences:
1. `an example sentence with some optional words`
2. `example sentence with some optional words`
3. `an example sentence some optional words`
4. `example sentence some optional words`
### Alternatives
A set of items, where only one is present at a time, is `(specified | like | this)`. For N items, there will be N sentences generated (unless you nest optional words, etc.). The template:
set the light to (red | green | blue)
will generate:
1. `set the light to red`
2. `set the light to green`
3. `set the light to blue`
### Rules
Rules allow you to reuse common phrases, alternatives, etc. Rules are defined by `rule_name = ...` alongside your sentences and referenced by `<rule_name>`. The template above with colors could be rewritten as:
colors = (red | green | blue)
set the light to <colors>
which will generate the same 4 sentences as above. Importantly, you can **share rules** across intents by prefixing the rule's name with the intent name followed by a dot:
[SetLightColor]
colors = (red | green | blue)
set the light to <colors>
[GetLightColor]
is the light <SetLightColor.colors>
The second intent (`GetLightColor`) references the `colors` rule from `SetLightColor`.
### Tags
The example templates above will generate sentences for training the speech recognizer, but using them to train the intent recognizer will not be satisfactory. The `SetLightColor` intent, when recognized, will result in a Home Assistant event called `rhasspy_SetLightColor`. But the actual *color* will not be provided because the intent recognizer is not aware that a `color` slot should exist (and has the values `red`, `green`, and `blue`).
Luckily, JSGF has a [tag feature](https://www.w3.org/TR/jsgf/#15057) that lets you annotate portions of sentences/rules. Rhasspy assumes that the tags themselves are *slot/entity names* and the tagged portions of the sentence are *slot/entity values*. The `SetLightColor` example can be extended with tags like this:
[SetLightColor]
colors = (red | green | blue){color}
set the light to <colors>
With the `{color}` tag attached to the `(red | green | blue)` alternative set, each color name will carry the tag. This is the same as typing `((red){color} | (green){color} | (blue){color})`, but less verbose. Rhasspy will now generate the following **tagged sentences**:
1. `set the light to [red](color)`
2. `set the light to [green](color)`
3. `set the light to [blue](color)`
When the `SetLightColor` intent is recognized now, the corresponding JSON event (`rhasspy_SetLightColor` in Home Assistant) will have the following properties:
{
"color": "red"
}
A Home Assistant [automation](https://www.home-assistant.io/docs/automation) can use the slot values to take an appropriate action, such as [setting an RGB light's color](https://www.home-assistant.io/docs/automation/action/) to `[255,0,0]` (red).
#### Tag Synonyms
There are times where you want to match a particular part of your sentence with a tag, but want the actual *value* of the tag to be something different than the matched text. This is needed if you want to talk about entities in Home Assistant, for example, with phrases like "the living room lamp", but want to pass the appropriate entity id (say `lamp_1`) to Home Assistant instead.
Normally, you would tag part of a sentence like this:
[ChangeLightState]
turn on the (living room lamp){name}
When this intent is activated, Rhasspy will send a JSON event (named `rhasspy_ChangeLightState` in Home Assistant) with:
{
"name": "living room lamp"
}
You can catch this event in a Home Assistant automation, match the `name` "living room name", and do something with the `lamp_1` entity. That's fine for one instance, but would require a separate rule for every `name`! Instead, let's add a tag **synonym**:
[ChangeLightState]
turn on the (living room lamp){name:lamp_1}
The tag label and synonym are separated by a ":". When this sentence is spoken and the intent is activated, the same `rhasspy_ChangeLightState` event will be sent to Home Assistant, but with the following data:
{
"name": "lamp_1"
}
Now in your Home Assistant automation, you could use [templating](https://www.home-assistant.io/docs/automation/templating/) to plug the `name` directly into the `entity_id` field of an action. One rule to rule them all.
This same technique could be used to replace number words with digits, like:
[SetTimer]
set a timer for (ten){number:10} seconds
which would generate an event like this when recognized:
{
"number": "10"
}
### Slots Lists
In the `SetLightColor` example above, the color names are stored in `sentences.ini` as a rule:
colors = (red | green | blue)
This is convenient when the list of colors is small, changes infrequently, and does not depend on an external service.
But what if this was a list of movie names that were stored on your [Kodi Home Theater](https://kodi.tv)?
movies = ("Primer" | "Moon" | "Chronicle" | "Timecrimes" | "Mulholland Drive" | ... )
It would be much easier if this list was stored externally, but could be *referenced* in the appropriate places in the grammar.
This is possible in Rhasspy by placing text files in the `speech_to_text.slots_dir` directory specified in your [profile](profiles.md) ("slots" by default).
If you're using the English (`en`) profile, for example, create the file `profiles/en/slots/movies` and add the following content:
Primer
Moon
Chronicle
Timecrimes
Mullholand Drive
This list of movie can now be referenced as `$movies` in your your `sentences.ini` file! Something like:
[PlayMovie]
play ($movies){movie_name}
will generate `rhasspy_PlayMovie` events like:
{
"movie_name": "Primer"
}
If you update the `movies` file, make sure to re-train Rhasspy in order to pick up the new movie names.
### Special Cases
If one of your sentences happens to start with an optional word (e.g., `[the]`), this can lead to a problem:
[SomeIntent]
[the] problem sentence
Python's [configparser](https://docs.python.org/3/library/configparser.html) will interpret `[the]` as a new section header, which will produce a new intent, grammar, etc. Rhasspy handles this special case by using a backslash escape sequence (`\[`):
[SomeIntent]
\[the] problem sentence
Now `[the]` will be properly interpreted as a sentence under `[SomeIntent]`. You only need to escape a `[` if it's the **very first** character in your sentence.
## Custom Words
Rhasspy looks for words you've defined outside of your profile's base dictionary (typically `base_dictionary.txt`) in a custom words file (typically `custom_words.txt`). This is just a [CMU phonetic dictionary](https://cmusphinx.github.io/wiki/tutorialdict/) with words/pronunciations separated by newlines:
@@ -232,170 +347,11 @@ Rhasspy looks for words you've defined outside of your profile's base dictionary
You can use the [Words tab](usage.md#words-tab) in Rhasspy's web interface to generate this dictionary. During training, Rhasspy will merge `custom_words.txt` into your `dictionary.txt` file so the [speech to text](speech-to-text.md) system knows the words in your voice commands are pronounced.
## Speech to Text
By default, Rhasspy generates training sentences from your [sentences.ini](#sentencesini) file, and then trains a custom language model using [opengrm](https://www.opengrm.org). You can call a **custom program** instead if you want to use a different language modeling toolkit or your custom speech to text system needs special training.
Add to your [profile](profiles.md):
```json
"training": {
"speech_to_text": {
"system": "command",
"command": {
"program": "/path/to/program",
"arguments": []
}
}
}
```
When training, your program will be called with all of the training sentences grouped by intent in JSON to standard in. No output is expected from your program besides a successful exit code. **NOTE**: Rhasspy will not generate `dictionary.txt` or `language_model.txt` if you use a custom program.
The input JSON is an object where each key is the name of an intent and the values are lists of training sentence objects. Each sentence object has the text of the sentence, all tagged entities, and the tokens of the sentence.
Example input:
{
"GetTime": [
{
"sentence": "what time is it",
"entities": [],
"tokens": [
"what",
"time",
"is",
"it"
]
},
{
"sentence": "tell me the time",
"entities": [],
"tokens": [
"tell",
"me",
"the",
"time"
]
}
],
"ChangeLightColor": [
{
"sentence": "set the bedroom light to red",
"entities": [
{
"entity": "name",
"value": "bedroom light"
},
{
"entity": "color",
"value": "red"
}
],
"tokens": [
"set",
"the",
"bedroom",
"light",
"to",
"red"
]
}
]
}
See [train-stt.sh](https://github.com/synesthesiam/rhasspy/blob/master/bin/mock-commands/train-stt.sh) for an example program.
## Intent Recognition
During training, Rhasspy uses the sentences generated from [sentences.ini](#sentencesini) as training material for the selected intent recognition system. If your intent recognition system requires some special training, you can call a **custom program** here.
Add to your [profile](profiles.md):
```json
"training": {
"intent": {
"system": "command",
"command": {
"program": "/path/to/program",
"arguments": []
}
}
}
```
During training, Rhasspy will call your program with the training sentences grouped by intent in JSON printed to standard in. No output is expected, besides a successful exit code.
The input JSON is an object where each key is the name of an intent and the values are lists of training sentence objects. Each sentence object has the text of the sentence, all tagged entities, and the tokens of the sentence.
Example input:
```json
{
"GetTime": [
{
"sentence": "what time is it",
"entities": [],
"tokens": [
"what",
"time",
"is",
"it"
]
},
{
"sentence": "tell me the time",
"entities": [],
"tokens": [
"tell",
"me",
"the",
"time"
]
}
],
"ChangeLightColor": [
{
"sentence": "set the bedroom light to red",
"entities": [
{
"entity": "name",
"value": "bedroom light"
},
{
"entity": "color",
"value": "red"
}
],
"tokens": [
"set",
"the",
"bedroom",
"light",
"to",
"red"
]
}
}
```
The following environment variables are available to your program:
* `$RHASSPY_BASE_DIR` - path to the directory where Rhasspy is running from
* `$RHASSPY_PROFILE` - name of the current profile (e.g., "en")
* `$RHASSPY_PROFILE_DIR` - directory of the current profile (where `profile.json` is)
See [train-intent.sh](https://github.com/synesthesiam/rhasspy/blob/master/bin/mock-commands/train-intent.sh) for an example program.
## Language Model Mixing
Rhasspy is designed to only respond to the voice commands you specify in [sentences.ini](training.md#sentencesini), but both the Pocketsphinx and Kaldi speech to text systems are capable of transcribing open ended speech. While this will never be as good as a cloud-based system, Rhasspy offers it as an option.
Rhasspy is designed to only respond to the voice commands you specify in [sentences.ini](training.md#sentencesini), but both the Pocketsphinx and Kaldi speech to text systems are capable of transcribing open ended speech. While this will never be as good as a cloud-based system, Rhasspy [offers it as an option](speech-to-text.md#open-transcription).
Open ended speech is achieved in Rhasspy by the inclusion of `base_dictionary.txt` and `base_language_model.txt` files in every profile. The former is a dictionary containing the pronunciations all possible words. The latter is a large language model trained on very large corpus of text in the profile's language (usually books and web pages).
During training, Rhasspy can **mix** this large, open ended language model with the one generated specifically for your voice commands. You specify a **mixture weight**, which controls how much of an influence the large language model has; a mixture weight of 0 makes Rhasspy sensitive *only* to your voice commands, which is the default.
A middle ground between open transcription and custom voice commands is **language model mixing**. During training, Rhasspy can mix a (large) pre-built language model with the custom-generated one. You specify a **mixture weight** (0-1), which controls how much of an influence the large language model has; a mixture weight of 0 makes Rhasspy sensitive *only* to your voice commands, which is the default.
![Diagram of Rhasspy's training process](img/training.svg)
@@ -468,15 +424,6 @@ $ echo 'would you please turn on the living room lamp' | \
"value": "on"
}
],
"tokens": [
"turn",
"on",
"the",
"living",
"room",
"lamp"
],
"speech_confidence": 1,
"slots": {
"state": "on"
}
@@ -486,7 +433,6 @@ $ echo 'would you please turn on the living room lamp' | \
But this works only because the default intent recognizer ([fsticuffs](intent-recognition.md#fsticuffs)) ignores unknown words by default, so "would you please" is not interpreted. Changing "lamp" to "light" in the input sentence will reveal the problem:
```
$ echo 'would you please turn on the living room light | \
rhasspy-cli --profile en text2intent
@@ -499,7 +445,6 @@ $ echo 'would you please turn on the living room light | \
"confidence": 0
},
"entities": [],
"speech_confidence": 1,
"slots": {}
}
}
@@ -535,7 +480,6 @@ $ echo 'would you please turn on the living room light' | \
"value": "on"
}
],
"speech_confidence": 1,
"slots": {
"state": "on"
}
@@ -545,4 +489,4 @@ $ echo 'would you please turn on the living room light' | \
This works well for our toy example, but will not scale well when there are thousands of voice commands represented in `sentences.ini` or if the words used are significantly different than in the training set ("light" and "lamp" are close enough for `fuzzywuzzy`).
A machine learning-based intent recognizer, like [flar](intent-recognition.md#flair), would be a better choice for open ended speech.
A machine learning-based intent recognizer, like [flair](intent-recognition.md#flair) or [Rasa](intent-recognition.md#rasanlu), would be a better choice for open ended speech.
+309
View File
@@ -0,0 +1,309 @@
# Tutorials
* [RGB Light Example](#rgb-light-example)
* [Client/Server Setup](#clientserver-setup)
## RGB Light Example
Let's say you have an RGB light of some kind in your bedroom that's [hooked up already to Home Assistant](https://www.home-assistant.io/components/light.mqtt). You'd like to be able to say things like "*set the bedroom light to red*" to change its color. To start, let's write a [Home Assistant automation](https://www.home-assistant.io/docs/automation/action/) to help you out:
automation:
# Change the light in the bedroom to red.
trigger:
...
action:
service: light.turn_on
data:
rgb_color: [255, 0, 0]
entity_id: light.bedroom
Now you just need the trigger! Rhasspy will send events that can be caught with the [event trigger platform](https://www.home-assistant.io/docs/automation/trigger/#event-trigger). A different event will be sent for each *intent* that you define, with slot values corresponding to important parts of the command (like light name and color). Let's start by defining an intent in Rhasspy called `ChangeLightState` that can be said a few different ways:
[ChangeLightState]
colors = (red | green | blue) {color}
set [the] (bedroom){name} [to] <colors>
This is a [simplified JSGF grammar](training.md#sentencesini) that will generate the following sentences:
* set the bedroom to red
* set the bedroom to green
* set the bedroom to blue
* set the bedroom red
* set the bedroom green
* set the bedroom blue
* set bedroom to red
* set bedroom to green
* set bedroom to blue
* set bedroom red
* set bedroom green
* set bedroom blue
Rhasspy uses these sentences to create an [ARPA language model](https://cmusphinx.github.io/wiki/arpaformat/) for speech recognition, and also train an intent recognizer that can extract relevant parts of the command. The `{color}` tag in the `colors` rule will make Rhasspy put a `color` property in each event with the name of the recognized color (red, green, or blue). Likewise, the `{name}` tag on `bedroom` will add a `name` property to the event.
If trained on these sentences, Rhasspy will now recognize commands like "*set the bedroom light to red*" and send a `rhasspy_ChangeLightState` to Home Assistant with the following data:
{
"name": "bedroom",
"color": "red"
}
You can now fill in the rest of the Home Assistant automation:
automation:
# Change the light in the bedroom to red.
trigger:
platform: event
event_type: rhasspy_ChangeLightState
event_data:
name: bedroom
color: red
action:
service: light.turn_on
data:
rgb_color: [255, 0, 0]
entity_id: light.bedroom
This will handle the specific case of setting the bedroom light to red, but not any other color. You can either add additional automations to handle these, or make use of [automation templating](https://www.home-assistant.io/docs/automation/templating/) to do it all at once. [Home Assistant Template Example](Home-Assistant-Template-Example)
### Home Assistant Template Example
Using the following additions, you can get Home Assistant to respond to turning on / off *ANY* light in your setup.
#### Slots
Add the following JSON to the Slots tab in your Rhasspy web interface:
```json
{
"lights": [
"(living room wall):light.bulb_3",
"(living room desk):switch.m4",
"(living room floor):switch.sonoff",
"(bar lights):switch.maxcio1",
"(entry wall):light.bulb_4",
"(guest wall):light.bulb_6",
"(guest floor):switch.m5",
"(bedroom wall):light.bulb_5",
"(bedroom desk):light.bulb_1",
"(bedroom floor):light.bulb_2"
]
}
```
#### Sentences
A simple sentence to turn any of the lights in the slots file on or off.
Note the use of the `<state>` rule and the slot `$lights`
```
[ChangeLightState]
state = (on | off) {light_state}
turn [the] ($lights) {light_name} <state>
```
#### Home Assistant
In your Home Assistant `automations.yaml` file, use a `data_template` to get the Rhasspy event data with `trigger.event.data.<your property name>` and then pass those along to a script:
```yaml
- id: '1577164768008'
alias: Rhasspy Light States
description: Voice Control on/off states for all lights
trigger:
- event_data: {}
event_type: rhasspy_ChangeLightState
platform: event
condition: []
action:
- alias: ''
data_template:
light_name: "{{ trigger.event.data.light_name }}"
light_state: "{{ trigger.event.data.light_state }}"
service: script.rhasspy_light_state
```
In `scripts.yaml`, the `service_template` casts the `light_state` into a string and checks to see if you said 'on' or 'off'. The homeassistant-service can toggle both lights and switches, which is helpful if you have a combination of "light" types:
```yaml
rhasspy_light_state:
alias: change_light_state
fields:
light_name:
description: "Light Entity"
example: light.bulb_1
light_state:
description: "State to change the light to"
example: on
sequence:
- service_template: >
{% set this_state = light_state | string %}
{% if this_state == 'on' %}
homeassistant.turn_on
{%else %}
homeassistant.turn_off
{% endif %}
data_template:
entity_id: "{{ light_name }}"
```
## Client/Server Setup
Contributed by [jaburges](https://community.home-assistant.io/u/jaburges)
* Hardware used:
* Raspberry Pi 3B w/ 8GB SD card
* [Seeed 4 Mic Array](https://www.amazon.com/seeed-Studio-ReSpeaker-4-Mic-Raspberry/dp/B076SSR1W1)
* Software used:
* [Raspbian Buster Lite](https://downloads.raspberrypi.org/raspbian_lite_latest)
* [Etcher](https://www.balena.io/etcher/)
* Docker ([install Docker](installation.md#docker))
### Server Steps
1. Assuming you already have docker running, create a directory for Rhasspy, and subdirectory called profiles.
2. Pull and Run docker image:
docker run -p 12101:12101 \
--restart unless-stopped \
--name rhasspy \
-v "/<PATH_TO>/rhasspy/profiles:/profiles" \
synesthesiam/rhasspy-server:latest \
--user-profiles /profiles \
--profile en
3. Go to server URL `http://<Server_IP>:12101` (you may be asked to download files)
4. Go to settings and check configuration (and save along the way):
[Rhasspy]
Listen for wake word on Startup = UNchecked
[Intent Handling]
Do not handle intent on this device
#There is no harm in having the Server handle Intents, but the Client must handle Intents
[Wake Word]
No Wake word on this device
[Voice Detection]
No voice communication on this device
[Speech Recognition]
Do Speech recognition with pocketsphinx
[Intent Recognition]
Do intent recognition with fuzzywuzzy
[Text to Speech]
No Text to speech on this device
[Audio Recording]
No recording on this device
[Audio Playing]
No Playback on this device
5. Check Slots, and Sentences tabs and make sure to hit `Train` and then `Restart`
### Client Steps
1. Flash 8Gb MicroSD Card with [Buster](https://downloads.raspberrypi.org/raspbian_lite_latest) with [Etcher](https://www.balena.io/etcher/).
2. Remove and re-insert MicroSD card and add files to the root directory (for headless setup - meaning no screen needed). You only need `wpa_supplicant` if you plan to use WiFi.
* a file simply called `ssh`
* `wpa_supplicant.conf` ([example here](https://pastebin.com/cDhyhQLs))
3. Insert the MicroSD card in the Pi, use a proper Power Supply and check your router for the IP address it gets.
4. SSH into the Pi using that IP address (I use [Putty](https://the.earth.li/~sgtatham/putty/latest/w64/putty-64bit-0.73-installer.msi)) using pi default user/pass = pi/raspberry.
You are going to want to change that in the future!
5. Install git:
sudo apt install git
6. Install Seeed mic array based on info [here](https://github.com/respeaker/seeed-voicecard)
git clone https://github.com/respeaker/seeed-voicecard
cd seeed-voicecard
sudo ./install.sh
sudo reboot
7. Plug in Seeed speaker and check install was successful against expected result here 5:
arecord -L
8. Install docker:
curl -sSL https://get.docker.com | sh
9. Modify user permissions to access docker without using `sudo` all the time ;)
sudo usermod -a -G docker pi
10. Close SSH, and relaunch SSH connection to use new permissions.
11. Create directories for Rhasspy Docker image to use:
cd /home/pi
mkdir rhasspy
cd rhasspy
mkdir profiles
12. Pull and run docker image:
docker run -p 12101:12101 \
--restart unless-stopped \
--name rhasspy \
-v "/home/pi/rhasspy/profiles:/profiles" \
--device /dev/snd:/dev/snd \
synesthesiam/rhasspy-server:latest \
--user-profiles /profiles \
--profile en
13. Go to Client URL `http://<Pi_IP_address>:12101` (you will be asked to download some files)
(At time of writing I put Wakeword, voice detection and recognition on the client)
14. Under settings ensure the following is selected, Save along the way. You will need to Train once also.
[Rhasspy]
Listen for wake word on Startup = checked
[Home Assistant]
Enable Intent Handling on this device
#Do not use Home Assistant if using Node-Red
[Wake Word]
Use snowboy (this should trigger a download of more files)
[Voice Detection]
Use webrtcvad and listen for silence
[Speech Recognition]
Use Remote Rhasspy server for speech recognition:
URL = http://<SERVER_IP>:12101/api/speech-to-text
[Intent Recognition]
Use Remote Rhasspy server for speech recognition:
URL = http://<SERVER_IP>:12101/api/text-to-intent
[Text to Speech]
No Text to speech on this device
[Audio Recording]
Use PyAudio (default)
Input Device = seeed-4mic-voicecard (you can test this if you want)
[Audio Playing]
No Playback on this device
### Node-Red Config
1. Import [this flow](https://github.com/synesthesiam/rhasspy/blob/cda3a02775865d49b52d32a3af7264b7cbd69472/examples/nodered/time-light-flow.js) from the Rhasspy examples
2. Attach a debug node to the websocket in and configure it to show full msg object.
3. I edited light text node to take this:
{
"domain": "light",
"service": "turn_{{slots.state}}",
"entity_id": "{{slots.name}}"
}
4. Add a call service node after the light text and leave it blank. Deploy and Enjoy offline voice assistant.
Pick a light (that is a light domain not a switch), and say "Snowboy, turn bedroom light off" :)
+119 -313
View File
@@ -1,11 +1,31 @@
# Usage
You can interact with Rhasspy in different ways besides just your voice. Rhasspy includes a [web interface](#web-inteface), typically hosted on port 12101. There is also an [HTTP API](#http-api) that lets you programmatically manipulate Rhasspy from external programs or services. A [command-line interface](#command-line) is available as well to allow for Rhasspy to be easily included in shell scripts. Lastly, Rhasspy subscribes and publishes to specific [MQTT topics](#mqtt) in accordance with (a portion of) the [Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol).
You can interact with Rhasspy in more ways than your voice:
* [Web Interface](#web-interface)
* [Home Assistant](#home-assistant)
* [Node-RED with Websockets](#node-red)
* [MQTT and Snips](#mqtt-and-snips)
* [HTTP API](#http-api)
* [Command Line](#command-line)
## Web Interface
A browser-based interface for Rhasspy is available on port 12101 by default ([http://localhost:12101](http://localhost:12101) if running locally). From this interface, you can test voice commands, add new voice commands, re-train, and edit your profile.
### Top Bar
The top bar of the web interface lets you perform some global actions on Rhasspy, regardless of which tab you have selected.
![Web interface top bar](img/web-top.png)
* Click the Rhasspy logo to reload the page
* Click the version number to test the [HTTP API](#http-api)
* The green `Train` button will re-train your profile
* Use the `Clear Cache` drop down to train from scratch
* The yellow `Wake` button will wake Rhasspy up and start listening for a voice command
* The red `Restart` button forces Rhasspy to restart
### Speech Tab
Test voice and text commands.
@@ -14,17 +34,28 @@ Test voice and text commands.
* Record a voice command with `Hold to Record` or `Tap to Record`
* Upload a WAV file with a voice command
* Enter a text command and execute it
* Enter a text command and either execute it (`Get Intent`) or `Speak` the sentence
* Uncheck `Send to Home Assistant` if you **don't** want Rhasspy to send events to Home Assistant
### Sentences Tab
Add new voice commands to Rhasspy.
Add new voice commands to Rhasspy using the [template syntax](training.md#sentencesini).
![Web interface sentences tab](img/web-sentences.png)
See documentation on [sentences.ini](training.md#sentencesini) for more information.
Make sure to re-train after saving!
* Edits `sentences.ini` by default
* Use the `Add File` button to create additional sentence template files
* These should be prefixed by the `sentences_dir` in your [profile](profiles.md). For example, `intents/more-commands.ini`
* The drop down can be used to switch editing between different template files
### Slots Tab
Edit your [slots lists](training.md#slots-lists) as JSON (keys = slot names, values = lists of slot values).
![Web interface slots tab](img/web-slots.png)
* New slot values will overwrite previous ones
* Delete a slot by providing an empty list for its JSON key
### Words Tab
@@ -57,83 +88,11 @@ Direct interface for editing your [profile](profiles.md).
![Web interface advanced tab](img/web-advanced.png)
## HTTP API
### Log Tab
Rhasspy features a comprehensive HTTP API available at `/api`, documented with [OpenAPI 3](https://github.com/OAI/OpenAPI-Specification) (Swagger). Some notable endpoints are:
Streams Rhasspy's log output over a websocket.
* `/api/profile`
* GET the JSON for your profile, or POST to overwrite it
* `/api/listen-for-command`
* POST to wake Rhasspy up and start listening for a voice command
* `/api/start-recording`
* POST to have Rhasspy start recording a voice command
* `/api/stop-recording`
* POST to have Rhasspy stop recording and process recorded data as a voice command
* `/api/train`
* POST to re-train your profile
* `/api/speech-to-intent`
* POST a WAV file and have Rhasspy process it as a voice command
* `/api/text-to-intent`
* POST text and have Rhasspy process it as command
* `/api/text-to-speech`
* POST text and have Rhasspy speak it
* `/api/slots`
* POST JSON to update [slot values](training.md#slots-lists)
See `public/swagger.yaml` in Rhasspy's repository for all available endpoints, or visit `/api` on your Rhasspy web server (e.g., [http://localhost:12101/api](http://localhost:12101/api)).
## Secure Hosting with HTTPS
If you need to access Rhasspy's web interface/API through HTTPS (formally SSL), you can provide a certificate and key file via command-line parameters or the Hass.io configuration.
If you're running Rhasspy via Docker or in a virtual environment, add `--ssl <CERT_FILE> <KEY_FILE>` to the command-line arguments where `<CERT_FILE>` is your SSL certificate and `<KEY_FILE>` is your SSL key file.
You can generate a self-signed certificate with the following command:
openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365
After answering the series of questions, you should have `cert.pem` and `key.pem` in your current directory. Then run Rhasspy with:
<RHASSPY COMMAND> --ssl cert.pem key.pem
The web interface will now be available at [https://localhost:12101](https://localhost:12101) and the web socket events at `wss://localhost:12101/api/events/intent`
In Hass.io, you will need to set the following options via the web interface or in your JSON configuration:
* `ssl`: `true`
* `certfile`: `cert.pem`
* `keyfile`: `key.pem`
## WebSocket Events
Whenever a voice command is recognized, Rhasspy emits JSON events over a websocket connection available at `ws://rhasspy:12101/api/events/intent` (replace `ws://` with `wss://` if you're using [secure hosting](usage.md#secure-hosting-with-https)).
You can listen to these events in a [Node-RED](https://nodered.org) flow, and easily add offline, private voice commands to your home automation set up!
For the `ChangLightState` intent from the [RGB Light Example](index.md#rgb-light-example), Rhasspy will emit a JSON event like this over the websocket:
```json
{
"text": "set the bedroom light to red",
"intent": {
"name": "ChangeLightColor",
"confidence": 1
},
"entities": [
{
"entity": "name",
"value": "bedroom"
},
{
"entity": "color",
"value": "red"
}
],
"slots": {
"name": "bedroom",
"color": "red"
}
}
```
![Web interface log tab](img/web-log.png)
## Home Assistant
@@ -164,6 +123,13 @@ automation:
You've now added offline, private voice commands to your Home Assistant. Happy automating!
### Getting the Spoken Text
The Home Assistant event will contain two extra slots besides the ones you specify:
* `_text` - spoken voice command text with [substitutions](training.md#substitutions)
* `_raw_text` - literal transcription of voice command
## Node-RED
Rhasspy can interact directly with [Node-RED](https://nodered.org) directly through [websockets](usage.md#websocket-events).
@@ -174,23 +140,90 @@ Make sure to also set send/receive to "entire message".
More example flows are available [on Github](https://github.com/synesthesiam/rhasspy/tree/master/examples/nodered).
### WebSocket Events
Whenever a voice command is recognized, Rhasspy emits JSON events over a websocket connection available at `ws://rhasspy:12101/api/events/intent` (replace `ws://` with `wss://` if you're using [secure hosting](usage.md#secure-hosting-with-https)).
You can listen to these events in a [Node-RED](https://nodered.org) flow, and easily add offline, private voice commands to your home automation set up!
For the `ChangLightState` intent from the [RGB Light Example](index.md#rgb-light-example), Rhasspy will emit a JSON event like this over the websocket:
```json
{
"text": "set the bedroom light to red",
"intent": {
"name": "ChangeLightColor",
"confidence": 1
},
"entities": [
{
"entity": "name",
"value": "bedroom"
},
{
"entity": "color",
"value": "red"
}
],
"slots": {
"name": "bedroom",
"color": "red"
}
}
```
## MQTT and Snips
Rhasspy is able to interoperate with Snips.AI services using the [Hermes protocol](https://docs.snips.ai/reference/hermes) over [MQTT](http://mqtt.org). The following components are Snips/Hermes compatible:
* [Microphone input](audio-input.md#mqtthermes)
* [Wake word](wake-word.md#mqtthermes)
* [Speech to text](speech-to-text.md#mqtthermes)
* [Intent recognition](intent-recognition.md#mqtthermes)
* [Audio output](audio-output.md#mqtthermes)
## HTTP API
Rhasspy features a comprehensive HTTP API available at `/api/`, documented with [OpenAPI 3](https://github.com/OAI/OpenAPI-Specification) (Swagger). See the [HTTP API reference](reference.md#http-api) for more details.
### Secure Hosting with HTTPS
If you need to access Rhasspy's web interface/API through HTTPS (formally SSL), you can provide a certificate and key file via command-line parameters or the Hass.io configuration.
If you're running Rhasspy via Docker or in a virtual environment, add `--ssl <CERT_FILE> <KEY_FILE>` to the command-line arguments where `<CERT_FILE>` is your SSL certificate and `<KEY_FILE>` is your SSL key file.
You can generate a self-signed certificate with the following command:
openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem -keyout key.pem -days 365
After answering the series of questions, you should have `cert.pem` and `key.pem` in your current directory. Then run Rhasspy with:
<RHASSPY COMMAND> --ssl cert.pem key.pem
The web interface will now be available at [https://localhost:12101](https://localhost:12101) and the web socket events at `wss://localhost:12101/api/events/intent`
In Hass.io, you will need to set the following options via the web interface or in your JSON configuration:
* `ssl`: `true`
* `certfile`: `cert.pem`
* `keyfile`: `key.pem`
## Command Line
You can access portions of Rhasspy's functionality without running a web server through the command-line interface.
The `rhasspy` Python module runs this interface in its `__main__`, so it's accessible from Rhasspy's source code directory by running:
python3 -m rhasspy <COMMAND> <ARGUMENTS>
This will only work inside a properly set up [virtual environment](installation.md#virtual-environment), however.
If you run Rhasspy through [Docker](installation.md#docker), the [rhasspy-cli](https://github.com/synesthesiam/rhasspy/blob/master/bin/rhasspy-cli) script should be used instead:
wget https://github.com/synesthesiam/rhasspy/blob/master/bin/rhasspy-cli
chmod +x rhasspy-cli
./rhasspy-cli --help
Put this script in your `~/bin` directory so that you can refer to it as `rhasspy-cli` from any directory.
By default, it will look for profiles in `$XDG_CONFIG_FILE/rhasspy/profiles`, which is probably `~/.config/rhasspy/profiles` (see [XDG specification](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) for more information).
**Beware**: the `rhasspy-cli` script runs under your user account and grants Rhasspy **write access to your home directory**.
This is needed to save files during the training process, and to avoid those files being owned by `root`.
The [rhasspy-cli-ro](https://github.com/synesthesiam/rhasspy/blob/master/bin/rhasspy-cli-ro) script can be used for read only operations, such as speech to text or intent handling, but cannot make any changes to your file system.
@@ -200,240 +233,13 @@ The [rhasspy-cli-ro](https://github.com/synesthesiam/rhasspy/blob/master/bin/rha
The `rhasspy-cli` script takes a command and a set of arguments:
rhasspy-cli --profile <PROFILE_NAME> <COMMAND> <ARGUMENTS>
Adding `--debug` before the command will print additional information to the console:
rhasspy-cli --debug --profile <PROFILE_NAME> <COMMAND> <ARGUMENTS>
You can override profile settings with `--set` like this:
rhasspy-cli --profile <PROFILE_NAME> --set <SETTING_NAME> <SETTING_VALUE> ... <COMMAND> <ARGUMENTS>
### Available Commands
For `rhasspy-cli --profile <PROFILE_NAME> <COMMAND> <ARGUMENTS>`, `<COMMAND>` can be:
* `info`
* Print profile JSON to standard out
* Add `--defaults` to only print settings from `defaults.json`
* `wav2text`
* Convert WAV file(s) to text
* `wav2intent`
* Convert WAV file(s) to intent JSON
* Add `--handle` to have Rhasspy send events to Home Assistant
* `text2intent`
* Convert text command(s) to intent JSON
* Add `--handle` to have Rhasspy send events to Home Assistant
* `train`
* Re-train your profile
* `mic2wav`
* Listen for a voice command and output WAV data
* Add `--timeout <SECONDS>` to stop recording after some number of seconds
* `mic2text`
* Listen for a voice command and convert it to text
* Add `--timeout <SECONDS>` to stop recording after some number of seconds
* `mic2intent`
* Listen for a voice command output intent JSON
* Add `--handle` to have Rhasspy send events to Home Assistant
* Add `--timeout <SECONDS>` to stop recording after some number of seconds
* `word2phonemes`
* Print the CMU phonemes for a word (possibly unknown)
* Add `-n <COUNT>` to control the maximum number of guessed pronunciations
* `word2wav`
* Pronounce a word (possibly unknown) and output WAV data
* `text2speech`
* Speaks one or more sentences using Rhasspy's text to speech system
* `text2wav`
* Converts a single sentence to WAV using Rhasspy's text to speech system
* `sleep`
* Run Rhasspy and wait until wake word is spoken
* `download`
* Download necessary profile files from the internet
### Profile Operations
Print the complete JSON for the English profile with:
rhasspy-cli --profile en info
You can combine this with other commands, such as `jq` to get at specific pieces:
rhasspy-cli info --profile en | jq .wake.pocketsphinx.keyphrase
Output (JSON):
"okay rhasspy"
### Training
Retrain your the English profile with:
rhasspy-cli --profile en train
Add `--debug` before `train` for more information.
### Speech to Text/Intent
Convert a WAV file to text from stdin:
rhasspy-cli --profile en wav2text < what-time-is-it.wav
Output (text):
what time is it
Convert multiple WAV files:
rhasspy-cli --profile en wav2text what-time-is-it.wav turn-on-the-living-room-lamp.wav
Output (JSON)
```json
{
"what-time-is-it.wav": "what time is it",
"turn-on-the-living-room-lamp.wav": "turn on the living room lamp"
}
```
Convert multiple WAV file(s) to intents **and** handle them:
rhasspy-cli --profile en wav2intent --handle what-time-is-it.wav turn-on-the-living-room-lamp.wav
Output (JSON):
```json
{
"what_time_is_it.wav": {
"text": "what time is it",
"intent": {
"name": "GetTime",
"confidence": 1.0
},
"entities": []
},
"turn_on_living_room_lamp.wav": {
"text": "turn on the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "on"
},
{
"entity": "name",
"value": "living room lamp"
}
]
}
}
```
### Text to Intent
Handle a command as if it was spoken:
rhasspy-cli --profile en text2intent --handle "turn off the living room lamp"
Output (JSON):
```json
{
"turn off the living room lamp": {
"text": "turn off the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "off"
},
{
"entity": "name",
"value": "living room lamp"
}
]
}
}
```
### Record Your Voice
Save a voice command to a WAV:
rhasspy-cli --profile en mic2wav > my-voice-command.wav
You can listen to it with:
aplay my-voice-command.wav
### Test Your Wake Word
Start Rhasspy and wait for wake word:
rhasspy-cli --profile en sleep
Should exit and print the wake word when its spoken.
### Text to Speech
Have Rhasspy speak one or more sentences:
rhasspy-cli --profile en text2speech "We ride at dawn!"
Use a different text to speech system and voice:
rhasspy-cli --profile en \
--set 'text_to_speech.system' 'flite' \
--set 'text_to_speech.flite.voice' 'slt' \
text2speech "We ride at dawn!"
### Pronounce Words
Speak words Rhasspy doesn't know!
rhasspy-cli --profile en word2wav raxacoricofallapatorius | aplay
### Text to Speech to Text to Intent
Use the miracle of Unix pipes to have Rhasspy interpret voice commands from itself:
rhasspy-cli --profile en \
--set 'text_to_speech.system' 'picotts' \
text2wav "turn on the living room lamp" | \
rhasspy-cli --profile en wav2text | \
rhasspy-cli --profile en text2intent
Output (JSON):
```json
{
"turn on the living room lamp": {
"text": "turn on the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "on"
},
{
"entity": "name",
"value": "living room lamp"
}
],
"speech_confidence": 1,
"slots": {
"state": "on",
"name": "living room lamp"
}
}
}
```
See the [command-line reference](reference.md#command-line) for available commands.
+45 -14
View File
@@ -34,16 +34,16 @@ Add to your [profile](profiles.md):
"listen_on_start": true
}
```
There are a lot of [keyword files](https://github.com/Picovoice/Porcupine/tree/master/resources/keyword_files) available for download. Use the `linux` platform if you're on desktop/laptop (`amd64`) and the `raspberrypi` platform if you're using a Raspberry Pi (`armhf`/`aarch64`). The `.ppn` files should go in the `porcupine` directory inside your profile (referenced by `keyword_path`).
If you want to create a custom wake word, you will need to run the [Porcupine Optimizer](https://github.com/Picovoice/Porcupine/tree/master/tools/optimizer). **NOTE**: the generated keyword file is only valid for 30 days, though you can always just re-run the optimizer.
If you want to create a custom wake word, you will need to use the [Picovoice Console](https://github.com/Picovoice/porcupine#picovoice-console). **NOTE**: the generated keyword file is only valid for 30 days, though you can always just re-run the optimizer.
See `rhasspy.wake.PorcupineWakeListener` for details.
## Snowboy
Listens for a wake word with [snowboy](https://snowboy.kitt.ai). This system has the good performance out of the box, but requires an online service to train.
Listens for one or more wake words with [snowboy](https://snowboy.kitt.ai). This system has the good performance out of the box, but requires an online service to train.
Add to your [profile](profiles.md):
@@ -54,10 +54,10 @@ Add to your [profile](profiles.md):
"wakeword_id": "default"
},
"snowboy": {
"model": "model-name-in-profile.(u|p)mdl",
"model": "snowboy/snowboy.umdl",
"audio_gain": 1,
"sensitivity": 0.5,
"chunk_size": 960
"sensitivity": "0.5",
"apply_frontend": false
}
},
@@ -65,10 +65,41 @@ Add to your [profile](profiles.md):
"listen_on_start": true
}
```
If your hotword model has multiple embedded hotwords (such as `jarvis.umdl`), the "sensitivity" parameter should contain sensitivities for each embedded hotword separated by commas (e.g., "0.5,0.5").
Visit [the snowboy website](https://snowboy.kitt.ai) to train your own wake word model (requires linking to a GitHub/Google/Facebook account). This *personal* model with end with `.pmdl`, and should go in your profile directory. Then, set `wake.snowboy.model` to the name of that file.
You also have the option of using a pre-train *universal* model (`.umdl`) from [Kitt.AI](https://github.com/Kitt-AI/snowboy/tree/master/resources/models). I've received errors using anything but `snowboy.umdl`, but YMMV.
You also have the option of using a pre-train *universal* model (`.umdl`) from [Kitt.AI](https://github.com/Kitt-AI/snowboy/tree/master/resources/models).
### Multiple Wake Words
You can have `snowboy` listen for multiple wake words with different models, each with their own settings. You will need to download each model file to the `snowboy` directory in your profile.
For example, to use both the `snowboy.umdl` and `jarvis.umdl` models, add this to your profile:
```json
"wake": {
"system": "snowboy",
"snowboy": {
"model": "snowboy/snowboy.umdl,snowboy/jarvis.umdl",
"model_settings": {
"snowboy/snowboy.umdl": {
"sensitivity": "0.5",
"audio_gain": 1,
"apply_frontend": false
},
"snowboy/jarvis.umdl": {
"sensitivity": "0.5,0.5",
"audio_gain": 1,
"apply_frontend": false
}
}
}
}
```
Make sure to include all models you want in the `model` setting (separated by commas). Each model may have different settings in `model_settings`. If a setting is not present, the default values under `snowboy` will be used.
See `rhasspy.wake.SnowboyWakeListener` for details.
@@ -92,7 +123,7 @@ Add to your [profile](profiles.md):
"listen_on_start": true
}
```
Set `wake.pocketsphinx.keyphrase` to whatever you like, though 3-4 syllables is recommended. Make sure to [train](training.md) and restart Rhasspy whenever you change the keyphrase.
The `wake.pocketsphinx.threshold` should be in the range 1e-50 to 1e-5. The smaller the number, the less like the keyphrase is to be observed. At least one person has written a script to [automatically tune the threshold](https://medium.com/@PankajB96/automatic-tuning-of-keyword-spotting-thresholds-a27256869d31).
@@ -120,14 +151,14 @@ Add to your [profile](profiles.md):
"listen_on_start": true
}
```
Follow [the instructions from Mycroft AI](https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word#how-to-train-your-own-wake-word) to train your own wake word model. When you're finished, place **both** the `.pb` and `.pb.params` files in your profile directory, and set `wake.precise.model` to the name of the `.pb` file.
See `rhasspy.wake.PreciseWakeListener` for details.
## MQTT/Hermes
Subscribes to the `hermes/hotword/<WAKEWORD_ID>/detected` topic, and wakes Rhasspy up when a message is received ([Hermes protocol](https://docs.snips.ai/ressources/hermes-protocol)). This allows Rhasspy to use the wake word functionality in [Snips.AI](https://snips.ai/).
Subscribes to the `hermes/hotword/<WAKEWORD_ID>/detected` topic, and wakes Rhasspy up when a message is received ([Hermes protocol](https://docs.snips.ai/reference/hermes)). This allows Rhasspy to use the wake word functionality in [Snips.AI](https://snips.ai/).
Add to your [profile](profiles.md):
@@ -153,7 +184,7 @@ Add to your [profile](profiles.md):
"site_id": "default"
}
```
Adjust the `mqtt` configuration to connect to your MQTT broker.
Set `mqtt.site_id` to match your Snips.AI siteId and `wake.hermes.wakeword_id` to match your Snips.AI wakewordId.
@@ -178,7 +209,7 @@ Add to your [profile](profiles.md):
"listen_on_start": true
}
```
When Rhasspy starts, your program will be called with the given arguments. Once your program detects the wake word, it should print it to standard out and exit. Rhasspy will call your program again when it goes back to sleep. If the empty string is printed, Rhasspy will **not** wake up and your program will be called again.
The following environment variables are available to your program:
+10 -12
View File
@@ -13,7 +13,7 @@ DEFINE_boolean 'precise' true 'Install Mycroft Precise'
DEFINE_boolean 'kaldi' true 'Install Kaldi'
DEFINE_boolean 'offline' false "Don't download anything"
DEFINE_boolean 'all-cpu' false 'Download dependencies for all CPU architectures'
DEFINE_string 'cpu-arch' "${cpu_arch}" 'CPU architecture (x86_64, armv7l, arm64v8)'
DEFINE_string 'cpu-arch' "${cpu_arch}" 'CPU architecture (x86_64, armv7l, arm64v8, armv6l)'
FLAGS "$@" || exit $?
eval set -- "${FLAGS_ARGV}"
@@ -47,14 +47,14 @@ fi
# -----------------------------------------------------------------------------
function maybe_download {
if [[ ! -f "$2" ]]; then
if [[ ! -z "${offline}" ]]; then
if [[ ! -s "$2" ]]; then
if [[ -n "${offline}" ]]; then
echo "Need to download $1 but offline."
exit 1
fi
mkdir -p "$(dirname "$2")"
curl -sSfL -o "$2" "$1"
curl -sSfL -o "$2" "$1" || { echo "Can't download $1"; exit 1; }
echo "$1 => $2"
fi
}
@@ -65,9 +65,10 @@ declare -A CPU_TO_FRIENDLY
CPU_TO_FRIENDLY["x86_64"]="amd64"
CPU_TO_FRIENDLY["armv7l"]="armhf"
CPU_TO_FRIENDLY["arm64v8"]="aarch64"
CPU_TO_FRIENDLY["armv6l"]="armv6l"
# CPU architecture
if [[ ! -z "${all_cpu}" ]]; then
if [[ -n "${all_cpu}" ]]; then
CPU_ARCHS=("x86_64" "armv7l" "arm64v8")
FRIENDLY_ARCHS=("amd64" "armhf" "aarch64")
else
@@ -79,10 +80,9 @@ fi
# Rhasspy
# -----------------------------------------------------------------------------
for FRIENDLY_ARCH in "${FRIENDLY_ARCHS[@]}";
do
for FRIENDLY_ARCH in "${FRIENDLY_ARCHS[@]}"; do
rhasspy_files=("rhasspy-tools_${FRIENDLY_ARCH}.tar.gz" "rhasspy-web-dist.tar.gz")
for rhasspy_file_name in "${rhasspy_files}"; do
for rhasspy_file_name in "${rhasspy_files[@]}"; do
rhasspy_file="${download_dir}/${rhasspy_file_name}"
rhasspy_file_url="https://github.com/synesthesiam/rhasspy/releases/download/v2.0/${rhasspy_file_name}"
maybe_download "${rhasspy_file_url}" "${rhasspy_file}"
@@ -110,8 +110,7 @@ maybe_download "${snowboy_url}" "${snowboy_file}"
# -----------------------------------------------------------------------------
if [[ -z "${no_precise}" ]]; then
for CPU_ARCH in "${CPU_ARCHS}";
do
for CPU_ARCH in "${CPU_ARCHS[@]}"; do
case $CPU_ARCH in
x86_64|armv7l)
precise_file="${download_dir}/precise-engine_0.3.0_${CPU_ARCH}.tar.gz"
@@ -126,8 +125,7 @@ fi
# -----------------------------------------------------------------------------
if [[ -z "${no_kaldi}" ]]; then
for FRIENDLY_ARCH in "${FRIENDLY_ARCHS}"
do
for FRIENDLY_ARCH in "${FRIENDLY_ARCHS[@]}"; do
# Install pre-built package
kaldi_file="${download_dir}/kaldi_${FRIENDLY_ARCH}.tar.gz"
kaldi_url="https://github.com/synesthesiam/kaldi-docker/releases/download/v1.0/kaldi_${FRIENDLY_ARCH}.tar.gz"
+2 -2
View File
@@ -43,7 +43,7 @@ switch:
command_on: "echo 'Living room lamp ON'"
command_off: "echo 'Living room lamp OFF'"
garage_light:
command_on: "echo 'Garage light ON'"
command_on: "echo 'Garage light ON'"
command_off: "echo 'Garage light OFF'"
# Doors
@@ -53,7 +53,7 @@ binary_sensor:
command: "bash -c 'sec=$(date +%s); [[ $(($sec % 2)) -eq 0 ]] && echo open || echo closed'"
payload_on: "closed"
payload_off: "open"
# Temperature
sensor:
- platform: command_line
+1 -1
View File
@@ -1,5 +1,5 @@
default_view:
view: yes
view: true
entities:
- group.inside
- group.garage
+39
View File
@@ -0,0 +1,39 @@
# ATLAS specific Linux ARM configuration
ifndef DOUBLE_PRECISION
$(error DOUBLE_PRECISION not defined.)
endif
ifndef OPENFSTINC
$(error OPENFSTINC not defined.)
endif
ifndef OPENFSTLIBS
$(error OPENFSTLIBS not defined.)
endif
ifndef ATLASINC
$(error ATLASINC not defined.)
endif
ifndef ATLASLIBS
$(error ATLASLIBS not defined.)
endif
CXXFLAGS = -std=c++11 -I.. -isystem $(OPENFSTINC) -O1 $(EXTRA_CXXFLAGS) \
-Wall -Wno-sign-compare -Wno-unused-local-typedefs \
-Wno-deprecated-declarations -Winit-self \
-DKALDI_DOUBLEPRECISION=$(DOUBLE_PRECISION) \
-DHAVE_EXECINFO_H=1 -DHAVE_CXXABI_H -DHAVE_ATLAS -I$(ATLASINC) \
-ftree-vectorize -pthread \
-g # -O0 -DKALDI_PARANOID
ifeq ($(KALDI_FLAVOR), dynamic)
CXXFLAGS += -fPIC
endif
# Compiler specific flags
COMPILER = $(shell $(CXX) -v 2>&1)
ifeq ($(findstring clang,$(COMPILER)),clang)
# Suppress annoying clang warnings that are perfectly valid per spec.
CXXFLAGS += -Wno-mismatched-tags
endif
LDFLAGS = $(EXTRA_LDFLAGS) $(OPENFSTLDFLAGS) -rdynamic
LDLIBS = $(EXTRA_LDLIBS) $(OPENFSTLIBS) $(ATLASLIBS) -lm -lpthread -ldl
+19
View File
@@ -0,0 +1,19 @@
[Unit]
Description=Rhasspy
After=syslog.target network.target
[Service]
Type=simple
WorkingDirectory=/home/<USER>/path/to/rhasspy
ExecStart=/bin/bash -lc './run-venv.sh --profile <LANGUAGE>'
RestartSec=1
Restart=on-failure
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=rhasspy
[Install]
WantedBy=multi-user.target
@@ -40,7 +40,7 @@ switch:
command_on: "echo 'Living room lamp ON'"
command_off: "echo 'Living room lamp OFF'"
garage_light:
command_on: "echo 'Garage light ON'"
command_on: "echo 'Garage light ON'"
command_off: "echo 'Garage light OFF'"
# Doors
@@ -50,7 +50,7 @@ binary_sensor:
command: "bash -c 'sec=$(date +%s); [[ $(($sec % 2)) -eq 0 ]] && echo open || echo closed'"
payload_on: "closed"
payload_off: "open"
# Temperature
sensor:
- platform: command_line
@@ -75,7 +75,7 @@ switch:
command_on: "echo 'Living room lamp ON'"
command_off: "echo 'Living room lamp OFF'"
garage_light:
command_on: "echo 'Garage light ON'"
command_on: "echo 'Garage light ON'"
command_off: "echo 'Garage light OFF'"
# Doors
@@ -85,7 +85,7 @@ binary_sensor:
command: "bash -c 'sec=$(date +%s); [[ $(($sec % 2)) -eq 0 ]] && echo open || echo closed'"
payload_on: "closed"
payload_off: "open"
# Temperature
sensor:
- platform: command_line
@@ -42,7 +42,7 @@ switch:
command_on: "echo 'Living room lamp ON'"
command_off: "echo 'Living room lamp OFF'"
garage_light:
command_on: "echo 'Garage light ON'"
command_on: "echo 'Garage light ON'"
command_off: "echo 'Garage light OFF'"
# Doors
@@ -52,7 +52,7 @@ binary_sensor:
command: "bash -c 'sec=$(date +%s); [[ $(($sec % 2)) -eq 0 ]] && echo open || echo closed'"
payload_on: "closed"
payload_off: "open"
# Temperature
sensor:
- platform: command_line
+1 -1
View File
@@ -1,7 +1,7 @@
#!/usr/bin/env bash
set -e
if [[ -z "$(which phonetisaurus-train)" ]]; then
if [[ -z "$(command -v phonetisaurus-train)" ]]; then
echo "Phonetisaurus not installed!"
exit 1
fi
+3
View File
@@ -4,6 +4,7 @@ nav:
- Home: index.md
- Hardware: hardware.md
- Installation: installation.md
- Tutorials: tutorials.md
- Usage: usage.md
- Profiles: profiles.md
- Training: training.md
@@ -15,5 +16,7 @@ nav:
- Intent Recognition: intent-recognition.md
- Intent Handling: intent-handling.md
- Text to Speech: text-to-speech.md
- Reference: reference.md
- Development: development.md
- License: license.md
- About: about.md
+21
View File
@@ -58,4 +58,25 @@ ignore_missing_imports = True
ignore_missing_imports = True
[mypy-google.*]
ignore_missing_imports = True
[mypy-networkx.*]
ignore_missing_imports = True
[mypy-num2words.*]
ignore_missing_imports = True
[mypy-doit.*]
ignore_missing_imports = True
[mypy-json5.*]
ignore_missing_imports = True
[mypy-quart.*]
ignore_missing_imports = True
[mypy-quart_cors.*]
ignore_missing_imports = True
[mypy-swagger_ui.*]
ignore_missing_imports = True
+4 -4
View File
@@ -64,16 +64,16 @@ class Porcupine(object):
"""
if not os.path.exists(library_path):
raise IOError("Could not find Porcupine's library at '%s'" % library_path)
raise IOError(f"Could not find Porcupine's library at '{library_path}'")
library = cdll.LoadLibrary(library_path)
if not os.path.exists(model_file_path):
raise IOError("Could not find model file at '%s'" % model_file_path)
raise IOError(f"Could not find model file at '{model_file_path}'")
if sensitivity is not None and keyword_file_path is not None:
if not os.path.exists(keyword_file_path):
raise IOError("Could not find keyword file at '%s'" % keyword_file_path)
raise IOError(f"Could not find keyword file at '{keyword_file_path}'")
keyword_file_paths = [keyword_file_path]
if not (0 <= sensitivity <= 1):
@@ -85,7 +85,7 @@ class Porcupine(object):
for x in keyword_file_paths:
if not os.path.exists(os.path.expanduser(x)):
raise IOError("Could not find keyword file at '%s'" % x)
raise IOError(f"Could not find keyword file at '{x}'")
for x in sensitivities:
if not (0 <= x <= 1):
+1
View File
@@ -1,6 +1,7 @@
{
"language": "ca",
"name": "ca",
"locale": "ca_ES",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower"
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
dilluns
dimarts
dimecres
dijous
divendres
dissabte
diumenge
+12
View File
@@ -0,0 +1,12 @@
de gener
de febrer
de març
dabril
de maig
de juny
de juliol
dagost
de setembre
doctubre
de novembre
de desembre
+2
View File
@@ -1,12 +1,14 @@
{
"language": "de",
"name": "de",
"locale": "de_DE",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower",
"kaldi": {
"base_dictionary": "kaldi/base_dictionary.txt",
"base_language_model": "kaldi/base_language_model.txt",
"base_language_model_fst": "kaldi/base_language_model.fst",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
Montag
Dienstag
Mittwoch
Donnerstag
Freitag
Samstag
Sonntag
+12
View File
@@ -0,0 +1,12 @@
Januar
Februar
März
April
Mai
Juni
Juli
August
September
Oktober
November
Dezember
+32 -7
View File
@@ -28,14 +28,18 @@
"program": ""
},
"forward_to_hass": false,
"system": "dummy"
"system": "dummy",
"remote": {
"url": "http://my-server:port/endpoint"
}
},
"home_assistant": {
"access_token": "",
"api_password": "",
"event_type_format": "rhasspy_{0}",
"pem_file": "",
"url": "http://hassio/homeassistant/"
"url": "http://hassio/homeassistant/",
"handle_type": "event"
},
"intent": {
"adapt": {
@@ -48,14 +52,17 @@
"conversation": {
"handle_speech": true
},
"error_sound": true,
"fuzzywuzzy": {
"examples_json": "intent_examples.json",
"min_confidence": 0
},
"fsticuffs": {
"intent_fst": "intent.fst",
"intent_graph": "intent.json",
"ignore_unknown_words": true,
"fuzzy": true
"fuzzy": true,
"converters_dir": "converters"
},
"flair": {
"cache_dir": "flair/cache",
@@ -81,11 +88,13 @@
"microphone": {
"arecord": {
"chunk_size": 960,
"device": ""
"device": "",
"keep_device_open": true
},
"pyaudio": {
"device": "",
"frames_per_buffer": 480
"frames_per_buffer": 480,
"keep_device_open": true
},
"stdin": {
"auto_start": true,
@@ -119,7 +128,8 @@
"sounds": {
"recorded": "${RHASSPY_BASE_DIR}/etc/wav/beep_lo.wav",
"system": "aplay",
"wake": "${RHASSPY_BASE_DIR}/etc/wav/beep_hi.wav"
"wake": "${RHASSPY_BASE_DIR}/etc/wav/beep_hi.wav",
"error": "${RHASSPY_BASE_DIR}/etc/wav/beep_error.wav"
},
"speech_to_text": {
"command": {
@@ -170,8 +180,17 @@
"remote": {
"url": "http://my-server:12101/api/speech-to-text"
},
"hass_stt": {
"platform": "",
"sample_rate": 16000,
"bit_size": 16,
"channels": 1,
"language": "en-US"
},
"sentences_ini": "sentences.ini",
"sentences_dir": "intents",
"slots_dir": "slots",
"slot_programs_dir": "slot_programs",
"system": "dummy"
},
"text_to_speech": {
@@ -179,6 +198,7 @@
"arguments": [],
"program": ""
},
"disable_wake": false,
"espeak": {},
"flite": {
"voice": "kal16"
@@ -197,6 +217,9 @@
"url": "https://texttospeech.googleapis.com/v1/text:synthesize",
"voice": "Wavenet-C",
"fallback_tts": "espeak"
},
"hass_tts": {
"platform": ""
}
},
"training": {
@@ -256,7 +279,8 @@
"audio_gain": 1,
"chunk_size": 960,
"model": "snowboy/snowboy.umdl",
"sensitivity": 0.5
"sensitivity": 0.5,
"model_settings": {}
},
"porcupine": {
"library_path": "porcupine/libpv_porcupine.so",
@@ -266,6 +290,7 @@
},
"system": "dummy"
},
"webhooks": {},
"download": {
"cache_dir": "download",
"conditions": {
+1
View File
@@ -1,6 +1,7 @@
{
"language": "el",
"name": "el",
"locale": "el_GR",
"speech_to_text": {
"g2p_casing": "lower",
"system": "pocketsphinx",
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
Δευτέρα
Τρίτη
Τετάρτη
Πέμπτη
Παρασκευή
Σάββατο
Κυριακή
+12
View File
@@ -0,0 +1,12 @@
Ιανουαρίου
Φεβρουαρίου
Μαρτίου
Απριλίου
Μαΐου
Ιουνίου
Ιουλίου
Αυγούστου
Σεπτεμβρίου
Οκτωβρίου
Νοεμβρίου
Δεκεμβρίου
+2
View File
@@ -1,12 +1,14 @@
{
"language": "en",
"name": "en",
"locale": "en_US",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower",
"kaldi": {
"base_dictionary": "kaldi/base_dictionary.txt",
"base_language_model": "kaldi/base_language_model.txt",
"base_language_model_fst": "kaldi/base_language_model.fst",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
+12
View File
@@ -0,0 +1,12 @@
January
February
March
April
May
June
July
August
September
October
November
December
+1
View File
@@ -1,6 +1,7 @@
{
"language": "es",
"name": "es",
"locale": "es_ES",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower"
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
lunes
martes
miércoles
jueves
viernes
sábado
domingo
+12
View File
@@ -0,0 +1,12 @@
enero
febrero
marzo
abril
mayo
junio
julio
agosto
septiembre
octubre
noviembre
diciembre
View File
+60
View File
@@ -0,0 +1,60 @@
2 j
9 j
9: ?
9~ V
@ E
@: ?
A a
A: ?
A~ n
E E
E: s
E~ I
H j
I ?
J n
N N
O 0
O~ n
R r
R: ?
S tS
SIL ?
U ?
Z dZ
a a
a: ?
aU ?
b b
d d
dZ dZ
e eI
e: ?
f f
g g
h ?
i I
i: ?
j @
k k
l l
m m
n n
o oU
o: ?
p p
pf ?
r r
s s
t t
tS tS
ts t
u u:
u: ?
v v
w OI
x ?
y j
y: ?
z s
{ ?
+44
View File
@@ -0,0 +1,44 @@
2 bleu b l 2
9 club k l 9 b
9~ aucun o k 9~
@ ceci s @ s i
A base b A z
A~ andy A~ n d i
E aies E
E: têtes t E: t
E~ bien b j E~
H fuir f H i R
I avril A v R I l
J gagne g a J
N king k i N
O bord b O R
O~ bons b O~
R agir a Z i R
S chef S E f
Z ange A~ Z
a abri a b R i
a: marc m a: k
b aube o b
d aide E d
dZ jack dZ a k
e aidé E d e
f afin a f E~
g goût g u
i agit a Z i
j ayez E j e
k acte a k t
l allo a l o
m aime E m
n anna a n a
o allô a l o
p pain p E~
r prison p r i z O~
s alex a l E k s
t bite b i t
tS match m a tS
ts cents s E n ts
u chou S u
v avec a v E k
w coin k w E~
y buts b y t
z aise E z
+96 -60
View File
@@ -1,63 +1,99 @@
{
"language": "fr",
"name": "fr",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower"
},
"intent": {
"wavenet": {
"language_code": "fr-FR"
},
"flair": {
"embeddings": [
"lm-fr-charlm-backward.pt", "lm-fr-charlm-forward.pt"
]
}
},
"download": {
"conditions": {
"speech_to_text.system": {
"pocketsphinx": {
"acoustic_model": "cmusphinx-fr-5.2.tar.gz:cmusphinx-fr-5.2",
"base_dictionary.txt": "fr-g2p.tar.gz:base_dictionary.txt",
"g2p.fst": "fr-g2p.tar.gz:g2p.fst"
}
},
"speech_to_text.pocketsphinx.mix_weight": {
">0": {
"base_language_model.txt": "fr-small.lm.gz:fr-small.lm"
}
},
"intent.system": {
"flair": {
"flair/cache/embeddings/lm-fr-charlm-backward.pt": "lm-fr-charlm-backward.pt",
"flair/cache/embeddings/lm-fr-charlm-forward.pt": "lm-fr-charlm-forward.pt"
}
}
},
"files": {
"cmusphinx-fr-5.2.tar.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/cmusphinx-fr-5.2.tar.gz"
},
"fr-small.lm.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/fr-small.lm.gz"
},
"fr-g2p.tar.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/fr-g2p.tar.gz"
},
"lm-fr-charlm-backward.pt": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/lm-fr-charlm-backward.pt",
"cache": false
},
"lm-fr-charlm-forward.pt": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/lm-fr-charlm-forward.pt",
"cache": false
}
}
"language": "fr",
"name": "fr",
"locale": "fr_FR",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower",
"kaldi": {
"base_dictionary": "kaldi/base_dictionary.txt",
"base_language_model": "kaldi/base_language_model.txt",
"base_language_model_fst": "kaldi/base_language_model.fst",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
"graph": "graph",
"language_model": "kaldi/language_model.txt",
"model_dir": "kaldi/model",
"unknown_words": "kaldi/unknown_words.txt",
"mix_fst": "kaldi/mixed.fst",
"g2p_model": "kaldi/g2p.fst",
"phoneme_examples": "kaldi/phoneme_examples.txt",
"phoneme_map": "kaldi/espeak_phonemes.txt"
}
},
"intent": {
"wavenet": {
"language_code": "fr-FR"
},
"flair": {
"embeddings": [
"lm-fr-charlm-backward.pt",
"lm-fr-charlm-forward.pt"
]
}
},
"download": {
"conditions": {
"speech_to_text.system": {
"pocketsphinx": {
"acoustic_model": "cmusphinx-fr-5.2.tar.gz:cmusphinx-fr-5.2",
"base_dictionary.txt": "fr-g2p.tar.gz:base_dictionary.txt",
"g2p.fst": "fr-g2p.tar.gz:g2p.fst"
},
"kaldi": {
"kaldi": "fr_kaldi-zamia.tar.gz:kaldi"
}
},
"speech_to_text.pocketsphinx.mix_weight": {
">0": {
"base_language_model.txt": "fr-small.lm.gz:fr-small.lm"
}
},
"speech_to_text.kaldi.mix_weight": {
">0": {
"kaldi/base_language_model.txt": "generic_fr_lang_model_small-r20191016.arpa.tar.gz:generic_fr_lang_model_small-r20191016.arpa"
}
},
"speech_to_text.kaldi.open_transcription": {
"True": {
"kaldi/model/base_graph": "fr_kaldi-zamia-base_graph.tar.gz:base_graph"
}
},
"intent.system": {
"flair": {
"flair/cache/embeddings/lm-fr-charlm-backward.pt": "lm-fr-charlm-backward.pt",
"flair/cache/embeddings/lm-fr-charlm-forward.pt": "lm-fr-charlm-forward.pt"
}
}
},
"files": {
"cmusphinx-fr-5.2.tar.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/cmusphinx-fr-5.2.tar.gz"
},
"fr-small.lm.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/fr-small.lm.gz"
},
"fr-g2p.tar.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/fr-g2p.tar.gz"
},
"lm-fr-charlm-backward.pt": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/lm-fr-charlm-backward.pt",
"cache": false
},
"lm-fr-charlm-forward.pt": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/lm-fr-charlm-forward.pt",
"cache": false
},
"generic_fr_lang_model_small-r20191016.arpa.xz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/generic_fr_lang_model_small-r20191016.arpa.xz"
},
"fr_kaldi-zamia.tar.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/fr_kaldi-zamia.tar.gz"
},
"fr_kaldi-zamia-base_graph.tar.gz": {
"url": "https://github.com/synesthesiam/rhasspy-profiles/releases/download/v1.0-fr/fr_kaldi-zamia-base_graph.tar.gz"
}
}
}
}
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
lundi
mardi
mercredi
jeudi
vendredi
samedi
dimanche
+12
View File
@@ -0,0 +1,12 @@
janvier
février
mars
avril
mai
juin
juillet
août
septembre
octobre
novembre
décembre
+1
View File
@@ -1,6 +1,7 @@
{
"language": "hi",
"name": "hi",
"locale": "hi_IN",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower"
+29
View File
@@ -0,0 +1,29 @@
#!/usr/bin/env python3
import argparse
import sys
def main():
parser = argparse.ArgumentParser("number")
parser.add_argument("lower", type=int, help="Lower bound")
parser.add_argument("upper", type=int, help="Upper bound (inclusive)")
args, rest_args = parser.parse_known_args()
lower = args.lower
upper = args.upper
step = 1
if rest_args:
step = int(rest_args[0])
if upper < lower:
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
# -----------------------------------------------------------------------------
if __name__ == "__main__":
main()
+7
View File
@@ -0,0 +1,7 @@
सोमवार
मंगलवार
बुधवार
गुरुवार
शुक्रवार
शनिवार
रविवार
+12
View File
@@ -0,0 +1,12 @@
जनवरी
फ़रवरी
मार्च
अप्रैल
मई
जून
जुलाई
अगस्त
सितंबर
अक्तूबर
नवंबर
दिसंबर
+1
View File
@@ -1,6 +1,7 @@
{
"name": "it",
"language": "it",
"locale": "it_IT",
"speech_to_text": {
"system": "pocketsphinx",
"dictionary_casing": "lower"

Some files were not shown because too many files have changed in this diff Show More