31 KiB
Reference
Supported Languages
The table below lists which components and compatible with Rhasspy's supported languages.
| Category | Name | Offline? | en | de | es | fr | it | nl | ru | el | hi | zh | vi | pt | sv | ca |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Wake Word | pocketsphinx | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| porcupine | ✓ | ✓ | ||||||||||||||
| snowboy | requires account | ✓ | • | • | • | • | • | • | • | • | • | • | • | • | • | |
| precise | ✓ | ✓ | • | • | • | • | • | • | • | • | • | • | • | • | • | |
| Speech to Text | pocketsphinx | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| kaldi | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||||
| Intent Recognition | fsticuffs | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| fuzzywuzzy | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| adapt | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| flair | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||
| rasaNLU | needs extra software | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Text to Speech | espeak | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| flite | ✓ | ✓ | ✓ | |||||||||||||
| picotts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||
| marytts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||||||
| wavenet | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
• - yes, but requires training/customization
HTTP API
Rhasspy's HTTP endpoints are documented below. You can also visit /api/ in your Rhasspy server (note the final slash) to try out each endpoint.
Application authors may want to use the rhasspy-client, which provides a high-level interface to a remote Rhasspy server.
Endpoints
/api/custom-words- GET custom word dictionary as plain text, or POST to overwrite it
- See
custom_words.txtin your profile directory
/api/download-profile- Force Rhasspy to re-download profile
?delete=true- clear download cache
/api/listen-for-command- POST to wake Rhasspy up and start listening for a voice command
- Returns intent JSON when command is finished
?nohass=true- stop Rhasspy from handling the intent?timeout=<seconds>- override default command timeout?entity=<entity>&value=<value>- set custom entity/value in recognized intent
/api/listen-for-wake-word- POST to wake Rhasspy up and return immediately
/api/lookup- POST word as plain text to look up or guess pronunciation
?n=<number>- return at mostnguessed pronunciations
/api/microphones- GET list of available microphones
/api/phonemes- GET example phonemes from speech recognizer for your profile
- See
phoneme_examples.txtin your profile directory
/api/play-wav- POST to play WAV data
/api/profile- GET the JSON for your profile, or POST to overwrite it
?layers=profileto only see settings different fromdefaults.json- See
profile.jsonin your profile directory
/api/restart- Restart Rhasspy server
/api/sentences- GET voice command templates or POST to overwrite
- Set
Accept: application/jsonto GET JSON with all sentence files - Set
Content-Type: application/jsonto POST JSON with sentences for multiple files - See
sentences.iniandintentsdirectory in your profile
/api/slots- GET slot values as JSON or POST to add to/overwrite them
?overwrite_all=trueto clear slots in JSON before writing
/api/speakers- GET list of available audio output devices
/api/speech-to-intent- POST a WAV file and have Rhasspy process it as a voice command
- Returns intent JSON when command is finished
?nohass=true- stop Rhasspy from handling the intent
/api/start-recording- POST to have Rhasspy start recording a voice command
/api/stop-recording- POST to have Rhasspy stop recording and process recorded data as a voice command
- Returns intent JSON when command has been processed
?nohass=true- stop Rhasspy from handling the intent
/api/test-microphones- GET list of available microphones and if they're working
/api/text-to-intent- POST text and have Rhasspy process it as command
- Returns intent JSON when command has been processed
?nohass=true- stop Rhasspy from handling the intent
/api/text-to-speech- POST text and have Rhasspy speak it
?play=false- get WAV data instead of having Rhasspy speak?voice=<voice>- override default TTS voice?language=<language>- override default TTS language or locale?repeat=true- have Rhasspy repeat the last sentence it spoke
/api/train- POST to re-train your profile
?nocache=true- re-train profile from scratch
/api/unknown-words- GET words that Rhasspy doesn't know in your sentences
- See
unknown_words.txtin your profile directory
Websocket API
/api/events/intent- Listen for recognized intents published as JSON
/api/events/log- Listen for log messages published as plain text
Command Line
Rhasspy provides a powerful command-line interface called rhasspy-cli.
For rhasspy-cli --profile <PROFILE_NAME> <COMMAND> <ARGUMENTS>, <COMMAND> can be:
info- Print profile JSON to standard out
- Add
--defaultsto only print settings fromdefaults.json
wav2text- Convert WAV file(s) to text
wav2intent- Convert WAV file(s) to intent JSON
- Add
--handleto have Rhasspy send events to Home Assistant
text2intent- Convert text command(s) to intent JSON
- Add
--handleto have Rhasspy send events to Home Assistant
train- Re-train your profile
mic2wav- Listen for a voice command and output WAV data
- Add
--timeout <SECONDS>to stop recording after some number of seconds
mic2text- Listen for a voice command and convert it to text
- Add
--timeout <SECONDS>to stop recording after some number of seconds
mic2intent- Listen for a voice command output intent JSON
- Add
--handleto have Rhasspy send events to Home Assistant - Add
--timeout <SECONDS>to stop recording after some number of seconds
word2phonemes- Print the CMU phonemes for a word (possibly unknown)
- Add
-n <COUNT>to control the maximum number of guessed pronunciations
word2wav- Pronounce a word (possibly unknown) and output WAV data
text2speech- Speaks one or more sentences using Rhasspy's text to speech system
text2wav- Converts a single sentence to WAV using Rhasspy's text to speech system
sleep- Run Rhasspy and wait until wake word is spoken
download- Download necessary profile files from the internet
Profile Operations
Print the complete JSON for the English profile with:
rhasspy-cli --profile en info
You can combine this with other commands, such as jq to get at specific pieces:
rhasspy-cli info --profile en | jq .wake.pocketsphinx.keyphrase
Output (JSON):
"okay rhasspy"
Training
Retrain your the English profile with:
rhasspy-cli --profile en train
Add --debug before train for more information.
Speech to Text/Intent
Convert a WAV file to text from stdin:
rhasspy-cli --profile en wav2text < what-time-is-it.wav
Output (text):
what time is it
Convert multiple WAV files:
rhasspy-cli --profile en wav2text what-time-is-it.wav turn-on-the-living-room-lamp.wav
Output (JSON)
{
"what-time-is-it.wav": "what time is it",
"turn-on-the-living-room-lamp.wav": "turn on the living room lamp"
}
Convert multiple WAV file(s) to intents and handle them:
rhasspy-cli --profile en wav2intent --handle what-time-is-it.wav turn-on-the-living-room-lamp.wav
Output (JSON):
{
"what_time_is_it.wav": {
"text": "what time is it",
"intent": {
"name": "GetTime",
"confidence": 1.0
},
"entities": []
},
"turn_on_living_room_lamp.wav": {
"text": "turn on the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "on"
},
{
"entity": "name",
"value": "living room lamp"
}
]
}
}
Text to Intent
Handle a command as if it was spoken:
rhasspy-cli --profile en text2intent --handle "turn off the living room lamp"
Output (JSON):
{
"turn off the living room lamp": {
"text": "turn off the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "off"
},
{
"entity": "name",
"value": "living room lamp"
}
]
}
}
Record Your Voice
Save a voice command to a WAV:
rhasspy-cli --profile en mic2wav > my-voice-command.wav
You can listen to it with:
aplay my-voice-command.wav
Test Your Wake Word
Start Rhasspy and wait for wake word:
rhasspy-cli --profile en sleep
Should exit and print the wake word when its spoken.
Text to Speech
Have Rhasspy speak one or more sentences:
rhasspy-cli --profile en text2speech "We ride at dawn!"
Use a different text to speech system and voice:
rhasspy-cli --profile en \
--set 'text_to_speech.system' 'flite' \
--set 'text_to_speech.flite.voice' 'slt' \
text2speech "We ride at dawn!"
Pronounce Words
Speak words Rhasspy doesn't know!
rhasspy-cli --profile en word2wav raxacoricofallapatorius | aplay
Text to Speech to Text to Intent
Use the miracle of Unix pipes to have Rhasspy interpret voice commands from itself:
rhasspy-cli --profile en \
--set 'text_to_speech.system' 'picotts' \
text2wav "turn on the living room lamp" | \
rhasspy-cli --profile en wav2text | \
rhasspy-cli --profile en text2intent
Output (JSON):
{
"turn on the living room lamp": {
"text": "turn on the living room lamp",
"intent": {
"name": "ChangeLightState",
"confidence": 1.0
},
"entities": [
{
"entity": "state",
"value": "on"
},
{
"entity": "name",
"value": "living room lamp"
}
],
"speech_confidence": 1,
"slots": {
"state": "on",
"name": "living room lamp"
}
}
}
Profile Settings
All available profile sections and settings are listed below:
rhasspy- configuration for Rhasspy assistantpreload_profile- true if speech/intent recognizers should be loaded immediately for default profile (default:true)listen_on_start- true if Rhasspy should listen for wake word at startup (default:true)load_timeout_sec- number of seconds to wait for internal actors before proceeding with start up
home_assistant- how to communicate with Home Assistant/Hass.iourl- Base URL of Home Assistant server (no/api)access_token- long-lived access token for Home Assistant (Hass.io token is used automatically)api_password- Password, if you have that enabled (deprecated)pem_file- Full path to your CA_BUNDLE file or a directory with certificates of trusted CAsevent_type_format- Python format string used to create event type from intent type ({0})
speech_to_text- transcribing voice commands to textsystem- name of speech to text system (pocketsphinx,remote,command, ordummy)pocketsphinx- configuration for Pocketsphinxcompatible- true if profile can use pocketsphinx for speech recognitionacoustic_model- directory with CMU 16Khz acoustic modelbase_dictionary- large text file with word pronunciations (read only)custom_words- small text file with words/pronunciations added by userdictionary- text file with all words/pronunciations needed for example sentencesunknown_words- small text file with guessed word pronunciations (from phonetisaurus)language_model- text file with trigram ARPA language model built from example sentencesopen_transcription- true if general language model should be used (custom voices commands ignored)base_language_model- large general language model (read only)mllr_matrix- MLLR matrix from acoustic model tuningmix_weight- how much of the base language model to mix in during training (0-1)mix_fst- path to save mixed ngram FST model
kaldi- configuration for Kaldicompatible- true if profile can use Kaldi for speech recognitionkaldi_dir- absolute path to Kaldi root directorymodel_dir- directory where Kaldi model is stored (relative to profile directory)graph- directory where HCLG.fst is located (relative tomodel_dir)base_graph- directory where large general HCLG.fst is located (relative tomodel_dir)base_dictionary- large text file with word pronunciations (read only)custom_words- small text file with words/pronunciations added by userdictionary- text file with all words/pronunciations needed for example sentencesopen_transcription- true if general language model should be used (custom voices commands ignored)unknown_words- small text file with guessed word pronunciations (from phonetisaurus)mix_weight- how much of the base language model to mix in during training (0-1)mix_fst- path to save mixed ngram FST model
remote- configuration for remote Rhasspy serverurl- URL to POST WAV data for transcription (e.g.,http://your-rhasspy-server:12101/api/speech-to-text)
command- configuration for external speech-to-text programprogram- path to executablearguments- list of arguments to pass to program
sentences_ini- Ini file with example sentences/JSGF templates grouped by intentsentences_dir- Directory with additional sentence templates (default:intents)g2p_model- finite-state transducer for phonetisaurus to guess word pronunciationsg2p_casing- casing to force for g2p model (upper,lower, or blank)dictionary_casing- casing to force for dictionary words (upper,lower, or blank)grammars_dir- directory to write generated JSGF grammars from sentences ini filefsts_dir- directory to write generated finite state transducers from JSGF grammars
intent- transforming text commands to intentssystem- intent recognition system (fsticuffs,fuzzywuzzy,rasa,remote,adapt,command, ordummy)fsticuffs- configuration for OpenFST-based intent recognizerintent_fst- path to generated finite state transducer with all intents combinedignore_unknown_words- true if words not in the FST symbol table should be ignoredfuzzy- true if text is matching in a fuzzy manner, skipping words instop_words.txt
fuzzywuzzy- configuration for simplistic Levenshtein distance based intent recognizerexamples_json- JSON file with intents/example sentencesmin_confidence- minimum confidence required for intent to be converted to a JSON event (0-1)
remote- configuration for remote Rhasspy serverurl- URL to POST text to for intent recognition (e.g.,http://your-rhasspy-server:12101/api/text-to-intent)
rasa- configuration for Rasa NLU based intent recognizerurl- URL of remote Rasa NLU server (e.g.,http://localhost:5005/)examples_markdown- Markdown file to generate with intents/example sentencesproject_name- name of project to generate during training
adapt- configuration for Mycroft Adapt based intent recognizerstop_words- text file with words to ignore in training sentences
command- configuration for external speech-to-text programprogram- path to executablearguments- list of arguments to pass to program
text_to_speech- pronouncing wordssystem- text to speech system (espeak,flite,picotts,marytts,command, ordummy)espeak- configuration for eSpeakphoneme_map- text file mapping CMU phonemes to eSpeak phonemes
flite- configuration for flitevoice- name of voice to use (e.g.,kal16,rms,awb)
picotts- configuration for PicoTTSlanguage- language to use (default if not present)
marytts- configuration for MaryTTSurl- address:port of MaryTTS server (port is usually 59125)voice- name of voice to use (e.g.,cmu-slt). Default if not present.locale- name of locale to use (e.g.,en-US). Default if not present.
wavenet- configuration for Google's WaveNetcache_dir- path to directory in your profile where WAV files are cachedcredentials_json- path to the JSON credentials file (generated online)gender- gender of speaker (MALEFEMALE)language_code- language/locale e.g.en-US,sample_rate- WAV sample rate (default: 22050)url- URL of WaveNet endpointvoice- voice to use (e.g.,Wavenet-C)fallback_tts- text to speech system to use when offline or error occurs (e.g.,espeak)
phoneme_examples- text file with examples for each CMU phoneme
training- training speech/intent recognizersdictionary_number_duplicates- true if duplicate words in dictionary should be suffixed by(2),(3), etc.tokenizer- system used to break sentences into words (regexonly for now)regex- configuration for regex tokenizerreplace- list of dictionaries with patterns/replacements used on each example sentencesplit- pattern used to break sentences into words
unknown_words- configuration for dealing with words not in base/custom dictionariesfail_when_present- true if Rhasspy should halt training when unknown words are foundguess_pronunciations- true if Phonetisaurus should be used to guess how an unknown word is pronounced
speech_to_text- training for speech decodersystem- speech to text training system (auto,pocketsphinx,kaldi,command, ordummy)command- configuration for external speech-to-text training programprogram- path to executablearguments- list of arguments to pass to program
intent- training for intent recognizersystem- intent recognizer training system (auto,fsticuffs,fuzzywuzzy,rasa,adapt,command, ordummy)command- configuration for external intent recognizer training programprogram- path to executablearguments- list of arguments to pass to program
wake- waking Rhasspy up for speech inputsystem- wake word recognition system (pocketsphinx,snowboy,precise,porcupine,command, ordummy)pocketsphinx- configuration for Pocketsphinx wake word recognizerkeyphrase- phrase to wake up on (3-4 syllables recommended)threshold- sensitivity of detection (recommended range 1e-50 to 1e-5)chunk_size- number of bytes per chunk to feed to Pocketsphinx (default 960)
snowboy- configuration for snowboymodel- path to model file(s), separated by commas (in profile directory)sensitivity- model sensitivity (0-1, default 0.5)audio_gain- audio gain (default 1)apply_frontend- true if ApplyFrontend should be setchunk_size- number of bytes per chunk to feed to snowboy (default 960)model_settings- settings for each snowboy model path (e.g.,snowboy/snowboy.umdl)- <MODEL_PATH>
sensitivity- model sensitivityaudio_gain- audio gainapply_frontend- true if ApplyFrontend should be set
- <MODEL_PATH>
precise- configuration for Mycroft Preciseengine_path- path to the precise-engine binarymodel- path to model file (in profile directory)sensitivity- model sensitivity (0-1, default 0.5)trigger_level- number of events to trigger activation (default 3)chunk_size- number of bytes per chunk to feed to Precise (default 2048)
porcupine- configuration for PicoVoice's Porcupinelibrary_path- path tolibpv_porcupine.sofor your platform/architecturemodel_path- path to theporcupine_params.pv(lib/common)keyword_path- path to the.ppnkeyword filesensitivity- model sensitivity (0-1, default 0.5)
command- configuration for external speech-to-text programprogram- path to executablearguments- list of arguments to pass to program
microphone- configuration for audio recordingsystem- audio recording system (pyaudio,arecord,hermes,http, ordummy)pyaudio- configuration for PyAudio microphonedevice- index of device to use or empty for default deviceframes_per_buffer- number of frames to read at a time (default 480)
arecord- configuration for ALSA microphonedevice- name of ALSA device (seearecord -L) to use or empty for default devicechunk_size- number of bytes to read at a time (default 960)
http- configuration for HTTP audio streamhost- hostname or IP address of HTTP audio server (default 127.0.0.1)port- port to receive audio stream on (default 12333)stop_after- one of "never", "text", or "intent" (see documentation)
gstreamer- configuration for GStreamer audio recorderpipeline- GStreamer pipeline (e.g.,FILTER ! FILTER ! ...) without sink
hermes- configuration for MQTT "microphone" (Hermes protocol)- Subscribes to WAV data from
hermes/audioServer/<SITE_ID>/audioFrame - Requires MQTT to be enabled
- Subscribes to WAV data from
sounds- configuration for feedback sounds from Rhasspysystem- which sound output system to use (aplay,hermes, ordummy)wake- path to WAV file to play when Rhasspy wakes uprecorded- path to WAV file to play when a command finishes recordingaplay- configuration for ALSA speakersdevice- name of ALSA device (seeaplay -L) to use or empty for default device
hermes- configuration for MQTT "speakers" (Hermes protocol)- WAV data published to
hermes/audioServer/<SITE_ID>/playBytes/<REQUEST_ID> - Requires MQTT to be enabled
- WAV data published to
commandsystem- which voice command listener system to use (webrtcvad,oneshot,hermes, ordummy)webrtcvad- configuration for webrtcvad systemsample_rate- sample rate of input audiochunk_size- bytes per buffer (must be 10,20,30 ms)vad_mode- sensitivity ofwebrtcvad(0-3)min_sec- minimum number of seconds in a commandsilence_sec- number of seconds of silences after voice command before stoppingtimeout_sec- maximum number of seconds before stoppingthrowaway_buffers- number of buffers to drop when recording startsspeech_buffers- number of buffers with speech before command starts
oneshot- configuration for voice command system that takes first audio frame as entire commandtimeout_sec- maximum number of seconds before stopping
command- configuration for external voice command programprogram- path to executablearguments- list of arguments to pass to program
hermes- configuration for MQTT-based voice command system that listens betweensstartListeningandstopListeningcommands (Hermes protocol)timeout_sec- maximum number of seconds before stopping
handlesystem- which intent handling system to use (hass,command, ordummy)forward_to_hass- true if intents are always forwarded to Home Assistant (even ifsystemiscommand)command- configuration for external speech-to-text programprogram- path to executablearguments- list of arguments to pass to program
mqtt- configuration for MQTT (Hermes protocol)enabled- true if MQTT client should be startedhost- MQTT hostport- MQTT portusername- MQTT username (blank for anonymous)password- MQTT passwordreconnect_sec- number of seconds before client will reconnectsite_id- ID of site (Hermes protocol)publish_intents- true if intents are published to MQTT
tuning- configuration for acoustic model tuningsystem- system for tuning (currently onlysphinxtrain)sphinxtrain- configuration for sphinxtrain based acoustic model tuningmllr_matrix- name of generated MLLR matrix (should matchspeech_to_text.pocketsphinx.mllr_matrix)
download- configuration for profile file downloadingcache_dir- directory in your profile where downloaded files are cachedconditions- profile settings that will trigger file downloads- keys are profile setting paths (e.g.,
wake.system) - values are dictionaries whose keys are profile settings values (e.g.,
snowboy)- settings may have the form
<=Nor!Xto mean "less than or equal to N" or "not X" - leaf nodes are dictionaries whose keys are destination file paths and whose values reference the
filesdictionary
- settings may have the form
- keys are profile setting paths (e.g.,
files- locations, etc. of files to download- keys are names of files
- values are dictionaries with:
url- URL of file to downloadcache-falseif file should be downloaded directly into profile (skipping cache)