72 Commits

Author SHA1 Message Date
Michael Hansen 3f0545ed0f Bump version 2020-03-04 11:52:18 -05:00
Michael Hansen 331138f300 Updated CHANGELOG 2020-03-04 11:52:03 -05:00
Michael Hansen 33b847b828 Merge branch 'master' of https://github.com/synesthesiam/rhasspy 2020-03-04 11:48:53 -05:00
Michael Hansen d770679373 Fix first entity bug in Rasa NLU training 2020-03-04 11:48:02 -05:00
Michael Hansen 86e695a7a4 Merge pull request #184 from daniele-athome/rasa-min-confidence
Rasa: min confidence parameter
2020-03-03 20:55:00 -05:00
Daniele Ricci 07d1cc4e43 Rasa: min confidence parameter
Signed-off-by: Daniele Ricci <daniele@casaricci.it>
2020-02-25 20:03:23 +01:00
Michael Hansen b68e5caf01 Merge pull request #182 from Tooa/follow_up_178
Prevent porcupine incompatibilities
2020-02-24 16:59:28 -05:00
Uli da4d994e75 Use tagged porcupine models
* This prevents incompatibilities with the Python wrapper
  in future.
2020-02-22 11:45:05 +01:00
Michael Hansen 627e6e8b3d Merge pull request #179 from daniele-athome/google-stt
Support for Google Cloud STT
2020-02-20 22:13:33 -05:00
Daniele Ricci f0ec0486f7 Google Cloud TTS documentation
Signed-off-by: Daniele Ricci <daniele@casaricci.it>
2020-02-18 20:04:03 +01:00
Daniele Ricci 1febc3d1d8 Support for Google Cloud STT
Signed-off-by: Daniele Ricci <daniele@casaricci.it>
2020-02-18 19:48:26 +01:00
Michael Hansen 2879802f2f Merge pull request #180 from Tooa/fix_178
Fixes Porcupine wrapper incompatibilities with models
2020-02-18 09:27:23 -05:00
Uli 21a2a8f9b4 Bump Porcupine Python wrapper 2020-02-16 18:51:01 +01:00
Michael Hansen a96f80237e Fix remote intent handler 2020-02-15 17:04:33 -05:00
Michael Hansen fc68d04f29 Fix siteId is null 2020-02-09 13:58:38 -05:00
Michael Hansen e00c1448cb Fix CHANGELOG date 2020-02-07 20:39:03 -05:00
Michael Hansen f04ad3bfeb Add more tutorials to docs 2020-02-07 17:14:01 -05:00
Michael Hansen eb11f90cab Add espeak arguments for text to speech 2020-02-07 17:00:33 -05:00
Michael Hansen 2c612ee669 Pocketsphinx wake keyphrase words added to dictionary 2020-02-07 16:39:06 -05:00
Michael Hansen dfe92f9d0e Fix STT casing outside of HTTP calls 2020-02-07 16:25:40 -05:00
Michael Hansen c59e7b42ab Update docs 2020-02-07 15:55:51 -05:00
Michael Hansen 948705a87b Add wake/text websocket endpoints 2020-02-07 15:45:37 -05:00
Michael Hansen 6b0b5c1799 Merge branch 'master' of https://github.com/synesthesiam/rhasspy 2020-02-06 16:50:51 -05:00
Michael Hansen 9553691e88 Working on wake websocket 2020-02-06 16:49:33 -05:00
Michael Hansen 997456631e Update /api/listen-for-wake to enable/disable wake word 2020-02-05 22:00:06 -05:00
Michael Hansen 104165198b Bump to rhasspy-nlu 0.1.5 2020-01-28 21:38:05 -05:00
Michael Hansen f405b827f4 Add Hass.IO change to CHANGELOG 2020-01-22 21:20:35 -05:00
Michael Hansen 089568cf9f Fix version in CHANGELOG 2020-01-22 21:14:49 -05:00
Michael Hansen a92d88ff8f Rename Add File button in web UI on sentences page 2020-01-22 17:03:21 -05:00
Michael Hansen c60030b48f Add download feedback to web UI 2020-01-22 17:02:26 -05:00
Michael Hansen b400d651f6 Add RHASSPY_LOG_LEVEL environment variable 2020-01-22 16:40:57 -05:00
Michael Hansen 2dfa9aa782 Fix _raw_text in Hass event being same as _text 2020-01-22 16:36:07 -05:00
Michael Hansen bd2c065415 Force slot programs to run each training cycle 2020-01-22 16:24:58 -05:00
Michael Hansen 1b95144b05 Add CHANGELOG and bump version 2020-01-21 15:56:09 -05:00
Michael Hansen 3f60936471 Add web button to play last recorded voice command 2020-01-21 15:55:01 -05:00
Michael Hansen 9a3c2f8a3f Move kaldi/custom_words.txt to kaldi_custom_words.txt 2020-01-21 15:39:22 -05:00
Michael Hansen 1fb75f24d7 Delete partial downloads of profile files 2020-01-21 15:39:08 -05:00
Michael Hansen 6c0187e606 Hide web notifications after 10 seconds 2020-01-21 15:38:52 -05:00
Michael Hansen 16262ec896 Keep slot substitution casing during training/recognition 2020-01-21 14:57:00 -05:00
Michael Hansen 9d1303ed21 Merge pull request #164 from alexkn/fix-device-preselection
fix microphone/sound device preselection
2020-01-20 08:38:57 -05:00
Michael Hansen a12e537110 Merge pull request #165 from alexkn/docs-picotts
update pico-tts languages
2020-01-20 08:37:39 -05:00
Alexander Knöbel 63fb3cf046 update pico-tts languages 2020-01-19 13:06:20 +01:00
Alexander Knöbel f5e6666931 fix microphone/sound device preselection 2020-01-19 01:04:02 +01:00
Michael Hansen 44a9c84bc7 Merge pull request #158 from drhirn/master
Added exclamation mark to shebang
2020-01-14 22:23:56 -05:00
Michael Hansen 9a076936c5 Merge pull request #160 from alexkn/docs-yarn-install
Add yarn install before build
2020-01-14 22:23:01 -05:00
Alexander Knöbel 102b29ecf6 Add yarn install before build 2020-01-14 19:02:08 +01:00
drhirn 51455bfd97 Added exclamation mark to shebang
The shebang in the code for the slot_program was missing an exclamation mark
2020-01-14 15:38:36 +01:00
Michael Hansen 5bf6086164 Merge pull request #156 from mzoeller/patch-3
Add missing space
2020-01-12 18:59:51 -05:00
mzoeller 08ebaf0914 Add missing space 2020-01-12 23:50:40 +01:00
Michael Hansen 4b3f26c12f Merge pull request #153 from daniele-athome/patch-1
Give a hint to lame about mp3 files
2020-01-12 13:29:34 -05:00
Michael Hansen 707c31e4d3 Merge pull request #155 from mzoeller/patch-2
Update intent-handling.md
2020-01-12 13:29:01 -05:00
Michael Hansen 509d47ea0f Merge pull request #154 from mzoeller/patch-1
Example command handler in python
2020-01-12 13:28:42 -05:00
mzoeller 1d2b08df6e Update intent-handling.md 2020-01-12 18:05:20 +01:00
mzoeller 0608443482 Example command handler in python 2020-01-12 17:56:00 +01:00
Daniele Ricci 9a1a41385c Give a hint to lame about mp3 files
Apparently, when given audio through stdin, lame can't detect the file type correctly some times. A quick fix is to add --mp3input to the command line (we have checked for file extension anyway, so...)
2020-01-12 17:24:43 +01:00
Michael Hansen 2d8095f0e1 Merge /media/hansenm/BAC6B44DC6B40C1F/rhasspy 2020-01-08 16:11:36 -05:00
Michael Hansen 8f3c1c5d61 Fix dictionary issue with multiple pronunciations 2020-01-07 21:08:17 -05:00
Michael Hansen deb742d768 Return WAV mimetype 2020-01-07 16:37:47 -05:00
Michael Hansen fa24588ea4 Removed flair intent recognition. Fixes for adapt/rasa. 2020-01-06 11:23:32 -05:00
Michael Hansen ed581ecf9d Fixing fuzzywuzzy and others with converters 2020-01-05 20:43:15 -05:00
Michael Hansen f8aedd4ef5 Update Docker update docs 2020-01-05 16:55:37 -05:00
Michael Hansen 14c1386496 Possible fix for threading issues 2020-01-05 16:46:14 -05:00
Michael Hansen 153b642057 Add Rpi Zero to docs 2020-01-05 15:14:45 -05:00
Michael Hansen dec32102dd Merge pull request #146 from esdeboer/freebsd
sed -i is not POSIX compliant, instead make a temp copy and rename to…
2020-01-05 14:50:38 -05:00
Michael Hansen f365c69265 Merge pull request #142 from maxbachmann/cleanup
code cleanup
2020-01-05 14:50:02 -05:00
Michael Hansen 1724c328b7 Trying to fix Docker image 2020-01-05 11:14:56 -05:00
Eric de Boer 6db4a8d341 sed -i is not POSIX compliant, instead make a temp copy and rename to original. 2020-01-05 16:53:38 +01:00
Michael Hansen b70e8a8569 Copying profiles in Docker 2020-01-04 22:41:04 -05:00
Michael Hansen 2e4828da06 Fix dockerignore 2020-01-04 22:34:15 -05:00
Michael Hansen 96cfe69753 Re-generated Dockerfile 2020-01-04 22:02:31 -05:00
maxbachmann 3e8e246c1c swap vars without temp var 2020-01-05 01:02:35 +01:00
Michael Hansen 80a5008b93 Copy built-in slots to Docker 2020-01-04 16:55:38 -05:00
65 changed files with 1308 additions and 1148 deletions
+173 -22
View File
@@ -1,26 +1,177 @@
.git/
.venv/
node_modules/
__pycache__/
test/
tools/
etc/test/
download/precise-engine/
download/kaldi/
opt/
*
!etc/qemu-*
etc/homeassistant/config/.storage
examples/typical/home-assistant/config/.storage
examples/typical-intent/home-assistant/config/.storage
examples/client-server/home-assistant/config/.storage
examples/mqtt-hermes/home-assistant/config/.storage
!download/rhasspy-tools*
!download/pocketsphinx-python.tar.gz
!download/snowboy*
!download/kaldi*
profiles/*/base_dictionary.txt
profiles/*/base_language_model.txt
profiles/*/acoustic_model/
profiles/*/g2p.fst
!requirements.txt
!dist/
!etc/wav
profiles/en-kaldi/
profiles/en-zamia/
!docker/run.sh
!docker/rhasspy
profiles/*/download/
!profiles/defaults.json
!profiles/zh/profile.json
!profiles/zh/custom_words.txt
!profiles/zh/espeak_phonemes.txt
!profiles/zh/phoneme_examples.txt
!profiles/zh/frequent_words.txt
!profiles/zh/sentences.ini
!profiles/zh/stop_words.txt
!profiles/zh/slots
!profiles/zh/slot_programs
!profiles/hi/profile.json
!profiles/hi/custom_words.txt
!profiles/hi/espeak_phonemes.txt
!profiles/hi/phoneme_examples.txt
!profiles/hi/frequent_words.txt
!profiles/hi/sentences.ini
!profiles/hi/stop_words.txt
!profiles/hi/slots
!profiles/hi/slot_programs
!profiles/el/profile.json
!profiles/el/custom_words.txt
!profiles/el/espeak_phonemes.txt
!profiles/el/phoneme_examples.txt
!profiles/el/frequent_words.txt
!profiles/el/sentences.ini
!profiles/el/stop_words.txt
!profiles/el/slots
!profiles/el/slot_programs
!profiles/es/profile.json
!profiles/es/custom_words.txt
!profiles/es/espeak_phonemes.txt
!profiles/es/phoneme_examples.txt
!profiles/es/frequent_words.txt
!profiles/es/sentences.ini
!profiles/es/stop_words.txt
!profiles/es/slots
!profiles/es/slot_programs
!profiles/it/profile.json
!profiles/it/custom_words.txt
!profiles/it/espeak_phonemes.txt
!profiles/it/phoneme_examples.txt
!profiles/it/frequent_words.txt
!profiles/it/sentences.ini
!profiles/it/stop_words.txt
!profiles/it/slots
!profiles/it/slot_programs
!profiles/ru/profile.json
!profiles/ru/custom_words.txt
!profiles/ru/espeak_phonemes.txt
!profiles/ru/phoneme_examples.txt
!profiles/ru/frequent_words.txt
!profiles/ru/sentences.ini
!profiles/ru/stop_words.txt
!profiles/ru/slots
!profiles/ru/slot_programs
!profiles/pt/profile.json
!profiles/pt/custom_words.txt
!profiles/pt/espeak_phonemes.txt
!profiles/pt/phoneme_examples.txt
!profiles/pt/frequent_words.txt
!profiles/pt/sentences.ini
!profiles/pt/stop_words.txt
!profiles/pt/slots
!profiles/pt/slot_programs
!profiles/sv/profile.json
!profiles/sv/custom_words.txt
!profiles/sv/espeak_phonemes.txt
!profiles/sv/phoneme_examples.txt
!profiles/sv/frequent_words.txt
!profiles/sv/sentences.ini
!profiles/sv/stop_words.txt
!profiles/sv/slots
!profiles/sv/slot_programs
!profiles/vi/profile.json
!profiles/vi/custom_words.txt
!profiles/vi/espeak_phonemes.txt
!profiles/vi/phoneme_examples.txt
!profiles/vi/frequent_words.txt
!profiles/vi/sentences.ini
!profiles/vi/stop_words.txt
!profiles/vi/slots
!profiles/vi/slot_programs
!profiles/ca/profile.json
!profiles/ca/custom_words.txt
!profiles/ca/espeak_phonemes.txt
!profiles/ca/phoneme_examples.txt
!profiles/ca/frequent_words.txt
!profiles/ca/sentences.ini
!profiles/ca/stop_words.txt
!profiles/ca/slots
!profiles/ca/slot_programs
!profiles/nl/profile.json
!profiles/nl/custom_words.txt
!profiles/nl/espeak_phonemes.txt
!profiles/nl/phoneme_examples.txt
!profiles/nl/frequent_words.txt
!profiles/nl/sentences.ini
!profiles/nl/stop_words.txt
!profiles/nl/slots
!profiles/nl/slot_programs
!profiles/nl/kaldi/custom_words.txt
!profiles/nl/kaldi/espeak_phonemes.txt
!profiles/nl/kaldi/phoneme_examples.txt
!profiles/de/profile.json
!profiles/de/custom_words.txt
!profiles/de/espeak_phonemes.txt
!profiles/de/phoneme_examples.txt
!profiles/de/frequent_words.txt
!profiles/de/sentences.ini
!profiles/de/stop_words.txt
!profiles/de/slots
!profiles/de/slot_programs
!profiles/de/kaldi/custom_words.txt
!profiles/de/kaldi/espeak_phonemes.txt
!profiles/de/kaldi/phoneme_examples.txt
!profiles/fr/profile.json
!profiles/fr/custom_words.txt
!profiles/fr/espeak_phonemes.txt
!profiles/fr/phoneme_examples.txt
!profiles/fr/frequent_words.txt
!profiles/fr/sentences.ini
!profiles/fr/stop_words.txt
!profiles/fr/slots
!profiles/fr/slot_programs
!profiles/fr/kaldi/custom_words.txt
!profiles/fr/kaldi/espeak_phonemes.txt
!profiles/fr/kaldi/phoneme_examples.txt
!profiles/en/profile.json
!profiles/en/custom_words.txt
!profiles/en/espeak_phonemes.txt
!profiles/en/phoneme_examples.txt
!profiles/en/frequent_words.txt
!profiles/en/sentences.ini
!profiles/en/stop_words.txt
!profiles/en/slots
!profiles/en/slot_programs
!profiles/en/kaldi/custom_words.txt
!profiles/en/kaldi/espeak_phonemes.txt
!profiles/en/kaldi/phoneme_examples.txt
!rhasspy/profile_schema.json
!rhasspy/*.py
!rhasspy/train/*.py
!rhasspy/train/jsgf2fst/*.py
!*.py
!VERSION
!pip
+90
View File
@@ -0,0 +1,90 @@
## [2.4.19] - 2020 Mar 04
### Added
- Support for Google Cloud speech to text
- Rasa NLU minimum confidence parameter
### Changed
- Using tagged version of porcupine wake models to avoid incompatibilities
- Fix Rasa NLU first entity only bug
- Fix siteId null bug
## [2.4.18] - 2020 Feb 07
### Added
- /api/listen-for-wake accepts "on" and "off" as POST data to enable/disable wake word
- /api/events/wake websocket endpoint reports wake up events
- /api/events/text websocket endpoint reports transcription events
- Rhasspy logo changes in web UI when wake word is detected
- espeak arguments list for text to speech
### Changed
- STT output casing is fixed outside of HTTP API calls
- All voice commands show up in web UI test page
- Play last voice command button in web UI works for any command
- Fixed commas in numbers with thousand separators
- Words from Pocketsphinx wake keyphrase are added to dictionary
- Pocketsphinx wake word keyphrase casing is fixed
## [2.4.17] - 2020 Jan 21
### Added
- Button to web UI to play last recorded voice command
- RHASSPY_LOG_LEVEL environment variable
- Web UI feedback during download
- Add "asoundrc" config option to Hass.IO add-on
### Changed
- Moved $profile/kaldi/custom_words.txt to $profile/kaldi_custom_words.txt
- Slot substitution casing is kept during training/recognition
- Fixed fuzzywuzzy and other intent recognizer training after addition of converters
- Fix thread max count issue
- Hide web UI alerts after 10 seconds
- Delete partially downloaded profile files
- Force slot programs to run each training cycle
- Fix _raw_text in Hass event being same as _text
### Removed
- Flair intent recognizer
## [2.4.16] - 2020 Jan 5
### Added
- Number ranges (0..100)
- Converters for transforming JSON values in intents (!int)
- Slot programs for generating slot values
- $rhasspy/days and $rhasspy/months built-in slots
## [2.4.15] - 2019 Dec 27
### Added
- Preliminary support for Raspberry Pi Zero (no Kaldi)
- Play error sound when intent not recognized
- _text and _raw_text to Home Assistant events
### Changed
- Disable wake word when TTS is speaking
- Use json5 library to parse profile
- Remove picotts pop sound
- Don't open/close microphone after wake-up
## [2.4.14] - 2019 Dec 19
### Added
- Ability to split sentences across multiple .ini file in intents directory
- Support (future) /api/intent for Home Assistant
- Support for Home Assistant TTS system
- Emulate MaryTTS /process API in web API
- Include wakeId/siteId in JSON intent (MQTT/Websocket)
- ?voice and ?language query parameters to /api/text-to-speech
+3 -1
View File
@@ -5,7 +5,9 @@ SHELL := bash
# Docker
# -----------------------------------------------------------------------------
docker: web-dist docker-amd64 docker-armhf docker-aarch64 docker-push manifest
docker: web-dist docker-amd64 docker-armhf docker-aarch64
docker-deploy: docker-push manifest
docker-amd64:
docker build . -f docker/templates/dockerfiles/Dockerfile.prebuilt.alsa.all \
+2 -2
View File
@@ -1,6 +1,6 @@
![Rhasspy logo](docs/img/rhasspy.svg)
Rhasspy (pronounced RAH-SPEE) is an offline, [multilingual](#supported-languages) voice assistant toolkit inspired by [Jasper](https://jasperproject.github.io/) that works well with [Home Assistant](https://www.home-assistant.io/), [Hass.io](https://www.home-assistant.io/hassio/), and [Node-RED](https://nodered.org).
Rhasspy (pronounced RAH-SPEE) is an offline voice assistant toolkit inspired by [Jasper](https://jasperproject.github.io/) [supports many languages](#supported-languages). It works well with [Home Assistant](https://www.home-assistant.io/), [Hass.io](https://www.home-assistant.io/hassio/), and [Node-RED](https://nodered.org).
* [Documentation](https://rhasspy.readthedocs.io/)
* [Discussion](https://community.rhasspy.org)
@@ -58,7 +58,7 @@ The table below summarizes language support across the various supporting techno
| | [rasaNLU](https://rhasspy.readthedocs.io/en/latest/intent-recognition/#rasanlu) | *needs extra software* | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| **Text to Speech** | [espeak](https://rhasspy.readthedocs.io/en/latest/text-to-speech/#espeak) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
| | [flite](https://rhasspy.readthedocs.io/en/latest/text-to-speech/#flite) | &#x2713; | &#x2713; | | | | | | | | &#x2713; | | | | | |
| | [picotts](https://rhasspy.readthedocs.io/en/latest/text-to-speech/#picotts) | &#x2713; | &#x2713; | | | | | | | | | | | | | |
| | [picotts](https://rhasspy.readthedocs.io/en/latest/text-to-speech/#picotts) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | | | | | | | | |
| | [marytts](https://rhasspy.readthedocs.io/en/latest/text-to-speech/#marytts) | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | | &#x2713; | | | | | | | |
| | [wavenet](https://rhasspy.readthedocs.io/en/latest/text-to-speech/#google-wavenet) | | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | | &#x2713; | &#x2713; | |
+1 -1
View File
@@ -1 +1 @@
2.4.16
2.4.19
+190 -60
View File
@@ -7,9 +7,11 @@ import json
import logging
import os
import re
import shutil
import time
from functools import wraps
from pathlib import Path
from typing import Any, Dict, List, Tuple, Union
from typing import Any, Dict, List, Optional, Set, Tuple, Union
from uuid import uuid4
import attr
@@ -29,7 +31,13 @@ from swagger_ui import quart_api_doc
from rhasspy.actor import ActorSystem, ConfigureEvent, RhasspyActor
from rhasspy.core import RhasspyCore
from rhasspy.events import IntentRecognized, ProfileTrainingFailed
from rhasspy.events import (
IntentRecognized,
ProfileTrainingFailed,
VoiceCommand,
WakeWordDetected,
WavTranscription,
)
from rhasspy.utils import (
FunctionLoggingHandler,
buffer_to_wav,
@@ -53,6 +61,10 @@ app = Quart("rhasspy")
app.secret_key = str(uuid4())
app = cors(app)
# WAV data from last voice command
last_voice_wav: Optional[bytes] = None
# -----------------------------------------------------------------------------
# Parse Arguments
# -----------------------------------------------------------------------------
@@ -91,8 +103,12 @@ parser.add_argument("--log-level", default="DEBUG", help="Set logging level")
args = parser.parse_args()
# Set log level
log_level = getattr(logging, args.log_level.upper())
logging.basicConfig(level=log_level)
if "RHASSPY_LOG_LEVEL" in os.environ:
log_level = os.environ["RHASSPY_LOG_LEVEL"]
else:
log_level = args.log_level
logging.basicConfig(level=getattr(logging, log_level.upper()))
logger.debug(args)
@@ -206,6 +222,14 @@ async def api_download_profile() -> str:
return "OK"
@app.route("/api/download-status", methods=["GET"])
async def api_download_status() -> str:
"""Get status of profile download"""
assert core is not None
return "\n".join(core.download_status)
# -----------------------------------------------------------------------------
@@ -256,8 +280,11 @@ async def api_speakers() -> Response:
async def api_listen_for_wake() -> str:
"""Make Rhasspy listen for a wake word"""
assert core is not None
core.listen_for_wake()
return "OK"
enabled_str = (await request.data).decode().strip().lower()
enabled = enabled_str not in ["false", "off"]
core.listen_for_wake(enabled)
return str(enabled)
# -----------------------------------------------------------------------------
@@ -278,6 +305,10 @@ async def api_listen_for_command() -> Response:
entity = request.args.get("entity")
value = request.args.get("value")
# Emulate wake
wake_json = json.dumps({"wakewordId": "default", "siteId": core.siteId})
await add_ws_event("wake", wake_json)
return jsonify(
await core.listen_for_command(
handle=(not no_hass), timeout=timeout, entity=entity, value=value
@@ -369,7 +400,7 @@ async def api_pronounce() -> Union[Response, str]:
if download:
# Return WAV
return Response(wav_data) # , mimetype="audio/wav")
return Response(wav_data, mimetype="audio/wav")
# Play through speakers
core.play_wav_data(wav_data)
@@ -524,6 +555,26 @@ async def api_custom_words():
assert core is not None
speech_system = core.profile.get("speech_to_text.system", "pocketsphinx")
# Temporary fix for kaldi/custom_words -> kaldi_custom_words.txt
old_kaldi_words_path = Path(core.profile.read_path("kaldi/custom_words.txt"))
if old_kaldi_words_path.is_file():
new_kaldi_words_path = Path(
core.profile.write_path(
core.profile.get(
"speech_to_text.kaldi.custom_words", "custom_words.txt"
)
)
)
if (
new_kaldi_words_path != old_kaldi_words_path
and not new_kaldi_words_path.is_file()
):
logger.warning(
"Moving %s to %s", str(old_kaldi_words_path), str(new_kaldi_words_path)
)
shutil.move(old_kaldi_words_path, new_kaldi_words_path)
if request.method == "POST":
custom_words_path = Path(
core.profile.write_path(
@@ -618,6 +669,7 @@ async def api_restart() -> str:
@app.route("/api/speech-to-text", methods=["POST"])
async def api_speech_to_text() -> str:
"""Transcribe speech from WAV file."""
global last_voice_wav
no_header = request.args.get("noheader", "false").lower() == "true"
assert core is not None
@@ -627,10 +679,20 @@ async def api_speech_to_text() -> str:
# Wrap in WAV
wav_data = buffer_to_wav(wav_data)
last_voice_wav = wav_data
start_time = time.perf_counter()
result = await core.transcribe_wav(wav_data)
end_time = time.perf_counter()
# Send to websocket
await add_ws_event(
"transcription",
json.dumps(
{"text": result.text, "wakewordId": "default", "siteId": core.siteId}
),
)
if prefers_json():
return jsonify(
{
@@ -665,7 +727,7 @@ async def api_text_to_intent():
intent_json = json.dumps(intent)
logger.debug(intent_json)
await add_ws_event(WS_EVENT_INTENT, intent_json)
await add_ws_event("intent", intent_json)
if not no_hass:
# Send intent to Home Assistant
@@ -680,11 +742,13 @@ async def api_text_to_intent():
@app.route("/api/speech-to-intent", methods=["POST"])
async def api_speech_to_intent() -> Response:
"""Transcribe speech, recognize intent, and optionally handle."""
global last_voice_wav
assert core is not None
no_hass = request.args.get("nohass", "false").lower() == "true"
# Prefer 16-bit 16Khz mono, but will convert with sox if needed
wav_data = await request.data
last_voice_wav = wav_data
# speech -> text
start_time = time.time()
@@ -692,6 +756,12 @@ async def api_speech_to_intent() -> Response:
text = transcription.text
logger.debug(text)
# Send to websocket
await add_ws_event(
"transcription",
json.dumps({"text": text, "wakewordId": "default", "siteId": core.siteId}),
)
# text -> intent
intent = (await core.recognize_intent(text)).intent
intent["speech_confidence"] = transcription.confidence
@@ -701,7 +771,7 @@ async def api_speech_to_intent() -> Response:
intent_json = json.dumps(intent)
logger.debug(intent_json)
await add_ws_event(WS_EVENT_INTENT, intent_json)
await add_ws_event("intent", intent_json)
if not no_hass:
# Send intent to Home Assistant
@@ -726,6 +796,7 @@ async def api_start_recording() -> str:
@app.route("/api/stop-recording", methods=["POST"])
async def api_stop_recording() -> Response:
"""End recording voice command. Transcribe and handle."""
global last_voice_wav
assert core is not None
no_hass = request.args.get("nohass", "false").lower() == "true"
@@ -739,20 +810,43 @@ async def api_stop_recording() -> Response:
text = transcription.text
logger.debug(text)
# Send to websocket
await add_ws_event(
"transcription",
json.dumps({"text": text, "wakewordId": "default", "siteId": core.siteId}),
)
intent = (await core.recognize_intent(text)).intent
intent["speech_confidence"] = transcription.confidence
intent_json = json.dumps(intent)
logger.debug(intent_json)
await add_ws_event(WS_EVENT_INTENT, intent_json)
await add_ws_event("intent", intent_json)
if not no_hass:
# Send intent to Home Assistant
intent = (await core.handle_intent(intent)).intent
# Save last voice command WAV data
last_voice_wav = wav_data
return jsonify(intent)
@app.route("/api/play-recording", methods=["POST"])
async def api_play_recording() -> str:
"""Play last recorded voice command through the configured audio output system"""
global last_voice_wav
assert core is not None
if last_voice_wav:
# Play through speakers
logger.debug("Playing %s byte(s)", len(last_voice_wav))
core.play_wav_data(last_voice_wav)
return "OK"
# -----------------------------------------------------------------------------
@@ -806,7 +900,7 @@ async def api_text_to_speech() -> Union[bytes, str]:
if not play:
# Return WAV data instead of speaking
return result.wav_data
return Response(result.wav_data, mimetype="audio/wav")
return sentence
@@ -823,16 +917,6 @@ async def api_slots() -> Union[str, Response]:
overwrite_all = request.args.get("overwrite_all", "false").lower() == "true"
new_slot_values = json5.loads(await request.data)
word_casing = core.profile.get(
"speech_to_text.dictionary_casing", "ignore"
).lower()
word_transform = lambda s: s
if word_casing == "lower":
word_transform = str.lower
elif word_casing == "upper":
word_transform = str.upper
slots_dir = Path(
core.profile.write_path(
core.profile.get("speech_to_text.slots_dir", "slots")
@@ -859,11 +943,10 @@ async def api_slots() -> Union[str, Response]:
slots_path.parent.mkdir(parents=True, exist_ok=True)
# Merge with existing values
values = {word_transform(v.strip()) for v in values}
values = {v.strip() for v in values}
if slots_path.is_file():
values.update(
word_transform(line.strip())
for line in slots_path.read_text().splitlines()
line.strip() for line in slots_path.read_text().splitlines()
)
# Write merged values
@@ -989,7 +1072,7 @@ def api_intents():
@app.route("/process", methods=["GET"])
async def marytts_process():
async def marytts_process() -> Response:
"""Emulate MaryTTS /process API"""
global last_sentence
@@ -1001,7 +1084,7 @@ async def marytts_process():
sentence, play=False, voice=voice, language=locale
)
return spoken.wav_data
return Response(spoken.wav_data, mimetype="audio/wav")
# -----------------------------------------------------------------------------
@@ -1073,26 +1156,26 @@ async def swagger_yaml() -> Response:
# WebSocket API
# -----------------------------------------------------------------------------
WS_EVENT_INTENT = 0
WS_EVENT_LOG = 1
ws_queues: List[List[asyncio.Queue]] = [[], []]
ws_locks: List[asyncio.Lock] = [asyncio.Lock(), asyncio.Lock()]
user_queues: Set[asyncio.Queue] = set()
logging_queues: Set[asyncio.Queue] = set()
async def add_ws_event(event_type: int, text: str):
"""Send text out to all websockets for a specific event."""
async with ws_locks[event_type]:
for q in ws_queues[event_type]:
await q.put(text)
async def add_ws_event(message_type: str, text: str):
"""Send text out to all user websockets for a specific event."""
for q in user_queues:
await q.put((message_type, text))
async def log_ws_event(text: str):
"""Send logging message out to websockets."""
for q in logging_queues:
await q.put(text)
# Send logging messages out to websocket
logging.root.addHandler(
FunctionLoggingHandler(
lambda msg: asyncio.run_coroutine_threadsafe(
add_ws_event(WS_EVENT_LOG, msg), loop
)
lambda msg: asyncio.run_coroutine_threadsafe(log_ws_event(msg), loop)
)
)
@@ -1102,6 +1185,8 @@ class WebSocketObserver(RhasspyActor):
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
global last_voice_wav
if isinstance(message, IntentRecognized):
# Add slots
intent_slots = {}
@@ -1113,29 +1198,75 @@ class WebSocketObserver(RhasspyActor):
# Convert to JSON
intent_json = json.dumps(message.intent)
self._logger.debug(intent_json)
asyncio.run_coroutine_threadsafe(
add_ws_event(WS_EVENT_INTENT, intent_json), loop
asyncio.run_coroutine_threadsafe(add_ws_event("intent", intent_json), loop)
elif isinstance(message, WakeWordDetected):
assert core is not None
wake_json = json.dumps({"wakewordId": message.name, "siteId": core.siteId})
asyncio.run_coroutine_threadsafe(add_ws_event("wake", wake_json), loop)
elif isinstance(message, WavTranscription):
assert core is not None
transcription_json = json.dumps(
{
"text": message.text,
"wakewordId": message.wakewordId,
"siteId": core.siteId,
}
)
asyncio.run_coroutine_threadsafe(
add_ws_event("transcription_json", transcription_json), loop
)
elif isinstance(message, VoiceCommand):
# Save last voice command
last_voice_wav = buffer_to_wav(message.data)
def api_websocket(func):
"""Wraps a websocket route to use a user websocket queue"""
@wraps(func)
async def wrapper(*_args, **kwargs):
global user_queues
queue = asyncio.Queue()
user_queues.add(queue)
try:
return await func(queue, *_args, **kwargs)
except Exception:
logger.exception("api_websocket")
finally:
user_queues.discard(queue)
return wrapper
@app.websocket("/api/events/intent")
async def api_events_intent() -> None:
@api_websocket
async def api_events_intent(queue) -> None:
"""Websocket endpoint to receive intents as JSON."""
# Add new queue for websocket
q: asyncio.Queue = asyncio.Queue()
async with ws_locks[WS_EVENT_INTENT]:
ws_queues[WS_EVENT_INTENT].append(q)
try:
while True:
text = await q.get()
while True:
message_type, text = await queue.get()
if message_type == "intent":
await websocket.send(text)
except Exception:
logger.exception("api_events_intent")
# Remove queue
async with ws_locks[WS_EVENT_INTENT]:
ws_queues[WS_EVENT_INTENT].remove(q)
@app.websocket("/api/events/text")
@api_websocket
async def api_events_text(queue) -> None:
"""Websocket endpoint for transcriptions."""
while True:
message_type, text = await queue.get()
if message_type == "transcription":
await websocket.send(text)
@app.websocket("/api/events/wake")
@api_websocket
async def api_events_wake(queue) -> None:
"""Websocket endpoint to report wake up."""
while True:
message_type, text = await queue.get()
if message_type == "wake":
await websocket.send(text)
@app.websocket("/api/events/log")
@@ -1143,8 +1274,7 @@ async def api_events_log() -> None:
"""Websocket endpoint to receive logging messages as text."""
# Add new queue for websocket
q: asyncio.Queue = asyncio.Queue()
async with ws_locks[WS_EVENT_LOG]:
ws_queues[WS_EVENT_LOG].append(q)
logging_queues.add(q)
try:
while True:
@@ -1152,12 +1282,9 @@ async def api_events_log() -> None:
await websocket.send(text)
except concurrent.futures.CancelledError:
pass
except Exception:
logger.exception("api_events_log")
# Remove queue
async with ws_locks[WS_EVENT_LOG]:
ws_queues[WS_EVENT_LOG].remove(q)
logging_queues.discard(q)
# -----------------------------------------------------------------------------
@@ -1193,6 +1320,9 @@ loop.run_until_complete(start_rhasspy())
# -----------------------------------------------------------------------------
# Disable useless logging messages
logging.getLogger("wsproto").setLevel(logging.CRITICAL)
# Start web server
if args.ssl is not None:
logger.debug("Using SSL with certfile, keyfile = %s", args.ssl)
+10 -14
View File
@@ -18,22 +18,18 @@ def main():
profile = json.load(profile_file)
locale_name = profile["locale"] + ".UTF-8"
locale.setlocale(locale.LC_ALL, locale_name)
slots_dir = profile_dir / "slots" / "rhasspy"
slots_dir.mkdir(parents=True, exist_ok=True)
# Day names
with open(slots_dir / "days", "w") as days_file:
for day_num in range(7):
print(calendar.day_name[day_num], file=days_file)
# Month names
with open(slots_dir / "months", "w") as month_file:
for month_num in range(1, 13):
print(calendar.month_name[month_num], file=month_file)
print(locale_name)
slots_dir = profile_dir / "slots" / "rhasspy"
slots_dir.mkdir(parents=True, exist_ok=True)
# Day names
(slots_dir / "days").write_text('\n'.join(calendar.day_name))
# Month names
(slots_dir / "months").write_text('\n'.join(filter(None, calendar.month_name)))
# -----------------------------------------------------------------------------
if __name__ == "__main__":
+28
View File
@@ -0,0 +1,28 @@
#!/usr/bin/env python
import sys
import json
import random
import datetime
def speech(text):
global o
o["speech"] = {"text": text}
# get json from stdin and load into python dict
o = json.loads(sys.stdin.read())
intent = o["intent"]["name"]
if intent == "GetTime":
now = datetime.datetime.now()
speech("It's %s %d %s." % (now.strftime('%H'), now.minute, now.strftime('%p')))
elif intent == "Hello":
replies = ['Hi!', 'Hello!', 'Hey there!', 'Greetings.']
speech(random.choice(replies))
# convert dict to json and print to stdout
print(json.dumps(o))
+14 -5
View File
@@ -329,31 +329,40 @@ case "${CPU_ARCH}" in
esac
requirements_file="${temp_dir}/requirements.txt"
temp_requirements_file="${temp_dir}/temp_requirements.txt"
cp "${this_dir}/requirements.txt" "${requirements_file}"
# Exclude requirements
if [[ -n "${no_flair}" ]]; then
echo "Excluding flair from virtual environment"
sed -i '/^flair/d' "${requirements_file}"
sed '/^flair/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
if [[ -n "${no_precise}" ]]; then
echo "Excluding Mycroft Precise from virtual environment"
sed -i '/^precise-runner/d' "${requirements_file}"
sed '/^precise-runner/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
if [[ -n "${no_adapt}" ]]; then
echo "Excluding Mycroft Adapt from virtual environment"
sed -i '/^adapt-parser/d' "${requirements_file}"
sed '/^adapt-parser/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
if [[ -n "${no_google}" ]]; then
echo "Excluding Google Text to Speech from virtual environment"
sed -i '/^google-cloud-texttospeech/d' "${requirements_file}"
sed '/^google-cloud-texttospeech/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
fi
# Install everything except openfst first
sed -i '/^openfst/d' "${requirements_file}"
sed '/^openfst/d' "${requirements_file}" > "${temp_requirements_file}" &&
mv "${temp_requirements_file}" "${requirements_file}"
"${python}" -m pip install -r "${requirements_file}"
+1 -132
View File
@@ -1,132 +1 @@
COPY profiles/zh/profile.json \
profiles/zh/custom_words.txt \
profiles/zh/espeak_phonemes.txt \
profiles/zh/phoneme_examples.txt \
profiles/zh/frequent_words.txt \
profiles/zh/sentences.ini \
profiles/zh/stop_words.txt ${RHASSPY_APP}/profiles/zh/
COPY profiles/hi/ \
profiles/hi/profile.json \
profiles/hi/custom_words.txt \
profiles/hi/espeak_phonemes.txt \
profiles/hi/phoneme_examples.txt \
profiles/hi/frequent_words.txt \
profiles/hi/sentences.ini \
profiles/hi/stop_words.txt ${RHASSPY_APP}/profiles/hi/
COPY profiles/el/profile.json \
profiles/el/custom_words.txt \
profiles/el/espeak_phonemes.txt \
profiles/el/phoneme_examples.txt \
profiles/el/frequent_words.txt \
profiles/el/sentences.ini \
profiles/el/stop_words.txt ${RHASSPY_APP}/profiles/el/
COPY profiles/de/profile.json \
profiles/de/custom_words.txt \
profiles/de/espeak_phonemes.txt \
profiles/de/phoneme_examples.txt \
profiles/de/frequent_words.txt \
profiles/de/sentences.ini \
profiles/de/stop_words.txt ${RHASSPY_APP}/profiles/de/
COPY profiles/de/kaldi/custom_words.txt \
profiles/de/kaldi/espeak_phonemes.txt \
profiles/de/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/de/kaldi/
COPY profiles/it/profile.json \
profiles/it/custom_words.txt \
profiles/it/espeak_phonemes.txt \
profiles/it/phoneme_examples.txt \
profiles/it/frequent_words.txt \
profiles/it/sentences.ini \
profiles/it/stop_words.txt ${RHASSPY_APP}/profiles/it/
COPY profiles/es/profile.json \
profiles/es/custom_words.txt \
profiles/es/espeak_phonemes.txt \
profiles/es/phoneme_examples.txt \
profiles/es/frequent_words.txt \
profiles/es/sentences.ini \
profiles/es/stop_words.txt ${RHASSPY_APP}/profiles/es/
COPY profiles/fr/profile.json \
profiles/fr/custom_words.txt \
profiles/fr/espeak_phonemes.txt \
profiles/fr/phoneme_examples.txt \
profiles/fr/frequent_words.txt \
profiles/fr/sentences.ini \
profiles/fr/stop_words.txt ${RHASSPY_APP}/profiles/fr/
COPY profiles/fr/kaldi/custom_words.txt \
profiles/fr/kaldi/espeak_phonemes.txt \
profiles/fr/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/fr/kaldi/
COPY profiles/ru/profile.json \
profiles/ru/custom_words.txt \
profiles/ru/espeak_phonemes.txt \
profiles/ru/phoneme_examples.txt \
profiles/ru/frequent_words.txt \
profiles/ru/sentences.ini \
profiles/ru/stop_words.txt ${RHASSPY_APP}/profiles/ru/
COPY profiles/nl/profile.json \
profiles/nl/custom_words.txt \
profiles/nl/espeak_phonemes.txt \
profiles/nl/phoneme_examples.txt \
profiles/nl/frequent_words.txt \
profiles/nl/sentences.ini \
profiles/nl/stop_words.txt ${RHASSPY_APP}/profiles/nl/
COPY profiles/nl/kaldi/custom_words.txt \
profiles/nl/kaldi/espeak_phonemes.txt \
profiles/nl/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/nl/kaldi/
COPY profiles/vi/profile.json \
profiles/vi/custom_words.txt \
profiles/vi/espeak_phonemes.txt \
profiles/vi/phoneme_examples.txt \
profiles/vi/frequent_words.txt \
profiles/vi/sentences.ini \
profiles/vi/stop_words.txt ${RHASSPY_APP}/profiles/vi/
COPY profiles/pt/profile.json \
profiles/pt/custom_words.txt \
profiles/pt/espeak_phonemes.txt \
profiles/pt/phoneme_examples.txt \
profiles/pt/frequent_words.txt \
profiles/pt/sentences.ini \
profiles/pt/stop_words.txt ${RHASSPY_APP}/profiles/pt/
COPY profiles/sv/profile.json \
profiles/sv/custom_words.txt \
profiles/sv/espeak_phonemes.txt \
profiles/sv/phoneme_examples.txt \
profiles/sv/frequent_words.txt \
profiles/sv/sentences.ini \
profiles/sv/stop_words.txt ${RHASSPY_APP}/profiles/sv/
COPY profiles/ca/profile.json \
profiles/ca/custom_words.txt \
profiles/ca/espeak_phonemes.txt \
profiles/ca/phoneme_examples.txt \
profiles/ca/frequent_words.txt \
profiles/ca/sentences.ini \
profiles/ca/stop_words.txt ${RHASSPY_APP}/profiles/ca/
COPY profiles/en/profile.json \
profiles/en/custom_words.txt \
profiles/en/espeak_phonemes.txt \
profiles/en/phoneme_examples.txt \
profiles/en/frequent_words.txt \
profiles/en/sentences.ini \
profiles/en/stop_words.txt ${RHASSPY_APP}/profiles/en/
COPY profiles/en/kaldi/custom_words.txt \
profiles/en/kaldi/espeak_phonemes.txt \
profiles/en/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/en/kaldi/
COPY profiles/ ${RHASSPY_APP}/profiles/
@@ -72,138 +72,7 @@ RUN chmod +x /run.sh
COPY profiles/zh/profile.json \
profiles/zh/custom_words.txt \
profiles/zh/espeak_phonemes.txt \
profiles/zh/phoneme_examples.txt \
profiles/zh/frequent_words.txt \
profiles/zh/sentences.ini \
profiles/zh/stop_words.txt ${RHASSPY_APP}/profiles/zh/
COPY profiles/hi/ \
profiles/hi/profile.json \
profiles/hi/custom_words.txt \
profiles/hi/espeak_phonemes.txt \
profiles/hi/phoneme_examples.txt \
profiles/hi/frequent_words.txt \
profiles/hi/sentences.ini \
profiles/hi/stop_words.txt ${RHASSPY_APP}/profiles/hi/
COPY profiles/el/profile.json \
profiles/el/custom_words.txt \
profiles/el/espeak_phonemes.txt \
profiles/el/phoneme_examples.txt \
profiles/el/frequent_words.txt \
profiles/el/sentences.ini \
profiles/el/stop_words.txt ${RHASSPY_APP}/profiles/el/
COPY profiles/de/profile.json \
profiles/de/custom_words.txt \
profiles/de/espeak_phonemes.txt \
profiles/de/phoneme_examples.txt \
profiles/de/frequent_words.txt \
profiles/de/sentences.ini \
profiles/de/stop_words.txt ${RHASSPY_APP}/profiles/de/
COPY profiles/de/kaldi/custom_words.txt \
profiles/de/kaldi/espeak_phonemes.txt \
profiles/de/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/de/kaldi/
COPY profiles/it/profile.json \
profiles/it/custom_words.txt \
profiles/it/espeak_phonemes.txt \
profiles/it/phoneme_examples.txt \
profiles/it/frequent_words.txt \
profiles/it/sentences.ini \
profiles/it/stop_words.txt ${RHASSPY_APP}/profiles/it/
COPY profiles/es/profile.json \
profiles/es/custom_words.txt \
profiles/es/espeak_phonemes.txt \
profiles/es/phoneme_examples.txt \
profiles/es/frequent_words.txt \
profiles/es/sentences.ini \
profiles/es/stop_words.txt ${RHASSPY_APP}/profiles/es/
COPY profiles/fr/profile.json \
profiles/fr/custom_words.txt \
profiles/fr/espeak_phonemes.txt \
profiles/fr/phoneme_examples.txt \
profiles/fr/frequent_words.txt \
profiles/fr/sentences.ini \
profiles/fr/stop_words.txt ${RHASSPY_APP}/profiles/fr/
COPY profiles/fr/kaldi/custom_words.txt \
profiles/fr/kaldi/espeak_phonemes.txt \
profiles/fr/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/fr/kaldi/
COPY profiles/ru/profile.json \
profiles/ru/custom_words.txt \
profiles/ru/espeak_phonemes.txt \
profiles/ru/phoneme_examples.txt \
profiles/ru/frequent_words.txt \
profiles/ru/sentences.ini \
profiles/ru/stop_words.txt ${RHASSPY_APP}/profiles/ru/
COPY profiles/nl/profile.json \
profiles/nl/custom_words.txt \
profiles/nl/espeak_phonemes.txt \
profiles/nl/phoneme_examples.txt \
profiles/nl/frequent_words.txt \
profiles/nl/sentences.ini \
profiles/nl/stop_words.txt ${RHASSPY_APP}/profiles/nl/
COPY profiles/nl/kaldi/custom_words.txt \
profiles/nl/kaldi/espeak_phonemes.txt \
profiles/nl/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/nl/kaldi/
COPY profiles/vi/profile.json \
profiles/vi/custom_words.txt \
profiles/vi/espeak_phonemes.txt \
profiles/vi/phoneme_examples.txt \
profiles/vi/frequent_words.txt \
profiles/vi/sentences.ini \
profiles/vi/stop_words.txt ${RHASSPY_APP}/profiles/vi/
COPY profiles/pt/profile.json \
profiles/pt/custom_words.txt \
profiles/pt/espeak_phonemes.txt \
profiles/pt/phoneme_examples.txt \
profiles/pt/frequent_words.txt \
profiles/pt/sentences.ini \
profiles/pt/stop_words.txt ${RHASSPY_APP}/profiles/pt/
COPY profiles/sv/profile.json \
profiles/sv/custom_words.txt \
profiles/sv/espeak_phonemes.txt \
profiles/sv/phoneme_examples.txt \
profiles/sv/frequent_words.txt \
profiles/sv/sentences.ini \
profiles/sv/stop_words.txt ${RHASSPY_APP}/profiles/sv/
COPY profiles/ca/profile.json \
profiles/ca/custom_words.txt \
profiles/ca/espeak_phonemes.txt \
profiles/ca/phoneme_examples.txt \
profiles/ca/frequent_words.txt \
profiles/ca/sentences.ini \
profiles/ca/stop_words.txt ${RHASSPY_APP}/profiles/ca/
COPY profiles/en/profile.json \
profiles/en/custom_words.txt \
profiles/en/espeak_phonemes.txt \
profiles/en/phoneme_examples.txt \
profiles/en/frequent_words.txt \
profiles/en/sentences.ini \
profiles/en/stop_words.txt ${RHASSPY_APP}/profiles/en/
COPY profiles/en/kaldi/custom_words.txt \
profiles/en/kaldi/espeak_phonemes.txt \
profiles/en/kaldi/phoneme_examples.txt \
${RHASSPY_APP}/profiles/en/kaldi/
COPY profiles/ ${RHASSPY_APP}/profiles/
COPY profiles/defaults.json ${RHASSPY_APP}/profiles/
COPY docker/rhasspy ${RHASSPY_APP}/bin/
+4 -1
View File
@@ -4,6 +4,9 @@ Rhasspy is designed to be run on different kinds of hardware, such as:
* Raspberry Pi 2-3 B/B+ (`armhf`/`aarch64`)
* Desktop/laptop/server (`amd64`)
* Raspberry Pi Zero (`armv6l`)
* You must use a [virtual environment](installation.md#virtual-environment)
* The [Kaldi speech recognizer](speech-to-text.md#kaldi) is **not** supported
The table below summarizes architecture compatibility with Rhasspy's components:
@@ -30,7 +33,7 @@ The table below summarizes architecture compatibility with Rhasspy's components:
To run Rhasspy on a Raspberry Pi, you'll need at least a 4 GB SD card and a good power supply. I highly recommend the [CanaKit Starter Kit](https://www.amazon.com/CanaKit-Raspberry-Starter-Premium-Black/dp/B07BCC8PK7), which includes a 32 GB SD card, a 2.5 A power supply, and a case.
Some components of Rhasspy will not work on the Raspberry Pi 3 B+ model (`aarch64`). As of the time of this writing, these are:
Some components of Rhasspy will not work on the Raspberry Pi 3 B+ model with a 64-bit operating system (`aarch64`). As of the time of this writing, these are:
* [snowboy](wake-word.md#snowboy) (wake word)
* [Mycroft Precise](wake-word.md#mycroft-precise) (wake word)
+16 -3
View File
@@ -54,7 +54,13 @@ To update your Rhasspy Docker image, just run:
```bash
docker pull synesthesiam/rhasspy-server:latest
```
on your Rhasspy server and restart the Docker container.
on your Rhasspy server and restart the Docker container. This may require running something like:
```bash
docker rm <container-name>
```
before doing a `docker run...`
## Hass.io
@@ -108,10 +114,17 @@ To update your Rhasspy virtual environment to the latest version, run:
git pull origin master
```
in your `rhasspy` directory. You should also re-build the web interface:
in your `rhasspy` directory, and then update your Python dependencies:
```bash
source .venv/bin/activate
pip3 install -r requirements.txt
```
You should also re-build the web interface:
1. Install [yarn](https://yarnpkg.com) on your system
2. Run `yarn build` in the `rhasspy` directory
2. Run `yarn install && yarn build` in the `rhasspy` directory
3. Restart any running instances of Rhasspy
### Running as a Service
+1 -1
View File
@@ -207,7 +207,7 @@ The following environment variables are available to your program:
* `$RHASSPY_PROFILE` - name of the current profile (e.g., "en")
* `$RHASSPY_PROFILE_DIR` - directory of the current profile (where `profile.json` is)
See [handle.sh](https://github.com/synesthesiam/rhasspy/blob/master/bin/mock-commands/handle.sh) for an example program.
See [handle.sh](https://github.com/synesthesiam/rhasspy/blob/master/bin/mock-commands/handle.sh) or [handle.py](https://github.com/synesthesiam/rhasspy/blob/master/bin/mock-commands/handle.py) for example programs.
### Speech
+1 -1
View File
@@ -10,8 +10,8 @@ The following table summarizes the trade-offs of using each intent recognizer:
| [fsticuffs](intent-recognition.md#fsticuffs) | 1M+ | very fast | very fast | ignores unknown words |
| [fuzzywuzzy](intent-recognition.md#fuzzywuzzy) | 12-100 | fast | fast | fuzzy string matching |
| [adapt](intent-recognition.md#mycroft-adapt) | 100-1K | moderate | fast | ignores unknown words |
| [flair](intent-recognition.md#flair) | 1K-100K | very slow | moderate | handles unseen words |
| [rasaNLU](intent-recognition.md#rasanlu) | 1K-100K | very slow | moderate | handles unseen words |
| [flair](intent-recognition.md#flair) | 1K-100K | very slow | moderate | handles unseen words |
## Fsticuffs
+2 -1
View File
@@ -53,7 +53,8 @@ Application authors may want to use the [rhasspy-client](https://pypi.org/projec
* `?timeout=<seconds>` - override default command timeout
* `?entity=<entity>&value=<value>` - set custom entity/value in recognized intent
* `/api/listen-for-wake-word`
* POST to wake Rhasspy up and return immediately
* POST "on" to have Rhasspy listen for a wake word
* POST "off" to disable wake word
* `/api/lookup`
* POST word as plain text to look up or guess pronunciation
* `?n=<number>` - return at most `n` guessed pronunciations
+24
View File
@@ -8,6 +8,7 @@ The following table summarizes language support for the various speech to text s
| ------ | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- | ------- |
| [pocketsphinx](speech-to-text.md#pocketsphinx) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | | &#x2713; | &#x2713; |
| [kaldi](speech-to-text.md#kaldi) | &#x2713; | &#x2713; | | &#x2713; | | &#x2713; | | | | | &#x2713; | | |
| [google](speech-to-text.md#google-cloud) | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; | &#x2713; |
## Pocketsphinx
@@ -77,6 +78,29 @@ Rhasspy expects a Kaldi-compatible profile to contain a `model` directory with a
If you just want to use Rhasspy for general speech to text, you can set `speech_to_text.kaldi.open_transcription` to `true` in your profile. This will use the included general language model (much slower) and ignore any custom voice commands you've specified.
## Google Cloud
Does speech recognition using [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text) service.
You will need an active Google Cloud subscription and a JSON private key connected to a service account enabled to use
the speech-to-text API. The locale configured in your profile will be used for speech recognition.
```json
{
"locale": "en_US",
"speech_to_text": {
"system": "google",
"google": {
"credentials": "api-project-xxxxxxxx-abcdef.json",
"min_confidence": 0.7
}
}
}
```
Please note that this module sends the recorded audio after it's completed, so no streaming support.
See `rhasspy.stt.GoogleCloudDecoder` for details.
## Remote HTTP Server
Uses a remote HTTP server to transform speech (WAV) to text.
+16 -1
View File
@@ -29,6 +29,19 @@ Add to your [profile](profiles.md):
Remove the `voice` option to have `espeak` use your profile's language automatically.
You may also pass additional arguments to the `espeak` command. For example,
```json
"text_to_speech": {
"system": "espeak",
"espeak": {
"arguments": ["-s", "80"]
}
}
```
will speak the sentence more slowly.
See `rhasspy.tts.EspeakSentenceSpeaker` for more details.
## Flite
@@ -52,7 +65,9 @@ See `rhasspy.tts.FliteSentenceSpeaker` for details.
## PicoTTS
Uses SVOX's [picotts](https://en.wikipedia.org/wiki/SVOX) for text to speech. Sounds a bit better (to me) than `flite` or `espeak`, but only has a single English voice.
Uses SVOX's [picotts](https://en.wikipedia.org/wiki/SVOX) for text to speech. Sounds a bit better (to me) than `flite` or `espeak`.
Included languages are `en-US`, `en-GB`, `de-DE`, `es-ES`, `fr-FR` and `it-IT`.
Add to your [profile](profiles.md):
+8 -1
View File
@@ -247,7 +247,7 @@ Add a file in `slot_programs` with the name of your slot, e.g. `colors`. Write a
```bash
cat <<EOF > "${slot_programs}/colors"
#/usr/bin/env bash
#!/usr/bin/env bash
echo 'red'
echo 'green'
echo 'blue'
@@ -262,6 +262,13 @@ You can pass **arguments** to your program using the syntax `$name,arg1,arg2,...
Like regular slots lists, slot programs can also be put in sub-directories under `slot_programs`. A program in `slot_programs/foo/bar` should be referenced in `sentences.ini` as `$foo/bar`.
#### Built-in Slots
Rhasspy includes a few built-in slots for each language:
* `$rhasspy/days` - day names of the week
* `$rhasspy/months` - month names of the year
### Converters
By default, all named entity values in a recognized intent's JSON are strings. If you need a different data type, such as an integer or float, or want to do some kind of complex *conversion*, use a converter:
+5
View File
@@ -2,6 +2,11 @@
* [RGB Light Example](#rgb-light-example)
* [Client/Server Setup](#clientserver-setup)
* MATRIX Labs
* [Rhasspy Voice Assistant on MATRIX Voice and MATRIX Creator](https://www.hackster.io/matrix-labs/rhasspy-voice-assistant-on-matrix-voice-and-matrix-creator-97f92e)
* [Adding Intents for Rhasspy Offline Voice Assistant](https://www.hackster.io/matrix-labs/adding-intents-for-rhasspy-offline-voice-assistant-faa221)
* Rendered Obsolete
* [Home Assistant Voice Recognition with Rhasspy](https://rendered-obsolete.github.io/2020/01/02/rhasspy.html)
## RGB Light Example
+39 -1
View File
@@ -142,7 +142,18 @@ More example flows are available [on Github](https://github.com/synesthesiam/rha
### WebSocket Events
Whenever a voice command is recognized, Rhasspy emits JSON events over a websocket connection available at `ws://rhasspy:12101/api/events/intent` (replace `ws://` with `wss://` if you're using [secure hosting](usage.md#secure-hosting-with-https)).
Rhasspy supports multiple websocket event endpoints:
* `/api/events/intent`
* Intent recognized or not
* `/api/events/wake`
* Wake word detected
* `/api/events/text`
* Speech transcription
#### WebSocket Intents
Whenever a voice command is recognized, Rhasspy emits JSON events over a websocket connection available at `ws://YOUR_SERVER:12101/api/events/intent` (replace `ws://` with `wss://` if you're using [secure hosting](usage.md#secure-hosting-with-https)).
You can listen to these events in a [Node-RED](https://nodered.org) flow, and easily add offline, private voice commands to your home automation set up!
For the `ChangLightState` intent from the [RGB Light Example](index.md#rgb-light-example), Rhasspy will emit a JSON event like this over the websocket:
@@ -171,6 +182,33 @@ For the `ChangLightState` intent from the [RGB Light Example](index.md#rgb-light
}
```
#### WebSocket Wake
When the wake word is detected, or Rhasspy is woken up via the `/api/listen-for-command` HTTP endpoint, a JSON event is emitted at `ws://YOUR_SERVER:12101/api/events/wake` (`wss://` if using HTTPS) like:
```json
{
"wakewordId": "default",
"siteId": "default"
}
```
The `wakewordId` is set using the model or file name of your wakeword model (e.g., `porcupine` for `porcupine.ppn`). The `siteId` comes from your `mqtt.siteId` profile setting.
#### WebSocket Transcriptions
Each time a voice command is transcribed, Rhasspy emits a JSON event at `ws://YOUR_SERVER:12101/api/events/text` (`wss://` if using HTTPS) like:
```json
{
"text": "text from voice command",
"wakewordId": "default",
"siteId": "default"
}
```
The transcription is contained in the `text` property. `wakewordId` is the id of the wakeword that initiated the voice command (or `default`). The `siteId` comes from your `mqtt.siteId` profile setting.
## MQTT and Snips
Rhasspy is able to interoperate with Snips.AI services using the [Hermes protocol](https://docs.snips.ai/reference/hermes) over [MQTT](http://mqtt.org). The following components are Snips/Hermes compatible:
+61 -48
View File
@@ -1,17 +1,12 @@
#
# Copyright 2018 Picovoice Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# You may not use this file except in compliance with the license. A copy of the license is located in the "LICENSE"
# file accompanying this source.
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
# an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.
#
import os
@@ -20,7 +15,7 @@ from enum import Enum
class Porcupine(object):
"""Python binding for Picovoice's wake word detection (aka Porcupine) library."""
"""Python binding for Picovoice's wake word detection (Porcupine) engine."""
class PicovoiceStatuses(Enum):
"""Status codes corresponding to 'pv_status_t' defined in 'include/picovoice.h'"""
@@ -29,11 +24,17 @@ class Porcupine(object):
OUT_OF_MEMORY = 1
IO_ERROR = 2
INVALID_ARGUMENT = 3
STOP_ITERATION = 4
KEY_ERROR = 5
INVALID_STATE = 6
_PICOVOICE_STATUS_TO_EXCEPTION = {
PicovoiceStatuses.OUT_OF_MEMORY: MemoryError,
PicovoiceStatuses.IO_ERROR: IOError,
PicovoiceStatuses.INVALID_ARGUMENT: ValueError
PicovoiceStatuses.INVALID_ARGUMENT: ValueError,
PicovoiceStatuses.STOP_ITERATION: StopIteration,
PicovoiceStatuses.KEY_ERROR: KeyError,
PicovoiceStatuses.INVALID_STATE: ValueError,
}
class CPorcupine(Structure):
@@ -48,9 +49,9 @@ class Porcupine(object):
keyword_file_paths=None,
sensitivities=None):
"""
Loads Porcupine's shared library and creates an instance of wake word detection object.
Constructor.
:param library_path: Absolute path to Porcupine's shared library.
:param library_path: Absolute path to Porcupine's dynamic library.
:param model_file_path: Absolute path to file containing model parameters.
:param keyword_file_path: Absolute path to keyword file containing hyper-parameters. If not present then
'keyword_file_paths' will be used.
@@ -64,38 +65,38 @@ class Porcupine(object):
"""
if not os.path.exists(library_path):
raise IOError(f"Could not find Porcupine's library at '{library_path}'")
raise IOError("could'nt find Porcupine's library at '%s'" % library_path)
library = cdll.LoadLibrary(library_path)
if not os.path.exists(model_file_path):
raise IOError(f"Could not find model file at '{model_file_path}'")
raise IOError("could'nt find model file at '%s'" % model_file_path)
if sensitivity is not None and keyword_file_path is not None:
if not os.path.exists(keyword_file_path):
raise IOError(f"Could not find keyword file at '{keyword_file_path}'")
raise IOError("could'nt' find keyword file at '%s'" % keyword_file_path)
keyword_file_paths = [keyword_file_path]
if not (0 <= sensitivity <= 1):
raise ValueError('Sensitivity should be within [0, 1]')
raise ValueError('sensitivity should be within [0, 1]')
sensitivities = [sensitivity]
elif sensitivities is not None and keyword_file_paths is not None:
if len(keyword_file_paths) != len(sensitivities):
raise ValueError("Different number of sensitivity and keyword file path parameters are provided.")
raise ValueError("different number of sensitivity and keyword file path parameters are provided.")
for x in keyword_file_paths:
if not os.path.exists(os.path.expanduser(x)):
raise IOError(f"Could not find keyword file at '{x}'")
raise IOError("could not find keyword file at '%s'" % x)
for x in sensitivities:
if not (0 <= x <= 1):
raise ValueError('Sensitivity should be within [0, 1]')
raise ValueError('sensitivity should be within [0, 1]')
else:
raise ValueError("Sensitivity and/or keyword file path is missing")
raise ValueError("sensitivity and/or keyword file path is missing")
self._num_keywords = len(keyword_file_paths)
init_func = library.pv_porcupine_multiple_keywords_init
init_func = library.pv_porcupine_init
init_func.argtypes = [
c_char_p,
c_int,
@@ -107,44 +108,43 @@ class Porcupine(object):
self._handle = POINTER(self.CPorcupine)()
status = init_func(
model_file_path.encode(),
model_file_path.encode('utf-8'),
self._num_keywords,
(c_char_p * self._num_keywords)(*[os.path.expanduser(x).encode() for x in keyword_file_paths]),
(c_char_p * self._num_keywords)(*[os.path.expanduser(x).encode('utf-8') for x in keyword_file_paths]),
(c_float * self._num_keywords)(*sensitivities),
byref(self._handle))
if status is not self.PicovoiceStatuses.SUCCESS:
raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Initialization failed')
self.process_func = library.pv_porcupine_multiple_keywords_process
self.process_func.argtypes = [POINTER(self.CPorcupine), POINTER(c_short), POINTER(c_int)]
self.process_func.restype = self.PicovoiceStatuses
raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('initialization failed')
self._delete_func = library.pv_porcupine_delete
self._delete_func.argtypes = [POINTER(self.CPorcupine)]
self._delete_func.restype = None
self._sample_rate = library.pv_sample_rate()
self.process_func = library.pv_porcupine_process
self.process_func.argtypes = [POINTER(self.CPorcupine), POINTER(c_short), POINTER(c_int)]
self.process_func.restype = self.PicovoiceStatuses
version_func = library.pv_porcupine_version
version_func.argtypes = []
version_func.restype = c_char_p
self._version = version_func().decode('utf-8')
self._frame_length = library.pv_porcupine_frame_length()
@property
def sample_rate(self):
"""Audio sample rate accepted by Porcupine library."""
self._sample_rate = library.pv_sample_rate()
return self._sample_rate
def delete(self):
"""Releases resources acquired by Porcupine's library."""
@property
def frame_length(self):
"""Number of audio samples per frame expected by C library."""
return self._frame_length
self._delete_func(self._handle)
def process(self, pcm):
"""
Monitors incoming audio stream for given wake word(s).
Processes a frame of the incoming audio stream and emits the detection result.
:param pcm: An array (or array-like) of consecutive audio samples. For more information regarding required audio
properties (i.e. sample rate, number of channels encoding, and number of samples per frame) please refer to
'include/pv_porcupine.h'.
:param pcm: A frame of audio samples. The number of samples per frame can be attained by calling
'.frame_length'. The incoming audio needs to have a sample rate equal to '.sample_rate' and be 16-bit
linearly-encoded. Porcupine operates on single-channel audio.
:return: For a single wake-word use cse True if wake word is detected. For multiple wake-word use case it
returns the index of detected wake-word. Indexing is 0-based and according to ordering of input keyword file
paths. It returns -1 when no keyword is detected.
@@ -153,7 +153,7 @@ class Porcupine(object):
result = c_int()
status = self.process_func(self._handle, (c_short * len(pcm))(*pcm), byref(result))
if status is not self.PicovoiceStatuses.SUCCESS:
raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]('Processing failed')
raise self._PICOVOICE_STATUS_TO_EXCEPTION[status]()
keyword_index = result.value
@@ -162,7 +162,20 @@ class Porcupine(object):
else:
return keyword_index
def delete(self):
"""Releases resources acquired by Porcupine's library."""
@property
def version(self):
"""Getter for version"""
self._delete_func(self._handle)
return self._version
@property
def frame_length(self):
"""Getter for number of audio samples per frame."""
return self._frame_length
@property
def sample_rate(self):
"""Audio sample rate accepted by Picovoice."""
return self._sample_rate
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -1
View File
@@ -10,7 +10,7 @@
"base_language_model": "kaldi/base_language_model.txt",
"base_language_model_fst": "kaldi/base_language_model.fst",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"custom_words": "kaldi_custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
"graph": "graph",
"language_model": "kaldi/language_model.txt",
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+9 -8
View File
@@ -76,7 +76,8 @@
"rasa": {
"examples_markdown": "intent_examples.md",
"project_name": "rhasspy",
"url": "http://localhost:5005/"
"url": "http://localhost:5005/",
"model_dir": "/app/models"
},
"remote": {
"url": "http://my-server:12101/api/text-to-intent"
@@ -323,31 +324,31 @@
"cache": false
},
"porcupine_params.pv": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/lib/common/porcupine_params.pv",
"url": "https://github.com/Picovoice/porcupine/raw/v1.7/lib/common/porcupine_params.pv",
"cache": false
},
"porcupine.ppn": {
"cache": false,
"x86_64": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/resources/keyword_files/linux/porcupine_linux.ppn"
"url": "https://github.com/Picovoice/Porcupine/raw/v1.7/resources/keyword_files/linux/porcupine_linux.ppn"
},
"armv7l": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/resources/keyword_files/raspberrypi/porcupine_raspberrypi.ppn"
"url": "https://github.com/Picovoice/porcupine/raw/v1.7/resources/keyword_files/raspberry-pi/porcupine_raspberry-pi.ppn"
},
"aarch64": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/resources/keyword_files/raspberrypi/porcupine_raspberrypi.ppn"
"url": "https://github.com/Picovoice/porcupine/raw/v1.7/resources/keyword_files/raspberry-pi/porcupine_raspberry-pi.ppn"
}
},
"libpv_porcupine.so": {
"cache": false,
"x86_64": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/lib/linux/x86_64/libpv_porcupine.so"
"url": "https://github.com/Picovoice/porcupine/raw/v1.7/lib/linux/x86_64/libpv_porcupine.so"
},
"armv7l": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/lib/raspberry-pi/cortex-a53/libpv_porcupine.so"
"url": "https://github.com/Picovoice/porcupine/raw/v1.7/lib/raspberry-pi/cortex-a53/libpv_porcupine.so"
},
"aarch64": {
"url": "https://github.com/Picovoice/Porcupine/raw/master/lib/raspberry-pi/cortex-a53/libpv_porcupine.so"
"url": "https://github.com/Picovoice/porcupine/raw/v1.7/lib/raspberry-pi/cortex-a53/libpv_porcupine.so"
}
}
}
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -1
View File
@@ -10,7 +10,7 @@
"base_language_model": "kaldi/base_language_model.txt",
"base_language_model_fst": "kaldi/base_language_model.fst",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"custom_words": "kaldi_custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
"graph": "graph",
"language_model": "kaldi/language_model.txt",
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -1
View File
@@ -10,7 +10,7 @@
"base_language_model": "kaldi/base_language_model.txt",
"base_language_model_fst": "kaldi/base_language_model.fst",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"custom_words": "kaldi_custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
"graph": "graph",
"language_model": "kaldi/language_model.txt",
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -1
View File
@@ -9,7 +9,7 @@
"base_dictionary": "kaldi/base_dictionary.txt",
"base_language_model": "kaldi/base_language_model.txt",
"compatible": true,
"custom_words": "kaldi/custom_words.txt",
"custom_words": "kaldi_custom_words.txt",
"dictionary": "kaldi/dictionary.txt",
"graph": "graph",
"language_model": "kaldi/language_model.txt",
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+1 -3
View File
@@ -17,9 +17,7 @@ def main():
step = int(rest_args[0])
if upper < lower:
temp_lower = lower
lower = upper
upper = temp_lower
lower, upper = upper, lower
for n in range(lower, upper + 1, step):
print(n)
+5
View File
@@ -13,6 +13,11 @@ body {
z-index: 9999;
}
#logo {
border-color: red;
border-width: 0;
}
.response {
text-align: center;
}
+12
View File
@@ -538,3 +538,15 @@ paths:
description: intents
schema:
type: object
/api/play-recording:
post:
summary: 'Play the last recorded voice command from web API'
produces:
- text/plain
responses:
'200':
description: OK
content:
text/plain:
schema:
type: string
+3 -2
View File
@@ -3,8 +3,9 @@ aiohttp==3.6.2
doit==0.31.1
fuzzywuzzy[speedup]==0.17.0
google-cloud-texttospeech==0.5.0
google-cloud-speech==1.3.1
html5lib==1.0.1
json5==0.8.5
json5==0.7.0
multidict==4.6.1
networkx>=2.0
num2words==0.5.10
@@ -15,6 +16,6 @@ pydash==4.7.6
quart==0.6.15
quart-cors==0.1.3
requests==2.22.0
rhasspy-nlu==0.1.4.1
rhasspy-nlu==0.1.6
swagger-ui-py==0.1.7
webrtcvad==2.0.10
+1 -1
View File
@@ -618,7 +618,7 @@ async def wav2mqtt(core: RhasspyCore, profile: Profile, args: Any) -> None:
async def text2wav(core: RhasspyCore, profile: Profile, args: Any) -> None:
"""Speak a sentence and output WAV data"""
result = await core.speak_sentence(args)
result = await core.speak_sentence(args.sentence)
sys.stdout.buffer.write(result.wav_data)
+10 -4
View File
@@ -116,9 +116,6 @@ class RhasspyActor:
def stop(self, block=True):
"""Stop this actor and its children."""
for child_actor in self._actors:
child_actor.stop(block=block)
self.send(self, ActorExitRequest())
if block:
self._thread.join()
@@ -127,6 +124,15 @@ class RhasspyActor:
"""Main loop for this actor."""
while self._running:
message_dict = self._queue.get()
message = message_dict.get("message")
if isinstance(message, ActorExitRequest):
for child in self._actors:
self.send(child, ActorExitRequest())
self._running = False
self.transition("stopped")
self.send(self._parent, ChildActorExited(self))
self.on_receive(message_dict)
@property
@@ -296,7 +302,7 @@ class InboxActor(RhasspyActor):
return self
def __exit__(self, *args):
self.stop(block=False)
self.stop(block=True)
class ActorSystem:
+44 -9
View File
@@ -39,6 +39,7 @@ from rhasspy.events import (
SentenceSpoken,
SpeakSentence,
SpeakWord,
StopListeningForWakeWord,
StartRecordingToBuffer,
StopRecordingToBuffer,
TestMicrophones,
@@ -88,6 +89,8 @@ class RhasspyCore:
self._session: Optional[aiohttp.ClientSession] = aiohttp.ClientSession()
self.dialogue_manager: Optional[RhasspyActor] = None
self.download_status: List[str] = []
# -------------------------------------------------------------------------
@property
@@ -96,6 +99,14 @@ class RhasspyCore:
assert self._session is not None
return self._session
@property
def siteId(self) -> str:
"""Get default MQTT siteId"""
try:
return self.profile.get("mqtt.siteId", "default").split(",")[0]
except Exception:
return "default"
# -------------------------------------------------------------------------
async def start(
@@ -160,10 +171,14 @@ class RhasspyCore:
# -------------------------------------------------------------------------
def listen_for_wake(self) -> None:
def listen_for_wake(self, enabled: bool = True) -> None:
"""Tell Rhasspy to start listening for a wake word."""
assert self.actor_system is not None
self.actor_system.tell(self.dialogue_manager, ListenForWakeWord())
if enabled:
self.actor_system.tell(self.dialogue_manager, ListenForWakeWord())
else:
self.actor_system.tell(self.dialogue_manager, StopListeningForWakeWord())
async def listen_for_command(
self,
@@ -344,9 +359,10 @@ class RhasspyCore:
"""Generate speech/intent artifacts for profile."""
if no_cache:
# Delete doit database
db_path = Path(self.profile.write_path(".doit.db"))
if db_path.is_file():
db_path.unlink()
profile_dir = Path(self.profile.write_path())
for db_path in profile_dir.glob(".doit.db*"):
if db_path.is_file():
db_path.unlink()
assert self.actor_system is not None
with self.actor_system.private() as sys:
@@ -480,6 +496,8 @@ class RhasspyCore:
async def download_profile(self, delete=False, chunk_size=4096) -> None:
"""Download all necessary profile files from the internet and extract them."""
self.download_status = []
output_dir = Path(self.profile.write_path())
download_dir = Path(
self.profile.write_path(self.profile.get("download.cache_dir", "download"))
@@ -500,7 +518,9 @@ class RhasspyCore:
async def download_file(url, filename):
try:
self._logger.debug("Downloading %s to %s", url, filename)
status = f"Downloading {url} to {filename}"
self.download_status.append(status)
self._logger.debug(status)
os.makedirs(os.path.dirname(filename), exist_ok=True)
async with self.session.get(url) as response:
@@ -508,10 +528,21 @@ class RhasspyCore:
async for chunk in response.content.iter_chunked(chunk_size):
out_file.write(chunk)
self._logger.debug("Downloaded %s", filename)
status = f"Downloaded {filename}"
self.download_status.append(status)
self._logger.debug(status)
except Exception:
self._logger.exception(url)
# Try to delete partially downloaded file
try:
status = f"Failed to download {filename}"
self.download_status.append(status)
self._logger.debug(status)
os.unlink(filename)
except Exception:
pass
# Check conditions
machine_type = platform.machine()
download_tasks = []
@@ -595,7 +626,9 @@ class RhasspyCore:
os.makedirs(os.path.dirname(dest_path), exist_ok=True)
# Copy file/directory as is
self._logger.debug("Copying %s to %s", src_path, dest_path)
status = f"Copying {src_path} to {dest_path}"
self.download_status.append(status)
self._logger.debug(status)
if os.path.isdir(src_path):
shutil.copytree(src_path, dest_path)
else:
@@ -668,7 +701,9 @@ class RhasspyCore:
extract_path = os.path.join(temp_dir, src_extract)
# Copy specific file/directory
self._logger.debug("Copying %s to %s", extract_path, dest_path)
status = f"Copying {extract_path} to {dest_path}"
self.download_status.append(status)
self._logger.debug(status)
if os.path.isdir(extract_path):
if src_exclude:
# Ignore some files
+33 -6
View File
@@ -7,8 +7,8 @@ from pathlib import Path
from typing import Any, Dict, List, Optional, Type
import pydash
import pywrapfst as fst
import requests
import rhasspynlu
from rhasspy.actor import (
ActorExitRequest,
@@ -386,6 +386,10 @@ class DialogueManager(RhasspyActor):
for hook_url in awake_hooks:
self._logger.debug("POST-ing to %s", hook_url)
requests.post(hook_url, json=hook_json)
# Forward to observer
if self.observer:
self.send(self.observer, message)
elif isinstance(message, WakeWordNotDetected):
self._logger.debug("Wake word NOT detected. Staying asleep.")
self.transition("ready")
@@ -423,6 +427,10 @@ class DialogueManager(RhasspyActor):
wav_data = buffer_to_wav(message.data)
self.send(self.decoder, TranscribeWav(wav_data, handle=message.handle))
self.transition("decoding")
# Forward to observer
if self.observer:
self.send(self.observer, message)
else:
self.handle_any(message, sender)
@@ -433,6 +441,15 @@ class DialogueManager(RhasspyActor):
def in_decoding(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in decoding state."""
if isinstance(message, WavTranscription):
message.wakewordId = self.wake_detected_name or "default"
# Fix casing
dict_casing = self.profile.get("speech_to_text.dictionary_casing", "")
if dict_casing == "lower":
message.text = message.text.lower()
elif dict_casing == "upper":
message.text = message.text.upper()
# text -> intent
self._logger.debug("%s (confidence=%s)", message.text, message.confidence)
@@ -447,7 +464,8 @@ class DialogueManager(RhasspyActor):
"text": message.text,
"likelihood": 1,
"seconds": 0,
"wakeId": self.wake_detected_name or "",
"wakeId": message.wakewordId,
"wakewordId": message.wakewordId,
}
).encode()
@@ -460,6 +478,10 @@ class DialogueManager(RhasspyActor):
)
self.send(self.mqtt, MqttPublish("hermes/asr/textCaptured", payload))
# Forward to observer
if self.observer:
self.send(self.observer, message)
# Pass to intent recognizer
self.send(
self.recognizer,
@@ -555,12 +577,14 @@ class DialogueManager(RhasspyActor):
self.transition("training_intent")
intent_fst_path = self.profile.read_path(
self.profile.get("intent.fsticuffs.intent_fst", "intent.fst")
intent_graph_path = self.profile.read_path(
self.profile.get("intent.fsticuffs.intent_graph", "intent.json")
)
intent_fst = fst.Fst.read(str(intent_fst_path))
self.send(self.intent_trainer, TrainIntent(intent_fst))
with open(intent_graph_path, "r") as graph_file:
json_graph = json.load(graph_file)
intent_graph = rhasspynlu.json_to_graph(json_graph)
self.send(self.intent_trainer, TrainIntent(intent_graph))
except Exception as e:
self.transition("ready")
self.send(self.training_receiver, ProfileTrainingFailed(str(e)))
@@ -730,6 +754,9 @@ class DialogueManager(RhasspyActor):
elif isinstance(message, GetProblems):
# Report problems from child actors
self.send(sender, Problems(self.problems))
elif isinstance(message, (ListenForWakeWord, StopListeningForWakeWord)):
# Forward to wake actor
self.send(self.wake, message)
else:
self.handle_forward(message, sender)
+10 -3
View File
@@ -246,8 +246,8 @@ class IntentForwarded:
class TrainIntent:
"""Request to train intent recognizer."""
def __init__(self, intent_fst, receiver: Optional[RhasspyActor] = None) -> None:
self.intent_fst = intent_fst
def __init__(self, intent_graph, receiver: Optional[RhasspyActor] = None) -> None:
self.intent_graph = intent_graph
self.receiver = receiver
@@ -390,10 +390,17 @@ class TranscribeWav:
class WavTranscription:
"""Response to TranscribeWav."""
def __init__(self, text: str, handle: bool = True, confidence: float = 1) -> None:
def __init__(
self,
text: str,
handle: bool = True,
confidence: float = 1,
wakewordId: str = "default",
) -> None:
self.text = text
self.confidence = confidence
self.handle = handle
self.wakewordId = wakewordId
# -----------------------------------------------------------------------------
+18 -209
View File
@@ -1,13 +1,11 @@
"""Support for intent recognition."""
import concurrent.futures
import io
import json
import logging
import os
import re
import shutil
import subprocess
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Set, Tuple, Type
from urllib.parse import urljoin
@@ -18,7 +16,7 @@ from rhasspynlu import json_to_graph, recognize
from rhasspy.actor import RhasspyActor
from rhasspy.events import IntentRecognized, RecognizeIntent, SpeakSentence
from rhasspy.utils import empty_intent, hass_request_kwargs
from rhasspy.utils import empty_intent, hass_request_kwargs, load_converters
# -----------------------------------------------------------------------------
@@ -32,7 +30,6 @@ def get_recognizer_class(system: str) -> Type[RhasspyActor]:
"adapt",
"rasa",
"remote",
"flair",
"conversation",
"command",
], f"Invalid intent system: {system}"
@@ -56,10 +53,6 @@ def get_recognizer_class(system: str) -> Type[RhasspyActor]:
# Use remote rhasspy server
return RemoteRecognizer
if system == "flair":
# Use flair locally
return FlairRecognizer
if system == "conversation":
# Use HA conversation
return HomeAssistantConversationRecognizer
@@ -137,32 +130,6 @@ class RemoteRecognizer(RhasspyActor):
# -----------------------------------------------------------------------------
class CliConverter:
"""Command-line converter for intent recognition"""
def __init__(self, name: str, command_path: Path):
self.name = name
self.command_path = command_path
def __call__(self, *args, converter_args=None):
"""Runs external program to convert JSON values"""
converter_args = converter_args or []
proc = subprocess.Popen(
[str(self.command_path)] + converter_args,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
universal_newlines=True,
)
with io.StringIO() as input_file:
for arg in args:
json.dump(arg, input_file)
stdout, _ = proc.communicate(input=input_file.getvalue())
return [json.loads(line) for line in stdout.splitlines() if line.strip()]
class FsticuffsRecognizer(RhasspyActor):
"""Recognize intents using OpenFST."""
@@ -188,33 +155,7 @@ class FsticuffsRecognizer(RhasspyActor):
self.fuzzy = self.profile.get("intent.fsticuffs.fuzzy", True)
# Load user-defined converters
converters_dir = Path(
self.profile.read_path(
self.profile.get("intent.fsticuffs.converters_dir", "converters")
)
)
if converters_dir.is_dir():
self._logger.debug("Loading converters from %s", converters_dir)
for converter_path in converters_dir.glob("**/*"):
if not converter_path.is_file():
continue
# Retain directory structure in name
converter_name = str(
converter_path.relative_to(converters_dir).with_suffix("")
)
# Run converter as external program.
# Input arguments are encoded as JSON on individual lines.
# Output values should be encoded as JSON on individual lines.
converter = CliConverter(converter_name, converter_path)
# Key off name without file extension
self.converters[converter_name] = converter
self._logger.debug(
"Loaded converter %s from %s", converter_name, converter_path
)
self.converters = load_converters(self.profile)
self.transition("loaded")
@@ -347,8 +288,8 @@ class FuzzyWuzzyRecognizer(RhasspyActor):
self._logger.exception("in_loaded")
intent = empty_intent()
intent["text"] = message.text
intent["raw_text"] = message.text
intent["raw_text"] = message.text
intent["speech_confidence"] = message.confidence
self.send(
message.receiver or sender,
@@ -448,6 +389,7 @@ class RasaIntentRecognizer(RhasspyActor):
RhasspyActor.__init__(self)
self.project_name = ""
self.parse_url = ""
self.min_confidence: float = 0
def to_started(self, from_state: str) -> None:
"""Transition to started state."""
@@ -456,6 +398,7 @@ class RasaIntentRecognizer(RhasspyActor):
self.project_name = rasa_config.get(
"project_name", f"rhasspy_{self.profile.name}"
)
self.min_confidence = rasa_config.get("min_confidence", 0)
self.parse_url = urljoin(url, "model/parse")
def in_started(self, message: Any, sender: RhasspyActor) -> None:
@@ -463,13 +406,23 @@ class RasaIntentRecognizer(RhasspyActor):
if isinstance(message, RecognizeIntent):
try:
intent = self.recognize(message.text)
intent["intent"]["name"] = intent["intent"]["name"] or ""
logging.debug(repr(intent))
confidence = intent["intent"]["confidence"]
if confidence < self.min_confidence:
intent["intent"]["name"] = ""
self._logger.warning(
"Intent did not meet confidence threshold: %s < %s",
confidence,
self.min_confidence,
)
except Exception:
self._logger.exception("in_started")
intent = empty_intent()
intent["text"] = message.text
intent["raw_text"] = message.text
intent["raw_text"] = message.text
self.send(
message.receiver or sender,
IntentRecognized(intent, handle=message.handle),
@@ -530,8 +483,8 @@ class AdaptIntentRecognizer(RhasspyActor):
self._logger.exception("in_loaded")
intent = empty_intent()
intent["text"] = message.text
intent["raw_text"] = message.text
intent["raw_text"] = message.text
intent["speech_confidence"] = message.confidence
self.send(
message.receiver or sender,
@@ -612,150 +565,6 @@ class AdaptIntentRecognizer(RhasspyActor):
self._logger.debug("Loaded engine from config file %s", config_path)
# -----------------------------------------------------------------------------
# Flair Intent Recognizer
# https://github.com/zalandoresearch/flair
# -----------------------------------------------------------------------------
class FlairRecognizer(RhasspyActor):
"""Flair based recognizer"""
def __init__(self) -> None:
RhasspyActor.__init__(self)
try:
# pylint: disable=E0401
from flair.models import TextClassifier, SequenceTagger
except Exception:
pass
self.class_model: Optional[TextClassifier] = None
self.ner_models: Optional[Dict[str, SequenceTagger]] = None
self.intent_map: Optional[Dict[str, str]] = None
self.preload = False
def to_started(self, from_state: str) -> None:
"""Transition to started state."""
self.preload = self.config.get("preload", False)
if self.preload:
try:
# Pre-load models
self.load_models()
except Exception as e:
self._logger.warning("preload: %s", e)
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
if isinstance(message, RecognizeIntent):
try:
self.load_models()
intent = self.recognize(message.text)
except Exception:
self._logger.exception("in_started")
intent = empty_intent()
intent["text"] = message.text
intent["raw_text"] = message.text
intent["speech_confidence"] = message.confidence
self.send(
message.receiver or sender,
IntentRecognized(intent, handle=message.handle),
)
def recognize(self, text: str) -> Dict[str, Any]:
"""Run intent classifier and then named-entity recognizer."""
# pylint: disable=E0401
from flair.data import Sentence
intent = empty_intent()
sentence = Sentence(text)
assert self.intent_map is not None
if self.class_model is not None:
self.class_model.predict(sentence)
assert sentence.labels, "No intent predicted"
label = sentence.labels[0]
intent_id = label.value
intent["intent"]["confidence"] = label.score
else:
# Assume first intent
intent_id = next(iter(self.intent_map))
intent["intent"]["confidence"] = 1
intent["intent"]["name"] = self.intent_map[intent_id]
assert self.ner_models is not None
if intent_id in self.ner_models:
# Predict entities
self.ner_models[intent_id].predict(sentence)
ner_dict = sentence.to_dict(tag_type="ner")
for named_entity in ner_dict["entities"]:
intent["entities"].append(
{
"entity": named_entity["type"],
"value": named_entity["text"],
"start": named_entity["start_pos"],
"end": named_entity["end_pos"],
"confidence": named_entity["confidence"],
}
)
return intent
# -------------------------------------------------------------------------
def load_models(self) -> None:
"""Load intent classifier and named entity recognizers."""
# pylint: disable=E0401
from flair.models import TextClassifier, SequenceTagger
# Load mapping from intent id to user intent name
if self.intent_map is None:
intent_map_path = self.profile.read_path(
self.profile.get("training.intent.intent_map", "intent_map.json")
)
with open(intent_map_path, "r") as intent_map_file:
self.intent_map = json.load(intent_map_file)
data_dir = self.profile.read_path(
self.profile.get("intent.flair.data_dir", "flair_data")
)
# Only load intent classifier if there is more than one intent
if (self.class_model is None) and (len(self.intent_map) > 1):
class_model_path = os.path.join(
data_dir, "classification", "final-model.pt"
)
self._logger.debug("Loading classification model from %s", class_model_path)
self.class_model = TextClassifier.load_from_file(class_model_path)
self._logger.debug("Loaded classification model")
if self.ner_models is None:
ner_models = {}
ner_data_dir = os.path.join(data_dir, "ner")
for file_name in os.listdir(ner_data_dir):
ner_model_dir = os.path.join(ner_data_dir, file_name)
if os.path.isdir(ner_model_dir):
# Assume directory is intent name
intent_name = file_name
if intent_name not in self.intent_map:
self._logger.warning(
"%s was not found in intent map", intent_name
)
ner_model_path = os.path.join(ner_model_dir, "final-model.pt")
self._logger.debug("Loading NER model from %s", ner_model_path)
ner_models[intent_name] = SequenceTagger.load_from_file(
ner_model_path
)
self._logger.debug("Loaded NER model(s)")
self.ner_models = ner_models
# -----------------------------------------------------------------------------
# Home Assistant Conversation
# https://www.home-assistant.io/integrations/conversation
@@ -861,8 +670,8 @@ class CommandRecognizer(RhasspyActor):
self._logger.exception("in_started")
intent = empty_intent()
intent["text"] = message.text
intent["raw_text"] = message.text
intent["raw_text"] = message.text
intent["speech_confidence"] = message.confidence
self.send(
message.receiver or sender,
+112 -329
View File
@@ -1,22 +1,16 @@
"""Training for intent recognizers."""
import json
import os
import random
import re
import shutil
import subprocess
import tempfile
import time
from collections import Counter, defaultdict
from io import StringIO
from typing import Any, Dict, List, Set, Type
from typing import Any, Callable, Dict, List, Set, Type
from urllib.parse import urljoin
from rhasspy.actor import RhasspyActor
from rhasspy.events import (IntentTrainingComplete, IntentTrainingFailed,
TrainIntent)
from rhasspy.utils import (lcm, make_sentences_by_intent,
sample_sentences_by_intent)
from rhasspy.events import IntentTrainingComplete, IntentTrainingFailed, TrainIntent
from rhasspy.utils import make_sentences_by_intent, load_converters
# -----------------------------------------------------------------------------
@@ -32,7 +26,6 @@ def get_intent_trainer_class(
"fuzzywuzzy",
"adapt",
"rasa",
"flair",
"auto",
"command",
], f"Invalid intent training system: {trainer_system}"
@@ -48,9 +41,6 @@ def get_intent_trainer_class(
if recognizer_system == "adapt":
# Use Mycroft Adapt locally
return AdaptIntentTrainer
if recognizer_system == "flair":
# Use flair locally
return FlairIntentTrainer
if recognizer_system == "rasa":
# Use Rasa NLU remotely
return RasaIntentTrainer
@@ -69,9 +59,6 @@ def get_intent_trainer_class(
if trainer_system == "rasa":
# Use Rasa NLU remotely
return RasaIntentTrainer
if trainer_system == "flair":
# Use flair RNN locally
return FlairIntentTrainer
if trainer_system == "command":
# Use command-line intent trainer
return CommandIntentTrainer
@@ -98,7 +85,7 @@ class DummyIntentTrainer(RhasspyActor):
class FsticuffsIntentTrainer(DummyIntentTrainer):
"""No training needed. Intent FST will be used directly during recognition."""
"""No training needed. Intent graph will be used directly during recognition."""
pass
@@ -112,23 +99,33 @@ class FsticuffsIntentTrainer(DummyIntentTrainer):
class FuzzyWuzzyIntentTrainer(RhasspyActor):
"""Save examples to JSON for fuzzy string matching later."""
def __init__(self):
RhasspyActor.__init__(self)
self.converters: Dict[str, Callable[..., Any]] = {}
def to_started(self, from_state: str) -> None:
# Load user-defined converters
self.converters = load_converters(self.profile)
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
if isinstance(message, TrainIntent):
try:
self.train(message.intent_fst)
self.train(message.intent_graph)
self.send(message.receiver or sender, IntentTrainingComplete())
except Exception as e:
self._logger.exception("train")
self.send(message.receiver or sender, IntentTrainingFailed(repr(e)))
def train(self, intent_fst) -> None:
def train(self, intent_graph) -> None:
"""Save examples to JSON file."""
examples_path = self.profile.write_path(
self.profile.get("intent.fuzzywuzzy.examples_json")
)
sentences_by_intent: Dict[str, Any] = make_sentences_by_intent(intent_fst)
sentences_by_intent = make_sentences_by_intent(
intent_graph, extra_converters=self.converters
)
with open(examples_path, "w") as examples_file:
json.dump(sentences_by_intent, examples_file, indent=4)
@@ -144,11 +141,19 @@ class FuzzyWuzzyIntentTrainer(RhasspyActor):
class RasaIntentTrainer(RhasspyActor):
"""Uses Rasa NLU HTTP API to train a recognizer."""
def __init__(self):
RhasspyActor.__init__(self)
self.converters: Dict[str, Callable[..., Any]] = {}
def to_started(self, from_state: str) -> None:
# Load user-defined converters
self.converters = load_converters(self.profile)
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
if isinstance(message, TrainIntent):
try:
self.train(message.intent_fst)
self.train(message.intent_graph)
self.send(message.receiver or sender, IntentTrainingComplete())
except Exception as e:
self._logger.exception("train")
@@ -156,9 +161,8 @@ class RasaIntentTrainer(RhasspyActor):
# -------------------------------------------------------------------------
def train(self, intent_fst) -> None:
def train(self, intent_graph) -> None:
"""Convert examples to Markdown and POST to RasaNLU server."""
from rhasspy.train.jsgf2fst import fstprintall
import requests
# Load settings
@@ -174,39 +178,59 @@ class RasaIntentTrainer(RhasspyActor):
)
# Build Markdown sentences
sentences_by_intent: Dict[str, Any] = defaultdict(list)
for symbols in fstprintall(intent_fst, exclude_meta=False):
intent_name = ""
strings = []
for sym in symbols:
if sym.startswith("<"):
continue # <eps>
sentences_by_intent = make_sentences_by_intent(
intent_graph, extra_converters=self.converters
)
if sym.startswith("__label__"):
intent_name = sym[9:]
elif sym.startswith("__begin__"):
strings.append("[")
elif sym.startswith("__end__"):
strings[-1] = strings[-1].strip()
tag = sym[7:]
strings.append(f"]({tag})")
strings.append(" ")
else:
strings.append(sym)
strings.append(" ")
sentence = "".join(strings).strip()
sentences_by_intent[intent_name].append(sentence)
# Write to YAML file
# Write to YAML/Markdown file
with open(examples_md_path, "w") as examples_md_file:
for intent_name, intent_sents in sentences_by_intent.items():
# Rasa Markdown training format
print(f"## intent:{intent_name}", file=examples_md_file)
for intent_sent in intent_sents:
print("-", intent_sent, file=examples_md_file)
raw_index = 0
index_entity = {e["raw_start"]: e for e in intent_sent["entities"]}
entity = None
sentence_tokens = []
entity_tokens = []
for raw_token in intent_sent["raw_tokens"]:
token = raw_token
if entity and (raw_index >= entity["raw_end"]):
# Finish current entity
last_token = entity_tokens[-1]
entity_tokens[-1] = f"{last_token}]({entity['entity']})"
sentence_tokens.extend(entity_tokens)
entity = None
entity_tokens = []
print("", file=examples_md_file)
new_entity = index_entity.get(raw_index)
if new_entity:
# Begin new entity
assert entity is None, "Unclosed entity"
entity = new_entity
entity_tokens = []
token = f"[{token}"
if entity:
# Add to current entity
entity_tokens.append(token)
else:
# Add directly to sentence
sentence_tokens.append(token)
raw_index += len(raw_token) + 1
if entity:
# Finish final entity
last_token = entity_tokens[-1]
entity_tokens[-1] = f"{last_token}]({entity['entity']})"
sentence_tokens.extend(entity_tokens)
# Print single example
print("-", " ".join(sentence_tokens), file=examples_md_file)
# Newline between intents
print("", file=examples_md_file)
# Create training YAML file
with tempfile.NamedTemporaryFile(
@@ -254,9 +278,19 @@ class RasaIntentTrainer(RhasspyActor):
try:
response.raise_for_status()
model_dir = rasa_config.get("model_dir", "")
model_file = os.path.join(model_dir, response.headers["filename"])
self._logger.debug("Received model %s", model_file)
# Replace model
model_url = urljoin(url, "model")
requests.put(model_url, json={"model_file": model_file})
except Exception:
# Rasa gives quite helpful error messages, so extract them from the response.
raise Exception(f'{response.reason}: {json.loads(response.content)["message"]}')
raise Exception(
f'{response.reason}: {json.loads(response.content)["message"]}'
)
# -----------------------------------------------------------------------------
@@ -268,11 +302,19 @@ class RasaIntentTrainer(RhasspyActor):
class AdaptIntentTrainer(RhasspyActor):
"""Configure a Mycroft Adapt engine."""
def __init__(self):
RhasspyActor.__init__(self)
self.converters: Dict[str, Callable[..., Any]] = {}
def to_started(self, from_state: str) -> None:
# Load user-defined converters
self.converters = load_converters(self.profile)
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
if isinstance(message, TrainIntent):
try:
self.train(message.intent_fst)
self.train(message.intent_graph)
self.send(message.receiver or sender, IntentTrainingComplete())
except Exception as e:
self._logger.exception("train")
@@ -280,19 +322,19 @@ class AdaptIntentTrainer(RhasspyActor):
# -------------------------------------------------------------------------
def train(self, intent_fst) -> None:
def train(self, intent_graph) -> None:
"""Create intents, entities, and keywords."""
# Load "stop" words (common words that are excluded from training)
stop_words: Set[str] = set()
stop_words_path = self.profile.read_path("stop_words.txt")
if os.path.exists(stop_words_path):
with open(stop_words_path, "r") as stop_words_file:
stop_words = {
line.strip() for line in stop_words_file if line.strip()
}
stop_words = {line.strip() for line in stop_words_file if line.strip()}
# { intent: [ { 'text': ..., 'entities': { ... } }, ... ] }
sentences_by_intent: Dict[str, Any] = make_sentences_by_intent(intent_fst)
sentences_by_intent = make_sentences_by_intent(
intent_graph, extra_converters=self.converters
)
# Generate intent configuration
entities: Dict[str, Set[str]] = {}
@@ -311,17 +353,12 @@ class AdaptIntentTrainer(RhasspyActor):
# Process sentences for this intent
for intent_sent in intent_sents:
_, slots, word_tokens = (
intent_sent.get("raw_text", intent_sent["text"]),
intent_sent["entities"],
intent_sent["tokens"],
)
entity_tokens: Set[str] = set()
# Group slot values by entity
slot_entities: Dict[str, List[str]] = defaultdict(list)
for sent_ent in slots:
slot_entities[sent_ent["entity"]].append(sent_ent["value"])
for sent_ent in intent_sent["entities"]:
slot_entities[sent_ent["entity"]].append(sent_ent["raw_value"])
# Add entities
for entity_name, entity_values in slot_entities.items():
@@ -335,10 +372,10 @@ class AdaptIntentTrainer(RhasspyActor):
# Split entity values by whitespace
for value in entity_values:
entity_tokens.update(re.split(r"\s", value))
entity_tokens.update(value.split())
# Get all non-stop words that are not part of entity values
words = set(word_tokens) - entity_tokens - stop_words
words = set(intent_sent["raw_tokens"]) - entity_tokens - stop_words
# Increment count for words
for word in words:
@@ -398,268 +435,6 @@ class AdaptIntentTrainer(RhasspyActor):
self._logger.debug("Wrote adapt configuration to %s", config_path)
# -----------------------------------------------------------------------------
# Flair Intent Trainer
# https://github.com/zalandoresearch/flair
# -----------------------------------------------------------------------------
class FlairIntentTrainer(RhasspyActor):
"""Trains a classification and NER model using flair"""
def __init__(self):
RhasspyActor.__init__(self)
self.embeddings = []
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
if isinstance(message, TrainIntent):
try:
self.train(message.intent_fst)
self.send(message.receiver or sender, IntentTrainingComplete())
except Exception as e:
self._logger.exception("train")
self.send(message.receiver or sender, IntentTrainingFailed(repr(e)))
def train(self, intent_fst) -> None:
"""Train intent classifier and named entity recognizers."""
# pylint: disable=E0401
from flair.data import Sentence, Token
# pylint: disable=E0401
from flair.models import SequenceTagger, TextClassifier
# pylint: disable=E0401
from flair.embeddings import (
FlairEmbeddings,
StackedEmbeddings,
DocumentRNNEmbeddings,
)
# pylint: disable=E0401
from flair.data import TaggedCorpus
# pylint: disable=E0401
from flair.trainers import ModelTrainer
# Directory to look for downloaded embeddings
cache_dir = self.profile.read_path(
self.profile.get("intent.flair.cache_dir", "flair/cache")
)
os.makedirs(cache_dir, exist_ok=True)
# Directory to store generated models
data_dir = self.profile.write_path(
self.profile.get("intent.flair.data_dir", "flair/data")
)
if os.path.exists(data_dir):
shutil.rmtree(data_dir)
self.embeddings = self.profile.get("intent.flair.embeddings", [])
assert self.embeddings, "No word embeddings"
# Create directories to write training data to
class_data_dir = os.path.join(data_dir, "classification")
ner_data_dir = os.path.join(data_dir, "ner")
os.makedirs(class_data_dir, exist_ok=True)
os.makedirs(ner_data_dir, exist_ok=True)
# Convert FST to training data
# ----------------------------
# { intent: [ { 'text': ..., 'entities': { ... } }, ... ] }
sentences_by_intent: Dict[str, Any] = {}
# Get sentences for training
do_sampling = self.profile.get("intent.flair.do_sampling", True)
start_time = time.time()
if do_sampling:
# Sample from each intent FST
num_samples = int(self.profile.get("intent.flair.num_samples", 10000))
intent_map_path = self.profile.read_path(
self.profile.get("training.intent.intent_map", "intent_map.json")
)
with open(intent_map_path, "r") as intent_map_file:
intent_map = json.load(intent_map_file)
# Gather FSTs for all known intents
fsts_dir = self.profile.write_dir(
self.profile.get("speech_to_text.fsts_dir")
)
intent_fst_paths = {
intent_id: os.path.join(fsts_dir, f"{intent_id}.fst")
for intent_id in intent_map
}
# Generate samples
self._logger.debug(
"Generating %s sample(s) from %s intent(s)",
num_samples,
len(intent_fst_paths),
)
sentences_by_intent = sample_sentences_by_intent(
intent_fst_paths, num_samples
)
else:
# Exhaustively generate all sentences
self._logger.debug(
"Generating all possible sentences (may take a long time)"
)
sentences_by_intent = make_sentences_by_intent(intent_fst)
sentence_time = time.time() - start_time
self._logger.debug("Generated sentences in %s second(s)", sentence_time)
# Get least common multiple in order to balance sentences by intent
lcm_sentences = lcm(*(len(sents) for sents in sentences_by_intent.values()))
# Generate examples
class_sentences = []
ner_sentences: Dict[str, List[Sentence]] = defaultdict(list)
for intent_name, intent_sents in sentences_by_intent.items():
num_repeats = max(1, lcm_sentences // len(intent_sents))
for intent_sent in intent_sents:
# Only train an intent classifier if there's more than one intent
if len(sentences_by_intent) > 1:
# Add balanced copies
for _ in range(num_repeats):
class_sent = Sentence(labels=[intent_name])
for word in intent_sent["tokens"]:
class_sent.add_token(Token(word))
class_sentences.append(class_sent)
if not intent_sent["entities"]:
continue # no entities, no sequence tagger
# Named entity recognition (NER) example
token_idx = 0
entity_start = {ev["start"]: ev for ev in intent_sent["entities"]}
entity_end = {ev["end"]: ev for ev in intent_sent["entities"]}
entity = None
word_tags = []
for word in intent_sent["tokens"]:
# Determine tag label
tag = "O" if not entity else f"I-{entity}"
if token_idx in entity_start:
entity = entity_start[token_idx]["entity"]
tag = f"B-{entity}"
word_tags.append((word, tag))
# word ner
token_idx += len(word) + 1
if (token_idx - 1) in entity_end:
entity = None
# Add balanced copies
for _ in range(num_repeats):
ner_sent = Sentence()
for word, tag in word_tags:
token = Token(word)
token.add_tag("ner", tag)
ner_sent.add_token(token)
ner_sentences[intent_name].append(ner_sent)
# Start training
max_epochs = int(self.profile.get("intent.flair.max_epochs", 100))
# Load word embeddings
self._logger.debug("Loading word embeddings from %s", cache_dir)
word_embeddings = [
FlairEmbeddings(os.path.join(cache_dir, "embeddings", e))
for e in self.embeddings
]
if class_sentences:
self._logger.debug("Training intent classifier")
# Random 80/10/10 split
class_train, class_dev, class_test = self._split_data(class_sentences)
class_corpus = TaggedCorpus(class_train, class_dev, class_test)
# Intent classification
doc_embeddings = DocumentRNNEmbeddings(
word_embeddings,
hidden_size=512,
reproject_words=True,
reproject_words_dimension=256,
)
classifier = TextClassifier(
doc_embeddings,
label_dictionary=class_corpus.make_label_dictionary(),
multi_label=False,
)
self._logger.debug(
"Intent classifier has %s example(s)", len(class_sentences)
)
trainer = ModelTrainer(classifier, class_corpus)
trainer.train(class_data_dir, max_epochs=max_epochs)
else:
self._logger.info("Skipping intent classifier training")
if ner_sentences:
self._logger.debug("Training %s NER sequence tagger(s)", len(ner_sentences))
# Named entity recognition
stacked_embeddings = StackedEmbeddings(word_embeddings)
for intent_name, intent_ner_sents in ner_sentences.items():
ner_train, ner_dev, ner_test = self._split_data(intent_ner_sents)
ner_corpus = TaggedCorpus(ner_train, ner_dev, ner_test)
tagger = SequenceTagger(
hidden_size=256,
embeddings=stacked_embeddings,
tag_dictionary=ner_corpus.make_tag_dictionary(tag_type="ner"),
tag_type="ner",
use_crf=True,
)
ner_intent_dir = os.path.join(ner_data_dir, intent_name)
os.makedirs(ner_intent_dir, exist_ok=True)
self._logger.debug(
"NER tagger for %s has %s example(s)",
intent_name,
len(intent_ner_sents),
)
trainer = ModelTrainer(tagger, ner_corpus)
trainer.train(ner_intent_dir, max_epochs=max_epochs)
else:
self._logger.info("Skipping NER sequence tagger training")
# -------------------------------------------------------------------------
def _split_data(self, data, split=0.1):
"""Randomly splits a data set into train, dev, and test sets"""
random.shuffle(data)
split_index = int(len(data) * split)
# 1 - (2*split)
train = data[(split_index * 2) :]
# split
dev = data[:split_index]
# split
test = data[split_index : (split_index * 2)]
return train, dev, test
# -----------------------------------------------------------------------------
# Command-line Based Intent Trainer
# -----------------------------------------------------------------------------
@@ -671,6 +446,7 @@ class CommandIntentTrainer(RhasspyActor):
def __init__(self):
RhasspyActor.__init__(self)
self.command: List[str] = []
self.converters: Dict[str, Callable[..., Any]] = {}
def to_started(self, from_state: str) -> None:
"""Transition to started state."""
@@ -682,6 +458,9 @@ class CommandIntentTrainer(RhasspyActor):
for a in self.profile.get("training.intent.command.arguments", [])
]
# Load user-defined converters
self.converters = load_converters(self.profile)
self.command = [program] + arguments
def in_started(self, message: Any, sender: RhasspyActor) -> None:
@@ -700,10 +479,14 @@ class CommandIntentTrainer(RhasspyActor):
self._logger.debug(self.command)
# { intent: [ { 'text': ..., 'entities': { ... } }, ... ] }
sentences_by_intent: Dict[str, Any] = make_sentences_by_intent(intent_fst)
sentences_by_intent = make_sentences_by_intent(intent_fst)
json_sentences = {
intent: [r.asdict() for r in sentences_by_intent[intent]]
for intent in sentences_by_intent
}
# JSON -> STDIN
json_input = json.dumps({sentences_by_intent}).encode()
json_input = json.dumps(json_sentences).encode()
subprocess.run(self.command, input=json_input, check=True)
except Exception:
+85
View File
@@ -10,6 +10,9 @@ from typing import Any, Dict, Iterable, List, Optional, Tuple, Type
from urllib.parse import urljoin
import requests
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
from rhasspy.actor import RhasspyActor
from rhasspy.events import TranscribeWav, WavTranscription
@@ -25,6 +28,7 @@ def get_decoder_class(system: str) -> Type[RhasspyActor]:
"pocketsphinx",
"kaldi",
"remote",
"google",
"hass_stt",
"command",
], f"Invalid speech to text system: {system}"
@@ -38,6 +42,9 @@ def get_decoder_class(system: str) -> Type[RhasspyActor]:
if system == "remote":
# Use remote Rhasspy server
return RemoteDecoder
if system == "google":
# Use remote Google Cloud
return GoogleCloudDecoder
if system == "hass_stt":
# Use Home Assistant STT platform
return HomeAssistantSTTIntegration
@@ -320,6 +327,84 @@ class RemoteDecoder(RhasspyActor):
return response.text
# -----------------------------------------------------------------------------
# Google Cloud Speech-to-text decoder
# -----------------------------------------------------------------------------
class GoogleCloudDecoder(RhasspyActor):
"""Forwards speech to text request to Google Cloud STT service"""
def __init__(self) -> None:
RhasspyActor.__init__(self)
self.client = None
self.language_code = None
self.min_confidence: float = 0
def to_started(self, from_state: str) -> None:
"""Transition to started state."""
credentials_file = self.profile.get("speech_to_text.google.credentials")
self.min_confidence = self.profile.get("speech_to_text.google.min_confidence")
self.language_code = self.profile.get("locale").replace('_', '-')
from google.auth import environment_vars
os.environ[environment_vars.CREDENTIALS] = credentials_file
self.client = speech.SpeechClient()
def in_started(self, message: Any, sender: RhasspyActor) -> None:
"""Handle messages in started state."""
if isinstance(message, TranscribeWav):
try:
text, confidence = self.transcribe_wav(message.wav_data)
self._logger.debug(text)
self.send(
message.receiver or sender,
WavTranscription(
text, confidence=confidence, handle=message.handle
),
)
except Exception:
self._logger.exception("transcribing wav")
# Send empty transcription back
self.send(
message.receiver or sender,
WavTranscription("", confidence=0, handle=message.handle),
)
def transcribe_wav(self, wav_data: bytes) -> Tuple[str, float]:
"""POST to remote server and return response."""
headers = {"Content-Type": "audio/wav"}
self._logger.debug(
"POSTing %d byte(s) of WAV data to Google Cloud STT", len(wav_data)
)
audio = types.RecognitionAudio(content=wav_data)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
model='command_and_search',
language_code=self.language_code)
response = self.client.recognize(config, audio)
if len(response.results) == 0:
self._logger.debug("No results returned.")
return "", 0
result = response.results[0].alternatives[0]
self._logger.debug("Transcription confidence: %s", result.confidence)
if result.confidence >= self.min_confidence:
return result.transcript, result.confidence
self._logger.warning(
"Transcription did not meet confidence threshold: %s < %s",
result.confidence,
self.min_confidence,
)
return "", 0
# -----------------------------------------------------------------------------
# Kaldi Decoder
# http://kaldi-asr.org
+32 -22
View File
@@ -165,9 +165,7 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
# Check for arguments.
# Slot name retains argument(s).
if "," in slot_name:
parts = slot_name.split(",")
slot_name = parts[0]
slot_args = parts[1:]
slot_name, *slot_args = slot_name.split(",")
else:
slot_args = None
@@ -228,7 +226,7 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
upper_bound = int(match.group(2))
step = 1
if len(match.groups()) > 2:
if len(match.groups()) > 3:
# Exclude ,
step = int(match.group(3)[1:])
@@ -259,7 +257,7 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
n = int(match.group(1))
# 75 -> (seventy five):75!int
number_text = num2words(n, lang=language).replace("-", " ").strip()
number_text = re.sub(r"[-,]\s*", " ", num2words(n, lang=language)).strip()
assert number_text, f"Empty num2words result for {n}"
number_words = number_text.split()
@@ -270,7 +268,7 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
word.converters = ["int"]
return word
# Hard case, split into mutliple Words
# Hard case, split into multiple Words
return jsgf.Sequence(
text=number_text,
type=jsgf.SequenceType.GROUP,
@@ -323,6 +321,19 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
def __setitem__(self, key, value):
self.values[key] = value
# Determine whether word casing has to be fixed
word_transform = None
if word_casing == "upper":
word_transform = str.upper
elif word_casing == "lower":
word_transform = str.lower
def fix_word_case(word):
if isinstance(word, jsgf.Word):
word.text = word_transform(word.text)
return word
# -------------------------------------------------------------------------
def do_intents_to_graph(sentences, slot_names, replacements, targets):
@@ -333,25 +344,11 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
for sentence in intent_sentences:
jsgf.walk_expression(sentence, number_transform, replacements)
# Determine whether word casing has to be fixed
transform = None
if word_casing == "upper":
transform = str.upper
elif word_casing == "lower":
transform = str.lower
if transform:
def fix_case(word):
if isinstance(word, jsgf.Word):
word.text = transform(word.text)
return word
if word_transform:
# Fix casing
for intent_sentences in sentences.values():
for sentence in intent_sentences:
jsgf.walk_expression(sentence, fix_case, replacements)
jsgf.walk_expression(sentence, fix_word_case, replacements)
# Convert to directed graph
graph = intents_to_graph(sentences, replacements)
@@ -379,6 +376,7 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
slot_names.add(slot_name)
# Load slot values
has_slot_program = False
for slot_key in slot_names:
slot_info = find_slot(slot_key)
@@ -390,9 +388,13 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
line = line.strip()
if line:
sentence = jsgf.Sentence.parse(line)
if word_transform:
jsgf.walk_expression(sentence, fix_word_case)
slot_values.append(sentence)
elif isinstance(slot_info, SlotProgramInfo):
# Program that will generate values
has_slot_program = True
slot_values = SlotProgram(slot_info.path, command_args=slot_info.args)
# Replace $slot with sentences
@@ -410,6 +412,7 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
"file_dep": ini_paths + deps,
"targets": [intent_graph],
"actions": [(do_intents_to_graph, [sentences, slot_names, replacements])],
"uptodate": [False if has_slot_program else None],
}
# -----------------------------------------------------------------------------
@@ -523,6 +526,13 @@ def train_profile(profile_dir: Path, profile: Profile) -> Tuple[int, List[str]]:
for word in read_dict(dict_file):
print(word, file=vocab_file)
if profile.get("wake.system", "dummy") == "pocketsphinx":
# Add words from Pocketsphinx wake keyphrase
keyphrase = profile.get("wake.pocketsphinx.keyphrase", "")
if keyphrase:
for word in re.split(r"\s+", keyphrase):
print(word, file=vocab_file)
@create_after(executed="language_model")
def task_vocab():
"""Writes all vocabulary words to a file from intent.fst."""
+1 -1
View File
@@ -91,7 +91,7 @@ def make_dict(
if (i < 1) or no_number:
print(word, pronounce, file=dictionary_file)
else:
print(f"{word, i + 1}({pronounce})", file=dictionary_file)
print(f"{word}({i + 1})", pronounce, file=dictionary_file)
words_in_dict.add(word)
+4 -2
View File
@@ -94,6 +94,7 @@ class EspeakSentenceSpeaker(RhasspyActor):
self.disable_wake = True
self.enable_wake = False
self.wake: Optional[RhasspyActor] = None
self.espeak_args: List[str] = []
def to_started(self, from_state: str) -> None:
"""Transition to started state."""
@@ -104,6 +105,7 @@ class EspeakSentenceSpeaker(RhasspyActor):
self.wake = self.config.get("wake")
self.wake_on_start = self.profile.get("rhasspy.listen_on_start", False)
self.disable_wake = self.profile.get("text_to_speech.disable_wake", True)
self.espeak_args = list(self.profile.get("text_to_speech.espeak.arguments", []))
self.transition("ready")
def in_ready(self, message: Any, sender: RhasspyActor) -> None:
@@ -143,7 +145,7 @@ class EspeakSentenceSpeaker(RhasspyActor):
def speak(self, sentence: str, voice: Optional[str] = None) -> bytes:
"""Get WAV buffer for sentence."""
try:
espeak_cmd = ["espeak"]
espeak_cmd = ["espeak"] + self.espeak_args
if voice:
espeak_cmd.extend(["-v", str(voice)])
@@ -896,7 +898,7 @@ class HomeAssistantSentenceSpeaker(RhasspyActor):
# Convert to WAV
if audio_url.endswith(".mp3"):
lame_command = ["lame", "--decode", "-", "-"]
lame_command = ["lame", "--decode", "--mp3input", "-", "-"]
self._logger.debug(lame_command)
return subprocess.run(
+97 -45
View File
@@ -1,23 +1,24 @@
"""Rhasspy utility functions."""
import collections
import concurrent.futures
import gzip
import io
import itertools
import json
import logging
import math
import os
import random
import re
import subprocess
import threading
import wave
from collections import defaultdict
from pathlib import Path
from typing import (Any, Callable, Dict, Iterable, List, Mapping, Optional,
Set, Tuple)
from typing import Any, Callable, Dict, Iterable, List, Mapping, Optional, Set, Tuple
import pywrapfst as fst
import networkx as nx
import rhasspynlu
from num2words import num2words
WHITESPACE_PATTERN = re.compile(r"\s+")
@@ -329,55 +330,45 @@ def grouper(iterable, n, fillvalue=None):
# -----------------------------------------------------------------------------
def make_sentences_by_intent(intent_fst: fst.Fst) -> Dict[str, Any]:
"""Get all sentences from an FST."""
from rhasspy.train.jsgf2fst import fstprintall, symbols2intent
def make_sentences_by_intent(
intent_graph: nx.DiGraph, num_samples: Optional[int] = None, extra_converters=None
) -> Dict[str, List[Dict[str, Any]]]:
"""Get all sentences from a graph."""
# { intent: [ { 'text': ..., 'entities': { ... } }, ... ] }
sentences_by_intent: Dict[str, Any] = defaultdict(list)
for symbols in fstprintall(intent_fst, exclude_meta=False):
intent = symbols2intent(symbols)
intent_name = intent["intent"]["name"]
sentences_by_intent[intent_name].append(intent)
start_node = None
end_node = None
for node, node_data in intent_graph.nodes(data=True):
if node_data.get("start", False):
start_node = node
elif node_data.get("final", False):
end_node = node
return sentences_by_intent
if start_node and end_node:
break
assert (start_node is not None) and (
end_node is not None
), "Missing start/end node(s)"
# -----------------------------------------------------------------------------
def sample_sentences_by_intent(
intent_fst_paths: Dict[str, str], num_samples: int
) -> Dict[str, Any]:
"""Generate random intents"""
from rhasspy.train.jsgf2fst import fstprintall, symbols2intent
def sample_sentences(intent_name: str, intent_fst_path: str):
rand_fst = fst.Fst.read_from_string(
subprocess.check_output(
["fstrandgen", f"--npath={num_samples}", intent_fst_path]
)
if num_samples is not None:
# Randomly sample
paths = random.sample(
list(nx.all_simple_paths(intent_graph, start_node, end_node)), num_samples
)
else:
# Use generator
paths = nx.all_simple_paths(intent_graph, start_node, end_node)
sentences: List[Dict[str, Any]] = []
for symbols in fstprintall(rand_fst, exclude_meta=False):
intent = symbols2intent(symbols)
sentences.append(intent)
return sentences
# Generate samples in parallel
future_to_intent = {}
with concurrent.futures.ThreadPoolExecutor() as executor:
for intent_name, intent_fst_path in intent_fst_paths.items():
future = executor.submit(sample_sentences, intent_name, intent_fst_path)
future_to_intent[future] = intent_name
# { intent: [ { 'text': ..., 'entities': { ... } }, ... ] }
sentences_by_intent: Dict[str, Any] = {}
for future, intent_name in future_to_intent.items():
sentences_by_intent[intent_name] = future.result()
# TODO: Add converters
for path in paths:
_, recognition = rhasspynlu.fsticuffs.path_to_recognition(
path, intent_graph, extra_converters=extra_converters
)
assert recognition, "Path failed"
sentences_by_intent[recognition.intent.name].append(recognition.asdict())
return sentences_by_intent
@@ -416,7 +407,7 @@ def numbers_to_words(sentence: str, language: Optional[str] = None) -> str:
number = float(word)
# 75 -> seventy-five -> seventy five
words[i] = num2words(number, lang=language).replace("-", " ")
words[i] = re.sub(r"[-,]\s*", " ", num2words(number, lang=language))
changed = True
except ValueError:
pass # not a number
@@ -507,3 +498,64 @@ def get_all_intents(ini_paths: List[Path]) -> Dict[str, Any]:
_LOGGER.exception("Failed to parse %s", ini_paths)
return {}
# -----------------------------------------------------------------------------
class CliConverter:
"""Command-line converter for intent recognition"""
def __init__(self, name: str, command_path: Path):
self.name = name
self.command_path = command_path
def __call__(self, *args, converter_args=None):
"""Runs external program to convert JSON values"""
converter_args = converter_args or []
proc = subprocess.Popen(
[str(self.command_path)] + converter_args,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
universal_newlines=True,
)
with io.StringIO() as input_file:
for arg in args:
json.dump(arg, input_file)
stdout, _ = proc.communicate(input=input_file.getvalue())
return [json.loads(line) for line in stdout.splitlines() if line.strip()]
def load_converters(profile) -> Dict[str, Any]:
# Load user-defined converters
converters = {}
converters_dir = Path(
profile.read_path(profile.get("intent.fsticuffs.converters_dir", "converters"))
)
if converters_dir.is_dir():
_LOGGER.debug("Loading converters from %s", converters_dir)
for converter_path in converters_dir.glob("**/*"):
if not converter_path.is_file():
continue
# Retain directory structure in name
converter_name = str(
converter_path.relative_to(converters_dir).with_suffix("")
)
# Run converter as external program.
# Input arguments are encoded as JSON on individual lines.
# Output values should be encoded as JSON on individual lines.
converter = CliConverter(converter_name, converter_path)
# Key off name without file extension
converters[converter_name] = converter
_LOGGER.debug("Loaded converter %s from %s", converter_name, converter_path)
return converters
+10 -7
View File
@@ -227,18 +227,19 @@ class PocketsphinxWakeListener(RhasspyActor):
self.keyphrase = self.profile.get("wake.pocketsphinx.keyphrase", "")
assert self.keyphrase, "No wake keyphrase"
# Fix casing
dict_casing = self.profile.get("speech_to_text.dictionary_casing", "")
if dict_casing == "lower":
self.keyphrase = self.keyphrase.lower()
elif dict_casing == "upper":
self.keyphrase = self.keyphrase.upper()
# Verify that keyphrase words are in dictionary
keyphrase_words = re.split(r"\s+", self.keyphrase)
with open(dict_path, "r") as dict_file:
word_dict = read_dict(dict_file)
dict_upper = self.profile.get("speech_to_text.dictionary_upper", False)
for word in keyphrase_words:
if dict_upper:
word = word.upper()
else:
word = word.lower()
if word not in word_dict:
self._logger.warning("%s not in dictionary", word)
@@ -570,7 +571,9 @@ class PreciseWakeListener(RhasspyActor):
self.prediction_sem = threading.Semaphore()
for _ in range(num_chunks):
chunk = self.audio_buffer[: self.chunk_size]
self.stream.write(chunk)
if chunk:
self.stream.write(chunk)
self.audio_buffer = self.audio_buffer[self.chunk_size :]
if self.send_not_detected:
+47 -2
View File
@@ -3,7 +3,7 @@
<!-- Top Bar -->
<nav class="navbar navbar-expand-sm navbar-dark bg-dark fixed-top">
<a href="/">
<img class="navbar-brand" v-bind:class="spinnerClass" src="/img/logo.png">
<img id="logo" class="navbar-brand" v-bind:class="spinnerClass" src="/img/logo.png">
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
@@ -119,6 +119,9 @@
Rhasspy will not work correctly until these files are downloaded.
</p>
<tree-view :data="missingFiles" :options="{ rootObjectKey: 'missing'}"></tree-view>
<br>
<label for="downloadStatus">Status:</label>
<textarea id="downloadStatus" v-model="this.downloadStatus" style="width: 100%;" rows="3"></textarea>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-dismiss="modal">Cancel</button>
@@ -186,7 +189,11 @@
missingFiles: {},
version: ''
version: '',
downloadStatus: '',
wakeSocket: null
}
},
@@ -209,6 +216,9 @@
this.hasAlert = true
this.alertText = text
this.alertClass = 'alert-' + level
// Hide alert after 20 seconds
setTimeout(this.clearAlert, 20000)
},
beginAsync: function() {
@@ -334,6 +344,8 @@
downloadProfile: function() {
this.beginAsync()
this.downloading = true
this.downloadStatus = ''
setTimeout(this.updateDownloadStatus, 1000)
ProfileService.downloadProfile()
.then(() => {
alert("Download is complete. Rhasspy will now restart. Make sure to train before using your profile!")
@@ -344,6 +356,38 @@
this.downloading = false
this.endAsync()
})
},
updateDownloadStatus: function() {
ProfileService.downloadStatus()
.then((request) => {
this.downloadStatus = request.data
})
if (this.downloading) {
setTimeout(this.updateDownloadStatus, 1000)
}
},
connectWakeSocket: function() {
// Connect to /api/events/intent websocket
var wsProtocol = 'ws://'
if (window.location.protocol == 'https:') {
wsProtocol = 'wss://'
}
var wsURL = wsProtocol + window.location.host + '/api/events/wake'
this.wakeSocket = new WebSocket(wsURL)
this.wakeSocket.onmessage = (evt) => {
$('#logo').css('filter', 'invert()')
setTimeout(() => {
$('#logo').css('filter', 'initial')
}, 2000)
}
this.wakeSocket.onclose = () => {
// Try to reconnect
setTimeout(this.connectWakeSocket, 1000)
}
}
},
@@ -355,6 +399,7 @@
this.getCustomWords()
this.getUnknownWords()
this.getProblems()
this.connectWakeSocket()
this.$options.sockets.onmessage = function(event) {
this.rhasspyLog = event.data + '\n' + this.rhasspyLog
}
+1 -1
View File
@@ -12,7 +12,7 @@
<div class="col-auto">
<button type="submit" class="btn btn-success"
v-if="sentences"
:disabled="sentences[newKey] || newKey.length == 0">Add File</button>
:disabled="sentences[newKey] || newKey.length == 0">New File</button>
</div>
</div>
</div>
+35 -1
View File
@@ -20,6 +20,10 @@
title="Record a voice command while held, interpret when released"
:disabled="interpreting || (holdRecording && !tapRecording)">{{ tapRecording ? 'Tap to Stop' : 'Tap to Record' }}</button>
</div>
<div class="col-auto">
<button type="button" class="btn btn-success" @click="this.playLastVoiceCommand"
title="Play last voice command"><i class="fas fa-play"></i></button>
</div>
</div>
</div>
<div class="form-group">
@@ -132,7 +136,9 @@
audioContext: null,
recorder: null,
sendHass: true
sendHass: true,
intentSocket: null
}
},
@@ -267,7 +273,35 @@
event.preventDefault()
PronounceService.saySentence(this.sentence)
.catch(err => this.$parent.error(err))
},
playLastVoiceCommand: function(event) {
TranscribeService.playRecording()
.catch(err => this.$parent.error(err))
},
connectIntentSocket: function() {
// Connect to /api/events/intent websocket
var wsProtocol = 'ws://'
if (window.location.protocol == 'https:') {
wsProtocol = 'wss://'
}
var wsURL = wsProtocol + window.location.host + '/api/events/intent'
this.intentSocket = new WebSocket(wsURL)
this.intentSocket.onmessage = (evt) => {
this.jsonSource = JSON.parse(evt.data)
this.sentence = this.jsonSource.raw_text
}
this.intentSocket.onclose = () => {
// Try to reconnect
setTimeout(this.connectIntentSocket, 1000)
}
}
},
mounted: function() {
this.connectIntentSocket()
}
}
</script>
+11 -11
View File
@@ -108,7 +108,6 @@
},
data: function () {
return {
device: '',
speakers: {}
}
},
@@ -124,20 +123,21 @@
},
computed: {
devicePath: function() {
return 'sounds.' + this.profile.sounds.system + '.device'
device: {
get: function() {
if(this.profile.sounds[this.profile.sounds.system]) {
return this.profile.sounds[this.profile.sounds.system].device;
}
return "";
},
set: function(newValue) {
this.profile.sounds[this.profile.sounds.system].device = newValue;
}
}
},
mounted: function() {
this.getSpeakers()
this.device = this._.get(this.profile, this.devicePath, '')
},
watch: {
device: function() {
this._.set(this.profile, this.devicePath, this.device)
}
this.getSpeakers();
}
}
</script>
+11 -11
View File
@@ -173,7 +173,6 @@
},
data: function () {
return {
device: '',
microphones: {},
testing: false
}
@@ -217,20 +216,21 @@
},
computed: {
devicePath: function() {
return 'microphone.' + this.profile.microphone.system + '.device'
device: {
get: function() {
if(this.profile.microphone[this.profile.microphone.system]) {
return this.profile.microphone[this.profile.microphone.system].device;
}
return "";
},
set: function(newValue) {
this.profile.microphone[this.profile.microphone.system].device = newValue;
}
}
},
mounted: function() {
this.getMicrophones()
this.device = this._.get(this.profile, this.devicePath, '')
},
watch: {
device: function() {
this._.set(this.profile, this.devicePath, this.device)
}
this.getMicrophones();
}
}
</script>
+1 -1
View File
@@ -137,7 +137,7 @@
<div class="form-row">
<label for="remote-handle-url" class="col-form-label">Remote URL</label>
<div class="col">
<input id="remote-handle-url" type="text" class="form-control" v-model="profile.handle.remote.url" :disabled="profile.intent.system != 'remote'">
<input id="remote-handle-url" type="text" class="form-control" v-model="profile.handle.remote.url" :disabled="profile.handle.system != 'remote'">
</div>
</div>
</div>
+4
View File
@@ -70,5 +70,9 @@ export default {
return Api().post('/api/download-profile', '',
{ 'params': params })
},
downloadStatus() {
return Api().get('/api/download-status')
}
}
+4
View File
@@ -37,6 +37,10 @@ export default {
{ params: params })
},
playRecording() {
return Api().post('/api/play-recording', '')
},
wakeup() {
return Api().post('/api/listen-for-command')
}