Giving a Voice to Rasa with Botium Speech

Florian Treml
Nerd For Tech
Published in
4 min readFeb 2, 2021

--

Voice platforms like Alexa and Google Assistant make it easy to build your own voice experience, even without going deeper in audio processing — everything is part of the platform. But what if you want to rather go for a solution hosted by yourself, running an assistant on your own website, in your own infrastructure ?

The goal of this article is to show a way how you can build your own voice platform using Open Source tools Rasa and Botium Speech Processing.

An awesome isometric visualization

Rasa is a developer-friendly and extensible chatbot building tool for self-hosting. Botium Speech Processing is a unified, developer-friendly API to the best available free and Open-Source Speech-To-Text and Text-To-Speech services. Let’s combine this, but first let’s quickly have a look on the architecture.

Architecture

  1. User speaks into a microphone
  2. A Speech-To-Text service translates into text (Botium Speech Processing)
  3. An NLU engine extracts information out of the text (Rasa)
  4. A dialogue engine builds text response (Rasa)
  5. A Text-To-Speech service translates into spoken text (Botium Speech Processing)
  6. User listens to the audio file
Picture is from this Rasa blog post

Installation Steps

So let’s come to the fun part.

Prerequisites

Here is what you need to have available on your workstation:

  • Git client
  • Docker and Docker-Compose

Launch Botium Speech Processing Service

Botium Speech Processing comes with a reasonable default configuration.

Both of them are free and Open Source and a good match to get started with voice technologies, on the other hand they are without a doubt among the best free voice tools available.

Launching it can be done with a few command line calls.

$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing
$ docker-compose up -d

Depending on network speed and hardware this step can take a while.

Pointing your browser to http://localhost will show the API explorer for Botium Speech Processing.

Setup Rasa

We will use Sara, the Rasa Demo Bot, as an example.

You can find first-hand information from the Github repository

I prefer to use Docker instead of installing everything locally. So you can use these command line calls to download the Rasa demo bot and run a first training:

$ git clone https://github.com/RasaHQ/rasa-demo.git
$ cd rasa-demo
$ docker run --rm -v .:/app rasa/rasa:latest-full train --domain domain.yml --data data/core data/nlu --out models/dialogue --augmentation 0

Depending on network speed and hardware this step can take a while.

Place this docker-compose.yml file into the Rasa folder:

version: '3.0'
services:
rasa:
image: rasa/rasa:latest-full
ports:
- 5005:5005
volumes:
- ./:/app
environment:
RASA_DUCKLING_HTTP_URL: http://rasa-duckling:8000
command: run --model models/dialogue --endpoints endpoints.yml
rasa-actions:
build:
context: .
ports:
- 5055:5055
rasa-duckling:
image: rasa/duckling
ports:
- 8000:8000

In the file endpoints.yml change the actions endpoint url from http://localhost:5055/webhook to http://rasa-actions:5055/webhook. Now launch the Rasa service:

$ docker-compose up -d

The Rasa service is now waiting for connections.

Add Voice Capabilities to Rasa

This Github repository includes a custom connector based on the Rasa builtin Socket.io-connector which adds Speech-To-Text and Text-To-Speech capabilities to Rasa.

First, clone the repository and copy the connectors folder to the Rasa folder:

$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing
$ cp -R connectors <rasa-dir>

In the file connectors/rasa/credentials.yml, there is a sample configuration for the Rasa custom connector.

You can either use this file directly or copy the configuration of the botium.SocketIOVoiceInput connector to your existing Rasa credentials.yml

Change the file to point to your local workstation for speech processing (it also starts a REST connector for convenience and other tests):

botium.SocketIOVoiceInput:
socketio_path: /socket.io
user_message_evt: user_uttered
bot_message_evt: bot_uttered
session_persistence: false
botium_speech_url: http://localhost
botium_speech_apikey:
botium_speech_language: en
botium_speech_voice: dfki-poppy-hsmm
rest:

Then, change the docker-compose.yml file for Rasa to use this connector.

version: '3.0'
services:
rasa:
image: rasa/rasa:latest-full
ports:
- 5005:5005
volumes:
- ./:/app
environment:
PYTHONPATH: "/app/connectors/rasa:/app"
RASA_DUCKLING_HTTP_URL: http://rasa-duckling:8000
command: run --cors "*" --credentials /app/connectors/rasa/credentials.yml --enable-api --model models/dialogue --endpoints endpoints.yml
rasa-actions:
build:
context: .
ports:
- 5055:5055
rasa-duckling:
image: rasa/duckling
ports:
- 8000:8000

Restart Rasa to make the changes to your Docker containers.

$ docker-compose up -d

Testing

There is a simple test client based on the Rasa Voice Interface available in the Botium Speech Processing project.

In the connectors/rasa/client directory, change the Rasa endpoint in the docker-compose.yml file:

version: '3'
services:
frontend:
build:
context: .
args:
RASA_ENDPOINT: http://localhost:5005
RASA_PATH: /socket.io
PUBLIC_PATH: /
image: botium/botium-speech-rasa-voice
restart: always
ports:
- 4700:8080

Then launch the website with “docker-compose up -d” and access the web interface at http://localhost:4700 to give a chat to your Rasa chatbot.

The voice interface in action

Now it is time to run on your microphone and speakers and have a chat with Rasa!

See this article in spanish here! 🇪🇸

--

--

Florian Treml
Nerd For Tech

Co-Founder and CTO Botium🤓 — Guitarist 🎸 — 3xFather 🐣