Giving a Voice to Rasa with Botium Speech

Published in

Nerd For Tech

4 min readFeb 2, 2021

Voice platforms like Alexa and Google Assistant make it easy to build your own voice experience, even without going deeper in audio processing — everything is part of the platform. But what if you want to rather go for a solution hosted by yourself, running an assistant on your own website, in your own infrastructure ?

The goal of this article is to show a way how you can build your own voice platform using Open Source tools Rasa and Botium Speech Processing.

Rasa is a developer-friendly and extensible chatbot building tool for self-hosting. Botium Speech Processing is a unified, developer-friendly API to the best available free and Open-Source Speech-To-Text and Text-To-Speech services. Let’s combine this, but first let’s quickly have a look on the architecture.

Architecture

User speaks into a microphone
A Speech-To-Text service translates into text (Botium Speech Processing)
An NLU engine extracts information out of the text (Rasa)
A dialogue engine builds text response (Rasa)
A Text-To-Speech service translates into spoken text (Botium Speech Processing)
User listens to the audio file

Installation Steps

So let’s come to the fun part.

Prerequisites

Here is what you need to have available on your workstation:

Git client
Docker and Docker-Compose

Launch Botium Speech Processing Service

Botium Speech Processing comes with a reasonable default configuration.

MaryTTS for Text-To-Speech
Kaldi for Speech-To-Text

Both of them are free and Open Source and a good match to get started with voice technologies, on the other hand they are without a doubt among the best free voice tools available.

Launching it can be done with a few command line calls.

$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing
$ docker-compose up -d

Depending on network speed and hardware this step can take a while.

Pointing your browser to http://localhost will show the API explorer for Botium Speech Processing.

Setup Rasa

We will use Sara, the Rasa Demo Bot, as an example.

You can find first-hand information from the Github repository

I prefer to use Docker instead of installing everything locally. So you can use these command line calls to download the Rasa demo bot and run a first training:

$ git clone https://github.com/RasaHQ/rasa-demo.git
$ cd rasa-demo
$ docker run --rm -v .:/app rasa/rasa:latest-full train --domain domain.yml --data data/core data/nlu --out models/dialogue --augmentation 0

Depending on network speed and hardware this step can take a while.

Place this docker-compose.yml file into the Rasa folder:

version: '3.0'
services:
  rasa:
    image: rasa/rasa:latest-full
    ports:
      - 5005:5005
    volumes:
      - ./:/app
    environment:
      RASA_DUCKLING_HTTP_URL: http://rasa-duckling:8000
    command: run --model models/dialogue --endpoints endpoints.yml
  rasa-actions:
    build:
      context: .
    ports:
      - 5055:5055
  rasa-duckling:
    image: rasa/duckling
    ports:
      - 8000:8000

In the file endpoints.yml change the actions endpoint url from http://localhost:5055/webhook to http://rasa-actions:5055/webhook. Now launch the Rasa service:

$ docker-compose up -d

The Rasa service is now waiting for connections.

Add Voice Capabilities to Rasa

This Github repository includes a custom connector based on the Rasa builtin Socket.io-connector which adds Speech-To-Text and Text-To-Speech capabilities to Rasa.

First, clone the repository and copy the connectors folder to the Rasa folder:

$ git clone https://github.com/codeforequity-at/botium-speech-processing.git
$ cd botium-speech-processing
$ cp -R connectors <rasa-dir>

In the file connectors/rasa/credentials.yml, there is a sample configuration for the Rasa custom connector.

You can either use this file directly or copy the configuration of the botium.SocketIOVoiceInput connector to your existing Rasa credentials.yml

Change the file to point to your local workstation for speech processing (it also starts a REST connector for convenience and other tests):

botium.SocketIOVoiceInput:
  socketio_path: /socket.io
  user_message_evt: user_uttered
  bot_message_evt: bot_uttered
  session_persistence: false
  botium_speech_url: http://localhost
  botium_speech_apikey:
  botium_speech_language: en
  botium_speech_voice: dfki-poppy-hsmmrest:

Then, change the docker-compose.yml file for Rasa to use this connector.

version: '3.0'
services:
  rasa:
    image: rasa/rasa:latest-full
    ports:
      - 5005:5005
    volumes:
      - ./:/app
    environment:
      PYTHONPATH: "/app/connectors/rasa:/app"
      RASA_DUCKLING_HTTP_URL: http://rasa-duckling:8000
    command: run --cors "*" --credentials /app/connectors/rasa/credentials.yml --enable-api --model models/dialogue --endpoints endpoints.yml
  rasa-actions:
    build:
      context: .
    ports:
      - 5055:5055
  rasa-duckling:
    image: rasa/duckling
    ports:
      - 8000:8000

Restart Rasa to make the changes to your Docker containers.

$ docker-compose up -d

Testing

There is a simple test client based on the Rasa Voice Interface available in the Botium Speech Processing project.

In the connectors/rasa/client directory, change the Rasa endpoint in the docker-compose.yml file:

version: '3'
services:
  frontend:
    build:
      context: .
      args:
        RASA_ENDPOINT: http://localhost:5005
        RASA_PATH: /socket.io
        PUBLIC_PATH: /
    image: botium/botium-speech-rasa-voice
    restart: always
    ports:
      - 4700:8080

Then launch the website with “docker-compose up -d” and access the web interface at http://localhost:4700 to give a chat to your Rasa chatbot.

Now it is time to run on your microphone and speakers and have a chat with Rasa!

See this article in spanish here! 🇪🇸