Upskilling Test Engineers for Chatbot Projects

4 min readMar 26, 2021

If you have a hammer, every problem looks like a nail.

With Botium, we are currently defining the industry standard for testing chatbots. In our support and developer channels we are regularily receiving questions like:

I have to test a Whatsapp chatbot, can you help me to set up Appium for it ?
For our client I have to test a chatbot embedded in their app, can I test it with Botium ?
I have troubles with testing the customer support chatbot on our website, Selenium says <some random Selenium error code>
… and so on

The conclusion to draw from these questions is: the test engineers learned how to test websites with Selenium and smartphone apps with Appium in the past, and now they try to apply this valueable knowledge again — neglecting the fact that chatbots are a new kind of apps that require new kind of tools (like Botium).

You can read about the most important differences in one of my previous blog posts.

With Selenium and Appium, we are talking about End-2-End testing (E2E) — simulating the full user experience on a graphical user interface. Those tests

are extremely slow in execution, as they are basically running in realtime, and even for a medium-size chatbot project there typically is a 5-figure number of test cases for having a satisfying test coverage — running those tests in and E2E scenario will take hours in best case
require a high amount of computing resources or access to expensive browser/device cloud services
are flaky as the required infrastructure is error-prone as well
cannot provide a holistic view of the test object quality, as some important assertions as the pure NLP performance are technically not possible at all with E2E testing.

So here are my recommendations for test engineers how to get going when asked for testing a chatbot.

API First

The most important metric for a chatbot is: is it able to do a meaningful conversation with a client ? In every chatbot project team there are conversation designers which, well, design the conversations that will make up the final user experience. The chatbot engine is trained (or coded) to provide the logic for these conversations.

Conversation flow as visualized by Botium

And this is the place to start testing: make sure that the conversations are working as designed, from a content perspective. You can read more about conversation flow testing in the Botium docs.

One important skill to have is knowing BotiumScript, the scripting language to define conversation flow test cases.

Testing the NLP engine

Most chatbots have some kind of natural language processing (NLP) component as part of the processing pipeline — it enabled users to communicate with the chatbot in natural language, and that’s what actually makes up a chatbot. As a test engineer it is your job to explore the limits of the NLP engine, and this requires basic skills in machine learning concepts, such as

intents, entities and prediction confidence
accuracy, sensitivity, specificity, precision, recall, F1-score
confusion matrix

You can read about it in my blog series Quality Metrics for NLU/Chatbot Training Data:

Quality Metrics for NLU/Chatbot Training Data, Part 1: Confusion Matrix

What is a Confusion Matrix ? How to generate and read a Confusion Matrix ? How to calculate precision, recall and…

medium.com

E2E Smoketest

Testing the end-user experience on user interface level is an important part of a testing strategy. When doing it right you now have the confidence the conversation flow and the NLP component are doing their work, so it is now time to add some user interface testing to the mix. The recommendation is to

do a small number of test cases, which cover all of the possible user interaction elements
do those tests on a mix of representative browser versions / operating systems / smartphone devices, both virtual and physical

The good news is that here test engineers can shine with the existing knowledge on Selenium and Appium!

Read on in the Botium Wiki how to setup this with Botium!

Non-Functional Testing

Finally, there are also non-functional tests like performance tests and security tests to add to the test mix. Opposed to the other test types those are typically done on certain milestones in the project.

Security Threats and Security Testing for Chatbots

This article is pointing out security threats and attack vectors of typical chatbot architectures — based on OWASP Top…

chatbotslife.com

Summary

A new generation of apps such as chatbots require a new generation of testing tools, like Botium. Test engineers have to develop additional skills for testing conversational interfaces like chatbots.

Get your free Botium Box Mini instance here

Conversation Flow Testing — Identify flaws in the conversation flow before going to production
NLP Testing — Improve your chatbot understanding
E2E Testing — Verifying the end-user experience
Voice Testing — Understand your users on voice channels
Performance Testing — Ensure your chatbot is responsive under high load
Security Testing — Making your chatbot secure
Monitoring — Get notified when problems arise