Black-Box Testing Chatterbots

Black-Box Testing

When using a software application, it is rare for the user to read—or even have access to—the source code. The only real observable components of the system are what goes in (i.e., input, such as clicking, mouse movement, data entry) and what comes out (i.e., output, such as video, printout, gameplay, or text on the screen). The software itself is no different than a featureless “black box.” The user may have no idea how the software works, but she definitely can discern what it does.

So, given a black box, how might one check to see if it works properly? The user typically knows not only what it does, but what she expects it to do—what it should do or what is acceptable for it to do. Feeding data to a system and comparing the output to expectations is called black-box testing.

Black-box testing may seem like a compromise when testing a lot of software, because the source code is visible. Consider the millions of Scratch programs on the Scratch website. Why go through the trouble of black-box testing when you can just examine the code blocks?

Well, errors are often caught when a system is used in a way not originally intended, or when specific boundary cases occur. In order to find unknown errors, it is often useful to abstract away the inner functionality of a system (i.e., how it works) and concentrate on its input and output (i.e., what it does). Comparing expected and actual results can spotlight which component of the system is at fault.


In this jigsaw activity, you will experiment with different chatterbots in order to better understand their behavior patterns.

Specifically, you will:

  1. Work with team members from other groups to perform a black-box test on one of the chatterbots below. To black-box test the chatterbot, ask it a list of questions, and record and compare its responses to possible reasonable answers. Using these responses, form a theory of how the chatterbot works.
  2. Share what you learned about that particular chatterbot with your project group.
  3. Outline a general method for discovering patterns in a chatterbot’s automated reasoning.

Jigsaw instructions

Your teacher will assign each member in your group a number. You will sit with students from other groups assigned the same number (e.g., students assigned #1 sit together). Each numbered group will analyze the chatterbot that corresponds with its number:

  1. Eliza
  2. Cleverbot
  3. Program-O

Part I. In your numbered groups:

  • Black-box test your assigned chatterbot in order to figure out how it functions (10–15 min.).
  • Create a list of questions to ask the chatterbot.
  • Ask your assigned chatterbot the questions and record its responses.
  • Identify anomalies (“mistakes”) that the chatterbot produces. Note: An anomaly occurs when the bot generates a response that is unexpected.
  • Detail and test a hypothesis for the reasoning behind each problematic response. Revise and retest your hypothesis as needed.
  • Prepare to share your theory, analysis, and evidence with your project group.

For example:


I hypothesize that Dr. Romulon is not easily able to parse the negation when phrased as do YOU not. It seems to pick up the pattern of don’t—which is a single word. Perhaps it would understand do not if it were collocated (i.e., occur in the same location) in place of don’t, even though that is awkward sounding to a human speaker.


Human: What do you do?

’Bot: I talk to people on the web. What do you do?

Human: What do you not do?

’Bot: I like to meet new people online.

Human: What don’t you do?

’Bot: There are many things still mysterious to me. I am just beginning.

Human: What do not you do?

’Bot: There are many things still mysterious to me. I am just beginning.


The hypothesis is borne out by this pattern. Of course, this is just another anecdote, and it may be that other cases of “do YOU not” are hand-coded to work properly.

Part II. In your project groups:

Reconvene in your project groups.

  1. Share what you learned about each chatterbot with your team members.
  2. Outline a method for discovering patterns in any chatterbot’s automated reasoning.
    • Include specific steps.
    • Test your method. Does the method generate patterns in all of the chatterbots’ responses? If not, which ones and why?
    • Revise your methods of how the chatterbots function to accommodate any new discoveries, if necessary.
    • Be prepared to discuss your work with the class.

Part III. As a class:

Discuss your findings. Address the following questions:

  1. “What were some question/answer pairs that gave you clues about how the chatterbots worked?”
  2. “Which chatterbot was the most effective? Why?”
  3. “How might you expose the chatterbot as non-human? Would this strategy always work?”
  4. “What could you do differently in creating a chatterbot, so that it might seem more human?”

Unspecified Input

Common Misconception

Bots are preprogrammed with specific question/response pairs.

This is false. An easy way to test this is to substitute out one word for another and compare responses. To illustrate, consider the following trivial example:

"What is your name?”


"Hello, Dave. I’m HAL.”
"What is your name?”


"Hello, Frank. I’m HAL.”

A hard-coded question/answer response system would require a separate routine to handle each possible name—or even each possible wording/phrasing of a question (e.g., “What is your name?” vs. “What’s your name?”).

How might programs allow developers to utilize unspecified input? Think programmatically (e.g., algorithmically, with Scratch blocks or Processing scripts in mind, etc.).