The bAbI Dataset

The bAbI (pronounced “baby”) dataset is a collection of tasks intended for use by researchers who work with natural language processing, in particular “QA” which means question-and-answer (not “quality assurance”).

Here’s an example of bAbI called a “single supporting fact” task:

1 Mary moved to the bathroom.
2 John went to the hallway.
3 Where is Mary? 	bathroom	1
4 Daniel went back to the hallway.
5 Sandra moved to the garden.
6 Where is Daniel? 	hallway	4
7 John moved to the office.
8 Sandra journeyed to the bathroom.
9 Where is Daniel? 	hallway	4
10 Mary moved to the hallway.
11 Daniel travelled to the office.
12 Where is Daniel? 	office	11
13 John went back to the garden.
14 John moved to the bedroom.
15 Where is Sandra? 	bathroom	8

The number after the answer to a question is the number of the statement that’s needed to answer the question.

Here’s an example of a “counting” task:

1 Mary moved to the bathroom.
2 Sandra journeyed to the bedroom.
3 John went to the kitchen.
4 Mary took the football there.
5 How many objects is Mary carrying? 	one	4
6 Sandra went back to the office.
7 Daniel went back to the office.
8 How many objects is Mary carrying? 	one	4
9 John moved to the bedroom.
10 Sandra moved to the garden.
11 How many objects is Mary carrying? 	one	4
12 Mary travelled to the garden.
13 Mary went to the hallway.
14 Sandra journeyed to the bedroom.
15 Mary dropped the football.
16 How many objects is Mary carrying? 	none	4 15
17 Mary got the football there.
18 Daniel travelled to the garden.
19 How many objects is Mary carrying? 	one	4 15 17

So the idea is to create models that can answer the questions and give an explanation.

As the name “bAbI” suggests, these are intended to be simple, not entirely realistic problems. The idea is that if researchers use a common set of tasks like bAbI, they’ll be able to compare results more easily.

By the way, I sent an email message to the authors of bAbI, asking them about the origin of the name “bAbI”. Antoine Bordes replied quickly and courteously — “bAbI” is not an acronym.

You can read more about bAbI at https://research.fb.com/downloads/babi/.



Baby, “Sucker Punch” (2011). Baby, “Baby Driver” (2017). Baby, “Dirty Dancing” (1987). Of these three movies, I liked “Baby Driver” the best.

Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s