Building Richer, Real-World Data Sets to Push Conversational Research Forward

  • by
Building Richer, Real-World Data Sets to Push Conversational Research Forward

Building Richer, Real-World Data Sets to Push Conversational Research Forward – One of the big challenges of artificial intelligence is the ability to connect open domain dialog between machines and humans. We’re interested in being able to talk to machines in the way that humans are able to talk to each other, as communication is one of the hallmarks of intelligence.

Humans are better than machines at being able to be consistent within their statements in a conversation, and humans tend to give more specific answers. They’re better at responding to their conversation partner’s feelings and using knowledge about the world around them and their environment. All of these things in total make machines less interesting to talk to than humans.

Dialog agents are typically trained on fully supervised conversational data, and this data can differ significantly in distribution from the environment in which a chat bot might be deployed. In Beat the Bot, users are paired with both a bot and a human, and they Messenger chat and are asked to have a short, fun conversation. Each human will receive two responses to every message they send, one from a bot and one from a human, and they’re asked to choose which message is better.

Since we’ll be open sourcing the data set, we’ve taken a lot of measures to protect user privacy. For one, the game is entirely opt-in, and we want to make it really clear to users how we’re using the messages in the game. We can use the data that we collect in this game to help train our dialog models.

Some research has shown that gamifying data collection has helped improve the quality because users are more engaged. And this game allows for supervision in a couple senses. One, we have the human-human dialog turns, which we can use to directly train our models on. But we also have a human’s assessment of when the bot fails to match human performance. So, we can also use this data to help improve the bots.

We hope to collect data that’s super high-signal and that helps push the entire dialog research community forward.

 

Useful links:

reference – Building Richer, Real-World Data Sets to Push Conversational Research Forward

Share this post ...

Leave a Reply

Your email address will not be published. Required fields are marked *