I was sitting in on a meeting of some of my colleagues recently. They were discussing how you might go about creating an automated answering system that is based on a document of some sort. For example, the base document might be the rule book for a complicated game like American football, or perhaps a Wikipedia page. Users could query the base document in natural language, for example, “What is a 1-point safety?”
My first thought was that the creation of such a system absolutely has to be 100% automated — any system that has a manual component is obviously doomed.
So, later in the day I gave the problem a little bit of thought and came up with a possible architecture for an automated, document-based answering system. First you’d programmatically analyze the base document. For each sentence, you’d generate possible questions. For example, if one base document sentence reads “the penalty for defensive holding is loss of five yards and automatic first down” then one question might be “what is the penalty for defensive holding?”
After the document has been analyzed and the questions created, the system would work by accepting a user’s question then doing a sentence-similarity search for the auto-generated questions. The closest question maps to an answer, which is returned to the user. For example, a user might ask “Is the penalty for holding five yards?” which would map to the system question “what is the penalty for defensive holding?” which maps to the answer “the penalty for defensive holding is loss of five yards and automatic first down”.
This would be a difficult system to create. But it seems possible.