{Answer Validation is an emerging topic in Question Answering, where open domain systems are often required to rank huge amounts of candidate answers. We present a novel approach to answer validation based on the intuition that the amount of implicit knowledge which connects an answer to a question can be estimated by exploiting the redundancy of Web information. Two techniques are considered in this paper: a statistical approach, which uses the Web to obtain a large amount of pages, and a content-based approach, which analyses text snippets retrieved by the search engine. Both the approaches do not require to download the documents. Experiments carried out on the TREC-2001 judged-answer collection show that a combination of the two approaches achieves a high level of performance (i.e. about 88% success rate). The simplicity and the efficiency of these Web-based techniques make them suitable to be used as a module in Question Answering systems.
Comparing Statistical and Content-Based Techniques for Answer Validation on the Web / B., Magnini; M., Negri; Prevete, Roberto; H., Tanev. - STAMPA. - (2002), pp. 0-0. (Intervento presentato al convegno Apprendimento Automatico e Data mining, AI*IA 2002 tenutosi a Siena, Italy nel 11 Settembre 2002).
Comparing Statistical and Content-Based Techniques for Answer Validation on the Web
PREVETE, ROBERTO;
2002
Abstract
{Answer Validation is an emerging topic in Question Answering, where open domain systems are often required to rank huge amounts of candidate answers. We present a novel approach to answer validation based on the intuition that the amount of implicit knowledge which connects an answer to a question can be estimated by exploiting the redundancy of Web information. Two techniques are considered in this paper: a statistical approach, which uses the Web to obtain a large amount of pages, and a content-based approach, which analyses text snippets retrieved by the search engine. Both the approaches do not require to download the documents. Experiments carried out on the TREC-2001 judged-answer collection show that a combination of the two approaches achieves a high level of performance (i.e. about 88% success rate). The simplicity and the efficiency of these Web-based techniques make them suitable to be used as a module in Question Answering systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.