The QA task to be performed by the participating groups of INEX 2011 is contextualizing tweets, i.e. answering questions of the form "what is this tweet about?" using a recent cleaned dump of the Wikipedia. The general process involves:
We regard as relevant passages segments that both
For evaluation purposes, we require that the answer uses ONLY elements or passages previously extracted from the document collection. The correctness of answers is established by participants exclusively based on the support passages and documents.
Participants are required to submit at least one completely automatic run. However, manual runs are strongly encouraged. Are considered as manual, runs that require a human intervention at any level of the process. These interventions should be clearly stated and documented.
The underlying scenario is when receiving a tweet with an url on a small terminal like a phone, provide the user with synthetic contextual information grasped from a local XML dump of the wikipedia. The answer needs to be built by aggregation of relevant XML elements or passages.
The aggregated answers will be evaluated according to the way they overlap with relevant passages (number of them, vocabulary and bi-grams included or missing) and the "last point of interest" marked by evaluators. By combining these measures, we expect to take into account both the informative content and the readability of the aggregated answers.
Automatic summarization systems by extraction are strongly encouraged to participate.
Each assessor will have to evaluate a pool of answers of a maximum of 500 words each. These answers will be an agglomeration of wikipedia passages.
Evaluators will have to mark:
Systems will be ranked according to the:
The document collection has been rebuilt based on a recent dump of the English wikipedia from April 2011 (we left a copy of this dump here). Since we target a plain xml corpus for an easy extraction of plain text answers, we removed all notes and bibliographic references that are difficult to handle and kept only the 3,217,015 non empty wikipedia pages (pages having at least on section).
Resulting documents are made of a title (title), an abstract (a) and sections (s). Each section has a sub-title (h). Abstract end sections are made of paragraphs (p) and each paragraph can have entities (t) that refer to wikipedia pages. Therefore the resulting corpus has this simple DTD:
A complementary list of non wikipedia entities extracted from pages using LIMSI tools is also available here.
A baseline XML-element retrieval system powered by Indri is available online with a standard CGI interface. The index covers all words (no stop list, no stemming) and all XML tags. Participants that do not wish to build their own index could use this one by downloading it or by using it online (More information here or contact email@example.com).
You can also query this baseline system in batch mode using this perl program. It uses input files as this one. See its synopsis for more details.
The selected 132 topics for 2011 are available here. Each topic includes the title and the first sentence of a New York Times paper that were twitted at least two months after the wikipedia dump we use. For each topic we manually checked that there is related information in the document collection. We can provide the content of the papers to participants on an individual basis but the objective of the task remains to contextualize only the twitted information.
2009 - 2010 topics also available here and anonymized best runs for 2010 participants are also available here. These runs can be used to smooth new systems.
Here:<qid> Q0 <file> <rank> <rsv> <run_id> <column_7> <column_8> <column_9>
raw text is given without XML tags and without formatting characters (avoid "\n","\r","\l"). The resulting word sequence has to appear in the file indicated in the third field. This is an example of such output:
1 Q0 3005204 1 0.9999 I10UniXRun1 The Alfred Noble Prize is an award presented by the combined engineering societies of the United States, given each year to a person not over thirty-five for a paper published in one of the journals of the participating societies. 1 Q0 3005204 2 0.9998 I10UniXRun1 The prize was established in 1929 in honor of Alfred Noble, Past President of the American Society of Civil Engineers. 1 Q0 3005204 3 0.9997 I10UniXRun1 It has no connection to the Nobel Prize , although the two are often confused due to their similar spellings.
In this format passages are given as offset and length calculated in characters with respect to the textual content (ignoring all tags) of the XML file. File offsets start counting a 0 (zero). Previous example would be the following in FOL format:
1 Q0 3005204 1 0.9999 I10UniXRun1 256 230 1 Q0 3005204 2 0.9998 I10UniXRun1 488 118 1 Q0 3005204 3 0.9997 I10UniXRun1 609 109
The results are from article 3005204. The first passage starts at the 256th character (so 257 characters beyond the first character), and has a length of 239 characters.
|18/Oct/2011||Release of final set of questions available here.|
|Submission deadline for Results|
|Release of QA semi-automatic evaluation results by organizers|
|10/Dec/2011||Release of manual evaluation by participants|
LSIS - Aix-Marseille University
IRIT, University of Toulouse
LIMSI-CNRS, University Paris-Sud 11
LIA, University of Avignon
LIMSI-CNRS, University Paris-Sud 11