INEX 2011 Relevance Feedback Track


Results available.


Unlike traditional IR evaluation tracks, the FRF track submission will be a computer program implementation. Participating organisations will create and submit their relevance feedback and ranking algorithm in the form of a dynamically linkable module which shall be used to evaluate the effectiveness of the implemented algorithm against other submissions and against traditional baseline ranking algorithms (such as BM25 or Rocchio). . It is our hope that this evaluation track will provide definitive answers, through comparable and reproducible experiments, about the merits of various RF approaches.

Use Case

The use-case of this track is a single user searching with a particular query in an information retrieval system that supports relevance feedback. The user highlights relevant passages of text in returned documents (if exist) and provides this feedback to the information retrieval system. The IR system re-ranks the remainder of the unseen results list to provide more relevant results to the user. The exact manner in which this is implemented is not of concern in this evaluation; here we test the ability of the system to use focused relevance feedback to improve the ranking of previously unseen results. We also seek to compare the improvement, if any, which focused relevance feedback (FRF) offers over whole document feedback (RF).

Test Collection

The relevance feedback track will use the INEX Wikipedia collection. Evaluation will be based on the focused relevance assessments, which are gathered by the INEX Ad-Hoc track through the GPXrai assessment tool. There is no additional manual assessment with the FRF track and it will re-use the Ad-Hoc qrels. The INEX Wikipedia test collection is semantically marked up. This will facilitate the evaluation of FRF algorithms implementations, which take advantage not only of the (often) passage-sized feedback, but also the semantic mark-up of the relevant text.



Evaluation Platform Java software

Useful Stuff

Focused Relevance Feedback presentation by Timothy Chappell


Participating organisations will create one or more Relevance Feedback Modules (RFMs) intended to rank a collection of documents with a query while incrementally responding to explicit user feedback on the relevance of the results presented to the user. These RFMs will be implemented as dynamically linkable modules (at the moment, Java JAR files and Windows DLLs) that will implement a standard defined interface. The Evaluation Platform (EP) will interact with the RFMs directly, simulating a user search session. The EP will instantiate an RFM object and provide it with a set of XML documents and a query. The RFM will respond by ranking the documents (without feedback) and returning the ranking to the EP. This is so that the difference in quality between the rankings before and after feedback can be compared to determine the extent of the effect the relevance feedback has on the results. The EP will then request the next most relevant document in the collection (that has not yet been presented to the user). On subsequent calls the EP will pass relevance feedback (in the form of passage offsets and lengths) about the last document presented by the RFM. This feedback is taken from the qrels of the respective topic, as provided by the Ad-Hoc track assessors. The simulated user feedback may then be used by the RFM to re-rank the remaining unseen documents and return the next most relvant document. There is no need for a real user in the loop since the document set and the corresponding feedback are taken directly from the relevance assessments generated during the evaluation of the topic pools in the INEX Ad-Hoc track.

The EP will continue to make repeated calls to the RFM until all relevant documents in the collection have been returned.

The EP will retain the presentation order of documents as generated by the RFM. This order will then be evaluated as a submission to the ad-hoc track in the usual manner and with the standard document-based retrieval metrics. It is expected that an effective dynamic relevance feedback method will produce a higher score than a static ranking method (i.e. the initial baseline rank ordering). Evaluation will be performed over all topics and systems will be ranked by the averaged performance over the entire set of topics, using standard INEX and TREC metrics.

In order to compare focused relevance feedback with whole document feedback, the experiment will be executed twice with each submission. In the first, the feedback will be focused, and in the second, the feedback will be an entire document; i.e. the returned relevant text (in a relevant document) will be the entire document.

Each topic consists of a set of documents (the topic pool) and a complete and exhaustive set of manual assessments against a query. Hence, we effectively have a "classical" Cranfield experiment over each topic pool as a small collection with complete assessments for a single query. The small collection size allows participants without an efficient implementation of a search engine to handle the task without the complexities of scale that the full collection presents.


The submission to the Focused Relevance Feedback track is the implementation of the RFInterface class. The class should be compiled and packaged into a .jar file. The Java class named RelevanceFeedback is part of the rf package which implements the following interface:
package rf;
public interface RFInterface {
	public Integer[] first(String[] documentList, String query);
	public Integer next();
	public String getFOL();
	public String getXPath();
	public void relevant(Integer offset, Integer length, String Xpath,
	String relevantText);
public class RelevanceFeedback implements RFInterface {
	// TODO: declare class variables and implement methods


The EP will instantiate a RelevanceFeeback object. The method first(.) will then be invoked by EP with the set of documents as an array of strings in documentList, each string holding the entire content of a document in XML format. The query parameter will contain the text of the query that is the subject of the search. The method will return an integer array which contains the initial ranking of the document set by the RFM's ranking algorithm. Each integer corresponds to the index of a documents given in documentList, starting from 0. For example, if a set of ten documents were passed to the RFM in the first() call, the RFM may return [3 9 0 4 8 2 6 1 7 5].

The next() method will be called for the first and for all subsequent documents in the collection. This method returns an integer that is an index into the original document array passed to the RFM. For example, if we assume the RFM from the previous example does not make use of relevance feedback, it would return 3 on the first invocation of next(), then 9, then 0 and so on. After the first invocation of next() which will always return the first document., subsequent invocations may be preceded by zero or more calls to the relevant() method described below.

The relevant(.) method will be invoked zero or more times to pass back the relevant passages to the RelevanceFeeback object, in the form of an offset and length pair, in the last returned document. Relevant passages will be non-overlapping. The offset/length pair is based on the document text content alone, ignoring the XML mark-up (this is a constraint imposed by the form of the Ad-Hoc track qrels). Additionally, the XPath pointing to the selected element and the selected text itself (not stripped of XML) will be provided in the XPath and relevantText arguments. These can be used or ignored as desired.

This relevance information will be referring to the document returned by the most recent invocation of next().

Two optional methods exist for providing more information about the most recent document returned by next(). They are getFOL() and getXPath(). getFOL(), when invoked, will return a Field Offset:Length string, in the form of "offset:length" (e.g. "100:25"), to indicate a segment of text that the RFM feels is relevant to the user, allowing the RFM to perform focused retrieval. Multiple calls of getFOL() before the next call to next() may return additional segments of text. When there are no more segments of text that are relevant, the method should return null. If the RFM chooses not to implement focused feedback, getFOL() should return null on every call. Similarly, getXPath() should return the XPath of one or more relevant segments in the same fashion. As before, the method is optional and should return null if it is not implemented.

The RFM will be used as a persistent object over the entire evaluation. This means that only one instance of the RFM object will be created and used over the entire set of topics. The method first(.) will be called with each new topic to re-initialise it with a new query and document set; however, the object is persistent and therefore it is possible for the RFM object to learn over the course of the evaluation in order to improve the effectiveness of the ranking. Parameters can be stored in private class variables and tuned over the course of evaluation and over many topics. This facilitates a dynamic learning to rank implementation and evaluation. Effective learning should be demonstrated through RF system performance improvement over the course of interaction with the user over many topics.


The presentation order of documents will form the RF pseudo-submission. This submission will be evaluated with inex_eval and/or trec_eval. It is expected that effective RF methods will outperform the initial rank order. The approach provides a level playing field to all RF implementations in a standard setting and with a standard pool of documents for each topic. It provides a reusable set of resources and a simple platform that significantly reduces the initial effort that is required to implement and evaluate a relevance feedback system in a uniform and methodologically sound manner.


Submission deadline for the track is November 4.


Shlomo Geva

Timothy Chappell
Imprint | Contact someone about INEX