INEX 2013 Books and Social Search Track
Home | About | 2011 | 2012 | 2014

New: 2013 evaluation results updated (2013-06-13)!

Overview

For centuries books were the dominant source of information, but how we acquire, share, and publish information is changing in fundamental ways due to the Web. The goal of the Books and Social Search Track is to investigate techniques to support users in searching and navigating the full texts of digitized books andcomplementary social media as well as providing a forum for the exchange of research ideas and contributions. Towards this goal the track is building appropriate evaluation benchmarks complete with test collections for focused, social and semantic search tasks. The track touches on a range of fields, including information retrieval (IR), information science (IS), human computer interaction (HCI), digital libraries (DL), and eBooks.

The Social Book Search Track runs two search tasks:

  1. Social Book Search:
  2. Prove It!


Social Book Search

Social media have made book search more complex: there is much more information about books online than what's in library catalogues, which brings out book search information needs that go beyond topical relevance and touch upon issues such as engagement, writing style, comprehensiveness and popularity. Most book search services allow users to search only on traditional metadata, which may not be suitable for scenarios where users have information needs with many relevance aspects. The Social Book Search (SBS) task aims to evaluate book search in such scenarios and to investigate the relative value of traditional book metadata and user-generated content for book search. Using book requests from the LibraryThing discussion forums and a collection of 2.8 million book descriptions from Amazon and LibraryThing, the task is to return a ranked list of book suggestions to the user.

One of the challenges is dealing with a mixture of professional and social metadata, which differ both in quantity as well as in kind. Professional metadata is often based on controlled vocabularies to describe topical information, with a minimal set of subject headings or classification information. Social metadata comes in the form of reviews that vary widely in length, opinion, clarity, seriousness and in the aspects of the book they discuss, such as writing style, comprehensiveness, engagement, accuracy, recency, topical coverage and diversity and genre.

The task attempts to address questions such as:

Book search is highly complex. Searcher may want to read reviews and ratings from others to inform their decisions. When searching for themselves their relevance criteria may be very different from when they are searching for someone else (as a birthday present or merely to help someone in their search). They may be searching for books in genres or about topics they are familiar with, in which case a profile of their reading habits may be helpful, but they may also be searching for new genres and/or topics, for which little preference information is available.

Submissions

Participants are allowed to submit up to 6 runs in standard TREC format. Any field in the topic statement may be used as well as any information in the user profiles. The topics and user profiles for 2013 are available on the Document Collection page. The submission deadline is 19 May 2013.

User Profiles

Each topic statement contains the username of a LibraryThing member who created the topic. In addition to these statements, there is a user profile for each of these members which contain information about the books catalogued by these members, including tags and ratings and connections to friends on LibraryThing. The topics and user profiles for 2013 are available on the Document Collection page.



Prove It!

We have always trusted books! Deserved or not, the nicely rolled, later bound, physically tangible, uniquely calligraphed or (later) mass pressed or printed texts have had an aura of trustworthiness, authority and rigor to them. Mass digitization has made many of them, (soon all of them ?) digitally available to people, reading devices and retrieval systems.

This mass digitization calls for incorporating the books in tasks that exploit (and in a way also closer examine) this legacy of the book. Prove It! (PI): Building on a corpus of over 50,000 digitized books, tries to do exactly that: test the application of both focused and semantic retrieval approaches to digitized books. Systems are presented with a factual claim and need to return a ranked list of book pages containing information that can confirm or refute that claim.

Finding support for or against a claim in digitised book collections has many applications in humanities research, all from identifying support for or dismissal of claimed facts, change of perspectives on a claim over time, until identifying reliable or original sources or analysing cultural differences.

The corpus contains close to 17 million pages. The participant is challenged to devise a retrieval system that, in response to a claim, returns lists of pages, in descending order of likelihood that they confirm / refute the claim, and at the same time have the authority to do so (and therefore can be trusted).

Thus, the returned book pages will be judged on two aspects:

  1. Confirmation/refutation: Whether the page confirms or refutes the factual claim. A page about the general topic of the claim is not enough. A page is only relevant when it says something about the truth value of the factual claim (i.e., whether it is true or not).
  2. Authority: Whether the book the page is taken from is of an appropriate genre and type to confirm or refute the claim. Works of fiction are probably inappropriate for most claims. For some claims a history book may be more appropriate than other genres and types of books. For other claims a rigorous scientific textbook with references to research literature may be needed for confirmation or refutation.

For registered participants, training material is available in the form of graded judgements of the extent to which pages confirm or refute claims from 2010—2012. The task guidelines for 2013 are now available.



Related projects

In addition to these search tasks, there are two related projects.

The Structure Extraction (SE) task runs as part of the ICDAR 2013 competition, this task builds on a set of 1,000 digitized books to evaluate automatic techniques for constructing hyperlinked table of contents from OCR text and layout information. Please contact Antoine Doucet for more information, or check the Structure Extraction 2013 competition website.

The Active Reading Task (ART) investigates how hardware and software for eBooks can support readers engaged in a variety of reading related activities such as fact finding, memory tasks and learning. The goal of the investigation is to derive user requirements and consequently design recommendations for more usable tools to support active reading practices for eBooks. For questions and more information, please contact Monica Landoni.

Data sets and training material

Document collections

Training material

Participants of the SBS and PI tasks can use topic sets and Qrels from previous years to debug and train their systems. Other material, such as scripts and ISBN mappings needed for the SBS evaluation are available from the same page.

Schedule

SBS and PI tasks:

Structure Extraction:

Further details are available at the
Structure Extraction 2013 competition website.

Organizers

Search tasks

Marijn Koolen
University of Amsterdam
marijn.koolen@uva.nl

Gabriella Kazai
Microsoft Research Cambridge
gabkaz@microsoft.com

Michael Preminger
Oslo and Akershus University College of Applied Sciences
michaelp@hioa.no

Structure Extraction

Antoine Doucet
University of Caen
doucet@info.unicaen.fr

Active Reading Task

Monica Landoni
University of Lugano
monica.landoni@unisi.ch

Imprint | Contact someone about INEX