Information Retrieval for Question Answering - IR4QA
 
Finding answers requires processing texts at a level of detail that
cannot be carried out at retrieval time for very large text
collections. This limitation has led many researchers to propose,
broadly, a two stage approach to the QA task. In stage one a subset of
query-relevant texts are selected from the whole collection. In stage
two this subset is subjected to detailed processing for answer
extraction. To date stage one has received limited explicit attention,
despite its obvious importance -- performance at stage two is bounded
by performance at stage one. The goal of this workshop is to correct
this situation, and, hopefully, to draw attention of IR researchers to
the specific challenges raised by QA.

A straightforward approach to stage one is to employ a conventional IR
engine, using the NL question as the query and with the collection
indexed in the standard manner, to retrieve the initial set of
candidate answer bearing documents for stage two. However, a number of
possibilities arise to optimise this set-up for QA, including:

* preprocessing the question in creating the IR query;
* preprocessing the collection to identify significant information that can be included in the indexation for retrieval;
* adapting the similarity metric used in selecting documents;
* modifying the form of retrieval return, e.g. to deliver passages rather than whole documents.

For this workshop, we solicit papers that address any aspect of how
this first, retrieval stage of QA can be adapted to improve overall
system performance. Possible topics include, but are not limited to:

* parametrizations/optimizations of specific IR systems for QA
* studies of query formation strategies suited to QA
* different uses of IR for factoid vs. non-factoid questions
* utility of term matching constraints, e.g. term proximity, for QA
* analyses of passage retrieval vs full document retrieval for QA
* analyses of boolean vs ranked retrieval for QA
* impact of IR performance on overall QA performance
* named entity preprocessing of questions or collections
* corpus preprocessing to create corpus-specific thesauri for question expansion
* evaluation measures for assessing IR for QA

The workshop will include paper presentations and discussion. All
those wishing to make a presentation should submit a 5-8 page position
paper; other attendees may submit a short abstract on why this topic
is of interest to them. The papers should describe recent work and may
be preliminary in nature. The programme committee will arrange the
presentations and discussion based on the quality of submissions and
expressed interests of the attendees, and may invite other
presentations as well. See http://www.sigir.org/sigir2004 for further
details.

Important Dates

Position paper submission: June 7
Acceptance notification: June 23
Final papers due: July 6
Workshop: July 29

Submission Instructions

Position papers should be no more than 4000 words (5-8 pages). The
standard ACM conference style is recommended (see:
http://www.acm.org/sigs/pubs/proceed/template.html). Submissions must
be sent electronically in PDF or PostScript format to:

Rob Gaizauskas
R.Gaizauskas@sheffield.ac.uk