Information Retrieval for Question Answering - IR4QA Finding answers requires processing texts at a level of detail that cannot be carried out at retrieval time for very large text collections. This limitation has led many researchers to propose, broadly, a two stage approach to the QA task. In stage one a subset of query-relevant texts are selected from the whole collection. In stage two this subset is subjected to detailed processing for answer extraction. To date stage one has received limited explicit attention, despite its obvious importance -- performance at stage two is bounded by performance at stage one. The goal of this workshop is to correct this situation, and, hopefully, to draw attention of IR researchers to the specific challenges raised by QA. A straightforward approach to stage one is to employ a conventional IR engine, using the NL question as the query and with the collection indexed in the standard manner, to retrieve the initial set of candidate answer bearing documents for stage two. However, a number of possibilities arise to optimise this set-up for QA, including: * preprocessing the question in creating the IR query; * preprocessing the collection to identify significant information that can be included in the indexation for retrieval; * adapting the similarity metric used in selecting documents; * modifying the form of retrieval return, e.g. to deliver passages rather than whole documents. For this workshop, we solicit papers that address any aspect of how this first, retrieval stage of QA can be adapted to improve overall system performance. Possible topics include, but are not limited to: * parametrizations/optimizations of specific IR systems for QA * studies of query formation strategies suited to QA * different uses of IR for factoid vs. non-factoid questions * utility of term matching constraints, e.g. term proximity, for QA * analyses of passage retrieval vs full document retrieval for QA * analyses of boolean vs ranked retrieval for QA * impact of IR performance on overall QA performance * named entity preprocessing of questions or collections * corpus preprocessing to create corpus-specific thesauri for question expansion * evaluation measures for assessing IR for QA The workshop will include paper presentations and discussion. All those wishing to make a presentation should submit a 5-8 page position paper; other attendees may submit a short abstract on why this topic is of interest to them. The papers should describe recent work and may be preliminary in nature. The programme committee will arrange the presentations and discussion based on the quality of submissions and expressed interests of the attendees, and may invite other presentations as well. See http://www.sigir.org/sigir2004 for further details. Important Dates Position paper submission: June 7 Acceptance notification: June 23 Final papers due: July 6 Workshop: July 29 Submission Instructions Position papers should be no more than 4000 words (5-8 pages). The standard ACM conference style is recommended (see: http://www.acm.org/sigs/pubs/proceed/template.html). Submissions must be sent electronically in PDF or PostScript format to: Rob Gaizauskas R.Gaizauskas@sheffield.ac.uk