CALL FOR PAPERS ACL-2004 Workshop on Multiword Expressions: Integrating Processing 26th July 2004, Barcelona, Spain _______________________________________________________ WEBSITES Workshop website: ACL website: WORKSHOP DESCRIPTION In recent years, there has been a growing awareness in the NLP community of the problems that Multiword Expressions (MWEs) pose and the need for their robust handling. MWEs include a large range of linguistic phenomena, such as phrasal verbs (e.g. "add up"), nominal compounds (e.g. "telephone box"), and institutionalized phrases (e.g. "salt and pepper"). These expressions, which can be syntactically and/or semantically idiosyncratic in nature, are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word. Most real-world applications tend to ignore MWEs or address them simply by listing. However, it is clear that successful applications will need to be able to identify and treat them appropriately. This particularly applies to the many applications which require some degree of semantic interpretation (e.g. machine translation, question-answering, summarisation, generation) and require tasks such as parsing and word sense disambiguation. A considerable amount of research has lately been conducted in this area, some within large research projects dedicated to MWEs. In this context, a successful workshop on MWEs was held at ACL 2003 (), with papers presenting a cross section of research on MWEs. There is some research on MWEs in general. Some is very computational, examining detection and extraction using a variety of methods. Some is more linguistic, focusing on classification of the various types. There is also a lot of research on particular subtypes of MWEs, especially English phrasal verbs. In this workshop the focus is on papers that integrate analysis, acquisition and treatment of various kinds of multiword expressions (MWEs) in NLP. For example, (1) research that combines a linguistic analysis with a method of automatically acquiring the classes described (2) work that combines the computational treatment of a class of MWEs with a solid linguistic analysis (3) research that extracts MWEs and either classifies them or uses them in some task. These combinations of research will help to bridge the gap between the needs of NLP and the descriptive tradition of linguistics. TARGET AUDIENCE The workshop will be of interest to anyone working on MWEs, e.g. in the areas of computational grammars, computational lexicography, automatic lexical acquisition, machine translation, information retrieval, text mining, and computer-assisted language teaching and learning. The objective is to summarise what has been achieved in the area, to establish common themes between different approaches, and to discuss future trends. AREAS OF INTEREST Papers are invited on, but not limited to, the following topics: * Theoretical research on MWEs, including corpus based analysis * MWE taxonomies, classifications and databases * Cross-lingual analysis of MWE types, use, and behaviour * Methods for identification and extraction of MWEs (machine learning, statistical, example- or rule-based, or hybrid) * Evaluation of MWE extraction methods * Methods for determining the compositionality of MWEs * Integration of MWE data into grammars and NLP applications (e.g. machine translation and generation) Papers can cover one or more of these areas, but research that combines different topics is especially encouraged. SUBMISSION INFORMATION Papers should be submitted electronically in Postscript or PDF format to: mwe-acl04@cl.cam.ac.uk . Submissions should conform to the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of ACL-2004 style files, also available from the ACL-2004 website. The subject line of the submission email should be "ACL2004 WORKSHOP PAPER SUBMISSION". As reviewing will be blind, the body of the paper should not include the names or affiliations of the authors. The following identification information should be sent in a separate email with the subject line "ACL2004 WORKSHOP ID PAGE": Title: title of paper Authors: list of all authors Keywords: up to five topic keywords Contact author: email address of author of record (for correspondence) Abstract: abstract of paper (not more than 10 lines) Notification of receipt will be emailed to the contact author. IMPORTANT DATES Submission deadline: 1 April 2003 Acceptance notification: 1 May 2003 Final version deadline: 15 May 2003 Workshop date: 26 July 2003 WORKSHOP CHAIRS Takaaki Tanaka (NTT Communication Science Laboratories, Japan) Aline Villavicencio (University of Cambridge, UK) Francis Bond (NTT Communication Science Laboratories, Japan) Anna Korhonen (University of Cambridge, UK) PROGRAM COMMITTEE Timothy Baldwin (Stanford University, USA) Colin Bannard (University of Edinburgh, UK) Ann Copestake (University of Cambridge, UK) Gael Dias (Beira Interior University, Portugal) James Dowdall (University of Zurich, Switzerland) Dan Flickinger (Stanford University, USA) Matthew Hurst (Intelliseek, USA) Stephan Oepen (Stanford University, USA; University of Oslo, Norway) Kyonghee Paik (ATR Spoken Language Translation Research Laboratories, Japan) Scott Piao (University of Lancaster, UK) Beata Trawinski (University of Tuebningen, Germany) Kiyoko Uchiyama (Keio University, Japan) REGISTRATION Workshop registration information will be posted at a later date. The registration fee will include attendance at the workshop and a copy of workshop proceedings.