COLING 2000 Workshop on Semantic Annotation and Intelligent Content

Centre Universitaire, Luxembourg, 5/6 August, 2000


Topic and Motivation

SEMANTIC ANNOTATION is augmentation of data to facilitate automatic
recognition of the underlying semantic structure. A common practice in
this respect is labeling of documents with thesaurus classes for the
sake of document classification and management. In the medical domain,
for instance, there is a long-standing tradition in terminology
maintenance and annotation/classification of documents using standard
coding systems such as ICD, MeSH and the MLS metathesaurus. Semantic
annotation in a broader sense also addresses document structure (title,
section, paragraph, etc.), linguistic structure (dependency,
coordination, thematic role, coreference, etc.), and so forth. In NLP,
semantic annotation has been used in connection with machine-learning
software trainable on annotated corpora for parsing, word-sense
disambiguation, coreference resolution, summarization, information
extraction, and other tasks. A still unexplored but important potential
of semantic annotation is that it can provide a common I/O format
through which to integrate various component technologies in NLP and AI
such as speech recognition, parsing, generation, inference, and so on.

INTELLIGENT CONTENT is semantically structured data that is used for a
wide range of content-oriented applications such as classification,
retrieval, extraction, translation, presentation, and
question-answering, as the organization of such data provides machines
with accurate semantic input to those technologies. Semantically
annotated resources as described above are typical examples of
intelligent content, whereas another major class includes electronic
dictionaries and interlingual or knowledge-representation data. Some
ongoing projects along these lines are GDA (Global Document Annotation),
UNL (Universal Networking Language) and SHOE (Simple HTML Ontology
Extension), all of which aim at motivating people to semantically
organize electronic documents in machine-understandable formats, and at
developing and spreading content-oriented application technologies aware
of such formats. Along similar lines, MPEG-7 is a framework for
semantically annotating audiovisual data for the sake of content-based
retrieval and browsing, among others. Incorporation of linguistic
annotation into MPEG-7 is in the agenda, because linguistic descriptions
already constitute a main part of existing metadata.

In short, semantic annotation is a central, basic technology for
intelligent content, which in turn is a key notion in systematically
coordinating various applications of semantic annotation. In the hope of
fueling some of the developments mentioned above and thus promoting the
linkage between basic researches and practical applications, the
workshop invites researchers and practitioners from such fields as
computational linguistics, document processing, terminology, information
science, and multimedia content, among others, to discuss various
aspects of semantic annotation and intelligent content in an
interdisciplinary way. Potential topics include but are not limited to:

            authoring/annotation tools
            integrated software architecture based on semantic
annotation
            language-based multimedia annotation
            standardization and interoperability

            semantic annotation, intelligent content and:

                  document classification
                  information extraction
                  information retrieval (interactive, pinpoint,
content-based, etc.)
                  intelligent/interactive manual
                  knowledge circulation and management
                  knowledge mining
                  machine translation
                  presentation (interactive, multimodal/multimedia,
etc.)
                  question answering
                  summarization (multimedia, multidocument, itemized,
graphical, etc.)

Please note: Submissions on syntactic annotation (tools, methods,
standards, etc.) should not be submitted to this workshop, but rather to
the COLING Workshop on Linguistically Interpreted Corpora.


Program Committee

       Amit Bagga (GE Corporate R&D, USA)
       Paul Buitelaar (DFKI-LT, Germany -- Co-Chair)
       Gregor Erbach (FTW, Austria)
       Christiane Fellbaum (Princeton University, USA)
       Wolfgang Giere (ZINFO, University of Frankfurt, Germany)
       Nicola Guarino (Ladseb-CNR Padova, Italy)
       Koiti Hasida (ETL, Japan -- Co-Chair)
       Boris Katz (AI Laboratory, MIT, USA)
       Adam Kilgarriff (University of Brighton, UK)
       Elizabeth Liddy (Syracuse University, USA)
       Katashi Nagao (IBM TRL, Japan)
       Hiroshi Nakagawa (University of Tokyo, Japan)
       Hwee Tou Ng (DSO, Singapore)
       Martha Palmer (University of Pennsylvania, USA)
       Virach Sornlertlamvanich (NECTEC, Thailand)
       Steffen Staab (University of Karlsruhe, Germany)
       Henry Thompson (Edinburgh University, UK)
       Hiroshi Uchida (United Nations University, Japan)
       Remi Zajac (CRL, New Mexico State University, USA)


Schedule

Two day workshop with an equal amount of invited and refereed
presentations on day one, plus a number of smaller working groups with
group presentations on day two.

       Paper submission deadline                   April 30
       Notification of acceptance/rejection      May 30
       Publication of workshop program         June 15
       Workshop                                          August 5/6


Submission

Submissions, in English, of at most 5000 words (in PS or PDF format)
should be sent (preferably by email) to the following two organizers:

       Paul Buitelaar ( paulb@dfki.de )
       DFKI
       Language Technology
       Stuhlsatzenhausweg 3
       D-66123 Saarbruecken, Germany

       Koiti Hasida ( hasida@etl.go.jp )
       Information Science Division
       Electrotechnical Laboratory
       1-1-4, Umezono, Tukuba, Ibaraki 305-8568, Japan


Workshop webpage: http://www.dfki.de/~paulb/workshop/cfp.html