COLE logo Software tools

In this page you will find a sample of the Natural Language Processing tools built by the COLE Group.

COMPAS (COMpiler for PArsing Schemata)

COMPAS (COMpiler for PArsing Schemata) is a system that can be used to automatically compile formal specifications of parsing algorithms (in the form of parsing schemata) to efficient Java implementations of the corresponding parsers.

You can find more information, and download the source code and binaries of the system at the COMPAS home page.

LIBNADFA, A library for efficient management of very large dictionaries

A library written in C to manage very large dictionaries in a efficient way with minimal memory requirements, thanks to the use of Numbered Acyclic Deterministic Fininte-state Automata. By dictionaries we means any data structure that allows associating entries in the dictionary (typically words) with any kind of information.

In this library we have integrated the techniques for the construction of minimal automata proposed by Jan Daciuk et al. in the article "Incremental Construction of Minimal Acyclic Finite-State Automata" with the techniques for the management of the information associated to entries (words) proposed by Jorge Graña et al. in "Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries".

This library is distributed under the terms of the GNU General Public License version 3.

Copyright 2009 Nieves Fernández, Fco. Mario Barcala, Jorge Graña.

Preprocessor

A linguistically-motivated preprocessor module for Spanish that performs tasks such as format conversion, tokenization, sentence segmentation, morphological pretagging, contraction splitting, separation of enclitic pronouns from verbal stems, expression identification, numeral identification and proper noun recognition. It is described in detail in the following publications:

To download this software, please contact Jesús Vilares (jvilares@udc.es).

PoS Tagger and Lemmatizer (MrTagoo)

MrTagoo is a high-performance part-of-speech tagger and lemmatizer based on a second order Hidden Markov Model that also incorporates certain capabilities such as a very efficient structure for storage and search based on finite-state automata, management of unknown words, and the possibility of integrating external dictionaries in the probabilistic frame defined by the Hidden Markov Models. Currently, the possibility of managing ambiguous segmentations is under implementation.

MrTagoo is described in detail in the following publications:

To download this software, please contact Jorge Graña (grana@udc.es).

Definite Clause Grammar Parser

An efficient parser of Definite Clause Grammars guided by the LALR(1) automata constructed from the underlying context-free backbone. As a special feature, some kinds of non-terminating computations are detected and the resulting cycles on the logical arguments are represented in a compact manner.

This parser is described in detail in the folloing publications:

To download this software, please contact David Cabrero (cabrero@udc.es).