Parallel Treebanks

SMULTRON - Stockholm MULtilingual TReebank

The Department of Computational Linguistics continues the work on the SMULTRON project, which is a treebank of parallel texts. To this date, each 500 sentences from the novel “Sophie’s World”, from several business texts and from a user manual have been annotated syntactically and aligned to each other. The parallel treebank is available in German, English and Swedish. smultron-4.0.zip

We plan to extend the treebank with new types and texts and more languages. In the next release, the Spanish version of the user manual (500 sentences) will be added.

We also experiment with typologically unrelated languages. For this we created a small parallel treebank with Quechua and Spanish.

Another dimension of annotation is semantic annotation of existing trees.

The annotation is distributed under a Creative Commons Licence.

TreeAligner

The Department of Computational Linguistics develops a tool for aligning and search parallel treebanks. Syntax trees as well as alignments are displayed in a graphical user interface. The TreeAligner supports alignments of words and phrases in parallel trees. The alignments can be exact or fuzzy (translational equivalence).

The search engine is based on the TIGER query language and is extended for searching parallel trees and their alignments.