Institute of Computational Linguistics

What is CL?

SMULTRON - Stockholm MULtilingual TReebank

Version 1.0

SMULTRON (Stockholm MULtilingual TReebank) is a parallel treebank first developed by the Computational Linguistics Group at the Department of Linguistics, at Stockholm University. Version 1.0 of the parallel treebank contains around 1000 sentences in English, German and Swedish. The sentences have been PoS-tagged and annotated with phrase structure trees. The trees have been aligned on sentence, phrase and word level. Additionally, the German and Swedish monolingual treebanks contain lemma information.

Version 2.0

The Institute of Computational Linguistics continues the work on the SMULTRON project. Version 2.0 is an extension of the original treebank with a new text type: 500 sentences from a user manual in English, German and Swedish.

Version 3.0

Yet another text genre and two new languages have been added to our parallel treebank: mountaineering reports in French and German as well as the Spanish version of the user manual.

Currently we are distributing the SMULTRON treebanks with around 2500 sentences (version 3.0) in TIGER-XML format in 12 treebank files combined in 9 alignment files.

We plan to extend the treebank with further texts in other languages and complete the alignments for all language pairs.


Please register here.



Please refer to:

  author = {Martin Volk and Anne Göhring and Torsten Marek and Yvonne Samuelsson},
  year = 2010,
  title = {{SMULTRON (version 3.0) — The Stockholm MULtilingual parallel TReebank}},
  note = {An English-French-German-Spanish-Swedish parallel treebank
          with sub-sentential alignments},
  howpublished = {},
  institution = {Institute of Computational Linguistics, University of Zurich}


This is a collection of the publications regarding the SMULTRON parallel treebank and its creation.