SMULTRON annotation guidelines
The treebanks have been created following different annotation schemata, depending on the language. The use of different annotation schemata for different languages can be problematic when combining the monolingual treebanks into one parallel treebank. However, we want the monolingual treebanks to be standalone, in addition to being used together in the parallel treebank, and therefore compatible with existing treebanks.
Guidelines for part-of-speech tagging
- English - Part-of-speech tagging for the Penn Treebank Project
- German - Part-of-speech tagging with the Stuttgart-Tübingen Tagset (STTS)
- Swedish - Part-of-speech tagging with the Stockholm-Umeå Corpus (SUC) tagset
Guidelines for parsing
- English - Treebank (II) bracketing for the Penn Treebank Project
- German - Syntactic annotation with the TIGER schema
- Swedish - Syntactic annotation with our adapted version of the TIGER schema: