NLP-Ressources by CL@UZH
Further resources available for download or upon request
- Annotated News Commentary 11 corpus in Spanish (automatically tagged, parsed, NER, co-references resolved)
- Co-reference resolution pipeline for Spanish (conll, Spanish adaptation of https://github.com/dtuggener/CorZu
- Quechua repository of project SQUOIA, contains text corpora, hybrid MT system, morphology tools (xfst, foma), treebanks
- bulletin4corpus: Parallel corpus created from the Credit Suisse Bulletins
- A manually corrected corpus with part-of-speech tags of approx. 62.000 tokens (Language: German; Domain: Reports about the University of Zurich; PoS-Tagset: STTS)
- 4561 German test cases for PP-attachment from the computer magazine used in the habilitation: Martin Volk: The automatic resolution of prepositional phrase attachment ambiguities in German. University of Zurich. 2001.
- 3000 sentences annotated in the NEGRA format (computer magazine). Please contact Martin Volk.
- The German-language thesaurus UniNet, which comprises approx. 20'000 nouns in the WordNet format belonging to the domain of (Swiss) university terminology. For information, please contact Simon Clematide.
- SMULTRON Stockholm MULtilingual TReebank
- A Python script for computing full lemmas for elliptical German compounds as developed by Noëmi Aepli in 2013
- Parallel corpus of German-Rumansh created by Manuela Weibel in her Master's thesis
- NOAH's Corpus of Swiss German Dialects manually annotated with POS Tags
- Manually parsed Dependency trees for Swiss German, created by Noëmi Aepli in ihrer Master's thesis