NLP-Ressources by CL@UZH
Further resources available for download or upon request
- Annotated News Commentary 11 corpus in Spanish (automatically tagged, parsed, NER, co-references resolved)
- Co-reference resolution pipeline for Spanish (conll, Spanish adaptation of https://github.com/dtuggener/CorZu
- Quechua repository of project SQUOIA, contains text corpora, hybrid MT system, morphology tools (xfst, foma), treebanks
- bulletin4corpus: Parallel corpus created from the Credit Suisse Bulletins
- A manually corrected corpus with part-of-speech tags of approx. 62.000 tokens (Language: German; Domain: Reports about the University of Zurich; PoS-Tagset: STTS)
- 4561 German test cases for PP-attachment from the computer magazine used in the habilitation: Martin Volk: The automatic resolution of prepositional phrase attachment ambiguities in German. University of Zurich. 2001.
- 3000 sentences annotated in the NEGRA format (computer magazine). Please contact Martin Volk.
- The German-language thesaurus UniNet, which comprises approx. 20'000 nouns in the WordNet format belonging to the domain of (Swiss) university terminology. For information, please contact Simon Clematide.
- SMULTRON Stockholm MULtilingual TReebank
- A Python script (ZIP, 10 KB) for computing full lemmas for elliptical German compounds as developed by Noëmi Aepli in 2013
- Parallel corpus of German-Rumansh created by Manuela Weibel in her Master's thesis
- NOAH's Corpus of Swiss German Dialects manually annotated with POS Tags
- Manually parsed Dependency trees for Swiss German, created by Noëmi Aepli in ihrer Master's thesis
- The Gold Standard corpus of temporal annotations comprising approx. 34,000 tokens. The corpus contains 50 historical legal texts in Early New High German from the Collection of the Swiss Law Sources Foundation.