Portrait and Research Interests
I am Titulary Professor and postdoc staff member in Computational Linguistics (CL) at the Department of Computational Linguistics, head of the NLP group in Linguistics Research Infrastructure (LiRI Tech NLP) and of the Text Crunching Center (TCC), which offers computational linguistics services to the University and other partners.
I am a senior researcher in the URPP Digital Religion(s), in Project 8, where we advance hate speech detection tools, detect intolerance and apply content analysis methods on important social and religious issues.
I have been senior lecturer and computing scientist (wissenschaftlicher Informatiker) at the English Department of the University of Zurich (Gerold Schneider's homepage at the English Department).
My research interests include corpus linguistics, semantic mining, automated media content analysis, cognitive linguistics, digital humanities, robust parsing, syntax, formal grammar.
I am involved in research on automated media content analysis, and on Text Mining in the biomedical and many other domains. I am also doing research on Digital Humanities, learner language, variationist linguistics (genre, regions, contrastive, typology), and statistical methods.
I have published over 130 peer-reviewed articles and a coursebook on Statistics.
In the winter term 2017/18 I have worked as Substituting Professor for German Linguistics at TU Dortmund University.
I have worked at the linguistics department of University of Konstanz, substituting Prof. Dr. Miriam Butt from 2015 to 2017 as Professor of Computational and General Linguistics.
Selected articles in bibliographical databases can be
downloaded from ZORA
or downloaded from my Google Scholar profile
I co-supervise the following doctoral theses: Michi Amsler, Peter Makarov, Janis Goldzycher, Maud Reveilhac.
I have written my cumulative habilitation on using computational linguistics methods for descriptive linguistics, text mining and psycholinguistics.
I have written a a low-complexity, broad-coverage probabilistic Dependency Parser for English,as a part of
I have also ported it to German, together with Rico Sennrich.
My Recent Publications related to the Department of Computational Linguistics (ZORA)
ZORA Publikationsliste
Download-Optionen
Publikationen
-
Investigating Linguistic Abilities of LLMs for Native Language Identification Proceedings of the 14th Workshop on NLP for Computer Assisted Language Learning. 2025., Talin. https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-series/nlp4call2025
-
Digital Dickens: An automated content analysis of Charles Dickens’ novels In: Buschfeld, Sarah; Ronan, Patricia; Neumaier, Theresa; Wellinghoff, Andreas; Westermayer, Lisa . Crossing Boundaries through Corpora: Innovative corpus approaches within and beyond linguistics. Amsterdam: John Benjamins Publishing, 62-98.
-
Automatically detecting directives with SPICE Ireland In: Schweinberger, Martin; Ronan, Patricia . Socio-Pragmatic Variation in Ireland: Using Pragmatic Variation to Construct Social Identities. Berlin: De Gruyter, 205-234.
-
Evaluating Transformers on the Ethical Question of Euthanasia In: SwissText 2024, Chur, Switzerland, 10 Juni 2024 - 11 Juni 2024, 241-246.
-
Text Analytics for Corpus Linguistics and Digital Humanities: Simple R Scripts and Tools London: Bloomsbury Academic.
-
The Visualisation and Evaluation of Semantic and Conceptual Maps In: Laitinen, Mikko; Tyrkkö, Jukka . Linguistics across Disciplinary Borders: The March of Data. London: Bloomsbury Publishing, 67-94.
-
Native Language Identification Improves Authorship Attribution In: Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024), Trento, Italy, 2024. Association for Computational Linguistics, 289-296.
-
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 1 June 2024. Association for Computational Linguistics, 4405-4424.
-
Investigating child language acquisition from a joint perspective: A comparison of traditional and new L1 speakers of English In: Schmalz, Mirjam; Vida-Mannl, Manuela; Buschfeld, Sarah . Acquisition and Variation in World Englishes: Bridging Paradigms and Rethinking Approaches. Berlin: De Gruyter, 133-157.
-
Turkish Native Language Identification In: 6th International Conference on Natural Language and Speech Processing (ICNLSP-2023), virtual, 16 December 2023 - 17 December 2023, 303-307.
-
Exploring Hybrid Linguistic Features for Turkish Text Readability In: 6th International Conference on Natural Language and Speech Processing (ICNLSP-2023), virtual, 16 December 2023 - 17 December 2023, 223-232.
-
The LiRI Corpus Platform In: CLARIN Annual Conference 2023, Leuven, Belgium, 16 October 2023 - 18 October 2023. CLARIN ERIC, 145-149.
-
Exploring the role of AI in classifying, analyzing, and generating case reports on assisted suicide cases: feasibility and ethical implications Frontiers in Artificial Intelligence, 6:1328865.
-
Colloquialisation, compression and democratisation in British parliamentary debates In: Korhonen, Minna; Kotze, Haidee; Tyrkkö, Jukka . Exploring Language and Society with Big Data: Parliamentary discourse across time and space. Amsterdam: John Benjamins Publishing, 336-372.
-
Swissdox@ LiRI–a large database of media articles made accessible to researchers In: CLARIN Annual Conference 2023, Leuven, 16 Oktober 2023 - 18 Oktober 2023. CLARIN ERIC, 111-115.
-
Differences in syntactic annotation affect retrieval International Journal of Corpus Linguistics, 28(3):378-406.
-
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data In: The 7th Workshop on Online Abuse and Harms (WOAH), Toronto, Canada, 13 July 2023. Association for Computational Linguistics, 187-201.
-
Detecting and Analysing Learner Difficulties Using a Learner Corpus Without Error Tagging In: Harrington, Kieran; Ronan, Patricia . Demystifying Corpus Linguistics for English Language Teaching. Cham: Palgrave Macmillan, 229-257.
-
Replicable semi-supervised approaches to state-of-the-art stance detection of tweets Information Processing & Management, 60(2):103199.
-
Do Non-native Speakers Read Differently? Predicting Reading Times with Surprisal and Language Models of Native and Non-native Eye Tracking Data In: Busse, Beatrix; Dumrukcic, Nina; Kleiber, Ingo . Language and Linguistics in a Complex World. Berlin: De Gruyter, 153-188.
-
Scaling Native Language Identification with Transformer Adapters In: 5th International Conference on Natural Language and Speech Processing (ICNLSP), Trento, 16 December 2022 - 17 December 2022, Cornell University.
-
Assessing How Attitudes to Migration in Social Media Complement Public Attitudes Found in Opinion Surveys SPELL: Swiss Papers in English Language and Literature, 41:119-153.
-
Complementing Kernel Density Estimation and Topic Modelling to Visualise Political Discourse In: Digital Research Data and Human Sciences DRDHum Conference 2022, Jyväskylä, Finland, 1 Dezember 2022 - 3 Dezember 2022. University of Jyväskylä, 12-27.
-
Systematically Detecting Patterns of Social, Historical and Linguistic Change: The Framing of Poverty in Times of Poverty Transactions of the Philological Society, 120(3):447-473.
-
Hypothesis Engineering for Zero-Shot Hate Speech Detection In: Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022), Gyeongju, Republic of Korea, 12 October 2022 - 17 October 2022. ACL, 75-90.
-
Comparing the coverage of the “marriage for all” vote on Twitter and in the newspapers In: 2nd Workshop on Computational Linguistics for Political Text Analysis (CPSS-2022), Potsdam, Germany, 12 September 2022. CPSS, 55-62.
-
Recent changes in spoken British English according to spoken BNC2014 In: Flach, Susanne; Hilpert, Martin . Broadening the spectrum of corpus linguistics: New approaches to variability and change. Amsterdam: John Benjamins Publishing, 173-195.
-
Syntactic changes in verbal clauses and noun phrases from 1500 onwards In: Los, Bettelou; Cowie, Claire; Honeybone, Patrick . English Historical Linguistics: Change in Structure and Meaning. Amsterdam: John Benjamins Publishing, 163-200.
-
Measuring Attitudes to Migration in the Media automatically with Complementary Data Sources and Methods In: Ronan, Patricia; Ziegler, Evelyn . Approaches to Migration and Language Identity. Oxford, Bern, Berlin, Bruxelles, New York, Wien: Peter Lang, 207-252.
-
Challenges and best practices for digital unstructured data enrichment in health research: a systematic narrative review medRxiv 22278137, Cold Spring Harbor Laboratory.
-
Medical topics and style from 1500 to 2018 In: Hiltunen, Turo; Taavitsainen, Irma . Corpus pragmatic studies on the history of medical discourse. Amsterdam: Benjamins, 49-78.
-
Correlations and predictions of reading times using language models and surprisal In: Krug, Manfred; Schützler, Ole; Vetter, Fabian; Werner, Valentin . Perspectives on Contemporary English : Structure, Variation, Cognition. Berlin, Bern, Bruxelles, New York, Oxford, Warszawa, Wien: Peter Lang, 209-243.
-
Comparing data-driven to corpus-based approaches for diachronic variation: document-classification and overuse metrics In: Schlüter, Julia; Schützler, Ole . Data and Methods in Corpus Linguistics: Comparative Approaches. Cambridge: Cambridge University Press, 291-322.
-
With a little help from familiar interlocutors: real-world language use in young and older adults Aging & Mental Health, 25(12):2310-2319.
-
Linear and Non-Linear Age Trajectories of Language Use: A Laboratory Observation Study of Couples' Conflict Conversations Journals of Gerontology, Series B: Psychological Sciences and Social Sciences, 75(9):e206-e214.
-
Changes in society and language: charting poverty In: Rautinaho, Paula; Nurmi, Arja; Klemola, Juhani . Corpora and the changing society: studies in the evolution of English. Amsterdam: John Benjamins Publishing, 29-56.
-
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications In: 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 11 May 2020 - 16 May 2020. European Language Resources Association, 346-355.
-
Spelling normalisation of Late Modern English: comparison and combination of VARD and character-based statistical machine translation In: Kytö, Merja; Smitterberg, Eric . Late Modern English: novel encounters. Amsterdam: John Benjamins Publishing, 243-268.
-
A Man who Was Just an Incredible Man, an Incredible Man: Age Factors and Coherence in Donald Trump’s Spontaneous Speech In: Schneider, Ulrike; Eitelmann, Matthias . Linguistic Inquiries into Donald Trump’s Language : From ‘Fake News’ to ‘Tremendous Success’. London: Bloomsbury, 62-84.
-
Statistics for Linguists: A patient, slow-paced introduction to statistics and to the programming language R Zurich: Digitale Lehre und Forschung UZH.
-
Cognitive Aging Effects on Language Use in Real-Life Contexts: A Naturalistic Observation Study In: The 41st Annual Meeting of the Cognitive Science Society, Montreal, QC, 24 July 2019 - 27 July 2019, CogSci.
-
Topics of eighteenth-century medical writing with triangulation of methods: LMEMT and the underlying reality In: Taavitsainen, Irma; Hiltunen, Turo . Late Modern English medical texts: writing medicine in the eighteenth century (Including the LMEMT Corpus). Amsterdam: John Benjamins Publishing, 31-74.
-
Scholastic argumentation in Early English medical writing and its afterlife: new corpus evidence In: Suhr, Carla; Nevalianen, Terttu; Taavitsainen, Irma . From data to evidence in English language research. Leiden: Brill, 191-221.
-
Statistical MWE-aware parsing In: Parmentier, Yannick; Waszczuk, Jakub . Representation and parsing of multiword expressions: current trends. Berlin: Language Science Press, 147-182.
-
NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills In: 7th Workshop on NLP for Computer Assisted Language Learning at SLTC 2018 (NLP4CALL 2018), Stockholm, 7 November 2018 - 7 November 2018, 69-78.
-
Detecting innovations in a parsed corpus of learner English In: Deshors, Sandra C.; Götz, Sandra; Laporte, Samanantha . Rethinking linguistic creativity in non-native Englishes. Amsterdam: John Benjamins Publishing, 47-74.
-
Differences between Swiss High German and German German via data-driven methods In: SwissText 2018: 3rd Swiss Text Analytics Conference, Winterthur, 12 Juni 2018 - 13 Juni 2018.
-
Differences between Swiss High German and German High German via data-driven methods In: 3rd Swiss Text Analytics Conference (SwissText 2018), Winterthur, Switzerland, 12 June 2018 - 13 June 2018. CEUR-WS, 17-25.
-
From Lexical Bundles to Surprisal and Language Models: measuring the idiom principle on native and learner language In: Kopaczyk, Joanna; Tyrkkö, Jukka . Applications of Pattern-driven Methods in Corpus Linguistics. Amsterdam: Benjamins, 15-56.
-
Tools and Methods for Processing and Visualizing Large Corpora Studies in Variation, Contacts and Change in English, 19:online.
-
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production In: Interspeech 2017, Stockholm, 19 August 2017 - 24 August 2017. ISCA, 1779-1783.
-
Saying Whatever It Takes: Creating and Analyzing Corpora from US Presidential Debate Transcripts In: Corpus Linguistics Conference 2017, Birmingham, 25 Juli 2017 - 28 Juli 2017, 537-544.
-
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts In: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language, Gothenburg, 22 Mai 2017 - 22 Mai 2017, 40-46.
-
Crossing the Border Twice: Reimporting Prepositions to Alleviate L1-Specific Transfer Errors In: Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition, Gothenburg, 22 Mai 2017. Linköping University Electronic Press, 18-26.
-
Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016, RUB.
-
Detecting innovations in a parsed corpus of learner english International Journal of Learner Corpus Research, 2(2):177-204.
-
Review of Automatic Treatment of Learner Corpus Data, Ana Diaz Negrillo, Nicolas Ballier and Paul Thompson, eds. (2013) International Journal of Learner Corpus Research, (1):172-177.
-
Parsing early and late modern English corpora Literary and Linguistic Computing, 30(3):423-439.
-
Determining light verb constructions in contemporary British and Irish English International Journal of Corpus Linguistics, 20(3):326-354.
-
Automated Media Content Analysis from the Perspective of Computational Linguistics In: Sommer, Katharina; Wettstein, Martin; Wirth, Werner; Matthes, Jörg . Automatisierung in der Inhaltsanalyse. Köln: Herbert von Halem Verlag, 40-54.
-
Measuring the public accountability of new modes of governance In: ACL Workshop on Language Technologies and Computational Social Science, Baltimore, MD, USA, 26 June 2014 - 26 June 2014, 38-43.
-
Applying Computational Linguistics and Language Models: From Descriptive Linguistics to Text Mining and Psycholinguistics 2014, University of Zurich, Philosophische Fakultät.
-
ODIN: a customizable literature curation tool In: Fourth BioCreative Challenge Evaluation Workshop, Bethesda, MD, US, 7 October 2013 - 9 October 2013, 219-223.
-
Of-genitive versus s-genitive: A corpus-based analysis of possessive constructions in 20thcentury English In: Bennett, Paul; Durrell, Martin; Scheible, Silke; Whitt, Richard J . New Methods in Historical Corpora. Tübingen: Narr Verlag, 163-180.
-
Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis In: Recent Advances in Natural Language Processing (RANLP 2013), Hissar, Bulgaria, 7 September 2013 - 13 September 2013, 601-609.
-
UZH in BioNLP 2013 In: Proceedings of the BioNLP Shared Task 2013 Workshop, Sophia, Bulgaria, 9 August 2013 - 9 August 2013, 116-120.
-
Investigating Irish English With ICE-Ireland Cahiers de l'institut de linguistique et des sciences du langage, 38(2013):137-162.
-
Using the OntoGene pipeline for the triage task of BioCreative 2012 Database, 2013:bas053.
-
Notes about the OntoGene pipeline In: AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, Virginia, USA., 2 November 2012 - 4 November 2012.
-
Using syntax features and document discourse for relation extraction on PharmGKB and CTD In: SMBM 2012, Zurich, Switzerland, 3 September 2012 - 4 September 2012, 52-57.
-
Dependency parsing for interaction detection in pharmacogenomics In: LREC 2012: The eighth international conference on Language Resources and Evaluation, Istanbul, 21 May 2012 - 25 May 2012.
-
Using semantic resources to improve a syntactic dependency parser In: LREC 2012 Conference Workshop "Semantic Relations II", Istanbul, Turkey, 22 May 2012 - 22 May 2012, 67-76.
-
Dependency bank In: LREC 2012 Conference Workshop "Challenges in the Management of Large Corpora", Istanbul, Turkey, 22 May 2012 - 22 May 2012, 23-28.
-
Relation Mining Experiments in the Pharmacogenomics Domain Journal of Biomedical Informatics, 45(5):851-861.
-
Adapting a parser to historical English Helsinki: University of Helsinki.
-
Using automatically parsed corpora to discover lexico-grammatical features of English varieties In: 30th International Conference on Lexis and Grammar, Nicosia, Cyprus, 5 October 2011 - 8 October 2011, 251-258.
-
Detection of interaction articles and experimental methods in biomedical literature BMC Bioinformatics, 12(Suppl 8):S13.
-
Text-Mining-Methoden im Semantic Web Wirtschaftsinformatik und Management, 3:28-35.
-
A large-scale investigation of verb-attached prepositional phrases Helsinki: University of Helsinki.
-
A data-driven approach to alternations based on protein-protein interactions In: III Congreso Internacional de Lingüística de Corpus, Valencia, Spain, 7 April 2011 - 9 April 2011, 597-607.
-
OntoGene (Team 65): preliminary analysis of participation in BioCreative III In: BioCreative III workshop, Bethesda, Maryland, 13 September 2010 - 15 September 2010.
-
OntoGene in BioCreative II.5 IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3):472-480.
-
Text Mining Methoden im Semantic Web HMD Praxis der Wirtschaftsinformatik, (271):35-46.
-
Using a parser as a heuristic tool for the description of New Englishes In: The Fifth Corpus Linguistics Conference, Liverpool, UK, 20 July 2009 - 23 July 2009, online.
-
UZurich in the BioNLP 2009 Shared Task In: BioNLP 2009 Companion Volume: Shared Task on Event Extraction, NAACL/HLT, Boulder, Colorado, 4 June 2009 - 5 June 2009, 28-36.
-
Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources In: Gelbukh, Alexander . Computational Linguistics and Intelligent Text Processing. Berlin: Springer, 406-417.
-
A New Hybrid Dependency Parser for German In: Chiarcos, Christian; de Castilho, Richard Eckart; Stede, Manfred . Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically. Proceedings of the Biennial GSCL Conference 2009. Tübingen: Narr, 115-124.
-
Parser-based analysis of syntax-lexis interactions In: Jucker, Andreas H; Schreier, Daniel; Hundt, Marianne . Corpora: Pragmatics and Discourse. Amsterdam, The Netherlands: Rodopi, 477-502.
-
Detecting Protein-Protein Interactions in Biomedical Literature Using a Parser In: Clematide, Simon; Klenner, Manfred; Volk, Martin . Searching Answers. Münster: MV Verlag, 109-118.
-
Hybrid long-distance functional dependency parsing 2008, University of Zurich, Faculty of Arts.
-
A Broad-Coverage, Representationally Minimalist LFG Parser: Chunks and F-Structures Are Enough In: LFG05, Bergen, Norway, 18 July 2005 - 20 July 2005.
My Recent Publications related to the English Department (ZORA)
ZORA Publikationsliste
Download-Optionen
Publikationen
-
Digital Dickens: An automated content analysis of Charles Dickens’ novels In: Buschfeld, Sarah; Ronan, Patricia; Neumaier, Theresa; Wellinghoff, Andreas; Westermayer, Lisa . Crossing Boundaries through Corpora: Innovative corpus approaches within and beyond linguistics. Amsterdam: John Benjamins Publishing, 62-98.
-
Automatically detecting directives with SPICE Ireland In: Schweinberger, Martin; Ronan, Patricia . Socio-Pragmatic Variation in Ireland: Using Pragmatic Variation to Construct Social Identities. Berlin: De Gruyter, 205-234.
-
Text Analytics for Corpus Linguistics and Digital Humanities: Simple R Scripts and Tools London: Bloomsbury Academic.
-
The Visualisation and Evaluation of Semantic and Conceptual Maps In: Laitinen, Mikko; Tyrkkö, Jukka . Linguistics across Disciplinary Borders: The March of Data. London: Bloomsbury Publishing, 67-94.
-
Investigating child language acquisition from a joint perspective: A comparison of traditional and new L1 speakers of English In: Schmalz, Mirjam; Vida-Mannl, Manuela; Buschfeld, Sarah . Acquisition and Variation in World Englishes: Bridging Paradigms and Rethinking Approaches. Berlin: De Gruyter, 133-157.
-
Colloquialisation, compression and democratisation in British parliamentary debates In: Korhonen, Minna; Kotze, Haidee; Tyrkkö, Jukka . Exploring Language and Society with Big Data: Parliamentary discourse across time and space. Amsterdam: John Benjamins Publishing, 336-372.
-
Differences in syntactic annotation affect retrieval International Journal of Corpus Linguistics, 28(3):378-406.
-
Detecting and Analysing Learner Difficulties Using a Learner Corpus Without Error Tagging In: Harrington, Kieran; Ronan, Patricia . Demystifying Corpus Linguistics for English Language Teaching. Cham: Palgrave Macmillan, 229-257.
-
Replicable semi-supervised approaches to state-of-the-art stance detection of tweets Information Processing & Management, 60(2):103199.
-
Assessing How Attitudes to Migration in Social Media Complement Public Attitudes Found in Opinion Surveys SPELL: Swiss Papers in English Language and Literature, 41:119-153.
-
Systematically Detecting Patterns of Social, Historical and Linguistic Change: The Framing of Poverty in Times of Poverty Transactions of the Philological Society, 120(3):447-473.
-
Recent changes in spoken British English according to spoken BNC2014 In: Flach, Susanne; Hilpert, Martin . Broadening the spectrum of corpus linguistics: New approaches to variability and change. Amsterdam: John Benjamins Publishing, 173-195.
-
Syntactic changes in verbal clauses and noun phrases from 1500 onwards In: Los, Bettelou; Cowie, Claire; Honeybone, Patrick . English Historical Linguistics: Change in Structure and Meaning. Amsterdam: John Benjamins Publishing, 163-200.
-
Measuring Attitudes to Migration in the Media automatically with Complementary Data Sources and Methods In: Ronan, Patricia; Ziegler, Evelyn . Approaches to Migration and Language Identity. Oxford, Bern, Berlin, Bruxelles, New York, Wien: Peter Lang, 207-252.
-
Medical topics and style from 1500 to 2018 In: Hiltunen, Turo; Taavitsainen, Irma . Corpus pragmatic studies on the history of medical discourse. Amsterdam: Benjamins, 49-78.
-
Comparing data-driven to corpus-based approaches for diachronic variation: document-classification and overuse metrics In: Schlüter, Julia; Schützler, Ole . Data and Methods in Corpus Linguistics: Comparative Approaches. Cambridge: Cambridge University Press, 291-322.
-
With a little help from familiar interlocutors: real-world language use in young and older adults Aging & Mental Health, 25(12):2310-2319.
-
Pluralized non-count nouns across Englishes: a corpus-linguistic approach to dialect typology Corpus Linguistics and Linguistic Theory, 16(3):515-546.
-
Linear and Non-Linear Age Trajectories of Language Use: A Laboratory Observation Study of Couples' Conflict Conversations Journals of Gerontology, Series B: Psychological Sciences and Social Sciences, 75(9):e206-e214.
-
Changes in society and language: charting poverty In: Rautinaho, Paula; Nurmi, Arja; Klemola, Juhani . Corpora and the changing society: studies in the evolution of English. Amsterdam: John Benjamins Publishing, 29-56.
-
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications In: 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, 11 May 2020 - 16 May 2020. European Language Resources Association, 346-355.
-
Spelling normalisation of Late Modern English: comparison and combination of VARD and character-based statistical machine translation In: Kytö, Merja; Smitterberg, Eric . Late Modern English: novel encounters. Amsterdam: John Benjamins Publishing, 243-268.
-
A Man who Was Just an Incredible Man, an Incredible Man: Age Factors and Coherence in Donald Trump’s Spontaneous Speech In: Schneider, Ulrike; Eitelmann, Matthias . Linguistic Inquiries into Donald Trump’s Language : From ‘Fake News’ to ‘Tremendous Success’. London: Bloomsbury, 62-84.
-
Statistics for Linguists: A patient, slow-paced introduction to statistics and to the programming language R Zurich: Digitale Lehre und Forschung UZH.
-
Enhancing the linguistic discovery potential of historical corpora: a twin-track approach using ARCHER In: CL 2019 International Corpus Linguistics Conference, Cardiff, Wales, UK, 22 Juli 2019 - 26 Juli 2019, Gossip Theme.
-
Topics of eighteenth-century medical writing with triangulation of methods: LMEMT and the underlying reality In: Taavitsainen, Irma; Hiltunen, Turo . Late Modern English medical texts: writing medicine in the eighteenth century (Including the LMEMT Corpus). Amsterdam: John Benjamins Publishing, 31-74.
-
Scholastic argumentation in Early English medical writing and its afterlife: new corpus evidence In: Suhr, Carla; Nevalianen, Terttu; Taavitsainen, Irma . From data to evidence in English language research. Leiden: Brill, 191-221.
-
Statistical MWE-aware parsing In: Parmentier, Yannick; Waszczuk, Jakub . Representation and parsing of multiword expressions: current trends. Berlin: Language Science Press, 147-182.
-
NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills In: 7th Workshop on NLP for Computer Assisted Language Learning at SLTC 2018 (NLP4CALL 2018), Stockholm, 7 November 2018 - 7 November 2018, 69-78.
-
Detecting innovations in a parsed corpus of learner English In: Deshors, Sandra C.; Götz, Sandra; Laporte, Samanantha . Rethinking linguistic creativity in non-native Englishes. Amsterdam: John Benjamins Publishing, 47-74.
-
Differences between Swiss High German and German German via data-driven methods In: SwissText 2018: 3rd Swiss Text Analytics Conference, Winterthur, 12 Juni 2018 - 13 Juni 2018.
-
Differences between Swiss High German and German High German via data-driven methods In: 3rd Swiss Text Analytics Conference (SwissText 2018), Winterthur, Switzerland, 12 June 2018 - 13 June 2018. CEUR-WS, 17-25.
-
From Lexical Bundles to Surprisal and Language Models: measuring the idiom principle on native and learner language In: Kopaczyk, Joanna; Tyrkkö, Jukka . Applications of Pattern-driven Methods in Corpus Linguistics. Amsterdam: Benjamins, 15-56.
-
Tools and Methods for Processing and Visualizing Large Corpora Studies in Variation, Contacts and Change in English, 19:online.
-
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production In: Interspeech 2017, Stockholm, 19 August 2017 - 24 August 2017. ISCA, 1779-1783.
-
Saying Whatever It Takes: Creating and Analyzing Corpora from US Presidential Debate Transcripts In: Corpus Linguistics Conference 2017, Birmingham, 25 Juli 2017 - 28 Juli 2017, 537-544.
-
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts In: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language, Gothenburg, 22 Mai 2017 - 22 Mai 2017, 40-46.
-
Introduction - The New Energy Crisis : Climate, Economics and Geopolitics In: Timofeeva, Olga; Gardner, Anne-Christine; Honkapohja, Alpo; Chevalier, Sarah . New Approaches in English Linguistics : Building Bridges. Amsterdam: Springer, 1-12.
-
Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016, RUB.
-
Introduction - New Approaches to English Linguistics : Building bridges In: Timofeeva, Olga; Gardner, Anne-Christine; Honkapoja, Alpo; Chevalier, Sarah . New Approaches to English Linguistics : Building bridges. Amsterdam: John Benjamins Publishing, 1-12.
-
Detecting innovations in a parsed corpus of learner english International Journal of Learner Corpus Research, 2(2):177-204.
-
Review of Automatic Treatment of Learner Corpus Data, Ana Diaz Negrillo, Nicolas Ballier and Paul Thompson, eds. (2013) International Journal of Learner Corpus Research, (1):172-177.
-
Parsing early and late modern English corpora Literary and Linguistic Computing, 30(3):423-439.
-
Determining light verb constructions in contemporary British and Irish English International Journal of Corpus Linguistics, 20(3):326-354.
-
Of-genitive versus s-genitive: A corpus-based analysis of possessive constructions in 20thcentury English In: Bennett, Paul; Durrell, Martin; Scheible, Silke; Whitt, Richard J . New Methods in Historical Corpora. Tübingen: Narr Verlag, 163-180.
-
Investigating Irish English With ICE-Ireland Cahiers de l'institut de linguistique et des sciences du langage, 38(2013):137-162.
-
Discovering new verb-preposition combinations in New Englishes Studies in Variation, Contacts and Change in English, 13:online.
-
Using semantic resources to improve a syntactic dependency parser In: LREC 2012 Conference Workshop "Semantic Relations II", Istanbul, Turkey, 22 May 2012 - 22 May 2012, 67-76.
-
Dependency bank In: LREC 2012 Conference Workshop "Challenges in the Management of Large Corpora", Istanbul, Turkey, 22 May 2012 - 22 May 2012, 23-28.
-
"Off with their heads". Profiling TAM in ICE corpora In: Hundt, Marianne; Gut, Ulrike . Mapping Unity and Diversity World-Wide. Corpus-Based Studies of New Englishes. Amsterdam: John Benjamins, 1-34.
-
Retrieving relatives from historical data Literary and Linguistic Computing, 27(1):3-16.
-
Semantic corpus trawling: Expressions of “courtesy” and “politeness” in the Helsinki Corpus In: Suhr, Carla; Taavitsainen, Irma . Developing Corpus Methodology for Historical Pragmatics. Helsinki: Research Unit for Variation, Contacts and Change in English, 1.
-
Adapting a parser to historical English Helsinki: University of Helsinki.
-
BNC Dependency Bank 1.0 In: Oksefjell, Signe; Ebeling, Jarle; Hasselgard, Hilde . Aspects of corpus linguistics: compilation, annotation, analysis. Helsinki: Research Unit for Variation, Contacts, and Change in English, online.
-
Relative complexity in scientific discourse English Language and Linguistics, 16(2):209-240.
-
Using automatically parsed corpora to discover lexico-grammatical features of English varieties In: 30th International Conference on Lexis and Grammar, Nicosia, Cyprus, 5 October 2011 - 8 October 2011, 251-258.
-
Detection of interaction articles and experimental methods in biomedical literature BMC Bioinformatics, 12(Suppl 8):S13.
-
Text-Mining-Methoden im Semantic Web Wirtschaftsinformatik und Management, 3:28-35.
-
A large-scale investigation of verb-attached prepositional phrases Helsinki: University of Helsinki.
-
A data-driven approach to alternations based on protein-protein interactions In: III Congreso Internacional de Lingüística de Corpus, Valencia, Spain, 7 April 2011 - 9 April 2011, 597-607.
-
OntoGene (Team 65): preliminary analysis of participation in BioCreative III In: BioCreative III workshop, Bethesda, Maryland, 13 September 2010 - 15 September 2010.
-
OntoGene in BioCreative II.5 IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3):472-480.
-
Text Mining Methoden im Semantic Web HMD Praxis der Wirtschaftsinformatik, (271):35-46.
-
Using a parser as a heuristic tool for the description of New Englishes In: The Fifth Corpus Linguistics Conference, Liverpool, UK, 20 July 2009 - 23 July 2009, online.
-
Multi-verbal expressions of ‘giving’ in Old English and Old Irish In: Corpus Linguistics Conference, Liverpool, UK, 20 July 2009 - 23 July 2009, 116.
-
UZurich in the BioNLP 2009 Shared Task In: BioNLP 2009 Companion Volume: Shared Task on Event Extraction, NAACL/HLT, Boulder, Colorado, 4 June 2009 - 5 June 2009, 28-36.
-
Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources In: Gelbukh, Alexander . Computational Linguistics and Intelligent Text Processing. Berlin: Springer, 406-417.
-
A New Hybrid Dependency Parser for German In: Chiarcos, Christian; de Castilho, Richard Eckart; Stede, Manfred . Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically. Proceedings of the Biennial GSCL Conference 2009. Tübingen: Narr, 115-124.
-
Parser-based analysis of syntax-lexis interactions In: Jucker, Andreas H; Schreier, Daniel; Hundt, Marianne . Corpora: Pragmatics and Discourse. Amsterdam, The Netherlands: Rodopi, 477-502.
-
Detecting Protein-Protein Interactions in Biomedical Literature Using a Parser In: Clematide, Simon; Klenner, Manfred; Volk, Martin . Searching Answers. Münster: MV Verlag, 109-118.
-
Fishing for compliments: precision and recall in corpus-linguistic compliment research In: Jucker, Andreas H; Taavitsainen, Irma . Speech acts in the history of English. Amsterdam: John Benjamins, 273-294.
-
A Broad-Coverage, Representationally Minimalist LFG Parser: Chunks and F-Structures Are Enough In: LFG05, Bergen, Norway, 18 July 2005 - 20 July 2005.
Research Interests
My research interests iclude
- Natural Language Processing (NLP)
- Corpus Linguistics
- Robust Fast Broad-Coverage Parsing
- Dependency Grammar
- Text Mining, Information Extraction
- Semantic Web
- Information Retrieval
- BioMedical Parsing Applications
- Automated Media Content Analysis
- Formal Grammar
My interests also include UNIX and Mac OS X system administration, Prolog and Perl programming, desktop publishing, travelling, literature, jogging and cycling. I have taught Prolog, theoretical computing science, and semantic web at Fernfachhochschule Schweiz (Swiss distance learning UAS). I have taught Prolog and Perl at the CL department of the University of Geneva.
Dependency Grammar and Robust Parsing
I have written a low-complexity, broad-coverage probabilistic Dependency Parser for English, Pro3Gres, as part of my doctoral thesis.
I have written my Master's Paper on Dependency Grammar and the partly dependency-based Link Grammar. I am currently developing Pro3Gres: a robust, probabilistic parser for a Dependency Grammar. In winter 2003/2004 and winter 2005/2006 I am teaching Dependency Grammar Parsing. In winter 2006/2007/2014 I am teaching Parsing Technology.
Corpus Linguistics
Both the English Seminar and the Department of Computational Linguistics have a long tradition in Corpus Linguistics research. I am a member of the Archer consortium. At the English Department, I am involved in the compilation of and web interface access to several corpora. In summer 2003, I teach a seminar on Corpus Linguistics. In summer 2006, I teach a colloquium on Corpus Linguistics. In spring 2008, I teach a lecture on Corpus Linguistics, together with Fabio Rinaldi. In spring 2008, I teach the workshop at the ICAME conference, together with Hans Martin Lehmann and Nelleke Oostdjik. In autumn 2012, I teach a BA seminar on Corpus Linguistics.
BioMedical Parsing and Relation Finding
Our research on an important application of my high-precision robust parser has started in 2005, and is an NFS project from 2008 to 2013. OntoGene: Relation Finding in the BioMedical domain.
Automated Media Content Analysis
We are using parsing and Opinion Mining in Automated Media Content Analysis projects. I am leader of subproject I.6 in the Swiss NCCR democracy project and part of the scientific network of the European ERC project POLCON.
Information Retrieval
From 2000 to 2004, I have worked in an unsupervised text classification project at the CL department of the University of Geneva
Question Answering
From 1999 to 2000 I have worked in the ExtrAns Project in Zurich.
Formal Grammars
Since the winter term 1999/2000 I sometimes teach the syntax course of the Zurich CL curriculum. We focus on GB, LFG and HPSG.