Data-driven Features of Linguistic Variation

Student: N.N.

Supervisor: Gerold Schneider

Introduction

We are using corpora of different English varieties, for example the ICE corpora, to describe differences between regional varieties, such as Indian English, Hong Kong English, etc.
There are also specialised corpora allowing us to detect changes in English in the past few decades or centuries.
We have used data-driven approaches to detect the extent and lexis in selected phenomena, such as passive constructions (Hundt et al. 2018) or mass nouns (Schneider et al. accepted). We have also detected high-level patterns semi-automatically (Schneider submitted). You will either zoom in on a specific phenomenon, or aim to detect patterns automatically. A keen interest in data science and diachronic or variationist linguistics is required.