Open Master thesis projects

We are happy to see that you are interested in writing your thesis with us! Please note that the list of open thesis projects on this page is not exhaustive. We also encourage you to approach us with your own research idea. You can find out more about our current interests in the publication or the research section.

Please get in touch with Lena Jäger in case you are interested in doing your thesis with us.

Mouse-Tracking for Reading (co-supervised by Ethan Wilcox, ETH)

Much of our current knowledge about how people process language is built on data from incremental processing experiments. In these experiments participants read a sentence, one word at a time. The amount of time they spend processing each word can reveal a wealth of information, including what predictions they are making about upcoming text, and what syntactic structure they had assigned to the previous words.
Currently, the best way to collect incremental processing data is in eye-tracking experiments. While eye-tracking experiments have great temporal resolution, they require participants to come into the lab which can be expensive and time consuming. At the same time, there are web-deployable paradigms. But these come with their own problems, often constraining participant input and producing data that is noisier. This project seeks to build a new experimental processing paradigm for sentence processing that takes place in the browser. The idea is to use mouse cursor tracking as a proxy for gaze, leading towards an experimental paradigm that has the same temporal resolution as eye-tracking but is deployable over the internet.

The project is broken down into two distinct stages:
* Stage one will involve developing code to run the novel experimental paradigm using the magpie framework for psychological experiments (https://magpie-manual.netlify.app/), as well as a pipeline for data analysis. Additionally, this stage will involve initial deployment of the new software over the internet to collect initial validation data. (Timeline: ~3-4 months)
* Stage two will involve more careful comparison of the new paradigm to established incremental processing techniques, including eye-tracking, self-paced reading and the Maze task. Additionally, this stage will involve comparison of experimental data to the outputs of neural language models, to ascertain how data collected from this paradigm matches the statistical predictions of these models. (Timeline ~4-5 months)

The ideal research assistant would have a background in linguistics or psycholinguistics, as well as previous experience coding. This project will involve working in Javascript, Python and R. The project will be co-supervised by Ethan Wilcox (https://wilcoxeg.github.io/), a postdoc at the institute for machine learning at ETH.

Comparing human and machine surprisal

Research question: How do surprisal values, extracted from large language models, predict different measures of human surprisal in language comprehension (or production)?
Data: Eye-tracking-while reading data, self-paced-reading data
Methods to be used: state-of-the-art language models, hierarchical linear mixed models

Predicting text comprehension from eye movements in reading

Research question: Predicting text comprehension from eye movements in reading.
Data: Eye-tracking-while reading data with reading comprehension scores will be provided.
Methods to be used: Machine Learning (neural sequence models or other methods); challenge: combining language modeling of the stimulus text with eye movement data

The impact of cognitive states on sentence processing

Research question: How are the cognitive mechanisms that underlie sentence processing affected by temporal cognitive conditions such as sleepiness or alcoholization?
Data: Eye-tracking-while reading data will be provided.
Methods to be used: Psycholinguistic statistical analysis of eye-tracking data

Detection of mental and physical fatigue from eye movements in reading

Research question: Detection of mental and physical fatigue from eye movements in reading

Data: Eye-tracking-while reading data with a range of labels that quantify sleepiness/vigilance will be provided.

Methods to be used: Machine Learning (neural sequence models or other methods); challenge: combining language modeling of the stimulus text with eye movement data

Detection of alcoholization from eye movements in reading

Research question: Detection of alcoholization from eye movements in reading

Data: Eye-tracking-while reading data labelled the reader's with breath alcohol level will be provided.

Methods to be used: Machine Learning (neural sequence models or other methods); challenge: combining language modeling of the stimulus text with eye movement data

 

Personality classification from eye movements in reading

Research question: Personality (big 5) classification from eye movements in reading

Data: Eye-tracking-while reading data labelled with psychometric personality scores

Methods to be used: Machine Learning (neural sequence models or other methods); challenge: combining language modeling of the stimulus text with eye movement data

Methodological comparison between eye-tracking and self-paced reading for psycholinguistic research

Research question: Methodological comparison between eye-tracking and self-paced reading for psycholinguistic research in sentence processing

Data: Self-paced-reading and eye-tracking data will be provided

Methods: Psycholinguistic statistical data analysis

Investigating the implications of the animacy hierarchy in German

Research question: Investigating the implications of the animacy hierarchy in German

Data: Self-paced reading data to be collected

Method: Self-paced reading data collection and psycholinguistic statistical data analysis

 

Predicting programming skills from eye movements in code reading

Research Question: Inference of coding skills / code comprehension level from eye movements in code reading
Data: Eyetracking data to be collected
Methods to be used: Machine Learning (e.g. neural sequence models)

Deep neural feature representation of eye-tracking data

Research Question: Developing deep neural feature representations of eye-tracking-while-reading data
Data: Various eye-tracking-while-reading data sets will be provided
Methods to be used: Machine Learning (e.g. CNN, sequence models)

Predicting foreign language proficiency from eye movements in reading

Research Question: Inference of foreign language proficiency from eye movements in reading
Data: Eye-tracking-while-reading data sets will be provided
Methods to be used: Machine Learning (e.g. neural sequence models)

Inferring subjective vs objective text difficulty from eye movements in reading

Research Question: Can we estimate objective text difficulty from eye movements in reading using subjective judgements as labels?
Data: https://github.com/ahnchive/SB-SAT/
Methods to be used: Psycholinguistic statistical analysis and/or Machine Learning Methods (e.g. neural sequence models)