Navigation auf uzh.ch

Suche

Institut für Computerlinguistik

Resources

Mostly an unordered collection of pieces of teaching material, writing or code. Feel free to poke around!

If you use any of the below material, please make sure to cite this source. Thanks!

Recommendations for how to get started with Deep Learning

I wrote an extensive list of papers, books, tutorials, websites, code etc. that I would recommend to get started with neural networks, NLP and MT.

Educational Material

 

Introduction to Neural Networks

I recently gave a general introduction to feed-forward neural networks. Mostly technical, bare-bones explanations and code, no high-level libraries.

Slide sets | Colab notebooks

 

Youtube Playlist Gradient-based Learning

I recently started a Youtube channel! There are 3 videos already, originally meant for my students in a 2019 course. The videos are supposed to give an intuition for what gradient-based learning is.

Watch Playlist

 

Youtube Playlist Decoding Algorithms

Another playlist where I explain sequence generation decoding algorithms such as beam search and constrained beam search.

Watch Playlist

 

Introduction to Machine Learning

Selected slide sets and exercises from my introductory course on machine learning. Requirements: high-school math, statistics and basics of Python programming. This specific course taught together with Phillip Ströbel - thanks Phillip!

  Topic                                                                Slide set               Exercises Notebook
1 Basic concepts of machine learning 1.pdf (PDF, 8 MB)

1.ipynb (IPYNB, 239 KB)

2 First classification algorithm: KNN 2.pdf (PDF, 4 MB)

2.ipynb (IPYNB, 9 KB)

3 First regression algoithm: linear regression

3.pdf (PDF, 1 MB)

3.ipynb (IPYNB, 51 KB)

4 Cross-validation, Hyperparameter Search

4.pdf (PDF, 9 MB)

4.ipynb (IPYNB, 19 KB)

5 Feature Extraction

5.pdf (PDF, 2 MB)

5.ipynb (IPYNB, 21 KB)

6 Overview of important classification algorithms

6.pdf (PDF, 2 MB)

6.ipynb (IPYNB, 11 KB)

 

Introduction to Neural Machine Translation

Selected slide sets and exercises from my introductory course on neural machine translation. Requirements: fundamentals of machine learning, high-school math, statistics and basics of Python programming.

Some of the materials were developed together with Samuel Läubli.

  Topic                                                                Slide set
1 Introduction 1.pdf (PDF, 13 MB)
2 Evaluation (this slide set by Samuel Läubli) 2.pdf (PDF, 3 MB)
3 Preprocessing

3.pdf (PDF, 3 MB)

4 Statistical Machine Translation

4.pdf (PDF, 5 MB)

5 Linear Algebra, Differential Calculus

5.pdf (PDF, 4 MB)

6 Linear Models

6.pdf (PDF, 3 MB)

7 Feed-forward Neural Networks 7.pdf (PDF, 7 MB)
8 Recurrent Neural Networks 8.pdf (PDF, 4 MB)
9 Tensorflow 9.pdf (PDF, 4 MB)
10 Encoder-Decoder Models 10.pdf (PDF, 8 MB)
11 Attention Networks 11.pdf (PDF, 5 MB)
12 Decoding (this slide set by Samuel Läubli) 12.pdf (PDF, 663 KB)
13 Current Research / Recent Improvements 13.pdf (PDF, 4 MB)
14 Summary 14.pdf (PDF, 4 MB)

Educational NMT Tool "daikon"

Try our educational (= slow, unstable, but insightful) NMT tool, daikon. It's written in Tensorflow, and you will need a GPU to train models. Main authors are Samuel Läubli and myself.

Github Repositories

 

Whatsapp Author Identification

Take text messages from your favorite Whatsapp group to train a system that classifies your friends!

 

Recipes for Sentence Classification with DyNet

Code that exemplifies neural network solutions for classification tasks with DyNet. On top of that, the code demonstrates how to implement a custom classifier that is compatible with scikit-learn's API.

 

RNN Recipes

Forward passes of several flavours of recurrent neural networks in Numy and Tensorflow.

 

Moses Scripts Only

A stripped-down version of the Moses repository, with only the scripts for preprocessing that most people still use.

 

Feed-forward neural networks with Numpy

Implementation of feed-forward networks only using Numpy. Thanks Joel for this idea!

 

Daikon Toy Models

Scripts that show how to train and use models with daikon, our educational NMT system (https://github.com/zurichnlp/daikon).

 

Sockeye Toy Models

Scripts that show how to train and use models with Sockeye.

Writing, Talks, Thoughts

 

Guide to Scientific Writing (PDF, 265 KB)

Guide to writing as scientific thesis. Caution and Disclaimer: this guide is unfinished and I will probably never get to work on it again (or write a thesis the way I am suggesting in it! :-).

 

Crowd-sourcing and English-centric research in NLP (PDF, 1 MB)

Thoughts on the question whether Crowd-sourcing facilitates research in non-English NLP. Presented at the Jožef Stefan Institute, Ljubljana.

 

Report on Feasibility of Stand-off Markup in TEI Documents (PDF, 307 KB)

Technical report on how exactly to organize Text+Berg annotation layers into several XML files.

 

Cost-effectiveness of Games with a Purpose for Collecting NLP Annotations (PDF, 5 MB)

Are games with a purpose a cost-effective method of collecting annotations for NLP research?

 

Treatment of Aphasia with Melodic Intonation Therapy in Tone Languages (PDF, 199 KB)

A seminar paper describing the setup and premise for an experiment that would investigate the merit of melodic intonation therapy to treat aphasia in speakers of tone languages.

 

Acquisition of Negation in English (link coming soon)

Discussing hypotheses about acquisition of negation in English speakers.

 

Schreien im Labor (PDF, 6 MB)

Summary of our research at the Phonetics Lab I gave at a conference for acousticians. In German.

 

Classifying Audience Reactions from Text (PDF, 270 KB)

Using the awesome CORPS corpus of speeches to classify text into audience reactions. For instance: look at a piece of text and try to determine whether the audience laughed after hearing it. Methods in the paper are questionable, but the idea itself is valid I think and still uncharted territory I would say.

 

Typology of Nominal Plural Marking (link coming soon)

Looking at a sample of typologically diverse languages to analyze if and how they mark plural on nominal constructions.