Probing Tasks for Noised Back-Translation

Student: Nicolas Spring


In NMT, a very common way to use monolingual data in the target language is to use back-translation ( Back-translation means training a target-to-source model and using it to fill in a pseudo source sentence for sentences in the target language. Back-translation has a profound impact on translation quality.

Recently, it was found that adding noise to back-translation improves performance even further ( The authors argue that noised back-translation achieves this by increasing source diversity. on the other hand offer a different explanation: they argue that the gains come from indicating to the model that back-translated text is a different kind of data.

In this project, you will put the latter view to the test by probing model states: you will train classifiers that take NMT encoder states as input, and label them as one of two classes: genuine source sentences and back-translations. The main idea is to test the extent to which encoder states contain information that helps discriminate between two kinds of inputs.

See for more information on probing tasks.


  • Python
  • Theoretical knowledge of NMT models
  • Scikit-learn
  • Experience with training machine translation models, such as Sockeye