Navigation auf


Institut für Computerlinguistik Phonetik

Forensic Phonetic Speaker Identification based on Temporal Evidence

Leitung: Prof. Dr. Volker Dellwo


Everyday experiences tell us that it is typically possible to identify a speaker solely on the basis of his/her voice (e.g. when someone starts a phone call with a simple 'hi' or when people talk in a different room). Such observations reveal that speakers carry individual features in their voices by which they can be identified to a considerable degree. The present project aims at studying the role of temporal characteristics of the speech signal in speaker identification. The study will pay particular attention to possible applications of the results in the field of forensic phonetics in which phonetic knowledge is applied in legal cases where the identity of the speaker in a recording is disputed. We start from the observation that the acoustic speech signal is made up of dynamic processes resulting from the movements of the articulators. It has been demonstrated successfully in other scientific domains that humans can be identified on the basis of their movements only, e.g. by the way they walk. Our working hypothesis is that the movements of the organs of speech (e.g. jaw, lips or tongue) can be equally idiosyncratic as human gait and that idiosyncratic ways to move the organs of speech leave individual temporal charcateristics in the acoustic speech signal. We will therefore study numerous durational parameters in speech from segment durations (e.g. the durations of consonants and vowels) over syllable and word to prosodic durations (e.g. durational characteristics of intonation). In the first year of the project we are aiming at identifying temporal measures of speech that are most speaker-idiosyncratic. In the second year we will test these measures towards within speaker variability (e.g. different types of voice disguise). In year three we will use behavioral experimental methods to test whether the measures we have identified as being most speaker-idiosyncratic are perceptually salient (i.e. whether listeners can identify a speaker solely on the basis of certain temporal voice characteristics). It is well possible that we will find that some temporal speaker idiosyncratic features are perceptually salient and others are not. We argue that the salient temporal features will help us explaining how human listeners identify speakers on the basis of their voice. Non-salient features, however, may be less prone to within speaker variability like voice disguise as they should be difficult to control for speakers. Such features may thus be of high value for acoustic voice identification of non-cooperative speakers (i.e. speakers not wishing to be identified) typically found under forensic circumstances.

Publications / Publikationen:

  • Dellwo,V., Fourcin,A., Abberton,E. (2007). Rhythmical classification based on voice parameters. International Conference of Phonetic Sciences (ICPhS) [Online]
  • Dellwo,V., Huckvale,M., Ashby,M. (2007). How is individuality expressed in voice? An introduction to speech production & description for speaker classification. in Mueller,C.
  • Dellwo, V. and Koreman, J. (2008) How speaker idiosyncratic is acoustically measurable speech rhythm? Abstract presented at the annual IAFPA meeting 2008, Lausanne/Switzerland.
  • Dellwo, V., Ramyead, S., and Dankovicova, J. (2009)
    The influence of voice disguise on temporal characteristics of speech. Abstract presented at the annual IAFPA meeting 2009, Cambridge/UK.
  • Fourcin, A. and Dellwo, V. (2009) Rhythmic classification of languages based on voice timing. UCL Eprints, London, UK (

Keywords / Suchbegriffe:

forensic phonetics, time-domain, temporal parameters, speech prosody, speech rhythm, speech timing

Funding source(s) / Unterstützt durch:

Universität Zürich (position pursuing an academic career), SNF (Personen- und Projektförderung) Duration of Project / Projektdauer Aug 2010 to Aug 2014