TQE: Transcription Quality Evaluation

Project team

Helmer Strik Eric Sanders Robin Rutten Joost van Doremalen Robin Oostrum Daan Broeder Remco van Veenendaal Laura van Eerten
Helmer
Strik
Eric
Sanders
Robin
Rutten
Joost
van Doremalen
Robin
Oostrum
Daan Broeder Remco
van Veenendaal
Laura
van Eerten
CLST, RU, Nijmegen CLST, RU, Nijmegen CLST, RU, Nijmegen CLST, RU, Nijmegen CLST, RU, Nijmegen MPI TST-C TST-C

TQE

Background

TQE (Transcription Quality Evaluation) makes it possible to automatically evaluate the quality of transcriptions. Pairs of files can be uploaded, in which each pair consist of an audio file and its phone transcription (PT). Such a pair is then processed in the following way:

  • the audio signal and the phonetic transcription are aligned,
  • segment boundaries are derived for each phone, and
  • for each segment-phone combination it is determined how well they fit together, i.e. for each phone a TQE measure (a confidence measure) is determined, a number ranging from 0-100%, indicating how good the fit is, i.e. what the quality of the phone transcription is (see, e.g., Figures 1 and 2).

The higher the number, the better the fit is. The output of the TQE tool consists of a TQE measure and the segment boundaries for each phone in the corpus.

Goal

The TQE tool thus makes it possible to find (sequences of) segments for which the match of the phone symbols with the audio signal is not optimal, in other words, the TQE tool can be used to check the quality of phonetic transcriptions. This can be useful for validating (manual) phonetic transcriptions, but also to compare and select (‘competing’) transcriptions, e.g. to study pronunciation variation.

Examples

In order to get a better idea of how TQE works, 2 examples are provided in Figures 1 and 2. As can be observed, the audio signal is the same in these two figures. However, in Figure 1 the correct transcription was used, while in Figure 2 we deliberately replaced it with an incorrect transcription. It can be observed that the TQE scores in Figure 2 are much lower, because the transcription symbols do not match the audio well. It can also be observed that for the second transcription symbol the score reduces from 90 to 39.

The reason is that the different phone transcription sequence in Figure 2 also yields a different segmentation, and part of the /r/ segment in Figure 2 contains part of the vowel, and thus the match is less good, and the TQE score becomes lower.

Figure 1 Figure 1
Figure 1. TQE scores for a correct transcription Figure 2. TQE scores for an incorrect transcription

Conclusion

TQE is useful for validating, obtaining, and selecting phone transcriptions, for detecting phone strings (e.g. words) with deviating pronunciation, and, in general, it can be usefully applied in all research - in various (sub-)fields of humanities and language and speech technology (L&ST) - in which audio and PT's are involved.

Link Description
PID The TQE PID-site
Website The projects website
Manual Manual for the use of the TQE-webservices
Information Additional information about TQE

CLARIN Centre

MPI

Project leader

Helmer Strik