“Hey Siri, what time is it in Leuven” “Alexa what is the weather today in Belgium?”

For some of us those phrases sound like every day used sentences. Siri calls granddad, Alexa will predict rain during the football training and so on. Our children are not really surprised that a device understands what they are saying and Siri is used to generate numbers to choose playgrounds in Fortnite. In YouTube we know where to find the closed caption button if a video is not in a familiar language. We might not realise it but we are all surrounded by innovation based on speech recognition.

Unfortunately speech recognition is not without failure:

You: “Siri, shut up my music”
Siri: “that is not very nice”
You: “Siri, make my music quiet”
Siri: “I can’t help you with that”
You: “turn my music off goddammit”
Siri: “Playing all songs”

With all that in mind we started with our mission. Based on the Equal Treatment Act and the conviction that good accessible education is a requirement, we were asked to investigate subtitling of audiovisual material for the University of Amsterdam and the Amsterdam University of Applied Sciences. Our mission was to investigate what is needed to make audiovisual material accessible with use of transcription, now and in the future. Being used to services like Siri and Alexa we had good hope that a simple connection of media files to a service would be enough to make all the video and audio material good accessible through the use of transcription.

Unfortunately we soon found out that recognizing the meaning of a sentence, for instance “Siri can you play classical music for me” is absolutely not the same as giving a perfect transcription. For instance, during the tests, we found out that one transcription service recognized the spoken word “Obesitas” as “op deze tas” (translated: “on this bag”). For us this might be a funny mistake but a student with a hearing problem is totally confused because it doesn’t make sense and the educational value of the video is lost. If that same student has to do an exam based on that video than the problem of mistakes in transcriptions becomes more serious.

After an intense research period and with the help of several transcriptions services in the field we came up with some interesting conclusions which I would like to share with you during the Media and Learning conference June 2019.


Arnout Probst, University of Amsterdam, The Netherlands