Should automatic speech to text transcriptions be trusted?

Transcriptions, subtitles and machine translations

For some years now, artificial intelligence has made it possible to transcribe speech into text in a fairly reliable manner. This is called "ASR" for Automatic Speech Recognition, or "Speech-to-text".

Automatic recognition saves a considerable amount of time compared to manual input. It takes about 8 hours of very tiring work to process a one-hour media, such as a conference recording, by hand.

The quality of the transcript is better if the language spoken is indicated, if there is no ambient noise and if the words are clearly intelligible, without accents and without overlapping. It is exceptional that an automatic transcription does not have to be corrected manually for spelling, interpretation of proper names, exotic brands or acronyms and especially for punctuation.

With the « parrot » or « respeaking » technique, a writer repeats everything he or she hears in the recording, articulating well and marking punctuation, providing a « clean » sound to the automatic recognition software, which will make fewer mistakes… but it will always make mistakes.

Differences between transcription and subtitles

A transcription by automatic recognition produces a transcript (a text) and subtitles (a structured text with time indications).

Transcription is a metadata of a media that meets a need for accessibility, to retranscribe the audio track of the media to a hearing impaired person, and for textual research or SEO. The transcript is a text. It can be read without playing the media (audio or video).

The subtitles only appear when the media is played, to which they are synchronized. They can be activated to appear on top of the player. One can choose a language among those available or turn all subtitles off.

Subtitle translation

Machine translation of an automatic transcript adds a layer of error to an unreliable source.

One can easily submit a subtitle file to a translation AI (GoogleAzureDeepL, or even Chat-GPT), either via an online interface (copy and paste) or via its API. In general, the formatting and time codes will be ignored, the text will be translated and the formatting will be put back in place. This means that we will have exactly the same temporality and synchronization in all languages, even if the translated texts have very different lengths. Above all, the meaning of the text will be strongly distorted if the original punctuation was incorrect and if each portion of a sentence is translated without considering the sentence as a whole.

To translate subtitles as efficiently as possible, it is therefore advisable to proceed as follows:

  • automatically produce subtitles in the media language
  • correct the subtitles
  • generate a transcript of the corrected subtitles
  • translate the transcript with an AI
  • synchronize the transcript with the media to generate the new subtitle file

There are many free subtitle synchronization tools. We recommend Media Subtitler, which has a very convenient interface and allows manual subtitling in real time (one minute of subtitled video generated in one minute).

On Streamlike

You can automatically generate a transcript and/or subtitles from any media by indicating the spoken language. For transcriptions of media longer than 5′, a specific option must be subscribed.

With the subtitle editor, you can manually correct them and generate a corrected transcript, free of any formatting.

The corrected transcript can then be automatically translated and then manually synchronized to produce translated subtitles.

Alternatively, the corrected subtitles can be submitted directly to a translator, which saves a lot of time, but errors in meaning and timing are to be expected.

Example of automatic transcription

We have chosen a scene from « Les Tontons flingueurs » to show what the AI understands from the dialogues in French and what it should have understood. Select the IA language (for « AI », here by default) to see what was understood, FR to see the corrected version and EN or DE to see the English and German translations of the corrected version. We have only corrected the texts, sometimes adding what was not heard, but we have not corrected the appearance times and display times of the subtitles. So we can do (much) better.

Our recommendations

In conclusion, automatic transcription saves time but manual corrections are rarely necessary.

Except in ideal cases where errors will be few, we do not recommend automatic subtitling without corrections.

Streamlike’s subtitle editor is sufficient for simple corrections, but for large corrections we recommend correcting the automatic transcript (which is very fast) and recreating subtitles by manual synchronization.

Once a subtitle is correct, its direct translation can give good results. The translation of the transcript and its re-synchronization will give a perfect result.

Do not hesitate to contact us for a precise study of your needs.

Share this post

Subscribe to this blog

Enter your email address to subscribe to this blog and receive notification of each new post by email.