< Back to Support

Kollaborate: Getting best results from AI transcription

Our speech models are currently optimized for specific use-cases and may not fit all of the situations our customers use Kollaborate for. We plan to train the models to fit a wider range of situations over time, using both our own data and data provided by customers.

You will get the best results from audio with the following characteristics:

  • Minimal background noise
  • No music
  • The speaker is using English words in common usage (no complex technical or scientific terms)
  • The speaker's mouth is close to the microphone
  • The speaker is a native English speaker
  • Dialogue is spoken conversationally (not shouted, whispered or overly dramatic)

This means that our system currently works best when transcribing content like podcasts and voiceovers.

If your audio doesn't meet all of these characteristics, that doesn't mean it can't be transcribed, it just means that our system may make more mistakes when transcribing.

The best way you can improve the results of the transcription engine is to correct any mistakes and then click the Learn button. You are given the option of submitting the entire file (correct transcriptions are still useful for the model) or just the sentences you corrected. Kollaborate then splits up the file into small audio clips for each sentence, along with the transcript provided, and those clips will then be used within the model to teach Kollaborate what the correct transcription should be for similar audio.

Your privacy is respected during this process. No video is stored, only audio, and each sentence is given a random filename so it would be difficult for anyone to piece back each sentence into its original order. We don't share your original audio data with anyone and once it's inside the model it can't be extracted out again as recognizable audio.