SpeedScriber provides automatic speech-to-text transcripts for audio files using the power of machine learning and the Internet.
I’ve been skeptical of automatic transcription – and still am, for that matter – when creating finished transcripts of programs. But, I’ve found myself using SpeedScriber more and more recently and want to share what I’ve learned.
NOTE: I’ve used SpeedScriber for two recent articles: Reasons for Hope and The Biggest News from CES 2019.
SpeedScriber is a stand-alone, easy-to-use program that quickly takes audio files (either stand-alone or synced with video) and creates text transcripts from the content.
The transcriptions are not perfect; punctuation and proper names are most often in need of correction. But, the speed and flexibility of this system provide a valuable service when you need text now and it doesn’t need to be accurate.
NOTE: Speedscriber cautions that it “is designed for professional content creators who are transcribing audio or video files with well-recorded audio and destined for editing or distribution. It is not designed for transcription of meetings, lectures or interviews recorded with phones or voice recorders.”
The program itself is free, charges are based on the number of minutes for transcribed audio. Prices range from 37 to 50 US cents per minute, based upon the amount of time purchased.
Developer: Digital Heaven, LTD
Mac App Store: id1101502006
Price: Free, charges apply for transcription time
(Click to view a larger image.)
One of the reasons I like SpeedScriber is that most of my editing these days is audio-only, for my weekly podcast Digital Production Buzz. Unlike other programs like SimonSays, Transcriptive, or Lumberjack System which are tightly integrated into video editing software, SpeedScriber works really well with individual audio files, as well as video.
Before transcribing, I edit the audio in Adobe Audition (or any other digital audio editor) to clean up the text and make sure levels are loud, background noise is reduced and, in general, the clip sounds as good as I can make it.
NOTE: Automatic transcription software gets really confused with low levels, multiple people talking at once, background noise or anything else that makes it hard to hear the principle speaker. You can use lower-grade audio, but the quality of the transcription will suffer.
Once the audio is prepped, I export just the section I want to transcribe as a high-quality WAV or AIF file. If you are working with video, simply drag the entire video clip into SpeedScriber. This will allow you to watch and listen to the video clip during review and editing; there’s no reason to separate the audio from video.
SECRET TRICK: What SpeedScriber does behind the scenes is convert whatever audio you import into a compressed mono audio file (and separates it from the video, if necessary). This allows for fast uploading with sufficient quality for transcription. You are never uploading the actual file. Cool.
The interface is divided into two sections:
To add a file – audio or video – simply drag it into the left panel. Here, you see the file I used for the Mark Harrison interview. Click a file name to select it and, at the bottom of the panel, you will see a waveform of the file itself. This allows you to play it to make sure you added the correct version. (You can add as many files as you want – the most I’ve batched at one time was ten.)
NOTE: The software also asks, as you import it, how many speakers the clip contains. This helps the Cloud service recognize different voices. The default setting is two.
When you’ve added all the files you want to transcribe, click the Transcribe button at the top.
SpeedScriber charges for transcription time based in whole minutes and always rounds up for each file. This message is displayed so you know how much time you are paying for.
Click Transcribe and the process starts.
The actual transcription is done by one of several web services. Amazon, IBM, Microsoft and Google are leaders in the industry, but there are others. SpeedScriber preps the file, then sends it to one of these Cloud services. When the transcript is complete, SpeedScriber displays the text and allows you to edit it.
Because developers consider their specific selection of Cloud services a competitive advantage they don’t release which service they use. Some, I’ve been told, use more than one. I don’t know which service SpeedScriber uses.
The actual transcription happens faster than real-time. My four minute clip took about 20 seconds to transcribe.
When the transcript is complete, the file appears in the right-panel with the Status pop-up menu set to Transcribed.
You can use the Search box at the top to search for a file, or the popup menu to search for files based upon its status in the production process.
Double-click a file to display it for editing.
As long as the local version of the audio file is not deleted or moved, you can press the spacebar to play the clip and watch as the transcript highlights each word in turn. This allows you to correct errors, add punctuation, remove “ums” and “you knows,” and, in general, clean things up.
NOTE: With automatic transcription, you will ALWAYS need to clean things up. The biggest weakness I’ve seen so far is punctuation; the system does not reliably recognize commas or questions.
You can display a transcript without accessing the audio file, but you won’t be able to hear it, only read it. The system provides a dialog allowing you relink an audio file if it gets lost, however, it takes a lot longer to open a transcript without an audio file because the software spends time trying to find it. A lot longer.
Double-click any text, so it turns blue, to edit the text.
The editing panel at the bottom is especially useful. This allows you to use either the keyboard or icons to:
A trick I discovered was to highlight a word in red, then type a comma. This automatically adds a comma at the end of a word. Or type a period to add a period at the end of the selected word and capitalize the next word. These typing shortcuts changed a laborious editing process into something much quicker. You can even add punctuation while the audio is playing – if your fingers are fast enough.
NOTE: You can also change the playback speed from real-time to 1.5x or 2.0x.
SpeedScriber allows selected multiple words, say to delete them, but you need to be in “red text” mode. When selected text is displayed in red, you can drag across and select multiple words. However, you can not drag and select multiple words between speakers. (Keyboard shortcut: Cmd+Delete.) You can also use multiple selected words to change case, hyphenate or join them together.
(Click to view larger image. Footage courtesy of John Putch “Route 30, Too!” (www.route30trilogy.com))
This is what the editing window looks like when you transcribe a video clip with synced audio.
Once a transcript is open, you can search for a word or phrase using the panel on the right. Click a search result and SpeedScriber jumps to that text and immediately starts playing the transcript.
NOTE: Personally, I wish there was a preference so that it would jump to a word and NOT start playing. This would make editing text a lot easier.
You can also use the right-hand panel to see and play the waveform, as well as build a custom list of words from transcript. All these panels are a bit rudimentary, but allow plenty of room for feature expansion in future versions of the software.
The real power of SpeedScriber is when the transcript is complete. With the file open for editing, choose File > Export to output the file as:
You can also send the selected file directly to Final Cut Pro X and have the transcript included with the clip. Here, for instance, are iTT captions exported from SpeedScriber and imported into Final Cut Pro X.
Even better, the system retains past transcripts so if you ever need to reoutput a file, it is easy to find and output.
NOTE: However, I found the current version of SpeedScriber (2.2.1) takes a very long time to open an older transcript if the audio file no longer exists locally.
I’ve been spoiled by the high-quality of human transcription and, truthfully, automatic transcription can’t beat the work of an experienced transcriptionist.
But, for many projects, speed is more important than transcription accuracy. I find SpeedScriber fast, easy to use, and easy to edit the finished results. It doesn’t get in my way and allows me to extract text from audio with a minimum of hassle. When you are working with source files and trying to figure out a rough cut, or simply need to extract an interview for a news story, SpeedScriber is a very useful tool.
I find myself finding new ways to use it each week.
6 Responses to Product Review: SpeedScriber from Digital Heaven
Yeah, but when will premiere offer a product that can do what Avid’s Scriptsync can do? Link the transcript to the timecode and allow editors to do a paper cut with the text file that creates a rough assembly of the clips?
I can upload an audio file to youtube and google will transcribe it for free. Why do I need this program?
SpeedScriber is a great stand-alone program. For something tightly integrated into Premiere the delivers functions similar to ScriptSync, look into Transcriptive. https://transcriptive.com
Larry, Have you tried TRANSCRIPTIVE? Any chance you plan to review it soon? I am working on a documentary with lots of interviews and need something that will work easily with Premiere for doing rough edits of many hours of footage.
Transcriptive is a lovely program with excellent integration into Premiere. I have used it and recommend it.
Thank you, Larry. The other service I am looking at, that recently created a Premiere plug-in, is Trint. The plug-in is free, but the transcription per hour is about 3X the cost of Transcriptive (whose plug in is not free.) Trint may be cost effective for those with fewer hours of footage. My main question is the function of the Premiere plug-in of each company, and whether you or any one has compared the two. I have done a lengthly search and have not found any comparisons online, but they are both relatively new. Do you have experience with Trint as well?
Prior to your note, I had not heard of Trint, so, no, I don’t have a comparison.
Keep in mind that virtually all automated text transcriptions use one of three Cloud-based engines. (Transcriptive just added the option to use their own engine as well.) Which means what you should REALLY look at is the interface.