Written by: Misha Tenenbaum, CEO, EditStock
Foreword from Larry Jordan
I’ve known Misha for years and when he asked to write a product review I was immediately interested. I first learned about SpeedScriber late last year and am delighted to publish Misha’s review.
– – –
Misha’s Product Review Disclaimer
I learned about SpeedScriber’s beta test through a friend at the Los Angeles Creative Pro User Group. I have not been paid to write this review, though my first 30 minutes of SpeedScriber transcription was free, as it is with all beta testers. After that, my company EditStock paid the regular retail price to continue transcribing. I was not asked to write this review by SpeedScriber and I have no financial interest in the success of SpeedScriber.
This review discusses a product currently in beta test. The final version may have features not covered in this review.
– – –
One of the most common feature requests we receive at EditStock is for transcripts to go along with the interviews in our documentary projects. In the past our customers complained that while all our short films included professionally lined scripts, our documentaries had zero paperwork.
Historically, making transcripts meant to have a human being sit in front of a tv monitor manually, carefully, and laboriously typing each spoken word of a video into software which ties those words to the video clip’s timecode. Transcription took time and cost money. Transcription services are generally expensive because companies like mine commission outside venders to do the work for us. We then hire our own operators to check that work.
This all changed for us when a friend recommended I beta test SpeedScriber, a simple, automatic transcription tool, which can be mastered in about an hour. In this article I’ll describe how it works, and what it’s costs are as compared to traditional transcription methods. My knowledge is based transcribing several projects beginning with “Built By Life,” a documentary about the Canadian farmer, Vince McIntyre.
SpeedScriber consists of two parts; a downloadable app where the transcription work is done and an online accounts page where you can buy transcription by the minute. You pay by the minute because the transcript is created for you automatically. The transcript is timecode stamped to match your video. Currently the app only transcribes from English and is Mac only.
Beta testers must request access to the service. Release is scheduled for March.
LARRY ADDS: My understanding is that the automated transcription is provided by the IBM Watson supercomputer. I know of two other automated transcription applications in development. Digital Heaven appears to have the one that is closest to release.
The application supports US, UK and Australian accents and can also export transcripts as SRT files for closed captions.
Developer: Digital Heaven
Cost: Currently, the beta application is free, transcription costs from $0.37 – $0.50 per minute; depending upon volume
The software component is a standalone app. When you open the app you’ll see two main windows. In the left window you’ll add your video or audio only clips. Your transcription doesn’t start right away and you won’t be charged, though it might feel a bit scary.
The left window is a queue where you can select the language being spoken (currently only English) and assign an estimated number of speakers in the clip to help the software in its speech to text detection.
When your queue is ready, click the Transcribe button. In my tests the app transcribed, on average, 4x realtime speed (a 4 minute interview takes about 1 minute) with better than 95% accuracy. I went from zero experience with the app to full competency in less than an hour, though admittedly I needed to read an article or two to find specific features.
What SpeedScriber is good at:
What SpeedScriber is not good at:
In the two dozen examples we submitted for transcription, correcting the transcript took about one-and-a-half times real time. In other words, an hour long clip can be checked and corrected in 90 minutes by a human.
THE FIRST TRANSCRIPTION
Double click the transcription you want to edit in the right window.
The interface will change to show your transcript on the left and your video on the right. You’ll see lots of punctuation and speaker mistakes right away. We learned that these can be changed very easily.
Change the name of the speaker by clicking on their name, which might be “speaker 1” as in our example. Enter a new name.
Change who is speaking in the transcript by left-clicking on the speaker’s tag. I right-clicked about 100 times before figuring this out.
Use the space bar to play the video. You’ll see the words in the transcript light up. They change from light grey (meaning not reviewed) to red (meaning the active word) to black (meaning reviewed). Press spacebar again to pause. Use the control bar at the bottom of your transcript to add or remove common punctuation, and capitalization errors. All these buttons have keyboard shortcuts which are easy to pickup. For example to add a period the keyboard shortcut is, well, the period key. It’s the same for adding or removing a comma.
To change a word – and this tripped me up – you need to press the return button on your keyboard. There is no button on the control panel for this. The word you want to change will turn blue. In the example below I changed “patch. Kids” to “hatchets.”
Most of the mistakes it made were in punctuation and confusion with words like “know” and “no” or “eyes” and “knives”. It was poor at determining who was speaking, especially if they were off mic, but this can all be adjusted as I’ll address in a moment. What I learned from this experience is that humans basically suck at speaking. We speak in fragmented, run on sentences, and not well formulated thoughts.
Once you’re done, export from the file menu. We exported to both PDF and Text file to include in our products. We could also have gone directly into our NLE with the “send to FCPx” feature (works with FCPx 10.3 or newer), FCPx XML export, or Avid Script export. We could also export an SRT file for closed-captioning.
Later, if you want to come back and make another change to the transcript you must have your media connected. The software will not allow you to work if the media is “offline.” However, you do not need to be “online” in terms of the internet to see your transcripts. If your media is offline when you double click on the transcript a relink dialog box will appear.
IMPORTANT NOTE: Sometimes SpeedScriber will skip dialog which is off mic, but that I’m sure the software hears. It never skipped Vince, though it would skip some of his “um’s” and “ah’s”, which is actually quite nice. Generally the transcription is much worse if the character is off mic. This is important because it means that SpeedScriber should not be used as a tool to figure out what someone says, but rather only to save time transcribing what we all know he says. The software struggles with mumbling.
We paid $30 per hour of video transcribed, or $0.50 per minute. We also needed to put some man hours into checking and adjusting the transcription. Our real cost was about $90 per hour of finished video ($30 SpeedScriber + $60 human to check). If you’re transcribing a lot, the price per minute gets down to $0.37.
Since the software is easy to use, we can assign transcriptions to employees who already work in-house.
As a business, EditStock is always balancing customer feature requests with added development cost per project. We were on the fence about adding transcription services before, but now we’ve decided to go for it. Based on what we learned here, we will be adding transcriptions to all our projects in the near future.
NEW & Updated!
Edit smarter with Larry’s latest training, all available in our store.