Crowdsourcing the pain of transcribing audio

The trouble with recording interviews is that you have to transcribe them. So after one of my forays to New Haven last week, where I interviewed people in connection with a book I’m working on about community news sites, I had a ton of audio and the unpleasant task of translating it all to text.

I decided to crowdsource the task through an Amazon.com service called Mechanical Turk. More about that in a moment. But first I want to explain my reluctance to try it.

I think the results are better when I do it myself. I have to listen carefully, which helps me seal the best stuff inside my leaky brain. I know what we were taking about, which means that I’m not flummoxed by names and unusual phrases, as any transcriber would be. And because I have an idea of how I’ll use the material, I can decide on the spot what to transcribe verbatim, what to paraphrase and what to leave out altogether. So I knew I could potentially be giving up a lot by turning the task over to others.

Some years ago I used a transcription service near Harvard Square when time was of the essence and when, most important, someone else was paying the bill. This time, faced with many hours of work, I decided to take advice given me last fall by Zach Seward and try MTurk. Seward, then with the Nieman Journalism Lab, told me that lab director Joshua Benton had used it to transcribe this talk by New York University’s Clay Shirky. I was impressed.

I posted a query on Twitter, and several people responded by sending me a link to an online guide by Andy Baio. I decided to try it with two interviews — a 65-minute recording with New Haven Independent founder and editor Paul Bass, made on his reasonably quiet back deck, and a 35-minute conversation with New Haven alderman Michael Jones, at an outdoor café on a busy street.

My first step was to go through the cumbersome process of converting my Olympus recorder’s WMA files to MP3s, and then dividing those MP3s into five-minute chunks so that a number of different people could apply themselves to the task. By the time I got around to doing the second interview, I had stumbled upon EasyWMA, a $10 utility that took the pain out of conversion, and had finally taught myself enough about Audacity, a free audio editor, so that I could painlessly produce five-minute bits.

I was surprised by how quickly the crowd swarmed over my files — in less than a day, I had everything I needed. Unfortunately, the quality was extremely uneven. Some of the mistakes were bizarre or unintentionally hilarious. How “state of Connecticut” became “state of Kentuckian” is one I’ll never figure out. And here’s a choice excerpt from my conversation with Bass. First, the MTurk version:

They had a Sunocompass call with WBR few weeks ago to get the advice, how the membership strives. The taste and ever didn’t undership strives because I felt that if the widely suceessful they might get thirty to fourty thousand dollars.

Now, what he really said:

They had us on a conference call with WBUR few weeks ago to get advice on how to do membership drives. In the past I hadn’t done membership drives, because I felt that if they’re wildly suceessful they might get you to $30,000 or $40,000.

Following Baio’s advice, I’d set a price of $2 per five-minute excerpt. You have the option of rejecting unusually bad work, refusing to pay and letting someone else take a crack at it. I decided to accept everyone’s work, including the person who produced what you see above. But I blocked two people (including the one I just cited), so that if I use the service again, they won’t have a chance to work on my stuff.

Overall, I paid $41.80*, $3.80 of which went to Amazon, the remainder to the folks who actually did the work.

Between file conversion and preparation, downloading transcribed interviews, listening to everything again and cleaning up the transcripts, I don’t know how much time I saved. Not much, probably. Yesterday I transcribed two interviews myself, and I thought the results were much better.

On the other hand, I purposely chose my Bass interview for MTurk because it was long and he talks very quickly. It was also an unusually substantive conversation, and I knew there wasn’t much I wanted to leave out. Most of the transcribers did an OK job.

My bottom line is that, in the future, I would probably reserve MTurk for situations in which I have good audio quality and need a full verbatim transcript. Even knowing that I’ll have to do a fair amount of retyping, it’s still better than starting from scratch.

But if I’m producing normal interview notes, I’ll handle it myself.

*Addendum: Jack Shafer of Slate told me the price I cited doesn’t mean much without comparing it to the price of a professional transcription service. So I contacted a good one and was told it would cost about $140 an hour — or about $230, nearly six times as much as what I paid. That’s a huge mark-up. On the other hands, the results would have been more usable.

Illustration via Wikimedia Commons.


Discover more from Media Nation

Subscribe to get the latest posts sent to your email.

7 thoughts on “Crowdsourcing the pain of transcribing audio”

  1. Dan:
    While I’m all for the DIY approach to crowd sourcing transcription, you might do a simple test by giving a couple of the more problematic 5 minute clips to Casting Words (castingwords.com) for an entirely different quality experience. That will only be a test, however, since Casting Words takes the file whole, saving most of your prep time right there.

    Really worth a close look.

    Thanks for being such a discerning presence amidst an industry in disruption.

    Chris

    1. @Chris: I’ll be heading back to New Haven soon enough, so perhaps I will. Thanks.

  2. I have the suspicion that the particularly bad example you showed was produced by machine translation software — in other words, someone cheating. No human writes “The taste and ever didn’t undership”

    – Jonathan

    1. @Jonathan: I hadn’t thought of that, but I’m sure you’re right. Makes perfect sense.

  3. I work as an editor, and among the pieces we work on are videos of an hour or more. In my line of work, it’s essential to check the transcript, word for word, against the video as a separate step no matter who transcribes the piece. I’ve worked as a transcriptionist in the deep, dark past, so I’m sometimes tempted to just do it myself, but over the years I’ve realized how much better the final product is when I’m not doing the production and the verification back to back, especially with high volume. Plus finding their errors is always good for a laugh or a good old flush of righteous indignation.

  4. Back in the day, right after the earth had cooled, I had to transcribe several lengthy interviews that were the primary research for my dissertation. No Mac back then. I typed the transcripts. Muchos tedious, but I certainly knew exactly what was in the interviews. As I wrote my dissertation I almost always could quickly find the quote I was looking for. Bottom line- transcribing it yourself takes more time upfront, but worth it in the long run.

  5. I’ve tried to transcribe my audio files myself as well…it turned out to be a long waste of time. If the length of your audio files is short, there are many transcription companies that will transcribe up to 5 minutes for free just to test out their services before you submit your full lengh audio files. Transcription-Express.net was pretty reliable for me.

Comments are closed.