Crowdsourcing the pain of transcribing audio

The trouble with recording interviews is that you have to transcribe them. So after one of my forays to New Haven last week, where I interviewed people in connection with a book I’m working on about community news sites, I had a ton of audio and the unpleasant task of translating it all to text.

I decided to crowdsource the task through an Amazon.com service called Mechanical Turk. More about that in a moment. But first I want to explain my reluctance to try it.

I think the results are better when I do it myself. I have to listen carefully, which helps me seal the best stuff inside my leaky brain. I know what we were taking about, which means that I’m not flummoxed by names and unusual phrases, as any transcriber would be. And because I have an idea of how I’ll use the material, I can decide on the spot what to transcribe verbatim, what to paraphrase and what to leave out altogether. So I knew I could potentially be giving up a lot by turning the task over to others.

Some years ago I used a transcription service near Harvard Square when time was of the essence and when, most important, someone else was paying the bill. This time, faced with many hours of work, I decided to take advice given me last fall by Zach Seward and try MTurk. Seward, then with the Nieman Journalism Lab, told me that lab director Joshua Benton had used it to transcribe this talk by New York University’s Clay Shirky. I was impressed.

I posted a query on Twitter, and several people responded by sending me a link to an online guide by Andy Baio. I decided to try it with two interviews — a 65-minute recording with New Haven Independent founder and editor Paul Bass, made on his reasonably quiet back deck, and a 35-minute conversation with New Haven alderman Michael Jones, at an outdoor café on a busy street.

My first step was to go through the cumbersome process of converting my Olympus recorder’s WMA files to MP3s, and then dividing those MP3s into five-minute chunks so that a number of different people could apply themselves to the task. By the time I got around to doing the second interview, I had stumbled upon EasyWMA, a $10 utility that took the pain out of conversion, and had finally taught myself enough about Audacity, a free audio editor, so that I could painlessly produce five-minute bits.

I was surprised by how quickly the crowd swarmed over my files — in less than a day, I had everything I needed. Unfortunately, the quality was extremely uneven. Some of the mistakes were bizarre or unintentionally hilarious. How “state of Connecticut” became “state of Kentuckian” is one I’ll never figure out. And here’s a choice excerpt from my conversation with Bass. First, the MTurk version:

They had a Sunocompass call with WBR few weeks ago to get the advice, how the membership strives. The taste and ever didn’t undership strives because I felt that if the widely suceessful they might get thirty to fourty thousand dollars.

Now, what he really said:

They had us on a conference call with WBUR few weeks ago to get advice on how to do membership drives. In the past I hadn’t done membership drives, because I felt that if they’re wildly suceessful they might get you to $30,000 or $40,000.

Following Baio’s advice, I’d set a price of $2 per five-minute excerpt. You have the option of rejecting unusually bad work, refusing to pay and letting someone else take a crack at it. I decided to accept everyone’s work, including the person who produced what you see above. But I blocked two people (including the one I just cited), so that if I use the service again, they won’t have a chance to work on my stuff.

Overall, I paid $41.80*, $3.80 of which went to Amazon, the remainder to the folks who actually did the work.

Between file conversion and preparation, downloading transcribed interviews, listening to everything again and cleaning up the transcripts, I don’t know how much time I saved. Not much, probably. Yesterday I transcribed two interviews myself, and I thought the results were much better.

On the other hand, I purposely chose my Bass interview for MTurk because it was long and he talks very quickly. It was also an unusually substantive conversation, and I knew there wasn’t much I wanted to leave out. Most of the transcribers did an OK job.

My bottom line is that, in the future, I would probably reserve MTurk for situations in which I have good audio quality and need a full verbatim transcript. Even knowing that I’ll have to do a fair amount of retyping, it’s still better than starting from scratch.

But if I’m producing normal interview notes, I’ll handle it myself.

*Addendum: Jack Shafer of Slate told me the price I cited doesn’t mean much without comparing it to the price of a professional transcription service. So I contacted a good one and was told it would cost about $140 an hour — or about $230, nearly six times as much as what I paid. That’s a huge mark-up. On the other hands, the results would have been more usable.

Illustration via Wikimedia Commons.