Transcription Businesses have Thrived Despite Arrival of Automated Services & Advancing A.I. Technology
Ellie Leonard, the owner and sole employee of Red Pencil Transcripts in Lolo, Mont., is a case study in occupational resilience, staying ahead of the relentless march of artificial intelligence for now.
Transcription, put simply, is converting human speech to text. And speech recognition is one of the tasks where A.I. technology has made the most rapid progress. Seeing the door that technology has opened, automated transcription start-ups like Trint and Otter.ai have jumped in and are becoming popular.
Human transcribers would seem an endangered species. Yet Ms. Leonard, 37, has thrived, despite the arrival of automated services and advancing A.I. technology.
The work is project-based so it varies, but she has had growing demand from university researchers, documentary filmmakers, authors, lawyers and journalists. (The New York Times occasionally uses Red Pencil Transcripts’ services.) Last year, Ms. Leonard made more than $40,000 — a decent income that along with her husband’s pay as a railway switchman helps support their household with four young children.
Her experience points to the continuing limitations in the field of natural language processing: using computers to not only recognize, but also to try to capture the meaning of language.
Ms. Leonard’s clients — some of whom have tried automated transcription services — say she brings context, background knowledge and a genuine interest in the subjects to her work. She looks up people’s names, place names and acronyms to get them right. Accents, cross talk and background noise do not result in gibberish in her transcripts, as they often do for software transcribers.
There is a lot more involved in producing the kind of flawless transcripts she does than generating words from sounds. “People think that words and language are the same thing, but they’re not,” said Kristian Hammond, a professor of computer science and an artificial intelligence researcher at Northwestern University.
To explain, Mr. Hammond offers an example he uses in class. “Mary Smith is Stanford’s premier roboticist.” Computer classification tools can determine that Mary Smith is a woman and Stanford is a university, for example. But to a human listener, the sentence also says that she is smart and ambitious, not meanings a computer would likely discern.
Continuing his example, Mr. Hammond tells his students that Mary has put in an 80-hour workweek and drops by a bar in town with friends for a cocktail. Will they serve her? Of course, his students reply. They know — in a way a computer cannot — that she would have to have a Ph.D. and done postgraduate work, so she must be at least in her mid-30s.
“Humans bring to language all this background knowledge about the world, and it doesn’t even occur to us that we know it,” Mr. Hammond said. “In terms of real computer understanding of language, we’re just scratching the surface.”
But despite the technology’s current limitations, recent progress and its trajectory of improvement suggests that natural language processing will have a major long-term impact on work and jobs. By 2030, up to one-third of Americans might have to switch to new occupations, estimates the McKinsey Global Institute, the research arm of the consulting firm.
Helping drive that labor force disruption, experts say, will be applying artificial intelligence to more and more language-dependent work tasks. It has already begun with things like question-answering chatbots in customer service call centers and applications to automate back-office clerical work.
In the future, think of an Alexa-style assistant that reads documents, picking out key points and summarizing arguments. It scans email, web pages and videos for topics of interest you have told it to look for. It is a digital helper, learning as it goes and steadily taking on more challenging tasks. The technology promises to change many jobs and eliminate some, though how many and how soon remains uncertain.
“Natural language processing is both a gating factor and an enabler, a pivotal capability in so much of work,” said Michael Chui, a partner in the McKinsey research institute.
Word percent accuracy or understanding is not a bar to effectiveness for many tasks. Transcription is a good example. Once the A.I. engines began delivering accuracy of 90 percent or more, it became a market-ready technology. Today, the word accuracy can exceed 95 percent, in good recordings of a single speaker talking clearly. The results typically require corrections, breaking into paragraphs and there are gaps. But the leading automated systems are fast, inexpensive and getting better.
In Nebraska, the state legislature began last year using the automated service from Trint, a London-based start-up, to help in transcribing its legislative sessions and hearings. The Nebraska legislators meet for a 90-day floor session in odd-numbered years and a 60-day floor session in even-numbered years.
The transcription accuracy of the morning prayer — one person speaking into a microphone — is nearly 100 percent, said Daren Gillespie, a network administrator at the legislature who oversees the adoption of the technology. During hearings, with several different speakers and cross talk, the accuracy “nose-dives,” he said.
The transcription work force has been organized into two groups. The first breaks the A.I.-generated text into logical paragraphs and identifies the speakers. The second group corrects mistakes. Their work, he said, has fundamentally changed, from being transcribers to being editors.
There are also fewer of them — eight full-time and part-time workers, down from 13 a year ago. “I don’t have a goal to get rid of people’s jobs,” Mr. Gillespie said. “But I’m not going to hold onto government jobs if things can be done efficiently with technology.”
Ms. Leonard in Montana has managed to offer a service the A.I. software cannot match. She knows her clients and often talks to them about what they have recorded and what sections are most important.
For a 10-part documentary on the 100-year history of the Green Bay Packers, released earlier this year, Ms. Leonard transcribed more than 100 interviews ranging in length from 30 minutes to six hours each that were flawlessly clean and comprehensive. “Ellie doesn’t leave blanks in the scripts,” said Bobbie Fredericks, operations manager of the video production team that co-produced the documentary.
Another Red Pencil client is Ted Catton, an associate research professor at the University of Montana, who writes book-length history projects under contract for the National Park Service. His research includes oral histories often filled with acronyms for government programs, American Indian place names and people speaking in regional accents.
“She digs in and looks things up to figure out the unintelligible word or phrase,” Mr. Catton said of Ms. Leonard. “That can be invaluable.”
Ms. Leonard has shunned automated tools, preferring to listen to every word herself so, she said, “I know more about what I’m typing and it’s etched into my brain.”
Her immersive handcrafted method also enriches the work experience. “I very much enjoy it,” she said. “You learn so much. Some of these projects are like a free college course.”