Blending Speech Analytics with Traditional Quality Management
“Grooming” a Speech Analytics Solution
Speech Analytics solutions use Artificial Intelligence techniques to analyze a recording. They don’t “listen” to a recording like humans do – they actually “parse” a recording, after breaking it down into tiny bits of speech (hence the term Speech Analytics). Traditional Quality Management is based on manual evaluations. It relies on evaluators, usually supervisors, to listen to calls and score them according to fixed criteria. It provides accurate, measurable analysis of agent performance over time, and can be used to direct coaching efforts. Speech Analytics, which is rapidly becoming a mainstream technology, provides contact centers with a
technology that can process 100% of calls and search for sentiment and specific speech patterns.
The Challenge With Speech Analytics
The challenge of providing applicable training samples can be overcome using various methods. Custom models can be developed using previously recorded interactions as training samples. Many novices to using Speech Analytics don’t realize that an out-of-the-box solution isn’t tuned to their enterprise – and that much work (and expense) will be required to do so.
The second major challenge is to make sure that the system is free of noise, so there’s no interference to the “signal” (phoneme, word, or phrase) that the system is trying to match. Having a noise free telephone recording environment is like asking an infant to eat baby food without making a mess: it’s not going to happen. Telephones are themselves noisy – narrow-band audio requires high compression, handset and headset microphones are made for
robustness and cost more than fidelity, and chances are extremely high that at least one-party in a two-party or multi-party call is using a cell phone. And that doesn’t include issues like background noise, stammering, coughing/sneezing, guttural sounds, speakers talking at the same time, or other noise sources. Accuracy is a function of the engine, to be sure, but more to the point, it’s a function of how well the engine can match a particular audio sample to its library of “learned” samples, given the noise in the sample. You can’t assign an accuracy value to an engine anymore than you could assign a speed value to a mountain climber. The answer is always another question – “which mountain?”