Over the last few years, we’ve started to interact with computers like Captain Picard on the bridge of the Enterprise. Machines capable of reliably discerning and making sense of speech, and then responding, were once sci-fi. Thanks to voice interfaces such Siri, Alexa and Google, talking to computers is becoming commonplace.
What today seems unremarkable belies the technical challenge of analyzing spoken communication. The tech industry’s collective investment in research and development amounts to billions of dollars and millions of human work hours. Despite this, voice analytics software stills makes plenty of mistakes. Background noise, the nearby voices of others, and even accented speech frequently hamper accurate recognition of what needs to be understood. Such errors when using a smartphone are mildly irritating, but when analyzing audio data to mitigate risk in regulated industries mistakes can be very costly indeed.
Digital Reasoning owes its reputation to the accuracy of our text analytics and the quality of the behavioral insights and process improvements that our solutions deliver. This combination of meaningful alerts and efficient workflows has led to us becoming the leading provider of communications analytics for conduct risk among global banks. It was therefore critical that our new audio analytics capability could deliver comparable performance to our text analytics.
Banking’s noisy voice data
The quickest way to add voice surveillance would be to adapt an existing technology. The standard of opensource audio analytics software is high, but our evaluation revealed that even the best on the market were insufficiently accurate for the noisy, domain-specific voice data that is common among our banking customers. The only way forward was to develop proprietary software.
Without getting into the technical details (which you can read about here), our approach leveraged our experience in ecomms surveillance and the knowledge of our banking partners. We built acoustic analytics models trained using relevant financial domain audio data, with additional model customization carried out for each new deployment.
Over years of work with banks, and through regulatory audits, we have proven that communications analyzed in context is vital for reducing false positives and building an accurate picture of risk. Applying this contextual methodology consistently across voice and text data has enabled us to package an integrated communications surveillance solution. This gives confidence that voice analytics can hit the regulatory targets set for text, as well as giving compliance analysts a single workflow that avoids information silos and allows surveillance to be expanded to enterprise-scale populations.
Where voice surveillance is heading
Early feedback from our initial customer deployments suggests that our investment is producing the desired results. Banks tell us that our transcription accuracy outperforms rival solutions. Perhaps more importantly for analysts tasked with reviewing alerts, we’re also scoring highly for the quality of our workflows and helpful investigation tools.
A review of regulations – from Dodd-Frank through to MAR and MiFID II – shows that voice communication is not regarded by regulators as being any less important than text communication. Historically, lower levels of enforcement have been due to regulators’ being realistic about what can be achieved through manual sampling or basic phonetic technology. Our prediction is that the regulatory demands of voice surveillance will follow a similar pattern as we have seen with text. Improvements in audio analytics technology will raise regulators’ expectations of what surveillance organizations must achieve. Fortunately for banks using Digital Reasoning, this adaptation will be relatively quick and easy.