SPEECH TO TEXT
Ready for Primetime?
Voice Transcription Services | SPEECH TO TEXT Solutions
How Useful is Speech to Text
Turning your voicemail message into text automatically so that you can read your message as well as hear it is a huge convenience. And it is no small technological feat.
Even though computers are incredibly fast and capable of astounding calculations, they are still no match for the human brain. When you hear a message, your brain can sort through background noise, change in inflection, gender, volume, accent, emotion, etc. to understand the message. Your brain will even “understand” words left out or unintelligible simply from the context.
Programming a computer to achieve a useful interpretation from a recording is very challenging. And it is not 100% by any means. Several key issues affect the accuracy such as volume, enunciation of the speaker, background noise and quality of the recording. However, the current technology can score very high on what I call the “Usefulness Index”.
The Usefulness Index
The Usefulness Index addresses 3 questions:
- Identity - Who is calling?
- Purpose – What do they want or why are they calling?
- Contact – How do I get back in touch with them?
To evaluate the effectiveness of the Speech to Text (STT) process, I submitted a dozen voicemail messages left for me for transcription. A dozen different speakers, different accents, gender and recording quality.
Each STT transcribed message got 1 point if it met one of the 3 requirements for the Usefulness Index. So each message had a potential score of 3 points.
Here are the results:
| Caller Points | |
| Christi | 3 |
| Cindy | 3 |
| Betty | 2 |
| Denise | 3 |
| Jennifer | 2 |
| Mitzi | 2 |
| Alex | 1 |
| Andy | 2 |
| Bruce | 3 |
| Joe | 2 |
| Kevin | 3 |
| Toby | 3 |
| Total Score: | 29 - Possible 36 |
Average message score 2.4 points
Half of the test messages scored 100% on the Usefulness Index. Five scored 2 out of 3 and only 1 almost failed.
When you remember that Caller ID (ANI) is captured on every call and included with the transcription, then Usefulness Index Requirement number 3 is always met and a couple of the scores improve.
And the Winner is...
Only perfect 50% of the time. But that 50% cuts in half the time I have to spend actually listening to messages. And on most of the other 50%, I have enough info to decide if I need to listen to the entire message. The benefits of an audio message such as inflection and tone of voice are still there if needed, but now I don't have to listen to every message to know what it's about. And the cost has now fallen to an affordable rate.
If I need a true word to word transcription, then a hybrid type service is required that combines computer processing with human oversight. At a higher cost.
A business executive spends an average of 45 minutes per-week checking voicemail; that's three hours per-month, per-person spent on non-revenue producing activity.
One day the natural human interface of speech will replace buttons, the mouse and the keyboard just like on Star Trek. We aren't there yet, but it might be closer than you think. In the mean time, Speech To Text in its current form can make your day more productive.





