In a recent post, we touched on the possibilities of Automatic Speech Recognition technology (ASR) for the modern business world. In particular, we highlighted that this technology is likely to improve efficiency for businesses competing in the global market as the technology is integrated into internal processes. We also pointed to new developments as far as cross-cultural communication is concerned, enabling multilingual dialogue in situations where it was previously impractical to invoke a human interpreter.
Well, Microsoft is certainly not waiting around as far as ASR is concerned, with the launch of Monolingual TTS, the latest innovation in voice recognition technology, at TechFest 2012.
This software is currently able to translate users’ speech into 26 languages – what’s more, it preserves user’s own voice.
I, a native of New Zealand, never imagined myself speaking Mandarin with a Kiwi accent, but upon reflection, this would certainly be more… natural, if that word still belongs in this field.
How far does Microsoft’s new software go towards turning a monolingual speaker into a multilingual one?
Decide for yourself with Microsoft’s interactive demonstration of how the Monolingual TTS software takes the voice of Rick, a native English speaker, and converts it into Mandarin.
Shrikanth Narayanan, a professor at the University of Southern California who leads a research group focused on the use of ASR in the real world, recognizes that “the word is just one part of what a person is saying.”
Narayanan suggests that an effective translation system needs to capture the essence of what a person is trying to convey through speech. This means preserving intonation, expression, and other important indicators.
“We’re asking if you can build systems that can mediate between people as well as just replacing the words.”
Narayanan, among others, see Microsoft’s research as a part of making this happen.