Modern platforms consisting of voice browsers and speech recognition engines make the recognition result and additionally the recognition confidence level available at the voice application level. By working with the absolute value instead of simply distinguishing between match and no match cases a much more efficient dialogue flow can be realized:
1. No Match: Plays a generic "No Match" prompt such as "I did not understand," followed by a context-specific call for re-entry prompt.
2. Low Confidence Match: Plays the recognized input and calls for confirmation, such as "I have understood the following customer number: 1234537. Is that correct?
3. High Confidence Match: Gives a short response to the successful recognition and continues the dialogue with the next step such as "Alright. And now please enter your date of birth... ".
Up to here experienced VUI designers and speech experts will confirm, that this is nothing new and more or less describes the status quo of common VUI design practices. But what do you think about the idea to set the confidence level thresholds dynamically. Take a look at the following two examples:
1. Auto tuning: An application that logs "confidencelevel" values and in a "Low Confidence Match" case additionally whether the caller confirmed the recognized input or not. This allows the application to automatically tune itself by adjusting the confidencelevel threshold between low and high at runtime.
2. Caller-Specific Thresholds: In case an application needs multiple entries by the user and at the first input a "Low Confidence Match" classified recognition result was subsequently confirmed by the user, the confidencelevel threshold between low and high could be reduced within the following recognition states.
What do you think about the two use cases?
a) worth thinking about it in detail
b) not necessary
c) already implemented
Please follow the discussion at Speech Community Group.