Dynamic Confidence Level manipulation - A way to enhance usability and acceptance of speech applications?
1. No Match: Plays a generic "No Match" prompt such as "I did not understand," followed by a context-specific call for re-entry prompt.
2. Low Confidence Match: Plays the recognized input and calls for confirmation, such as "I have understood the following customer number: 1234537. Is that correct?
3. High Confidence Match: Gives a short response to the successful recognition and continues the dialogue with the next step such as "Alright. And now please enter your date of birth... ".
Up to here experienced VUI designers and speech experts will confirm, that this is nothing new and more or less describes the status quo of common VUI design practices. But what do you think about the idea to set the confidence level thresholds dynamically. Take a look at the following two examples:
1. Auto tuning: An application that logs "confidencelevel" values and in a "Low Confidence Match" case additionally whether the caller confirmed the recognized input or not. This allows the application to automatically tune itself by adjusting the confidencelevel threshold between low and high at runtime.
2. Caller-Specific Thresholds: In case an application needs multiple entries by the user and at the first input a "Low Confidence Match" classified recognition result was subsequently confirmed by the user, the confidencelevel threshold between low and high could be reduced within the following recognition states.
What do you think about the two use cases?
a) worth thinking about it in detail
b) not necessary
c) already implemented
Please follow the discussion at Speech Community Group.
Silent Monitoring Framework: New approach or already available?
Think about a framework that allows you to plug in various engines like voice biometrics, emotion detection, age and gender classification. The framework itself is runtime uncritical, works in near time and of course is PBX and IVR independent.
How it works: The framework realizes a silent monitor listening to the incoming audio stream. So it is running completely in the background. Dependant on installed and configured engines it pushes the results to either an application or an agent.
Use case: A bank wants to achieve additional transaction security by analyzing the biometric characteristics of the voice. So when a customer call comes in, an agent answers the call but in the background the silent monitoring framework invokes the biometric engine and speech data is collected. As far as the engine has gathered enough data the result is presented on the agent's desktop during the call. This could be done via traffic light indication or as absolute scores.
Do you know if such a product is already available? If not what do you think about market relevance and technical feasibility?