Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Wed-Ses1-O4:
Spoken Dialogue Systems II

Time: Wednesday 10:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral

Chair: Steve Young

10:00 Optimizing Situated Dialogue Management in Unknown Environments
Heriberto Cuayahuitl (German Research Center for Artificial Intelligence (DFKI))
Nina Dethlefs (University of Bremen)
We present a conversational learning agent that helps users navigate through complex and challenging spatial environments. The agent exhibits adaptive behaviour by learning spatially-aware dialogue actions while the user carries out the navigation task. To this end, we use Hierarchical Reinforcement Learning with relational representations to efficiently optimize dialogue actions tightly-coupled with spatial ones, and Bayesian networks to model the user's beliefs of the navigation environment. Since these beliefs are continuously changing, we induce the agent's behaviour in real time. Experimental results, using simulation, are encouraging by showing efficient adaptation to the user's navigation knowledge, specifically to the generated route and the intermediate locations to negotiate with the user.

10:20 Acoustic-similarity based technique to improve concept recognition
Om D Deshmukh (IBM Research India)
Shajith Ikbal (IBM Research India)
Ashish Verma (IBM Research India)
Etienne Marcheret (IBM Watson Research Center)
In this work we propose an acoustic-similarity based technique to improve the recognition of in-grammar utterances in typical directed-dialog applications where the Automatic Speech Recognition (ASR) system consists of one or more class-grammars embedded in the Language Model (LM). The proposed technique increases the transition cost of LM paths by a value proportional to the average acoustic similarity between that LM path and all the in-grammar utterances. Proposed modifications improve the in-grammar concept recognition rate by 0.5% absolute at lower grammar fanouts and by about 2% at higher fanouts as compared to a technique which reduces the probability of entering all the LM paths by a uniform value. The improvements are more pronounced as the fanout size of the grammar is increased and especially at operating points corresponding to lower False Accept (FA) values.

10:40 Dialog Methods for Improved Alphanumeric String Capture
Doug Peters (Nuance Communications)
Peter Stubley (Nuance Communications)
In this paper, we consider advances in automated over-the-phone alphanumeric string capture. For this task, acoustic confusions typically result in significant error rates. Of course, confusions also exist in human-to-human communication. However, humans employ dialog-level strategies with which to disambiguate confusions and correct errors – allowing high-fidelity transmission of alphanumeric strings across all but the noisiest of channels. These human strategies are examined and a subset amenable to automation is identified. The resulting automated error-correction dialog achieves 30% dialog error rate reduction compared to a conventional application in a high-volume commercial deployment. Further, the fact that there are many recognition errors in the context of a structurally simple dialog recommends this task for dialog optimization. We present an example of offline optimization and discuss the potential for online learning.

11:00 Detecting the Status of a Predictive Incremental Speech Understanding Model for Real-Time Decision-Making in a Spoken Dialogue System
David DeVault (USC Institute for Creative Technologies)
Kenji Sagae (USC Institute for Creative Technologies)
David Traum (USC Institute for Creative Technologies)
We explore the potential for a responsive spoken dialogue system to use the real-time status of an incremental speech understanding model to guide its incremental decision-making about how to respond to a user utterance that is still in progress. Spoken dialogue systems have a range of potentially useful real-time response options as a user is speaking, such as providing acknowledgments or backchannels, interrupting the user to ask a clarification question or to initiate the system's response, or even completing the user's utterance at appropriate moments. However, implementing such incremental response capabilities seems to require that a system be able to assess its own level of understanding incrementally, so that an appropriate response can be selected at each moment. In this paper, we use a data-driven classification approach to explore the trade-offs that a virtual human dialogue system faces in reliably identifying how its understanding is progressing during a user utterance.

11:20 User Simulation in Dialogue Systems using Inverse Reinforcement Learning
Senthilkumar Chandramohan (Supelec / LIA - UAPV)
Matthieu Geist (Suepelc)
Fabrice Lefevre (LIA - UAPV)
Olivier Pietquin (Supelec / UMI 2958 (CNRS - GeorgiaTech))
Spoken Dialogue Systems (SDS) are man-machine interfaces which use natural language as the medium of interaction. Dialogue corpora generation for the purpose of training and evaluating dialogue systems is an expensive process. User simulators focus on simulating human users in order to generate synthetic data. Existing methods for user simulation mainly focus on generating data with the same statistical consistency as in the dialogue corpus. This paper outlines a novel approach for user simulation based on Inverse Reinforcement Learning (IRL). The task of building the user simulator is perceived as a task of imitation learning.

11:40 Lossless Value Directed Compression of Complex User Goal States for Statistical Spoken Dialogue Systems
Paul A. Crook (Interaction Lab, Heriot-Watt University, Edinburgh, UK)
Oliver Lemon (Interaction Lab, Heriot-Watt University, Edinburgh, UK)
This paper presents initial results in the application of Value Directed Compression (VDC) to spoken dialogue management states for reasoning about complex user goals. On a small but realistic SDS problem VDC generates a lossless compression which achieves a 6 fold reduction in the number of dialogue states required by a Partially Observable Markov Decision Process (POMDP) dialogue manager (DM). Reducing the number of dialogue states reduces the computational power, memory and storage requirements of the hardware used to deploy such POMDP SDSs, thus increasing the complexity of the POMDP SDSs which could theoretically be deployed. In addition, in the case when on-line reinforcement learning is used to learn the DM policy, it should lead to, in this case, a 6 fold reduction in policy learning time. These are the first automatic compression results that have been presented for POMDP SDS states which represent user goals as sets over the possible domain objects.

Technical Programme

Wed-Ses1-O4:Spoken Dialogue Systems II

Wed-Ses1-O4:
Spoken Dialogue Systems II