Session Summary:
The goal of this special session is to bring together different communities working towards speech-based assistive technologies and health applications. In this special session, we hope to provide a venue not only to share technical approaches towards building speech-based health informatics and assistive technologies but also to attract the speech and language processing community to this exciting emerging field.
Motivation and Rationale:
With the advances in speech and spoken language processing in the last decade, many speech-based assistive and enabling applications have emerged towards easing the daily life and work, continuously monitor, assess, and understand activities using speech, help people in need for various reasons using either speech-based solutions or using features from their speech, or even just to keep company when needed.
These include a diverse area of research efforts:
- Speech technology can be utilized for interactive speech-based personal assistant technologies in smart environments or mobile devices. Some applications can be command/control systems (e.g., entertainment center control in a car), question answering systems (either factoid, e.g., where is the nearest gas station?, or personal, e.g., do I have a meeting tomorrow?), or information access or distillation systems using a document collection, such as news or web (e.g., anything new about Prop 8 in California?). Designing tailored spoken language interfaces targeting the needs of a range of populations, from the elderly to young children, especially those with specific mental of physical health issues is a great need, and an open technical challenge. Speech and language technology can monitor the well-being of people, such as elderly using cues from their speech, assist people in everyday life, such as reading web pages or books for blind people or children who do not know how to read. The latter may include educational applications such as tutoring systems for children or adults or helping users to learn a new language for pronunciation or grammar, or even new socio-communication behavior (such as in children with Autism).
- Speech technology opens a plethora of powerful health applications. Different developmental, mental, or anatomic defects or diseases can lead to voice, speech, language and reading disorders. Examples are cognitive impairment, depression and PTSD, dyslexia and Alzheimer’s, Autism, oral and laryngeal cancer, dysarthria, cleft lip and palate, Sigmatism, and stuttering. Monitoring technologies using acoustic, prosodic, and lexical cues can be utilized to detect disorders or specific articulation difficulties or to assess and quantize voice, speech and language (how “good” is a person's voice, speech, or language). A reliable assessment allows to monitor the progress of diseases or therapies and a comparison between different possible therapies, different hospitals and doctors. It provides the basis for interactive training tools with direct feedback for home usage. Examples are training tools where patients can train their speech and voices after total or partial laryngectomy in order to improve their intelligibility, practicing tools in order to improve certain phonemes specific issues like Sigmatism, or training tools to avoid stuttering. Voice conversion technologies allow the transformation of handicapped voices to more natural voices and would improve the communication affected persons.
It is clear that significant technical improvements are needed to enable such assistive technologies in diverse areas of research, such as machine learning, speech and language processing, and multi-modal human/machine interaction, and in most cases, collaboration with social science fields, such as gerontology, pediatrics, preventive medicine or education. However, most underlying technologies are now at a level to make these applications possible in everyday use.
Organizers:
Tobias Bocklet (FAU) - tobias.bocklet@informatik.uni-erlangen.de
Hereceived his diploma degree in computer science in 2007 at the University of Erlangen- Nuremberg. In 2008 he was with the speech groupat SRI International working on automatic speaker identification. He is now a member of the research staff of the Lehrstuhl für Informatik 5 (Mustererkennung) and the Department of Phoniatrics and Pedaudiology of the University Clinics Erlangen and works towards his doctoral degree on medical applications of automatic speech technologies focusing on children's voices and the assessment of speech and language development and pathologies.
Shrikanth Narayanan (USC) - shri@sipi.usc.edu
He received his M.S., Engineer, and Ph.D., all in electrical engineering, from UCLA in 1990, 1992, and 1995, respectively. From 1995-2000 he was with AT&T Labs-Research, Florham Park and AT&T Bell Labs, Murray Hill--first as a Senior Member and later as a Principal member of its Technical Staff. Currently, he is a Professor at the Signal and Image Processing Institute of USC's Electrical Engineering department and holds joint appointments as Professor in Computer Science, Linguistics and Psychology. He is also the inaugural director of the Ming Hsieh Institute at USC. He was a Research Area Director of the Integrated Media Systems Center, an NSF Engineering Research Center at USC, and was the Research Principal for the USC Pratt and Whitney Institute for Collaborative Engineering, a unique partnership between academia and industry (2003-2007).
Shri Narayanan is currently an Editor for the Computer, Speech and Language Journal and an Associate Editor for the IEEE Transactions on Multimedia, the IEEE Transactions on Affective Computing and the Journal of Acoustical Society of America having previously served an Associate Editor for the IEEE Transactions of Speech and Audio Processing (2000-2004) and the IEEE Signal Processing Magazine (2005-2008). He holds positions on the Speech Communication and Acoustic Standards committees of the Acoustical Society of America and the Advisory Council of the International Speech Communication Association, having previously served on the Speech Processing Technical Committee (2003-2007) and on the Multimedia Signal Processing technical committee (2005-2008) of the IEEE Signal Processing Society. At USC, he was Chair of the Joint Provost-Senate University Research Committee (2006-09) and, a Past President of the Phi Kappa Phi Academic Honor Society (2007-08).
Shri Narayanan is a Fellow of the Acoustical Society of America (ASA), the Institute of Electrical and Electronics Engineers (IEEE) and the American Association for the Advancement of Science (AAAS). He is a member of Tau Beta Pi, Phi Kappa Phi and Eta Kappa Nu. He holds the first Viterbi Professorship in Engineering at USC. He is a recipient of an NSF CAREER award, USC Engineering Junior Research Award, USC Electrical Engineering Northrop-Grumman Research award, a Mellon award for mentoring excellence, a USC Distinguished Faculty Service Award, an Okawa Research Award, an IBM Faculty Award and a faculty fellowship from the USC Center for Interdisciplinary research. He is a recipient of a 2005 Best Paper award (with Alex Potamianos) and a 2009 Best Paper Award (with Chul Min Lee) from the IEEE Signal Processing society and was selected as Signal Processing Society Distinguished Lecturer for 2010-2011. Papers co-authored with his students have won awards at InterSpeech 2009-Emotion Challenge, IEEE DCOSS 2009, IEEE MMSP 2007, IEEE MMSP 2006, ICASSP 2005 and ICSLP 2002. His research interests are in signals and systems modeling with an interdisciplinary emphasis on speech, audio, language, multimodal and biomedical problems and applications with direct societal relevance. His laboratory is supported by federal (NSF, NIH, DARPA, ONR, Army and DHS) and industry grants. He has published over 400 papers and has 8 granted U.S. patents.
Elmar Nöth (FAU) - noeth@informatik.uni-erlangen.de
He obtained his diploma degree in computer science and his doctoral degree at the University of Erlangen-Nuremberg in 1985 and 1990, respectively. From 1985 to 1990 he was a member of the research staff of the Lehrstuhl für Informatik 5 (Mustererkennung), working on the use of prosodic information in automatic speech understanding. Since 1990, he is an assistant professor at the same institute and head of the speech group. In 2008, he became a tenured full professor at the Lehrstuhl für Informatik 5, focusing on medical applications of speech technologies and strengthening the cooperation with the medical department of the University of Erlangen-Nuremberg. He is one of the founders of the Sympalog Company, which markets conversational dialog systems.
Gokhan Tur (Microsoft) - gokhan.tur@ieee.org
He was born in Ankara, Turkey in 1972. He received his B.S., M.S., and Ph.D. degrees from the Department of Computer Science, Bilkent University, Turkey in 1994, 1996, and 2000 respectively. Between 1997 and 1999, he visited the Center for Machine Translation of CMU, then the Department of Computer Science of Johns Hopkins University, and then the Speech Technology and Research Lab of SRI International. He worked at AT&T Labs - Research from 2001 to 2006 and at the Speech Technology and Research Lab of SRI International from 2006 to June 2010. He is currently with Microsoft working as a principal scientist. His research interests include spoken language understanding (SLU), speech and language processing, machine learning, and information retrieval and extraction. He co-authored more than 100 papers published in refereed journals and presented at international conferences. Dr. Tur is also the recipient of the Speech Communication Journal Best Paper awards by ISCA for 2004-2006 and by EURASIP for 2005-2006. Dr. Tur is the organizer of the HLT-NAACL 2007 Workshop on Spoken Dialog Technologies, and the HLT-NAACL 2004 and AAAI 2005 Workshops on SLU, and the editor of the Speech Communication Special Issue on SLU in 2006. He is also the spoken language processing area chair for IEEE ICASSP 2007, 2008, and 2009 conferences, spoken dialog area chair for HLT-NAACL 2007 conference, finance chair for IEEE/ACL SLT 2006 and SLT 2010 workshops, and SLU area chair for IEEE ASRU 2005 workshop. Dr. Tur is a senior member of IEEE, ACL, and ISCA, and is an associate editor for the IEEE Transactions on Audio, Speech, and Language Processing journal, and was a member of IEEE Signal Processing Society (SPS), Speech and Language Technical Committee (SLTC) for 2006-2008.
|