|
12thAnnual Conference of the
International Speech Communication Association
|
sponsors
|
Interspeech 2011 Florence |
Technical Programme
This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.
Tue-Ses2-O2: First Language Acquisition
Time: | Tuesday 13:30 |
Place: | Leonardo - Pala Affari - Ground Floor |
Type: | Oral |
Chair: | cinzia avesani |
13:30 | The Multi Timescale Phoneme Acquisition Model of the Self-Organizing Based on the Dynamic Features
Kouki MIYAZAWA (Graduate School of Human Sciences, Waseda University) Hideaki MIURA (Graduate School of Human Sciences, Waseda University) Hideaki KIKUCHI (Graduate School of Human Sciences, Waseda University) Reiko MAZUKA (RIKEN Brain Science Institute)
It is unclear as to how infants learn the acoustic expression of each phoneme of their native languages. In recent studies, researchers have inspected phoneme acquisition by using a computational model. However, these studies have used a limited vocabulary as input and do not handle a continuous speech that is almost comparable to a natural environment. Therefore, we use a natural continuous speech and build a self-organization model that simulates the cognitive ability of the humans, and we analyze the quality and quantity of the speech information that is necessary for the acquisition of the native phoneme system. Our model is designed to learn values of the acoustic features of a continuous speech and to estimate the number and boundaries of the phoneme categories without using explicit instructions. In a recent study, our model could acquire the detailed vowels of the input language. In this study, we examined the mechanism necessary for an infant to acquire all the phonemes of a language, including consonants. In natural speech, vowels have a stationary feature; hence, our recent model is suitable for learning them. However, learning consonants through the past model is difficult because most consonants have more dynamic features than vowels. To solve this problem, we designed a method to separate “stable” and “dynamic” speech patterns using a feature-extraction method based on the auditory expressions used by human beings. Using this method, we showed that the acquisition of an unstable phoneme was possible without the use of instructions.
|
13:50 | The time-course of talker-specificity effects for newly-learned pseudowords: Evidence for a hybrid model of lexical representation
Helen Brown (Department of Psychology, University of York) M. Gareth Gaskell (Department of Psychology, University of York)
Whilst research shows that talker information affects recognition of recently studied words, it remains unclear whether this information is stored in long-term memory. Three experiments explored whether talker-specificity effects (TSEs) for pseudowords changed over time and were affected by within- and between-talker variability during study. Results showed TSEs immediately after study in all experiments, consistent with episodic models, but TSEs remained a week later only for pseudowords studied in a single voice. Furthermore, source memory data suggested that talker information becomes less accessible over time, supporting hybrid models that incorporate aspects of both episodic and abstract lexical representation.
|
14:10 | A parametric approach to intonation acquisition research: Validation on child-directed speech data
Britta Lintfert (Institute of Natural Language Processing, University of Stuttgart, Germany) Antje Schweitzer (Institute of Natural Language Processing, University of Stuttgart, Germany) Bernd Möbius (Department of Computational Linguistics and Phonetics, Saarland University, Germany)
This paper validates a parametric approach to intonation acquisition research using
child-directed speech data.
An advantage of this approach is that it can be used for studying
child speech as well as adult speech. Within the field of prosody
acquisition it reconciles independent approaches to child prosody with
ToBI-based approaches. In this paper we substantiate this claim by
showing that clusters of parameterized contours obtained from German
child-directed speech correlate with GToBI(S) categories, and by
elaborating how, alternatively, the parameters can be mapped to
properties that are relevant in independent approaches.
|
14:30 | Modelling Novelty Preference in Word Learning
Maarten Versteegh (International Max Planck Research School for Language Sciences / Radboud University, Nijmegen, The Netherlands) Louis ten Bosch (Radboud University Nijmegen) Lou Boves (Radboud University Nijmegen)
This paper investigates the effects of novel words on a cognitively
plausible computational model of word learning. The model is first
familiarized with a set of words, achieving high recognition scores and subsequently offered novel words for training. We show that the model is able to recognize the novel words as different from the previously seen words, based on a measure of novelty that we introduce. We then propose a procedure analogous to novelty preference in infants. Results from simulations of word learning show that adding this procedure to our model speeds up training and helps the model attain higher recognition rates.
|
14:50 | Using Imitation to learn Infant-Adult Acoustic Mappings
G Ananthakrishnan (Center for speech Technology, KTH (Royal Institute of Technology)) Giampiero Salvi (Centre for Speech Technology, Royal Institute of Technology (KTH), Stockholm, Sweden)
This paper discusses a model which conceptually demonstrates how infants could learn the normalization between infant-adult
acoustics. The model proposes that the mapping can be inferred from the topological correspondences between the adult and infant acoustic spaces, that are clustered separately in an unsupervised manner. The model requires feedback from the adult in order to select the right topology for clustering, which is a crucial aspect of the model. The feedback is in terms of an overall rating of the imitation effort by the infant, rather than a frame-by-frame correspondence. Using synthetic, but continuous speech data, we demonstrate that clusters, which have a good topological correspondence, are perceived to be similar by a phonetically trained listener.
|
15:10 | Thresholding word activations for response scoring - Modelling psycholinguistic data
Christina Bergmann (Centre for Language and Speech Technology/International Max Planck Research School for Language Sciences, Radboud University Nijmegen, The Netherlands) Louis ten Bosch (Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands) Lou Boves (Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands)
In the present paper we replicate simulations of infant word learning and the effect of variation in the input. We then investigate to what extent the results are influenced by the way in which the continuous response functions are treated and what effects the use of thresholds can have on the data.
Our results show that the underlying response pattern, as uncovered by different thresholds, varies greatly. Nonetheless, the overall output of the model is often correct and able to generalise to unseen data. Thus, we show that the model can give correct responses even in uncertain circumstances. Links of this finding to language acquisition research are discussed.
|
|
|