《英语声学和语音学英文版ppt课件.ppt》由会员分享,可在线阅读,更多相关《英语声学和语音学英文版ppt课件.ppt(26页珍藏版)》请在三一文库上搜索。
1、Speech acoustics and phonetics,Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC),NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,2,Overvi
2、ew,Dynamics in speech acoustics Contour modeling (mainly formants) Aspects of spectral undershoot Modeling V and C reduction Phonetic knowledge from speech corpora IFA, CGN, TIMIT, found speech Conclusions,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,4,Dynamics in speech acoustics,Dynami
3、cs is the norm, not stationarity articulatory efficiency Dynamics is everywhere generally no word boundaries in speech deletion of words, syllables, phonemes; insertion within/between word coarticulation/assimilation vowel and consonant reduction Acoustic manifestations segment duration, F0, loudnes
4、s, spectral quality,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,5,Dynamics is the norm,The speaker speaks as sloppily as the listeners allow him to do in communication communicative efficiency Articulatory vs. perceptual efficiency do spectral transitions facilitate or hamper perception
5、? see other presentation Speaker flexibility; speaking style (clear vs. sloppy); speaking rate,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,6,Dynamics is everywhere,Deletion bread and butter /brEmbY3/ Amsterdam (Du) /AmstrdAm/ /AmsdAm/ koninklijke (Du) /konIklk/ /kolk/ Insertion homorgan
6、ic glide insertion: die een (Du) /dijn/ Degemination is zichtbaar (Du) /Is zIxtbar/ /IsIxbar/ Reduction, coarticulation, assimilation,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,7,Acoustic manifestations,pitch, loudness, formant, component contours contour stylization (e.g., pitch in pr
7、aat) contour modeling n-th degree curve fitting (D.van Bergem) Legendre polynomials ) (R.van Son) 16 points per segment ) (phoneme) segmentation by hand (time consuming; non-consistent) automatically (via forced phoneme recognition and a pronunciation lexicon with alternatives; systematic errors),Ju
8、ly 1st, 2002,Speech acoustics and phonetics, Il Ciocco,8,Contour modeling,allows modeling of specific phenomena pitch accentuation (vs. vowel onset) reduction, centralization, undershoot allows generation of stimuli for perc. expts. phoneme identification in extending context 2-alternatives forced c
9、hoice identif. of continua discrimination, RT allows statistics on large speech corpora TIMIT, CGN, IFA-corpus, Switchboard,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,9,Static vs. dynamic V recogn.,see Weenink (2001) “Vowel normalizations with the TIMIT acoustic phonetic speech corpus”
10、, IFA Proc. 24, 117-123 438 males, both train & test sent. of TIMIT 35,385 vowel segments, hand segmented 13 monophthongeal vowel categories 1-Bark bandfilter anal. (18), intensity. normal. 3 frames per segment: central and 25 ms L/R,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,10,Some r
11、esults,Vowel classif. (%) with discriminant functions,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,11,Formant tracks / speaking rate,Ph.D. thesis Rob van Son (1993) “Spectro-temporal features of vowel segments” see also Speech Comm. 13, 135-148 (Pols & vSon) 850-words text, read at norma
12、l and fast rate hand segmentation of 7 most freq. V + schwa formant tracks via 16 points per segm. or 5 Legendre polynomials influence of rate, V-dur., context, sent. acc. evidence for duration-controlled undershoot?,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,12,Some results,no differe
13、nces for F1/F2 in vowel center for normal- or fast-rate speech; only some over- all rise in F1 for fast rate (irrespective of V) same formant track shape (normalized to 16 points) for normal- or fast-rate speech same results when using the more elaborate Legendre polynomials Concl.: changes in V-dur
14、ation do not change the amount of undershoot active control of articulation speed,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,13,Formant representations,zeroth order Legendre Legendre polynomial coefficients (mean Fi in vowel segment),second order polynomials (axes reversed),e,e,July 1s
15、t, 2002,Speech acoustics and phonetics, Il Ciocco,14,Modeling vowel reduction,Ph.D. thesis Dick van Bergem (1995) “Acoustic and lexical vowel reduction” see also Speech Communication 16, 329-358 lexical V reduction Fr /bet/ vs. Du /btOn/ acoustic V reduction /banan, bAnan, bnan/ f(sent. acc., w. str
16、., w. class): can-candy-canteen coarticulatory effects on the schwa C1C2V- and VC1C2-type nonsense words perceptual effects (full V or schwa, f.i. ananas),July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,15,Some results,The schwa is not just a centralized vowel but something that is complete
17、ly assimilated with its phonemic context,t-n,w-l,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,16,Modeling consonant reduction,Sp. Comm. (1999) 28, 125-140 (vSon V-C sound energy differences,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,17,Some results,V markedly reduced in spo
18、ntaneous speech lower F2-slope diff. in spontaneous speech decrease in articulation speed no systematic effect on F2 locus equation; V onsets and targets change in concert any V reduction mirrored by comparable change in C spont. sp.: V and C shorter; lower COG decrease in vocal and articulatory eff
19、ort,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,18,Access to large corpora,more, and more realistic, data phonetic knowledge via statistical analyses f.i. highly accessible IFA-corpus (free, SQL) see “Structure and access of the open source IFA-corpus”, IFA Proc. 24, 15-26 (vSon & Pols)
20、 on-line http:/www.fon.hum.uva.nl/IFAcorpus/ 4 M/4F speakers, 5.5 hrs of speech from informal to read + sent., words, syllables 50Kwords segm. and labeled at phoneme level,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,19,Some results,speech + annot. + meta data: relational DB realization
21、of final n, f.i. Du geven /xev(n)/,Read,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,20,Spoken Dutch Corpus (CGN),10 M words, 1,000 hrs of speech variety of styles, incl. telephone speech adult Dutch and Flemish speakers for linguistic and technological research see various LREC and ICSL
22、P papers (2002) see also http:/lands.let.kun.nl/cgn/home.htm fully transcribed: orthogr., POS, lemmas partly transcr.: phonemic, prosodic, syntactic,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,21,TIMIT,popular DB in acoustic phonetics and ASR also telephone version (NTIMIT) hand segment
23、ed & labeled at phoneme level 438 males, 192 females (8 dialect regions) 10 sent./sp. (2 fixed, 1 phon. compact, 7 diverse) sa1: “She had her dark suit in greasy wash water all year” includes separate test data (112 M, 56 F) e.g. Ph.D thesis X. Wang (1997) “Incorporating knowledge on segmental durat
24、ion in HMM-based continuous speech recognition”,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,22,Useful info: durational variability,Adopted from Wang (1998),normalized phone duration,speaking rate,all 3,696 training sent. (sx + si) of TIMIT training set,0,July 1st, 2002,Speech acoustics
25、and phonetics, Il Ciocco,24,found speech,DARPA-LVSR community rather ambitious Broadcast News (BN), Sp.Comm. 37 (2002),For Proc. DARPA Workshops, see http:/www.nist.gov/speech/proc/darpa99/index.htm,July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,25,Articul.-acoustic features in ASR,“A Dutc
26、h treatment of an elitist approach to articulatory-acoustic feature classification”, Proc. Eurospeech-2001, 1729-1732 (M. Wester et al.) “Integrating articulatory features into acoustic models for speech recognition”, Phonus 5, 73-86 (K. Kirchhoff, 2000) “An overlapping-feature-based phonological mo
27、del incorporating linguistic constraints: Applications to speech recognition”, JASA 111 (2), 1086-1101 (J. Sun & L. Deng, 2002),July 1st, 2002,Speech acoustics and phonetics, Il Ciocco,26,Conclusions,examples of dynamics in speech acoustics going from formal to informal speech: less dynamics, more reduction (artic. guided) undershoot vs. speaking style sloppiness or articulatory limits? functionality of dynamics? other paper systematicity of dynamics? easing ASR, rules for TTS, acquiring knowledge?,
链接地址:https://www.31doc.com/p-3303810.html