dc.relation.reference | [1] W. O’Grady, J. Archibald, M. Aronoff, and J. Rees-Miller, Contemporary Linguistics
- An Introduction, 4th ed. Bedford/St. Martin’s, 2001.
[2] E. Fosler-Lussier and N. Morgan, “Effects of speaking rate and word frequency
on pronunciations in convertional speech,” Speech Communication, vol. 29, no.
Modeling pronunciation variation for automatic speech recognition, pp. 137–158,
1999.
[3] S. Greenberg, “Speaking in shorthand - a syllable-centric perspective for understanding
pronunciation variation,” Speech Communication, vol. 29, pp. 159–176,
1999.
[4] H. Strik and C. Cucchiarini, “Modeling pronunciation variation for ASR: A survey
of the literature,” Speech Communication, vol. 29, pp. 225–246, 1999.
[5] M. Weintraub, E. Fosler, C. Galles, Y.-H. Kao, S. Khudanpur, M. Saraclar, and
S. Wegmann, “WS96 project report: Automatic learning of word pronunciation
from data,” in JHU Workshop 96 Pronunciation Group, 1996.
[6] T. Sloboda and A. Waibel, “Dictionary learning for spontaneous speech recognition,”
in International Conference on Spoken Language processing, vol. 4, 1996,
pp. 2328–2331.
[7] D. Torre, L. Villarrubia, L. Hernandez, and L. Elvira, “Automatic alternative
118
transcription generation and vocabulary selection for flexible word recognizers,”
in IEEE International Conference On Acoustics, Speech, And Signal Processing,
1997, pp. 1463–1466.
[8] M. Finke and A. Waibel, “Flexible transcription alignment,” in IEEE Automatic
Speech Recognition and Understanding Workshop, 1997, pp. 34–40.
[9] ——, “Speaking mode dependent pronunciation modeling in large vocabulary
conversational speech recognition,” in European Conference on Speech Communication
and Technology, 1997, pp. 2379–2382.
[10] B. Byrne, M. Finke, S. Khudanpur, J. McDonough, H. Nock, M. Riley, M. Saraclar,
C. Wooters, and G. Zavaliagkos, “Pronunciation modelling for conversational
speech recognition: A status report from WS97,” in IEEE Workshop on
Speech Recognition and Understanding, 1997.
[11] W. Byrne, V. Venkataramani, T. Kamm, T. F. Zheng, Z. Song, P. Fung, Y. Liu,
and U. Ruhi, “Automatic generation of pronunciation lexicons for Mandarin
spontaneous speech,” in IEEE International Conference On Acoustics, Speech,
And Signal Processing, 2001, pp. 569–572.
[12] M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. McDonough, H. Nock,
M. Saraclar, C. Wooters, and G. Zavaliagkos, “Stochastic pronunciation modelling
from hand-labelled phonetic corpora,” Speech Communication, vol. 29, pp.
209–224, 1999.
[13] E. Fosler-Lussier and G. Williams, “Not just what, but also when: Guided automatic
pronunciation modeling for broadcast news,” in DARPA Broadcast News
Workshop, 1999, pp. 171–174.
[14] E. Fosler-Lussier, “Multi-level decision trees for static and dynamic pronuncia119
tion models,” in European Conference on Speech Communication and Technology,
1999, pp. 463–466.
[15] T. Holter and T. Svendsen, “Maximum likelihood modelling of pronunciation
variation,” Speech Communication, vol. 29, pp. 177–191, 1999.
[16] N. Cremelie and J.-P. Martens, “In search of better pronunciation models for
speech recognition,” Speech Communication, vol. 29, pp. 115–136, 1999.
[17] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Improved pronunciation modelling by
inverse word frequency and pronunciation entropy,” in IEEE Automatic Speech
Recognition and Understanding Workshop, 2001, pp. 53–56.
[18] ——, “Improved pronunciation modeling by properly integrating better approaches
for baseform generation, ranking and pruning,” in ISCA Workshop:
Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp.
77–82.
[19] M. Wester, “Pronunciation modeling for ASR - knowledge-based and dataderived
methods,” Computer Speech and Language, pp. 69–85, 2003.
[20] G. Tajchman, E. Fosler, and D. Jurafsky, “Building multiple pronunciation models
for novel words using exploratory computational phonology,” in European
Conference on Speech Communication and Technology, 1995, pp. 2247–2250.
[21] J. M. Kessens, M. Wester, and H. Strik, “Improving the performance of a Dutch
CSR by modeling within-word and cross-word pronunciaiton variation,” Speech
Communication, vol. 29, pp. 193–207, 1999.
[22] T. J. Hazen, I. L. Hetherington, H. Shu, and K. Livescu, “Pronunciation modeling
using a finite-state transducer representation,” in ISCA Workshop: Pronunciation
Modeling and Lexicon Adaptation for Spoken Language, 2002, pp. 99–104.
120
[23] T. Fukada, T. Yoshimura, and Y. Sagisaka, “Automatic generation of multiple
pronunciations based on neural networks,” Speech Communicaiton, vol. 27, pp.
63–73, 1999.
[24] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation variation analysis with
respect to various linguistic levels and contextual conditions for Mandarin Chinese,”
in European Conference on Speech Communication and Technology, 2001,
pp. 1445–1448.
[25] I. Amdal, F. Korkmazskiy, and A. C. Surendran, “Joint pronunciation modelling
of non-native speakers using data-driven methods,” in International Conference
on Spoken Language processing, vol. 3, 2000, pp. 622–625.
[26] Q. Yang and J.-P. Martens, “Data-driven lexical modeling of pronunciation variations
for ASR,” in International Conference on Spoken Language Processing,
vol. 1, 2000, pp. 417–420.
[27] Q. Yang, J.-P. Martens, P.-J. Ghesquiere, and D. V. Compernolle, “Pronunciation
variation modeling for ASR : Large improvements are possible but small
ones are likely to achieve,” in ISCA Workshop: Pronunciation Modeling and
Lexicon Adaptation for Spoken Language, 2002, pp. 123–128.
[28] E. Fosler-Lussier, I. Amdal, and H.-K. J. Kuo, “On the road to improved lexical
confusability metrics,” in ISCA Workshop: Pronunciation Modeling and Lexicon
Adaptation for Spoken Language, 2002, pp. 53–58.
[29] M.Wester, “Pronunciation variation modeling for Dutch automatic speech recognition,”
Ph.D. dissertation, University of Nijmegen, The Netherlands, 2002.
[30] S. Greenberg and S. Chang, “Linguistic dissection of switchboard-corpus automatic
speech recognition systems,” in the ISCA Workshop on Automatic Speech
Recognition: Challenges for the New Millennium, 2000.
121
[31] L. R. Rabiner, B.-H. Juang, and C.-H. Lee, “An overview of automatic speech
recognition,” in Automatic speech and speaker recognition: Advanced topics, C.-
H. Lee, F. K. Soong, and K. K. Paliwal, Eds. Kluwer Academic, 1996, vol.
Chapter 1.
[32] A. Akmajian, R. A. Demers, A. K. Farmer, and R. M. Harnish, Linguistics: An
Introduction to Language and Communication. The MIT Press, 2001.
[33] D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction
to Natural Language Processing, Computational Linguistics and Speech Recognition.
Prentice Hall, 2000.
[34] C.-y. Tseng and F.-c. Chou, “Machine readable phonetic transcription system for
Chinese dialects spoken in Taiwan,” J. Acoust. Soc. Jpn, vol. 20, pp. 215–223,
1999.
[35] F. Seide and J. C. Wang, “Phonetic modeling in the Philips Chinese continuous
speech recognition system,” in International Symposium. Chinese Spoken
Language Processing, 1998, pp. 54–59.
[36] S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland,
“The HTK book Version 3.0,” 2000.
[37] B. H. Juang and L. R. Rabiner, “A probabilistic distance measure for hidden
markov models,” ATT Technical Journal, vol. 64 No.2, pp. 391–408, 1985.
[38] M. Vihola, M. Harju, P. Salmela, J. Suontausta, and J. Savela, “Two dissimilarity
measures for hmms and their application in phoneme model clustering,” in IEEE
International Conference On Acoustics, Speech, And Signal Processing, vol. 1,
2002, pp. 933–936.
[39] R. Singh, B. Raj, and R. M. Stern, “Structured redefinition of sound units by
122
merging and splitting for improved speech recognition,” in INTERNATIONAL
CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 2000, pp. 151–154.
[40] J. Kohler, “Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities
of sounds,” in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE
PROCESSING, 1996.
[41] A. Kienappel, D. Geller, and R. Bippus, “Cross-language transfer of multilingual
phoneme models,” in ISCA ITRW ASR, 2000, pp. 155–159.
[42] Z. Zhang and S. Furui, “An online incremental speaker adaptation method using
speaker-clustered initial models,” in INTERNATIONAL CONFERENCE ON
SPOKEN LANGUAGE PROCESSING, vol. 3, 2000, pp. 694–697.
[43] P. Geutner, M. Finke, and A. Waibel, “Selection criteria for hypothesis driven
lexical adaptation,” in IEEE International Conference On Acoustics, Speech,
And Signal Processing, vol. 2, 1999, pp. 617–620.
[44] B. T. Tan, Y. Gu, and T. Thomas, “Word confusability measures for vocabulary
selection in speech recognition,” in IEEE AUTOMATIC SPEECH RECOGNITION
AND UNDERSTANDING WORKSHOP, 1999, pp. 185–188.
[45] J. Yi and J. Glass, “Information-theoretic criteria for unit selection synthesis,”
in INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING,
2002, pp. 2617–2620.
[46] Y. Singer and M. K. Warmuth, “Batch and on-line parameter estimation of
gaussian mixtures based on the joint entropy,” in Advances in Neural Information
Processing Systems, 1998, pp. 578–584.
[47] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation modeling with reduced
confusion for mandarin chinese using a three-stage framework,” to be published
in IEEE Transactions on Speech and Audio Processing, 2005.
123
[48] R. Baeza and B. Ribeiro, Modern Information Retrieval. ACM Press, 1999.
[49] F. Wilcoxon, Individual Comparisons by Ranking Methods. Biometrics, 1945,
vol. 1.
[50] M. Saraclar, H. Nock, and S. Khudanpur, “Pronunciation modeling by sharing
Gaussian densities across phonetic models,” Computer Speech and Language,
vol. 14, pp. 136–160, 2000.
[51] B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,”
IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 3043 –
3054, 1992.
[52] W. Chou, C.-H. Lee, and B.-H. Juang, “Minimum error rate training based on
n-best string models,” in IEEE International Conference On Acoustics, Speech,
And Signal Processing, vol. 2, 1993, pp. 652 – 655.
[53] B.-H. J. W. H. C.-H. Lee, “Minimum classification error rate methods for speech
recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3,
pp. 257 – 265, 1997.
[54] L. Bahl, P. Brown, P. d. Souza, and R. Mercer, “Maximum mutual information
estimation of hidden markov model parameters for speech recognition,” in IEEE
International Conference On Acoustics, Speech, And Signal Processing, 1986, pp.
49–52.
[55] D. Povey and P. Woodland, “Improved discriminative training techniques for
large vocabulary continuous speech recognition,” in IEEE International Conference
On Acoustics, Speech, And Signal Processing, vol. 1, 2001, pp. 45 – 48.
[56] R. Schluter, W. Macherey, B. Muller, and H. Ney, “Comparison of discriminative
training criteria and optimization methods for speech recognition,” Speech
Communication, vol. 34, pp. 287–310, 2001.
124
[57] H.-K. J. Kuo, E. Fosler-Lussier, H. Jiang, and C.-H. Lee, “Discriminative training
of language models for speech recognition,” in IEEE International Conference
On Acoustics, Speech, And Signal Processing, vol. 1, 2002, pp. 325–328.
[58] P.Woodland and D. Povey, “Large scale discriminative training of hidden markov
models for speech recognition,” Computer Speech and Language, vol. 16, no. 1,
pp. 25–47, 2002.
[59] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach
to continuous speech recognition,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, pp. 179–190, 1983.
[60] H. Printz and P. Olsen, “Theory and practice of acoustic confusability,” Computer,
Speech and Language, vol. 16, no. 1, pp. 131–164, 2002.
[61] S. Chen, D. Beeferman, and R. Rosenfeld, “Evaluation metrics for language
models,” in DARPA Broadcast News Transcription and Understanding Workshop,
1998, pp. 275–280.
[62] C.-S. Huang, C.-H. Lee, and H.-C. Wang, “New model-based HMM distances
with applications to run-time ASR error estimation and model tuning,” in European
Conference on Speech Communication and Technology, 2003, pp. 457–460.
[63] M.-Y. Tsai and L.-S. Lee, “Pronunciation modeling for spontaneous speech by
maximizing word correct rate in a production-recognition model,” in The ISCA
and IEEE Workshop on Spontaneous Speech Processing and Recognition, 2003.
[64] Y. Deng, M. Mahajan, and A. Acero, “Estimating speech recognition error rate
without acoustic test data,” in European Conference on Speech Communication
and Technology, 2003, pp. 929–932.
125
[65] F. E. Korkmazskiy and B.-H. Juang, “Discriminative training of the pronunciation
networks,” in IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING
WORKSHOP, 1997, pp. 137–144.
[66] F. Korkmazskiy and B.-H. Juang, “Statistical modeling of pronunciation and
production variations for speech recognition,” in International Conference on
Spoken Language Processing, vol. 2, 1998, pp. 149–152.
[67] H. Schramm and P. Beyerlein, “Towards discriminative lexicon optimization,”
in European Conference on Speech Communication and Technology, 2001, pp.
1457–1460.
[68] ——, “Discriminative optimization of the lexical model,” in ISCA Workshop:
Pronunciation Modeling and Lexicon Adaptation for Spoken Language, 2002, pp.
105–110.
[69] W. J. M. Levelt, “Spoken word production: A theory of lexical access,” in National
Academy of Sciences, vol. 98, 2001, pp. 13 464–13 471.
[70] M.-Y. Tsai and L.-S. Lee, “Pronunciation variation analysis based on acoustic
and phonemic distance measures with application examples on Mandarin Chinese,”
in IEEE Workshop on Automatic Speech Recognition and Understanding,
2003, pp. 117– 122.
[71] D. McAllaster, L. Gillick, F. Scattone, and M. Newman, “Fabricating conversational
speech datawith acoustic models: A program to examine model-data
mismatch,” in International Conference on Spoken Language Processing, 1998,
pp. 1847–1850.
[72] J.-X. Yu, “Large vocabulary continuous mandarin speech recognition using finitestate
machines,” Master Thesis, National Taiwan University, 2004. | en |