杨允言IunnUn-gian14.PPT
《杨允言IunnUn-gian14.PPT》由会员分享,可在线阅读,更多相关《杨允言IunnUn-gian14.PPT(38页珍藏版)》请在三一文库上搜索。
1、,楊允言 Iunn Un-gian 2008.7.14,台語文特性分析 及其處理技術,Written Taiwanese : Its Characteristic Analysis and Processing Techniques,2,Vita,1984-1988 NTU CSIE under 1990/8-1994/1 Sinica IIS assistant 1991-1993 NTHU IS graduate 1994/2-1996/11 NTU CC programmar 1996 migrate to Hualian,3,Vita-2,1999 Dahan I.T. CSIE le
2、cturer 2003/8 - assistant prof. 2004 - NTU CSIE phD program Journal : IJCLCLP 12(4) Project : NSC 3, NMTL 1, Academia Historica 1,4,Outline,Introduction Resources and Survey of Written Taiwanese Processing Coding and I/O of POJ Tone Sandhi Problem and Algorithm,5,Outline-2,Word Segmentation and Tagg
3、ing Methods Corpora Collection and Annotation Some Applications of Written Taiwanese Corpora Conclusion and Future Work,6,1. Introduction,1.1 Background Population : 46M (2005) Distribution : Taiwan, Singapore, Malaysia, Brunei, China, Thailand, Philippines, Indonesia Rank : 21 Confused Name : South
4、ern-Min ? Amoy ? Taiwanese ?,7,1. Introduction-2,1.2 Different Scripts Han Characters Script Romanization Script (POJ) Han-Romanization Mixed Script Others : Kana, Phonetic Symbols, Proverb, ,8,1. Introduction-3,1.3 Phoneme of the Taiwanese Initials (18) Vowels (86) Tones (7) Compared with Mandarin
5、: legal syllable 2726 vs 1200,9,1. Introduction-4,1.4 Some Keypoints Not yet standardized The POJ characters are seperated to different zones in Unicode set Need to Annotate phonetic marker in corpora Interact with Taiwanese group,10,1. Introduction-5,1.5 Motivation My mother tongue 1.6 Definition a
6、nd Glossary 1.7 Goal of This Dissertation 1.8 Organization,11,2. Resources and Survey,2.1 Resources Input method Dictionary Corpus Word segmentation Scripts conversion Text-to-speech 2.2 Survey,12,3. Coding and I/O of POJ,3.1 POJ Character Code Unicode encoding 3.2 Two Kinds of POJ Representation PO
7、J and numbered POJ,13,3. Coding and I/O of POJ-2,3.3 Retrieval of POJ Issue : both case-sensitive and case-insensitive 2-stage retrieval : excute SQL command and then filtering Fuzzy retrieval : toneless, glottal stop, checked syllable, vowel Examples,14,3. Coding and I/O of POJ-3,3.4 Display of POJ
8、 Strategy : Unicode (with specific fonts) or graph POJ to numbered POJ lng la5ng lang5 Numbered POJ to POJ lang5 la5ng lng Priority : o a e u i n m ou5o5u ou5 .,15,3. Coding and I/O of POJ-4,3.5 Word Processing Utilities for POJ Phoneme segmentation : backward direction Spelling checker Syllable / w
9、ord / sentence count,16,4. Tone Sandhi,4.1 Tone Sandhi Problem Types of tone sandhi Normal sandhi Following sandhi Neutral sandhi Double sandhi Pre- sandhi Triplicate sandhi Rising sandhi,17,4. Tone Sandhi-2,4.1 Tone Sandhi Problem Most complicate among the Sino language family Need to find the boun
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 杨允言 IunnUn gian14
链接地址:https://www.31doc.com/p-2665823.html