2010-10-222024-05-17https://scholars.lib.ntu.edu.tw/handle/123456789/679605摘要:近年來,在語料庫語言學的最新發展進程上,已有從傳統手製語料庫,轉向半自動建 立之鉅型語料庫,一直到網路作為語料庫之趨勢。大規模網路語料的可得,使得語言 研究的經驗基礎更形穩固;而這些鉅量語料所呈現出來的豐富異質訊息,更是對於語 言理論研究方法產生極大的衝擊。本計劃延續上一年度之網路語料庫建構之研究,特 別針對社群網路(社交網站與微網誌)擷取語料,並以此語料出發,提出特殊之標記, 並結合社會網路與詞彙網路,對於新詞進行分布偵測與計量研究。<br> Abstract: In recent years, researches on Web as Corpus (WaC) have rapidly emerged. Corpus linguistic approaches witness a huge progress from the transition of traditional manually made corpus to (semi‐) automatically constructed corpora by extracting web data for various linguistic and NLP purposes. This project is an extended attempt of our previous works, with the aim to fill the research gap of Chinese WaC. In particular, it will focus on the social domain (social network, micro‐blog, etc). In addition to the corpus construction, this project also aims to make use of various techniques from social network mining and mash‐up programming for linguistic studies. Finally, innovative social data segmentation and tagging scheme tailored for WaC will be explored and implemented as well.中文平衡網路語料庫 (II): 網路社群語料庫