https://scholars.lib.ntu.edu.tw/handle/123456789/638818
標題: | Mitigating Impacts of Word Segmentation Errors on Collocation Extraction in Chinese | 作者: | Liao, Yongfu SHU-KAI HSIEH |
關鍵字: | Chinese Word Segmentation | Collocation Extraction | Word Vector | 公開日期: | 1-一月-2020 | 起(迄)頁: | 8 - 20 | 來源出版物: | ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing | 摘要: | The prevalence of the web has brought about the construction of many large-scale, automatically segmented and tagged corpora, which inevitably introduces errors due to automation and are likely to have negative impacts on downstream tasks. Collocation extraction from Chinese corpora is one such task that is profoundly influenced by the quality of word segmentation. This paper explores methods to mitigate the negative impacts of word segmentation errors on collocation extraction in Chinese. In particular, we experimented with a simple model that aims to combine several association measures linearly to avoid retrieving false collocations resulting from word segmentation errors. The results of the experiment show that this simple model could not differentiate between true collocations and false collocations resulting from word segmentation errors. An ad hoc case study incorporating information from FastText word vectors is also conducted. The results show that collocates resulting from correct and erroneous word segmentation have different profiles in terms of the semantic similarities between the collocates. The incorporation of word vector information to differentiate between true and false collocations is suggested for future work. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85181112172&partnerID=40&md5=7c62835bd19e8a107e994b1fe7616b16 https://scholars.lib.ntu.edu.tw/handle/123456789/638818 |
ISBN: | 9789869576932 |
顯示於: | 音樂學研究所 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。