Liao, YongfuYongfuLiaoSHU-KAI HSIEH2024-01-232024-01-232020-01-019789869576932https://www.scopus.com/inward/record.uri?eid=2-s2.0-85181112172&partnerID=40&md5=7c62835bd19e8a107e994b1fe7616b16https://scholars.lib.ntu.edu.tw/handle/123456789/638818The prevalence of the web has brought about the construction of many large-scale, automatically segmented and tagged corpora, which inevitably introduces errors due to automation and are likely to have negative impacts on downstream tasks. Collocation extraction from Chinese corpora is one such task that is profoundly influenced by the quality of word segmentation. This paper explores methods to mitigate the negative impacts of word segmentation errors on collocation extraction in Chinese. In particular, we experimented with a simple model that aims to combine several association measures linearly to avoid retrieving false collocations resulting from word segmentation errors. The results of the experiment show that this simple model could not differentiate between true collocations and false collocations resulting from word segmentation errors. An ad hoc case study incorporating information from FastText word vectors is also conducted. The results show that collocates resulting from correct and erroneous word segmentation have different profiles in terms of the semantic similarities between the collocates. The incorporation of word vector information to differentiate between true and false collocations is suggested for future work.Chinese Word Segmentation | Collocation Extraction | Word VectorMitigating Impacts of Word Segmentation Errors on Collocation Extraction in Chineseconference paper2-s2.0-85181112172