指導教授:林守德臺灣大學:資訊工程學研究所梁安群Liang, An-ChunAn-ChunLiang2014-11-262018-07-052014-11-262018-07-052014http://ntur.lib.ntu.edu.tw//handle/246246/261487在網路社群的快速發展的同時,個人化語言模型在相關領域的應用 上也佔了一席之地。其中基於類神經網路的語言模型在最近的研究與 應用上也更為廣泛且勝過傳統的統計語言模型。為了解決資料稀疏對 於語言模型的學習所造成的問題,我們提出了一個嶄新的領域適應方 法,加入限制使目標模型的連續文字特徵與社群網路中其他有關係的 使用者的來源模型中的連續文字特徵具有一定程度的相似性。這方法 不需要使用到來源模型的文字資料,只需要用到模型中的連續文字特 徵參數,佔用的硬碟與記憶體資源很小,方便應用在智慧型手機等隨 身攜帶裝置。在實驗中顯示了我們的方法比其他方法表現的更穩定, 而他也能與其他方法合併使用以達到更大的幫助。Personalized language models play an important role in many real world applications as the online social network services blossom. Neural network based language models are increasingly popular and outperforming traditional n-gram language models recently. To deal with the data sparseness problem in training the personalized language models, we propose a novel domain adap- tation method based on regularization on distributed word representations of neural network based language models from other models in the social net- work. Our method does not requires the text data of the source domain but only needs the parameters of the source model. Thus it requires less mem- ory and disk space, which is limited on smart-phone devices. We show that our method is more robust and is able to transfer knowledge from dissimilar domains during cross-individual adaptation. Our method is able to combine with the linear interpolation adaptation methods to make further improvement in cross-domain adaptation.誌謝 i 摘要 ii Abstract iii 1 Introduction 1 2 Related works 3 2.1 Cross-domainAdaptation.......................... 3 2.2 AdaptationonPersonalizedlanguagemodel . . . . . . . . . . . . . . . . 4 2.3 Adaptation on neural network based language model . . . . . . . . . . . 4 3 Method 6 3.1 ProblemDefinition ............................. 6 3.2 Preliminary ................................. 6 3.2.1 Languagemodel .......................... 6 3.2.2 Neuralnetworkbasedlanguagemodel . . . . . . . . . . . . . . . 7 3.2.3 Log-bilinearLanguageModel ................... 8 3.2.4 Domainadaptation ......................... 9 3.3 Domainadaptationbyparameterregularization . . . . . . . . . . . . . . 10 3.3.1 Continuousspacewordrepresentation . . . . . . . . . . . . . . . 10 3.3.2 Constraintsonthewordrepresentations . . . . . . . . . . . . . . 11 4 Experiments and Results 13 4.1 Evaluationmetrics ............................. 14 4.1.1 AuthorAttribution ......................... 14 4.1.2 Perplexity.............................. 15 4.2 Methodstocompare ............................ 15 4.2.1 Baseline............................... 15 4.2.2 FNNLMadaptationwithacascadednetwork . . . . . . . . . . . 16 4.3 Cross-individualadaptation......................... 16 4.3.1 Twitterdataset ........................... 16 4.3.2 Result................................ 17 4.3.3 Multiplesources .......................... 19 4.4 cross-domainadaptation .......................... 20 4.4.1 Amazonreviews .......................... 20 4.4.2 Result................................ 21 4.5 Discussion.................................. 22 4.5.1 Dissimilaritybetweendomains................... 22 5 Conclusion 24 A Appendix 26 A.0.2 Wordrepresentationlengthselection. . . . . . . . . . . . . . . . 26 Bibliography 27390365 bytesapplication/pdf論文公開時間:2015/08/21論文使用權限:同意有償授權(權利金給回饋學校)領域適應語言模型類神經網路語言模型領域適應於個人化類神經網路語言模型Domain Adaptation on Personalized Neural Netowrk Based Language Modelsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/261487/1/ntu-103-R01922038-1.pdf