Use Context Information to Improve the Performance of Latent
Dirichlet Allocation

Lin, Che-Yi

Use Context Information to Improve the Performance of Latent Dirichlet Allocation

Date Issued

2014

Date

2014

Author(s)

Lin, Che-Yi

URI

http://ntur.lib.ntu.edu.tw//handle/246246/261418

Abstract

Latent Dirichlet Allocation (LDA), is a wildly used topic model for discovering the topics in documents, however it suffers from many problems like lack of dependency between words and sparse data. The main cause of these problems is the word-sense disambiguation in the natural language. In previous works, they ignore the assumption of "bag of words" and add the dependency between each words. However, we use different approach. In order to solve these problems, we proposed a topic model called context LDA (CLDA) model. The CLDA model first build up concept vectors with context information at each position and use these vectors to distinguish the equivalent relationships between word, then we present a topic model which can take these relationship as input and model the words into latent topics. The CLDA model can not only overcome the word disambiguation problem but also be easily parallelized and extended. With some extra knowledge and slight modification, we show that our model can solve the sparse data problem easily. We conduct several experiments based on 20Newsgroup dataset; in the results we show that our model can actually improve the performance of the original LDA and fix the imbalance topic problem via using the vectors and equivalent relationships. Finally we show the examples of latent topics produced by the LDA model and our model.

Subjects

主題模型

隱含狄利克雷分布

前後文

意義向量

機器學習

隱含主題

Type

thesis

File(s)

Name

ntu-103-R01922027-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):cf10dd75f59504cbea235660c5da8a8c

Use Context Information to Improve the Performance of Latent Dirichlet Allocation

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)