Improved Summarization of Chinese Spoken Documents by Probabilistic Latent Semantic Analysis (PLSA) with Further Analysis and Integrated Scoring

Sheng-yi KongLIN-SHAN LEE2010-11-132018-07-052010-11-132018-07-052006http://ntur.lib.ntu.edu.tw//handle/246246/220185http://ntur.lib.ntu.edu.tw/bitstream/246246/220185/-1/25.pdfhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-48749112591&doi=10.1109%2fSLT.2006.326808&partnerID=40&md5=5d4890a59d56eaa24fddfbfb310f6643ArubaIn a previous paper [1] two new scoring measures, Topic Significance (TS) and Topic Entropy (TE), obtained from Probabilistic Latent Semantic Analysis (PLSA) were shown to outperform very successful baseline Significance Score (SS) in selecting the important sentences for summarization of spoken documents. In this paper extensive experiments using the ROUGE scores with respect to different parameters at different summarization ratios were carefully analyzed in great detail. It was also found that integration of these two scoring measures offered further improvements, and special considerations of the structure of Chinese language was also helpful when summarizing Chinese spoken documents. ©2006 IEEE.en-USProbabilistic latent semantic analysis; Spoken document; Summarization[SDGs]SDG4Image retrieval; Information theory; Learning systems; Probability; Semantics; Chinese language; Probabilistic latent semantic analysis (PLSA); Scoring measures; Spoken documents; Spoken languages; Linguistics; Semantics; Chinese language; Probabilistic latent semantic analysis; Scoring measures; Spoken document; SummarizationImproved Summarization of Chinese Spoken Documents by Probabilistic Latent Semantic Analysis (PLSA) with Further Analysis and Integrated Scoringconference paper10.1109/SLT.2006.3268082-s2.0-48749112591http://ntur.lib.ntu.edu.tw/bitstream/246246/220185/-1/25.pdf