2003-08-012024-05-17https://scholars.lib.ntu.edu.tw/handle/123456789/672035摘要:本計畫將探究自動生成文件標題的四項重要的知識來源,1) 陌生文件(Unseen Documents)的文本知識(Knowledge of Unseen Text,簡稱KUT);2) 訓練文件(Training Documents)的文本語言模式(Language Model for Text,簡稱LMT);3) 訓練文件的標題語言模式(Language Model for Headline,簡稱LMH);4) 訓練文件的標題文本聯合語言模式(Joint Language Model for Document,簡稱JLM)。然後妥善運用這四項知識自動生成陌生文件(亦即測試文件)的標題(Headline for Unseen Document)。中文文件標題自動生成系統不僅可以自動生成「極為簡短的摘要」之外,尚可以為無標題的網頁自動建立標題,為多篇類似的文件建立一個統一的標題。本計畫的工作項目如下所示: 1. 蒐集新聞文件 2. 標記訓練文件 3. 建立訓練模式 4. 建立標題生成模式 5. 測試標題生成模式 <br> Abstract: This project will investigate four important knowledge sources for automatic headline generation. The first is knowledge of unseen text (KUT); the second is language model for text (LMT); the third is language model for headline (LMH); the last but not the least is joint language model for document (JLM). After investigating these knowledge sources, we will apply these knowledge sources to generate headline for news articles in a systematic way. An automatic headline generation system for Chinese documents could not only generate “very short summaries”, but also assign headline for “untitled web pages” and generate one headline for similar documents. The core tasks of this project are shown as follows: 1. Collect news articles 2. Markup news articles 3. Construct learning model 4. Construct generating model 5. Evaluate generating model新聞標題自動生成HeadlineAutomatic Generation新聞文件標題自動生成之研究