https://scholars.lib.ntu.edu.tw/handle/123456789/364753
標題: | Cosdes: A collaborative spam detection system with a novel e-mail abstraction scheme | 作者: | Tseng, C.-Y. Sung, P.-C. MING-SYAN CHEN |
關鍵字: | e-mail abstraction; near-duplicate matching; Spam detection | 公開日期: | 2011 | 卷: | 23 | 期: | 5 | 起(迄)頁: | 669-682 | 來源出版物: | IEEE Transactions on Knowledge and Data Engineering | 摘要: | E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams. Moreover, we design a complete spam detection system Cosdes (standing for COllaborative Spam DEtection System), which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in detection results and is applicable to the real world. © 2006 IEEE. |
URI: | http://www.scopus.com/inward/record.url?eid=2-s2.0-79953225105&partnerID=MN8TOARS http://scholars.lib.ntu.edu.tw/handle/123456789/364753 |
ISSN: | 10414347 | DOI: | 10.1109/TKDE.2010.147 | SDG/關鍵字: | Collaborative spam detection; Data sets; e-mail abstraction; E-mail servers; E-mail spam; Email communication; Layout structure; Matching scheme; Near-duplicate detection; near-duplicate matching; Similarity-matching; Spam database; Spam detection; Spam filtering; Storage utilization; User feedback; Abstracting; Internet; Electronic mail |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。