Mining browsing behaviors for objectionable content filtering

Lee L.-H.; Juan Y.-C.; Tseng W.-L.; Chen H.-H.; Tseng Y.-H.; Tseng Y.-H.;Chen H.-H.;Tseng W.-L.;Juan Y.-C.;Lee L.-H.

doi:10.1002/asi.23217

Mining browsing behaviors for objectionable content filtering

Journal

Journal of the Association for Information Science and Technology

Journal Volume

66

Journal Issue

5

Pages

930-942

Date Issued

2015

Author(s)

Lee L.-H.

Juan Y.-C.

Tseng W.-L.

Chen H.-H.

Tseng Y.-H.

DOI

10.1002/asi.23217

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/413107

URL

https://www.scopus.com/inward/record.uri?eid=2-s2.0-84944315865&doi=10.1002%2fasi.23217&partnerID=40&md5=375e179119152d0caa6ed5abd8cbc953

Abstract

This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives. ? 2014 ASIS&T.

Subjects

collaborative filtering

SDGs

[SDGs]SDG16

Other Subjects

Electronic document exchange; Filtration; Hidden Markov models; Markov processes; Websites; Aggregation model; Browsing behavior; Clickthrough data; Content filtering; Contextual relationships; False positive rates; Large scale experiments; Top level domains; Collaborative filtering

Type

journal article

Mining browsing behaviors for objectionable content filtering

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)