Lee L.-H.Juan Y.-C.Chen H.-H.HSIN-HSI CHEN2019-07-102019-07-1020139781450322638https://scholars.lib.ntu.edu.tw/handle/123456789/413130This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web. Copyright is held by the owner/author(s).Click-through miningCollaborative filteringInternet censorship[SDGs]SDG12Objectionable content filtering by click-through dataconference paper10.1145/2505515.25078492-s2.0-84889607153https://www.scopus.com/inward/record.uri?eid=2-s2.0-84889607153&doi=10.1145%2f2505515.2507849&partnerID=40&md5=211ce9d9fa54c7b3d0729cbdeabfb30b