SenseInput: An Image-Based Sensitive Input Detection Scheme for Phishing Website Detection
Journal
IEEE International Conference on Communications
Journal Volume
2022-May
ISBN
9781538683477
Date Issued
2022-01-01
Author(s)
Abstract
Phishing has persistently posed threats to the World Wide Web as phishing websites evolve over these years. Many previous works were devoted to extracting useful features and focused on the essential components of phishing websites. One of the essential components is sensitive inputs which require sensitive information. Yet, due to a large variety of web designs, detecting the existence of sensitive inputs is not trivial. Some previous works have provided rule-based approaches to detect login forms, which contain sensitive inputs, using HTML codes. However, the novel phishing websites modify HTML codes against the detection rules, which causes less accurate detection.To overcome the limitation of previous works, we proposed SenseInput using hybrid deep learning models to detect the existence of sensitive inputs and sensitive information because phishing websites eventually present sensitive inputs in their visual content. SenseInput achieved 96.94% f1-score for sensitive input detection on our dataset and 96.73% f1-score on a public dataset, Phishpedia Phish30K. Next, we used 22 features involving the proposed seven statistical features and two sensitive input features for phishing detection. The experiment shows that our approach achieves 98.48% and 95.87% f1-score on our validation and Phishpedia datasets, outperforming previous approaches. Finally, we investigated the influence of sensitive input features. The result shows that our sensitive input features are more effective than the rule-based login form. Besides, the experiment also indicates that proposed sensitive input features can reduce the impact of bias between different datasets.
Subjects
computer vision | machine learning | object detection | phishing detection
Type
conference paper
