Discovering and explaining abnormal nodes in semantic graphs
Journal
IEEE Transactions on Knowledge and Data Engineering
Journal Volume
20
Journal Issue
8
Pages
1039-1052
Date Issued
2008
Author(s)
Abstract
An important problem in the area of homeland security is to identify abnormal or suspicious entities in large data sets. Although there are methods from data mining and social network analysis focusing on finding patterns or central nodes from networks or numerical data sets, there has been little work aimed at discovering abnormal instances in large complex semantic graphs, whose nodes are richly connected with many different types of links. In this paper, we describe a novel unsupervised framework to identify such instances. Besides discovering abnormal instances, we believe that to complete the process, a system has to also provide users with understandable explanations for its findings. Therefore, in the second part of the paper, we describe an explanation mechanism to automatically generate human-understandable explanations for the discovered results. To evaluate our discovery and explanation systems, we perform experiments on several different semantic graphs. The results show that our discovery system outperforms state-of-the-art unsupervised network algorithms used to analyze the 9/11 terrorist network and other graph-based outlier detection algorithms by a significant margin. Additionally, the human study we conducted demonstrates that our explanation system, which provides natural language explanations for the system's findings, allowed human subjects to perform complex data analysis In a much more efficient and accurate manner. © 2008 IEEE.
SDGs
Other Subjects
(e ,3e) process; Central nodes; Complex data; Different types; Explanation systems; Graph-based; Homeland security (HLS); Human subjects; Large data sets; Natural language explanations; Numerical data; Outlier Detection; Semantic graphs; Social network analysis (SNA); Unsupervised network; Administrative data processing; Arts computing; Decision support systems; Electric network analysis; Graph theory; Information management; Information theory; Knowledge management; Online searching; Search engines; Semantics; Set theory; Statistical methods; Security of data
Type
conference paper
