Sampling Heterogeneous Social Network
Date Issued
2012
Date
2012
Author(s)
Yang, Cheng-Lun
Abstract
Social network analysis has been a hot topic in the last few years. Due to the rise of social network sites such as Twitter, Plurk, and Facebook, large amount of data is available for the research world. However, not all social network sites make their full customer database available to general public. Often times, engineers need to write crawlers to crawl websites just to obtain parts of the social network. Data Sampling has been widely used to extract a subset of social network to represent the larger network. Various network properties have been proposed to measure the similarity between the sampled sub-networks with the original network. However, some of the properties only work for homogenous networks, in which nodes and edges are treated the same. In this thesis, we propose a novel network property, the Relational Profile to model the transitional probability between node and link types in a heterogeneous social network, networks which nodes and edges have different types.
We propose a novel sampling by exploration method with the goal to sample a sub-network whose Relational Profile is as close to the Relational Profile of the original network as possible. The experiment result shows that our sampling method produces a more representative sub-network with less sampled nodes and edges. Then we try to solve a real world problem, node type prediction, using Machine Learning method with sampled sub-network as training data. Experiment shows that using Relational Profile as features works better than other features, such as in and out degree, as Relational Profile is more resistant to neighbors of testing nodes missing. Also, with the same amount of nodes sampled, sub-networks created by our sampling method can predict node types with higher accuracy than other baseline methods.
Subjects
Social Network Analysis
Heterogeneous Network
Graph Sampling
Node Type Prediction
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-101-R99944042-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):60cfafe7c9ce10c06fb6b244a3812fd6
