A Framework for Privacy-preserving Record Linkage Using Bloom Filter
Date Issued
2015
Date
2015
Author(s)
Chang, Li-Chung
Abstract
Record linkage is the task of identifying records from multiple datasets that refer to the same individual. However, it is not an easy work because unique identities cannot be available most of the time so a set of attributes, such as extit{Forename, Gender} and extit{Street}, can be used in light of quasi-identifiers. In addition, as for privacy, various regulations and policies have been made to prohibit people from disclose of identifies, especially in the medical domain. Therefore, lots of methods of privacy-preserving record linkage (PPRL) have been developed to integrate datasets without revealing identifies associated with the records. A recent evaluation has shown that a transformation based on Bloom filter is superior to other approaches, but the encoding may be compromised through frequency-based cryptanalysis. Thus, two methods, RBF and CLK, have been proposed to solve this problem. However, both of them have their own strengths and weaknesses. In this dissertation, we merge these two methods and propose an advance one which we call WCLK. Besides, entropy is used to determine field weights. By giving different weighting to each field, we can improve the accuracy of the linkage results. Finally, without being able to access linkage quality and completeness in practice, threshold determination is a big challenge. Thus, we propose a clustering-based method to find a suitable threshold which can also lead to accurate results of record linkage. Using datasets generated by Febrl, we conduct several empirical experiments to show that our work can perform better than previous ones.
Subjects
record linkage
privacy-preserving
bloom filter
field weighting
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-R02921035-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):551f4ce851bf94c57283a965f8185671
