Machine Learning of k-Anonymity Data by using Feature Importance and Margin Preservation
Journal
2023 IEEE Global Conference on Artificial Intelligence and Internet of Things, GCAIoT 2023
ISBN
9798350382020
Date Issued
2023-01-01
Author(s)
Abstract
Quality of data is the key for machine learning. In the case of sensitive data, de-identification must be adopted to guarantee data privacy. Supervisory organizations have enacted regulations, e.g., EU GDPR, to govern these behaviors. It makes the reliability of machine learning become a critical challenge as de-identification must lead to the information loss. Previous algorithms (such as k anonymity) only focused on optimizing metrics of anonymity. They do not consider requirements of machine learning and may degrade performance from conducted results. This paper adjusts k-anonymity algorithms by feature importance (FI) and margin preservation (MP) of data classes to resolve this drawback. FI was explored to have a high-degree correlation to machine learning performance. MP not only anonymizes data but also tries to preserve original class margins of data. FI and MP schemes are tested on 4 different k-anonymity algorithms with different values of k. Four different datasets are applied to evaluate the performance for three machine learning models. These comprehensive experiments demonstrate that our schemes can improve machine learning of k-anonymity data. Comparing to the original k-anonymity algorithms, the best improvement of our improved k-anonymity algorithms with FI and MP can lead to over 10% and 17% in f1score, respectively.
Subjects
Data Privacy | De-identification | k-Anonymity | Machine Learning | Reliable Artificial Intelligence
SDGs
Type
conference paper
