Machine Learning of k-Anonymity Data by using Feature Importance and Margin Preservation

RAY-I CHANG; Lee, Cheng Yen; Chen, Po Wei; Wang, Chia Hui

doi:10.1109/GCAIoT61060.2023.10385127

Machine Learning of k-Anonymity Data by using Feature Importance and Margin Preservation

Journal

2023 IEEE Global Conference on Artificial Intelligence and Internet of Things, GCAIoT 2023

ISBN

9798350382020

Date Issued

2023-01-01

Author(s)

RAY-I CHANG

Lee, Cheng Yen

Chen, Po Wei

Wang, Chia Hui

DOI

10.1109/GCAIoT61060.2023.10385127

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/640115

URL

https://api.elsevier.com/content/abstract/scopus_id/85184664087

Abstract

Quality of data is the key for machine learning. In the case of sensitive data, de-identification must be adopted to guarantee data privacy. Supervisory organizations have enacted regulations, e.g., EU GDPR, to govern these behaviors. It makes the reliability of machine learning become a critical challenge as de-identification must lead to the information loss. Previous algorithms (such as k anonymity) only focused on optimizing metrics of anonymity. They do not consider requirements of machine learning and may degrade performance from conducted results. This paper adjusts k-anonymity algorithms by feature importance (FI) and margin preservation (MP) of data classes to resolve this drawback. FI was explored to have a high-degree correlation to machine learning performance. MP not only anonymizes data but also tries to preserve original class margins of data. FI and MP schemes are tested on 4 different k-anonymity algorithms with different values of k. Four different datasets are applied to evaluate the performance for three machine learning models. These comprehensive experiments demonstrate that our schemes can improve machine learning of k-anonymity data. Comparing to the original k-anonymity algorithms, the best improvement of our improved k-anonymity algorithms with FI and MP can lead to over 10% and 17% in f1score, respectively.

Subjects

Data Privacy | De-identification | k-Anonymity | Machine Learning | Reliable Artificial Intelligence

SDGs

[SDGs]SDG8

Type

conference paper

Machine Learning of k-Anonymity Data by using Feature Importance and Margin Preservation

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)