https://scholars.lib.ntu.edu.tw/handle/123456789/640115
標題: | Machine Learning of k-Anonymity Data by using Feature Importance and Margin Preservation | 作者: | RAY-I CHANG Lee, Cheng Yen Chen, Po Wei Wang, Chia Hui |
關鍵字: | Data Privacy | De-identification | k-Anonymity | Machine Learning | Reliable Artificial Intelligence | 公開日期: | 1-一月-2023 | 來源出版物: | 2023 IEEE Global Conference on Artificial Intelligence and Internet of Things, GCAIoT 2023 | 摘要: | Quality of data is the key for machine learning. In the case of sensitive data, de-identification must be adopted to guarantee data privacy. Supervisory organizations have enacted regulations, e.g., EU GDPR, to govern these behaviors. It makes the reliability of machine learning become a critical challenge as de-identification must lead to the information loss. Previous algorithms (such as k anonymity) only focused on optimizing metrics of anonymity. They do not consider requirements of machine learning and may degrade performance from conducted results. This paper adjusts k-anonymity algorithms by feature importance (FI) and margin preservation (MP) of data classes to resolve this drawback. FI was explored to have a high-degree correlation to machine learning performance. MP not only anonymizes data but also tries to preserve original class margins of data. FI and MP schemes are tested on 4 different k-anonymity algorithms with different values of k. Four different datasets are applied to evaluate the performance for three machine learning models. These comprehensive experiments demonstrate that our schemes can improve machine learning of k-anonymity data. Comparing to the original k-anonymity algorithms, the best improvement of our improved k-anonymity algorithms with FI and MP can lead to over 10% and 17% in f1score, respectively. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/640115 | ISBN: | 9798350382020 | DOI: | 10.1109/GCAIoT61060.2023.10385127 |
顯示於: | 工程科學及海洋工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。