The Impact of Feature Normalization on Different Feature Types of Medical Datasets

Hu, Ya HanYa HanHuKANG ERNEST LIUTsai, Chih FongChih FongTsai2023-12-252023-12-252023-05-129798400700712https://www.scopus.com/inward/record.uri?eid=2-s2.0-85178030324&doi=10.1145%2f3608298.3608304&partnerID=40&md5=4e32bcb1e94e7163cf586d999e8d3e7bhttps://scholars.lib.ntu.edu.tw/handle/123456789/638106To obtain quality data mining results, data pre-processing is usually performed in the knowledge discovery in databases (KDD) process. Particularly, feature normalization or scaling is one important step in data pre-processing. This is because many datasets usually contain some features that have broad ranges of values, and feature normalization is applied to normalize or rescale each feature value to a fixed range, usually between 0 and 1. For the medical domain datasets, they usually contain three different kinds of data including categorical, numerical, and the mixed data type, this paper examines the effect of performing feature normalization on the three different types of medical datasets. Our experimental results indicate that for the categorical and some mixed types of datasets performing feature normalization does not necessarily make the k-NN and SVM classifiers perform better than the ones without feature normalization. On the other hand, for the numerical type of datasets k-NN and SVM by feature normalization perform better than the baseline classifiers.data preprocessing | feature normalization | medical datasets | pattern classificationThe Impact of Feature Normalization on Different Feature Types of Medical Datasetsconference paper10.1145/3608298.36083042-s2.0-85178030324https://api.elsevier.com/content/abstract/scopus_id/85178030324