https://scholars.lib.ntu.edu.tw/handle/123456789/638060
標題: | On the Thresholding Strategy for Infrequent Labels in Multi-label Classification | 作者: | YU-JEN LIN CHIH-JEN LIN |
關鍵字: | F-measure | infrequent labels | multi-label classification | threshold adjustion | 公開日期: | 21-十月-2023 | 來源出版物: | International Conference on Information and Knowledge Management, Proceedings | 摘要: | In multi-label classification, the imbalance between labels is often a concern. For a label that seldom occurs, the default threshold used to generate binarized predictions of that label is usually sub-optimal. However, directly tuning the threshold to optimize F-measure has been observed to overfit easily. In this work, we explain why this overfitting occurs. Then, we analyze the FBR heuristic, a previous technique proposed to address the overfitting issue. We explain its success but also point out some problems unobserved before. Then, we first propose a variant of the FBR heuristic that not only fixes the problems but is also more justifiable. Second, we propose a new technique based on smoothing the F-measure when tuning the threshold. We theoretically prove that, with proper parameters, smoothing results in desirable properties of the tuned threshold. Based on the idea of smoothing, we then propose jointly optimizing micro-F and macro-F as a lightweight alternative free from extra hyperparameters. Our methods are empirically evaluated on text and node classification datasets. The results show that our methods consistently outperform the FBR heuristic. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/638060 | ISBN: | 9798400701245 | DOI: | 10.1145/3583780.3614996 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。