Contrast-Enhanced Semi-supervised Text Classification with Few Labels
Journal
Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Journal Volume
36
ISBN
1577358767
Date Issued
2022-06-30
Author(s)
Abstract
Traditional text classification requires thousands of annotated data or an additional Neural Machine Translation (NMT) system, which are expensive to obtain in real applications. This paper presents a Contrast-Enhanced Semi-supervised Text Classification (CEST) framework under label-limited settings without incorporating any NMT systems.We propose a certainty-driven sample selection method and a contrastenhanced similarity graph to utilize data more efficiently in self-training, alleviating the annotation-starving problem. The graph imposes a smoothness constraint on the unlabeled data to improve the coherence and the accuracy of pseudolabels. Moreover, CEST formulates the training as a "learning from noisy labels" problem and performs the optimization accordingly. A salient feature of this formulation is the explicit suppression of the severe error propagation problem in conventional semi-supervised learning. With solely 30 labeled data per class for both training and validation dataset, CEST outperforms the previous state-of-the-art algorithms by 2.11% accuracy and only falls within the 3.04% accuracy range of fully-supervised pre-training language model finetuning on thousands of labeled data.
Type
conference paper
