Pseudo Triplet Networks for Classification Tasks with Cross-Source Feature Incompleteness
Journal
International Conference on Information and Knowledge Management, Proceedings
ISBN
9798400701245
Date Issued
2023-10-21
Author(s)
Abstract
Cross-source feature incompleteness - a scenario where certain features are only available in one data source but missing in another - is a common and significant challenge in machine learning. It typically arises in situations where the training data and testing data are collected from different sources with distinct feature sets. Addressing this challenge has the potential to greatly improve the utility of valuable datasets that might otherwise be considered incomplete and enhance model performance. This paper introduces the novel Pseudo Triplet Network (PTN) to address cross-source feature incompleteness. PTN fuses two Siamese network architectures - Triplet Networks and Pseudo Networks. By segregating data into instance, positive, and negative subsets, PTN facilitates effectively contrastive learning through a hybrid loss function design. The model was rigorously evaluated on six benchmark datasets from the UCI Repository, in comparison with five other methods for managing missing data, under a range of feature overlap and missing data scenarios. The PTN consistently exhibited superior performance, displaying resilience in high missing ratio situations and maintaining robust stability across various data scenarios.
Subjects
Classification Tasks | Cross Data Source | Feature Incompleteness | Pseudo Triplet Networks | Tabular Data
Type
conference paper
