Generalization-Aware Zero-Shot Neural Architecture Search for Self-Supervised Transformers

Ko, Jun-HuaJun-HuaKoChiueh, Tzi-DarTzi-DarChiueh2026-04-162026-04-162025-11-1421614393https://www.scopus.com/record/display.uri?eid=2-s2.0-105029356117&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/737230Neural Architecture Search (NAS) aims to automate the design of neural networks, enabling the discovery of highly effective architectures. Recent advancements in NAS have shown significant success in identifying high-performing Transformer architectures for computer vision and natural language processing tasks. However, most NAS research has focused on supervised learning frameworks, which rely heavily on labeled data. This dependence on labeled data makes deploying these methods in real-world applications challenging due to the high cost of data annotation. Additionally, previous studies often prioritize model performance while neglecting generalization ability, particularly in scenarios with limited labeled data. To address these challenges, this study introduces a generalization-aware zero-shot proxy based on self-supervised learning. By combining this proxy with a complementary zero-shot proxy, we identify architectures that balance generalization ability and expressivity. Experimental results demonstrate that the architectures discovered using the proposed approach achieve competitive performance on the ImageNet and Wikitext-2 datasets while significantly reducing the required labeled data by up to 75% and 99%, respectively.falseComputer VisionNatural Language ProcessingNeural Architecture SearchSelf-Supervised LearningGeneralization-Aware Zero-Shot Neural Architecture Search for Self-Supervised Transformersconference paper10.1109/ijcnn64981.2025.112293572-s2.0-105029356117