https://scholars.lib.ntu.edu.tw/handle/123456789/629449
標題: | XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding | 作者: | Hsu, Chan Jan HUNG-YI LEE Tsao, Yu |
公開日期: | 1-一月-2022 | 卷: | 2 | 來源出版物: | Proceedings of the Annual Meeting of the Association for Computational Linguistics | 摘要: | Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and fine-tuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/629449 | ISBN: | 9781955917223 | ISSN: | 0736587X |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。