Improving End-to-end Taiwanese-Speech-to-Chinese-Text Translation by Semi-supervised Learning

Lin, Yu ChunYu ChunLinWang, Chung CheChung CheWangJYH-SHING JANG2024-03-072024-03-072023-01-019789869576963https://scholars.lib.ntu.edu.tw/handle/123456789/640560The main challenges in Taiwanese speech recognition are the lack of abundant and publicly available Taiwanese speech corpora, and the inconsistency in the written system of Taiwanese. The former results in insufficient data for speech recognition tasks, while the latter leads to inconsistent output formats and difficulties in interpretation. Therefore, this study takes the speech translation from Taiwanese speech to Chinese text as the task, and builds a speech translation model from Taiwanese speech to Chinese text by combining the pre-trained speech model with the architecture of the end-to-end deep learning model. Our method is based on a small amount of Taiwanese speech paired with Chinese text, and by collecting a large amount of unpaired Taiwanese speech data, and designing various algorithms to use a large amount of unpaired corpus to improve the system of translating Taiwanese speech into Chinese text. The research and discussion are mainly divided into four improvement directions: end-to-end speech translation model, pre-trained speech model features, iterative training method and corpus cleaning. Experimental results show that the above methods can effectively improve the translation performance of Taiwanese speech to Chinese text.Corpus cleaning | End-to-end speech translation | Semi-supervised learningImproving End-to-end Taiwanese-Speech-to-Chinese-Text Translation by Semi-supervised Learningconference paper2-s2.0-85184840400https://api.elsevier.com/content/abstract/scopus_id/85184840400