Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

Chang, Chih Chiang; HUNG-YI LEE

doi:10.21437/Interspeech.2022-10627

Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

Journal

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Journal Volume

2022-September

Date Issued

2022-01-01

Author(s)

Chang, Chih Chiang

HUNG-YI LEE

DOI

10.21437/Interspeech.2022-10627

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/633656

URL

https://api.elsevier.com/content/abstract/scopus_id/85140053983

Abstract

Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate- and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach.

Subjects

continuous integrate-and-fire | end-to-end model | online sequence-to-sequence model | simultaneous speech translation | streaming

SDGs

[SDGs]SDG16

Type

conference paper

Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)