Identifying and Classifying Whale Sounds in Underwater Soundscapes Based on Faster Regionbased Convolutional Neural Networks

Ling, Yu-ChiYu-ChiLingFang, Yin-YingYin-YingFangCHI-FANG CHENTsai, Meng-FanMeng-FanTsaiWeng, Shih-HsienShih-HsienWengKuo, Ting-JungTing-JungKuo2025-06-172025-06-172025-03-02https://www.scopus.com/record/display.uri?eid=2-s2.0-105003138569&origin=resultslisthttps://scholars.lib.ntu.edu.tw/handle/123456789/730151Passive acoustic monitoring is a well-established tool for studying underwater soundscapes, including ship noise, the activity and ecology of marine animal species. As people pay more and more attention to marine ecology and the amount of passive acoustic data collected increases exponentially, an efficient detection model is needed to assist in the analysis of underwater acoustic data. This study proposes Faster R-CNN to identify the sounds of Chinese white dolphins in 36 hours of acoustic data collected at a monitoring point off the coast of Yunlin, Taiwan. The model had an average accuracy of 0.87 and an average area under the receiver operating characteristic curve (AUC-ROC) of ${0. 8 0 2}$. This model output was used to analyze the spatial and temporal patterns of Chinese white dolphin calls, confirming the behavioral patterns of Chinese white dolphins living near Taiwan. This study demonstrates that Faster R-CNN trained on a small data set generalizes well to highly variable signal types under various recording and noise conditions. We demonstrate the utility of transfer learning methods. These results validate the feasibility of applying deep learning models to identify highly variable signals across a wide range of spatial and temporal scales, enabling new discoveries by combining large datasets with cutting-edge tools.Faster R-CNNPassive acoustic monitoringunderwater soundscapewhale voiceprint[SDGs]SDG14Identifying and Classifying Whale Sounds in Underwater Soundscapes Based on Faster Regionbased Convolutional Neural Networksconference paper10.1109/UT61067.2025.10947365