2021-01-012024-05-13https://scholars.lib.ntu.edu.tw/handle/123456789/649558"隨著科技與分析技術的演進與發展,許多科學研究的興趣與需求在統計以及大數據資料處理上與日俱增。有鑑於此,本子計畫將著眼於語言(學)資料科學的應用與研究。此計畫將首重非文本型態的資料分析,包含語言學子領域中聲學語音學、構音語音學、以及腦神經語言學中各項行為實驗的資料分析與應用,同時提供一系列相關訓練課程,輔以本所原有之語言學核心課程,以期提升整體教學品質以及研究能量。 本子計畫預計針對三項主題進行教學增能:(一)影像處理、(二)資料聚類分析、以及(三)機器學習。「影像處理」包含了構音影像處理的降維分析、統計視覺化,以及介面化影像分析套件之運用;「資料巨集分析」則針對資料探勘、聚類分析、以及類型識別等主題提供更多元的教學與技術分享;「機器學習」將以語言學議題為出發,引導具備語言學背景的研究人員、師生以機器學習的方式探索不同形式的語言資料,並比較人工類型識別以及機器類型識別的差異,試圖了解人機在解構語言資料時的異同。除了三大主題之外,本計畫亦將推動高階以及應用性統計的發展,透過學習並運用不同的統計分析,本所師生在研究上的質量也將隨之提升。 除了教學增能之外,本計畫亦將建構以超音波構音資料為主體的影像資料庫,以及包含語音聲學資料、腦波訊號及影像資料等的影音資料庫。透過此類影像、影音資料庫的建立,各項行為實驗所蒐集的資料將可更進一步系統性地進行標記、分類,使得未來的檢索與分析更事半功倍。本計畫的目標不僅擴增本所師生資料處理的分析能力,同時也將導入機器學習的機制,並將建構相對應的巨集資料庫以供訓練與測試,以期能將語言學的分析與研究帶向另一個嶄新的階段。" " With development and growth of technology and analytical approaches, including both statistical developments and big data (machine learning and deep learning), more and more research interests and demands associated with the latest, cutting-edge data analyses are requested for linguistic data. Considering these, a new stream of Linguistic Data Science is proposed. This stream of linguistic data science is particularly designed to deal with non-text linguistic data. Specifically, the series of courses under this stream are provided to guide the students how to handle the great amount of data of different types collected from behavioral experiments, and how to apply these analyzed data to the state-of-art machine learning technique. This stream of linguistic data science is designed to cover three major themes: image processing, data clustering, and machine learning. The theme of image processing includes (1) the training of dimensional reduction of the articulatory images, such as ultrasound, MRI, CT, and X-ray; (2) statistical analyses and visualization; and (3) customized toolkit for speech articulatory data. In the theme of data clustering, the series of courses covers (1) the data mining for different types of linguistic data, ranging from discourse (text-based) materials, acoustic data, to articulatory images; and (2) data clustering and pattern recognition. The third theme machine learning focuses on (1) the introduction to machine learning and deep learning from linguists’ points of view; (2) application of machine learning and deep learning to non-text-based linguistic data; and (3) comparison of pattern recognitions by human and machines. Apart from the aforementioned three themes, the ability of employing applied and advanced statistical analyses will also be emphasized. Students are encouraged to take more advanced statistics courses, such as multivariate statistics and Bayesian inference, to complement the analytical skills acquired in this stream. This stream of Linguistic Data Science also intends to create an ultrasound repository. With the construction of such a repository, a large amount of ultrasound image data can be labelled, categorized, and analyzed through the state-of-art analytical techniques. It is our intention to use the automatic, data-driven approach to decompose the data and to match the obtained image information with our linguistic structures. Similar approach will also be applied to other data such as acoustic data, and neuro-behavioral data (e.g., EEG, brain heatmaps, etc.). With the implementation of this stream, our linguistic research will provide more substantial empirical and data-driven evidence. "語言資料科學影像處理資料聚類分析機器學習類型識別Linguistic data scienceimage processingclustering analysismachine learningpattern recognition大專校院人文與社會科學領域標竿計畫(語言學)子計畫一