語音驅動之3維人臉動畫

陳炳宇臺灣大學：資訊管理學研究所黃鈞澤Huang, Jun-ZeJun-ZeHuang2007-11-262018-06-292007-11-262018-06-292006http://ntur.lib.ntu.edu.tw//handle/246246/54277要創造一段講特定語音的3維人臉動畫是很困難的。即使是對一個專業的動畫師而言, 也要花費許多的時間。我們的研究提供了一個語音驅動的3維人臉動畫系統，可以讓使用者輕易地產生人臉動畫。使用者只要給一段語音當成輸入，我們的系統就會輸出一段講輸入語音的3維人臉動畫我們的系統主要分為三個部份: 第一個是MMM (multidimensional morphable model).MMM主要是利用機械學習的方式，從訓練影片所建立的模型。我們使用MMM來產生對應輸入語音的真實語音影片。第二個部份是臉部追蹤。臉部追蹤可以從合成的語音影片中找出在人臉上的特徵點所在位置。第三個部份是Mesh-IK(mesh based inverse kinematics).Mesh-IK以特徵點的移動當成指導方針來變形3維人臉模型，並且使得產生的模型相似於對應的語音影片的影格。所以我們可以輸出一段3維的人臉動畫。臉部追蹤和Mesh-IK也可把真實的語音影片或表情影片當成輸入，然後產生對應的語音或是表情的人臉動畫。It is often difficult to animate a face model speaking a specific speech. Even for professional animators, it will take a lot of time. Our work provides a speech-driven 3D facial animation system which allows the user to easily generate facial animations. The user only needs to give a speech as the input. The output will be a 3D facial animation relative to the input speech. Our work can be divided into three sub-systems: One is the MMM (multidimensional morphable model). MMM is build from the pre-recorded training video using machine learning techniques. We can use MMM to generate realistic speech video respect to the input speech. The second part is Facial Tracking. Facial Tracking can extract the feature points of a human subject in the synthetic speech video. The third part is Mesh-IK (mesh based inverse kinematics). Mesh-IK takes the motion of feature points as the guide line to deform 3D face models, and makes the result model have the same looking in the corresponding frame of the speech video. Thus we can have a 3D facial animation as the output. Facing Tracking and Mesh-IK can also take a real speech video or even a real expression video as the input, and produce the corresponding facial animations.1. Introduction 9 2. Related Work 11 3. System Overview 17 4. MMM 19 4.1 Corpus Recording 19 4.2 Pre-Processing 19 4.3 Building a MMM 21 4.3.1 PCA 22 4.3.2 K-means Clustering 22 4.3.3 Dijkstra 24 4.4 MMM Synthesis 25 4.5 Analysis 26 4.6 Trajectory Synthesis 29 4.7 Post-Processing 31 5 Facial Tracking 35 6 MeshIK 41 6.1 Feature Vectors 41 6.2 Linear Feature Space 45 6.3 Nonlinear Feature Space 45 7. Result 49 7.2 Synthetic speech video driven facial animation2 51 7.3 Real speech video driven facial animation 52 7.4 Real expression video driven facial animation 54 8. Conclusion & Future Work 57 8.1 Conclusion 57 8.2 Future Work 57 9. Reference 582544573 bytesapplication/pdfen-US語音臉部動畫追蹤speechfacial animationtracking語音驅動之3維人臉動畫Speech-Driven 3D Facial Animationotherhttp://ntur.lib.ntu.edu.tw/bitstream/246246/54277/1/ntu-95-R93725012-1.pdf