Transfer Learning for Video Recognition with Scarce Training Data for  Deep Convolutional Neural Network

Yu-Chuan Su; Tzu-Hsuan Chiu; Chun-Yen Yeh; Hsin-Fu Huang; WINSTON HSU; Hsin-Fu Huang;WINSTON HSU;Chun-Yen Yeh;Tzu-Hsuan Chiu;Yu-Chuan Su

doi:http://arxiv.org/abs/1409.4127v2

Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network

Date Issued

2014-09-15

Author(s)

Yu-Chuan Su

Tzu-Hsuan Chiu

Chun-Yen Yeh

Hsin-Fu Huang

WINSTON HSU

DOI

http://arxiv.org/abs/1409.4127v2

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/413054

URL

http://arxiv.org/abs/1409.4127v2

Abstract

Unconstrained video recognition and Deep Convolution Network (DCN) are two active topics in computer vision recently. In this work, we apply DCNs as frame-based recognizers for video recognition. Our preliminary studies, however, show that video corpora with complete ground truth are usually not large and diverse enough to learn a robust model. The networks trained directly on the video data set suffer from significant overfitting and have poor recognition rate on the test set. The same lack-of-training-sample problem limits the usage of deep models on a wide range of computer vision problems where obtaining training data are difficult. To overcome the problem, we perform transfer learning from images to videos to utilize the knowledge in the weakly labeled image corpus for video recognition. The image corpus help to learn important visual patterns for natural images, while these patterns are ignored by models trained only on the video corpus. Therefore, the resultant networks have better generalizability and better recognition rate. We show that by means of transfer learning from image to video, we can learn a frame-based recognizer with only 4k videos. Because the image corpus is weakly labeled, the entire learning process requires only 4k annotated instances, which is far less than the million scale image data sets required by previous works. The same approach may be applied to other visual recognition tasks where only scarce training data is available, and it improves the applicability of DCNs in various computer vision problems. Our experiments also reveal the correlation between meta-parameters and the performance of DCNs, given the properties of the target problem and data. These results lead to a heuristic for meta-parameter selection for future researches, which does not rely on the time consuming meta-parameter search.

Subjects

Computer Science - Computer Vision and Pattern Recognition; Computer Science - Computer Vision and Pattern Recognition; Computer Science - Learning

Type

conference paper

File(s)

Name

1409.4127.pdf

Size

7.75 MB

Format

Adobe PDF

Checksum

(MD5):0839c716021c968733665bb259924318

Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)