Learning Key Evidence for Detecting Complex Events in Videos

Lai, Kuan-Ting

Learning Key Evidence for Detecting Complex Events in Videos

Date Issued

2015

Date

2015

Author(s)

Lai, Kuan-Ting

URI

http://ntur.lib.ntu.edu.tw//handle/246246/276600

Abstract

Video event detection is one of the most important, yet very challenging, research topics in computer science. The recognition of complex events, e.g. “birthday party”, “wedding ceremony” or “attempting a bike trick”, is even more difficult since complex events consist of various human interactions with different objects in diverse environments with variable time intervals. Currently the most common approach is to extract features from frames or video clips, and then to quantize and pool these features to form a single vector representation for the entire video. While this method is simple and efficient, the final pooling step may lead to the loss of temporally local information, and include many irrelevant features from noisy background. To approach this problem in a different way than in previous methods, we noticed that humans require only a small amount of evidence to recognize an event in a video. For example, a “birthday party” event can be identified by discovering “birthday cake” and “blowing candles”. Inspired by this idea, we propose a novel way to detect complex events, whereby one first identifies the key evidence that can prove the existence of an event, and then utilizes the evidence to recognize videos. Under our framework, each video is represented as multiple “instances”, which are defined as video segments of different temporal intervals. Then we apply learning methods to identify evidence (positive instances) first and utilize the evidence to recognize complex video events. In this thesis, we propose two learning methods. The first proposed method, called maximal evidence learning (MEL), is based on a large-margin formulation that treats instance labels as hidden latent variables, and infers the instance labels and the instance-level classification model simultaneously. MEL can infer optimal solutions by learning as many positive instances as possible from positive videos, and negative instances from negative videos. The second proposed method is called evidence selective ranking (ESR). ESR is based on static-dynamic instance embedding, and employs infinite push ranking to select the most distinctive evidence. Extensive analysis on large-scale video event datasets shows significant performance gains by both methods. In this study, we also demonstrate key selected evidence is meaningful to humans and can be used to locate video segments that signify an event.

Subjects

video event detection

large-margin framework

proportional SVM

infinite push ranking

multiple instance learning

Type

thesis

File(s)

Name

ntu-104-D98921025-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):df9dc1878ef737e5813e8a16d94d9a6d

Learning Key Evidence for Detecting Complex Events in Videos

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)