Egocentric Activity Recognition by Leveraging Multiple Mid-level Representations
Date Issued
2015
Date
2015
Author(s)
Hsieh, Peng-Ju
Abstract
Existing approaches for egocentric activity recognition mainly rely on a single modality (e.g., detecting interacting objects) to infer the activity category. However, due to the inconsistency between camera angle and subject''s visual field, important objects may be partially occluded or missing in the video frames. Moreover, where the objects are and how we interact with the objects are usually ignored in prior works. To resolve these difficulties, we propose to leverage multiple mid-level representations to improve egocentric activity classification accuracy. Specifically, we aim at utilizing multimodal representations (e.g., background context, objects manipulated by a user, and motion patterns of hands) to compensate the insufficiency of a single modality, and jointly consider what, where, and how a subject is interacting with. To evaluate the method, we introduce a new and challenging egocentric activity dataset (ADL+) that contains video and wrist-worn accelerometer data of people performing daily-life activities. Our approach significantly outperforms the state-of-the-art method on the ADL dataset (i.e., 36.8% to 46.7%) and our ADL+ dataset (i.e., 32.5% to 60.0%) in terms of classification accuracy. In addition, we also conduct a series of analyses to explore relative merits of each modality to egocentric activity recognition.
Subjects
Activity Recognition
Egocentric Video
Feature Fusion
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-R02944011-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):168884e5faef779f16486c4e7d80c59a
