Learning-Based Fusion of Spatiotemporal Visual Attention Cues for Video
Date Issued
2010
Date
2010
Author(s)
Lee, Wen-Fu
Abstract
Visual attention is an important characteristic of human visual system, useful for image processing and compression. This paper proposes a computational scheme that adopts both low-level and high-level features to predict visual attention from video signal. The low-level and high-level features are fused by using machine learning. The adoption of low-level features (color, orientation, and motion) is based on the study of visual cells, whereas the adoption of human face as a high-level feature is based on the study of media communications. We show that such a scheme is more robust than those using purely single low- or high-level features. Unlike conventional techniques, our scheme is able to learn the relationship between features and visual attention to avoid perceptual mismatch between the estimated saliency and the actual human fixation. We also show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.
Subjects
Visual attention
saliency map
human visual system
eye tracking experiment
fixation distribution
regression
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-99-R97942039-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):1c90e163067c43c9c4c637ed2cda8bb2
