Algorithm, VLSI Hardware Architecture and System Design for Smart Surveillance
Date Issued
2012
Date
2012
Author(s)
Chan, Wei-Kai
Abstract
In the next-generation visual surveillance systems, content analysis tools will be integrated. New design issues will arise related to system cost, deployment space, network loading, and system scalability. In this thesis, after a discussion in terms of surveillance pipelines, it is proposed to utilize a content abstraction hierarchy to relieve network loading and increase system scalability, and integrate a hardware content analysis engine into a smart camera System-on-a-Chip (SoC) to reduce system cost and deployment space. As a result, the surveillance IP camera will become a smart camera with the embedded capabilities for automatic content analysis and the network of surveillance IP cameras will become smart surveillance networks.
For the functions of content analysis, the video object segmentation and tracking are two important building blocks for smart surveillance. However, there are several issues needed to be solved. First, the threshold decision is a hard problem for background subtraction video object segmentation. Second, for video object tracking, there are some issues or conditions that make video object tracking hard to be robust, such as non-rigid object motion, target appearance changes due to illumination condition changes, background clutter, ..., etc. In this thesis, by proposing an improve threshold decision algorithm, the threshold for background-subtraction-based video object segmentation can be decided automatically and robustly under sever dynamic backgrounds. Besides, the proposed threshold decision is based on a mechanism different from that in background-subtraction-based video object segmentation, which can prevent possible error propagations. For video object tracking, by using diffusion distance for color histogram matching, the tracker can track non-rigid moving object under sever illumination condition changes, and, by using motion clue from video object segmentation, the tracker can be robust to background clutter. In the experiments results, we show that the presented algorithms are robust under several challenging sequences and our proposed methods are truly effective approaches for the mentioned issues.
Beside of video object segmentation and tracking, two more functions of content analysis are also improved in this thesis. They are video object description, and face detection and scoring. For the video object description, a new descriptor for human objects, Human Color Structure Descriptor (HCSD), is proposed. Experimental results show that the proposed descriptor, HCSD, can achieve better performance than Scalable Color Descriptor and Color Structure Descriptor of MPEG-7 for human objects.
For face detection and scoring, facial images with low resolution in surveillance sequences are hard to detect with traditional approaches. An efficient face detection
and face scoring technique in surveillance systems is proposed. It combines spirits of image-based face detection and essences of video object segmentation to filter out high-quality faces. The proposed face scoring technique, which is useful for surveillance video summary and indexing, includes four scoring functions based on feature extraction and is integrated by a neural network training system to select high-quality face. Experiments show that the proposed algorithm effectively extracts low-resolution human faces, which the traditional face detection algorithms cannot handle well. It can also rank face candidates according to face scores, which determine face quality.
For the hardware content analysis engine, a 5.877 TOPS/W and 111.329 GOPS/mm^2 Reconfigurable Smart-camera Stream Processor (ReSSP) is implemented in 90nm CMOS technology. A coarse-grained reconfigurable image stream processing architecture (CRISPA) along with design techniques of heterogeneous stream processing (HSP) and subword-level parallelism (SLP) is implemented to accelerate the processing algorithms for smart-camera vision applications. With the processor architecture of CRISPA and the design techniques of HSP and SLP, ReSSP can outperform existing vision chips in many aspects of hardware performances. Moreover, the programmability of ReSSP makes it capable of supporting many high-level vision algorithms in high spec, such as the real-time capability for full-HD video analysis. The implementation results show that the on-chip memory can be reduced by 94% with SLP memory sharing. The on-chip memory size, power efficiency and area efficiency are 18.2x to 182x, 4.5x to 33.0x, and 3.8x to 74.2x better than the state-of-the-art chips.
Beside of the algorithms and hardware that are proposed for the single smart camera, this thesis also presents a cooperative surveillance system. It proposes a cooperation scheme between fixed cameras and a mobile robot. The fixed cameras detect the objects with background subtraction and locate the objects on a map with homography transform. At the same time, the information of the target to track, including the position and the appearance, is transmitted to the mobile robot. After
Breadth First Search in a map of Boolean array, the mobile robot finds the target in its view by use of a stochastic scheme with the information given, then the mobile robot will track the target and keep it in the robot''s view wherever he or she goes. By proposing this system, the dead spot problem in typical surveillance systems
with only fixed cameras is considered and resolved.
Besides, the track initialization problem in typical tracking systems, i.e. how to decide the target of interests to be tracked, is also resolved with the proposed cooperation scheme in system level.
Subjects
smart surveillance
smart camera
content analysis
video object segmentation
video object tracking
vision chip
cooperative smart surveillance
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-101-F93943041-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):61434bee4f7caa929cf45fc37e2b9318
