A 40-nm 91-mW, 90-fps Learning-Based Full HD Super-Resolution Accelerator
Journal
IEEE Journal of Solid-State Circuits
Journal Volume
58
Journal Issue
2
Date Issued
2023-02-01
Author(s)
Abstract
Super-resolution has been utilized in a plenty of applications to provide better visual experience. To meet the high-throughput and low-power needs, some dedicated accelerators for super-resolution have been proposed. Neural-network (NN)-based super-resolution accelerators achieve impressive restoration performance, but the high-computational complexity does not allow a high throughput for video streaming. This work presents a super-resolution accelerator that implements the rapid and accurate image super-resolution (RAISR) algorithm for reconstructing super-resolution images. The utilization of the low-resolution (LR) upscaler is increased by 50% by the proposed memory scheduling scheme. Kernel compression is utilized to reduce the overall on-chip memory by 72%. A patch reuse scheme achieves a 91% reduction in external memory access times compared to the direct-mapped design. The architecture is flexible to reconstruct full HD images with a variety of upscaling factors (2×, 3×, 4×). Fabricated in a 40-nm CMOS technology, the proposed super-resolution accelerator integrates 3.11-M gates in a core area of 3.33 mm2. The chip is able to deliver a throughput of 90 frame/s (fps) for all supported upscaling factors and dissipates 91 mW at 200 MHz. Compared with the state-of-the-art designs, this work achieves a 5.4-to-28.4× higher normalized throughput with 5.1-to-36× lower normalized energy dissipation.
Subjects
CMOS integrated circuits | energy-efficient architecture | hardware accelerator | machine learning | super-resolution
Type
journal article