Chen, Po ShaoPo ShaoChenChen, Yen LungYen LungChenLee, Yu ChiYu ChiLeeFu, Zih SingZih SingFuCHIA-HSIANG YANG2024-01-312024-01-312023-01-0100189200https://scholars.lib.ntu.edu.tw/handle/123456789/639431This work presents an accelerator that performs blind deblurring based on the dark channel prior. The alternating minimization algorithm is leveraged for latent image and blur kernel estimation. A 2-D Laplace equation solver is embedded to reduce the latency by 56% for boundary wrapping. For latent image estimation, gradient data locality is employed to reduce the latency by 57%. A sorting engine is designed to reduce the latency in data access by 96% for calculating the dark channel. A pipelined mixed-radix 1-D fast Fourier transform (FFT) engine is used for efficient latent image estimation and blur kernel estimation. By employing image size approximation, 85% of additions and 97% of multiplications for FFT can further be saved. In the blur kernel estimator, a 2-D convolution engine with a parallel architecture is implemented, reducing the latency by 79%. The accelerator supports blur kernels of <inline-formula> <tex-math notation="LaTeX">$25\ttimes 25$</tex-math> </inline-formula> and <inline-formula> <tex-math notation="LaTeX">$49\ttimes 49$</tex-math> </inline-formula> pixels for blurred images of <inline-formula> <tex-math notation="LaTeX">$129\ttimes 129$</tex-math> </inline-formula> and <inline-formula> <tex-math notation="LaTeX">$257\ttimes 257$</tex-math> </inline-formula> pixels, respectively. Fabricated in 40-nm CMOS, the accelerator’s core area is 3.98 mm<inline-formula> <tex-math notation="LaTeX">$^{\text{2}}$</tex-math> </inline-formula>. The chip dissipates 28.8 mW at 65 MHz from a 0.65-V supply. It can estimate a blur kernel of <inline-formula> <tex-math notation="LaTeX">$25\ttimes 25$</tex-math> </inline-formula> pixels for an image patch with <inline-formula> <tex-math notation="LaTeX">$129\ttimes 129$</tex-math> </inline-formula> pixels for deblurring a full-HD image in 1.7 s, achieving a 2562<inline-formula> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> shorter latency than a high-end CPU. Compared with the state-of-the-art design, the chip achieves a four times higher normalized area efficiency and a 7.5<inline-formula> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> higher normalized energy efficiency.Alternating minimization | blind image deblurring | Cameras | Channel estimation | CMOS integrated circuits | Energy efficiency | energy-efficient architecture | Estimation | hardware accelerator | Image restoration | Kernel | Wrapping[SDGs]SDG7A 28.8-mW Accelerator IC for Dark Channel Prior-Based Blind Image Deblurringjournal article10.1109/JSSC.2023.33445392-s2.0-85181569911https://api.elsevier.com/content/abstract/scopus_id/85181569911