A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training

Lu C.-H; Wu Y.-C; CHIA-HSIANG YANG; Lu C.-H;Wu Y.-C;Yang C.-H.

doi:10.1109/A-SSCC47793.2019.9056967

A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training

Journal

Proceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019

Journal Volume

2019-November

Pages

65-68

Date Issued

2019

Author(s)

Lu C.-H

Wu Y.-C

CHIA-HSIANG YANG

DOI

10.1109/A-SSCC47793.2019.9056967

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85090188108&doi=10.1109%2fA-SSCC47793.2019.9056967&partnerID=40&md5=0c43eabc11db4615de4af9092a2f7df6

https://scholars.lib.ntu.edu.tw/handle/123456789/632191

Abstract

This paper presents a deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size. The proposed design enables on-chip training for applications that ask for high security and privacy. Techniques across design abstraction are applied to improve the energy efficiency. Rearrangement of the weights in filters is leveraged to reduce the processing latency by 88%. Integration of fixed-point and floating-point arithmetics reduces the area of the multiplier by 56.8%, resulting in an unified processing element (PE) with 33% less area. In the low-precision mode, clock gating and data gating are employed to reduce the power of the PE cluster by 62%. Maxpooling and ReLU modules are co-designed to reduce the memory usage by 75%. A modified softmax function is utilized to reduce the area by 78%. Fabricated in 40nm CMOS, the chip consumes 18.7 mW and 64.5 mW for inference and training, respectively, at 82 MHz from a 0.6V supply. It achieves an energy efficiency of 2.25 TOPS/W, which is 2.67 times higher than the state-of-the-art learning processors. The chip also achieves a 2?10 5 times higher energy efficiency in training than a high-end CPU. © 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.

Subjects

CMOS digital integrated circuits; Convolutional neural network; Deep learning; Specialized processor

SDGs

[SDGs]SDG7

Other Subjects

Convolutional neural networks; Energy efficiency; Fixed point arithmetic; Integrated circuit design; Privacy by design; Clock gating; Design abstractions; Fixed points; Fully integrated; High securities; Memory usage; Processing elements; State of the art; Deep learning

Type

conference paper

A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)