Lu C.-HWu Y.-CCHIA-HSIANG YANG2023-06-092023-06-092019https://www.scopus.com/inward/record.uri?eid=2-s2.0-85090188108&doi=10.1109%2fA-SSCC47793.2019.9056967&partnerID=40&md5=0c43eabc11db4615de4af9092a2f7df6https://scholars.lib.ntu.edu.tw/handle/123456789/632191This paper presents a deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size. The proposed design enables on-chip training for applications that ask for high security and privacy. Techniques across design abstraction are applied to improve the energy efficiency. Rearrangement of the weights in filters is leveraged to reduce the processing latency by 88%. Integration of fixed-point and floating-point arithmetics reduces the area of the multiplier by 56.8%, resulting in an unified processing element (PE) with 33% less area. In the low-precision mode, clock gating and data gating are employed to reduce the power of the PE cluster by 62%. Maxpooling and ReLU modules are co-designed to reduce the memory usage by 75%. A modified softmax function is utilized to reduce the area by 78%. Fabricated in 40nm CMOS, the chip consumes 18.7 mW and 64.5 mW for inference and training, respectively, at 82 MHz from a 0.6V supply. It achieves an energy efficiency of 2.25 TOPS/W, which is 2.67 times higher than the state-of-the-art learning processors. The chip also achieves a 2?10 5 times higher energy efficiency in training than a high-end CPU. © 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.CMOS digital integrated circuits; Convolutional neural network; Deep learning; Specialized processor[SDGs]SDG7Convolutional neural networks; Energy efficiency; Fixed point arithmetic; Integrated circuit design; Privacy by design; Clock gating; Design abstractions; Fixed points; Fully integrated; High securities; Memory usage; Processing elements; State of the art; Deep learningA 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip trainingconference paper10.1109/A-SSCC47793.2019.90569672-s2.0-85090188108