https://scholars.lib.ntu.edu.tw/handle/123456789/632191
標題: | A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training | 作者: | Lu C.-H Wu Y.-C CHIA-HSIANG YANG |
關鍵字: | CMOS digital integrated circuits; Convolutional neural network; Deep learning; Specialized processor | 公開日期: | 2019 | 卷: | 2019-November | 起(迄)頁: | 65-68 | 來源出版物: | Proceedings - 2019 IEEE Asian Solid-State Circuits Conference, A-SSCC 2019 | 摘要: | This paper presents a deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size. The proposed design enables on-chip training for applications that ask for high security and privacy. Techniques across design abstraction are applied to improve the energy efficiency. Rearrangement of the weights in filters is leveraged to reduce the processing latency by 88%. Integration of fixed-point and floating-point arithmetics reduces the area of the multiplier by 56.8%, resulting in an unified processing element (PE) with 33% less area. In the low-precision mode, clock gating and data gating are employed to reduce the power of the PE cluster by 62%. Maxpooling and ReLU modules are co-designed to reduce the memory usage by 75%. A modified softmax function is utilized to reduce the area by 78%. Fabricated in 40nm CMOS, the chip consumes 18.7 mW and 64.5 mW for inference and training, respectively, at 82 MHz from a 0.6V supply. It achieves an energy efficiency of 2.25 TOPS/W, which is 2.67 times higher than the state-of-the-art learning processors. The chip also achieves a 2?10 5 times higher energy efficiency in training than a high-end CPU. © 2019 Institute of Electrical and Electronics Engineers Inc.. All rights reserved. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85090188108&doi=10.1109%2fA-SSCC47793.2019.9056967&partnerID=40&md5=0c43eabc11db4615de4af9092a2f7df6 https://scholars.lib.ntu.edu.tw/handle/123456789/632191 |
DOI: | 10.1109/A-SSCC47793.2019.9056967 | SDG/關鍵字: | Convolutional neural networks; Energy efficiency; Fixed point arithmetic; Integrated circuit design; Privacy by design; Clock gating; Design abstractions; Fixed points; Fully integrated; High securities; Memory usage; Processing elements; State of the art; Deep learning |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。