A 1.625 TOPS/W SOC for Deep CNN Training and Inference in 28nm CMOS

Liu Y.-TKung CHsieh M.-HWang H.-WLin C.-PYu C.-YChen C.-STZI-DAR CHIUEH2022-04-252022-04-252021https://www.scopus.com/inward/record.uri?eid=2-s2.0-85118437077&doi=10.1109%2fESSCIRC53450.2021.9567756&partnerID=40&md5=3a05b7ca5dcfde6c42f6f796192626f0https://scholars.lib.ntu.edu.tw/handle/123456789/607334This work presents a FloatSD8-based system on chip (SOC) for the inference as well as the training of a convolutional neural networks (CNNs). A novel number format (FloatSD8) is employed to reduce the computational complexity of the convolution circuit. By co-designing data representation and circuit, we demonstrate that the AISOC can achieve high convolution performance and optimal energy efficiency without sacrificing the quality of training. At its normal operating condition (200MHz), the AISOC prototype is capable of 0.69 TFLOPS peak performance and 1.625 TOPS/W in 28nm CMOS. ? 2021 IEEE.AI acceleratorlow-precision neural networkmachine learningSOCCMOS integrated circuitsEnergy efficiencyMachine learningNeural networksProgrammable logic controllersSystem-on-chip28nm CMOSCo-designingConvolutional neural networkData representationsLow-precision neural networkLower precisionNetwork inferenceNeural networks trainingsNeural-networksConvolution[SDGs]SDG7A 1.625 TOPS/W SOC for Deep CNN Training and Inference in 28nm CMOSconference paper10.1109/ESSCIRC53450.2021.95677562-s2.0-85118437077