T-EAP: Trainable Energy-Aware Pruning for NVM-based Computing-in-Memory Architecture

Chang, Cheng YangCheng YangChangYU-CHUAN CHUANGChou, Kuang ChaoKuang ChaoChouAN-YEU(ANDY) WU2023-06-262023-06-262022-01-019781665409964https://scholars.lib.ntu.edu.tw/handle/123456789/633065While convolutional neural networks (CNNs) are desired for outstanding performance in many applications, the energy consumption for inference becomes enormous. Computing-in-memory architecture based on embedded nonvolatile memory (NVM-CIM) has emerged to improve CNNs' energy efficiency. Recently, NVM crossbar-aware pruning has been extensively studied. However, directly incorporating energy estimation during sparse learning has not been well explored. In this paper, for the first time, we propose T-EAP, a trainable energy-aware pruning method to close the gap between pruning policy and energy optimization for NVM-CIM. Specifically, T-EAP improves the energy-accuracy trade-off by removing redundant weight groups that consume significant energy. Moreover, the trainable thresholds enable end-to-end sparse learning without a laborious train-prune-retrain process. Experimental results based on NeuroSim, which is a circuit-level simulator for CIM systems, show that compared with prior work, T-EAP maintains the accuracy while reducing energy consumption by up to 26.5% and 22.7% for VGG-8 and ResNet-20, respectively. We also provide a layer-wise analysis for energy savings to validate the effectiveness of T-EAP.Computing-in-memory | deep learning accelerator | embedded nonvolatile memory | energy consumption | pruningT-EAP: Trainable Energy-Aware Pruning for NVM-based Computing-in-Memory Architectureconference paper10.1109/AICAS54282.2022.98698492-s2.0-85139042602https://api.elsevier.com/content/abstract/scopus_id/85139042602