Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model

CHENG-HUNG WUHong, Ding YongDing YongHongPANGFENG LIUWu, Jan JanJan JanWu2024-04-302024-04-302023-01-01979835033071715219097https://scholars.lib.ntu.edu.tw/handle/123456789/642189Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.deep neural network | weight pruning[SDGs]SDG2Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Modelconference paper10.1109/ICPADS60453.2023.003982-s2.0-85190245177https://api.elsevier.com/content/abstract/scopus_id/85190245177