Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model
Journal
Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISBN
9798350330717
Date Issued
2023-01-01
Author(s)
Abstract
Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.
Subjects
deep neural network | weight pruning
SDGs
Type
conference paper
