Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model

CHENG-HUNG WU; Hong, Ding Yong; PANGFENG LIU; Wu, Jan Jan

doi:10.1109/ICPADS60453.2023.00398

Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model

Journal

Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

ISBN

9798350330717

Date Issued

2023-01-01

Author(s)

CHENG-HUNG WU

Hong, Ding Yong

PANGFENG LIU

Wu, Jan Jan

DOI

10.1109/ICPADS60453.2023.00398

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/642189

URL

https://api.elsevier.com/content/abstract/scopus_id/85190245177

Abstract

Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.

Subjects

deep neural network | weight pruning

SDGs

[SDGs]SDG2

Type

conference paper

Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)