Accelerate CNN Models via Filter Pruning and Sparse Tensor Core

Chen A.-T; PANGFENG LIU; Hong D.-Y; Wu J.-J.; Chen A.-T;Liu P;Hong D.-Y;Wu J.-J.

doi:10.1109/CANDAR53791.2021.00009

Accelerate CNN Models via Filter Pruning and Sparse Tensor Core

Journal

Proceedings - 2021 9th International Symposium on Computing and Networking, CANDAR 2021

Pages

1-9

Date Issued

2021

Author(s)

Chen A.-T

PANGFENG LIU

Hong D.-Y

Wu J.-J.

DOI

10.1109/CANDAR53791.2021.00009

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124168182&doi=10.1109%2fCANDAR53791.2021.00009&partnerID=40&md5=c6bb708f6b5ebe70660ec1f2c2dee451

https://scholars.lib.ntu.edu.tw/handle/123456789/607445

Abstract

Convolutional neural network (CNN) is a state-of-The-Art technique in machine learning and has achieved high accuracy in many computer vision tasks. However, the number of the parameters of the models is fast increasing for accuracy improvement; therefore, it requires more computation time and memory space for training and inference. Thus, compressing the model size and improving the inference speed has become an important issue. This paper focuses on filter pruning and NVIDIA sparse tensor core. Filter pruning is one of the model compression methods which uses a method that evaluates the importance of filters in the CNN model and removes the less important ones. NVIDIA sparse tensor core is the hardware support provided by NVIDIA Ampere GPU architecture. The sparse tensor core can speed up the matrix multiplication if the matrix has a structure that manifests as a 2:4 pattern. In this paper, we proposed a hybrid pruning metric to prune the CNN model. The hybrid pruning combines filter pruning and 2:4 pruning. We apply filter pruning to remove the redundant filters in convolutional layers to make the model smaller. Next, we use 2:4 pruning to prune the model according to a 2:4 pattern to utilize the sparse tensor core hardware for speedup. In this hybrid pruning situation, we have also proposed a hybrid ranking metric to decide the filter's importance during filter pruning. In hybrid ranking metric, we will preserve the filters that are important for both of the pruning steps. By considering both metrics, we can achieve higher accuracy than traditional filter prunings. We test our hybrid pruning algorithm on MNIST, SVHN, CIFAR-10 datasets using AlexNet. From our experiments, we concluded that our hybrid ranking method achieves better accuracy than the classic L1-norm metric and the output L1-norm metric. When we prune away 40 percent of filters in the model, our method has 2.8 %, 2.9 %, 2.7% higher accuracy than the classic L1-norm metric and the output L1-norm metric on these three datasets. Next, we evaluate the inference speed. We compare the hybrid pruning model with the models that result from either filter pruning or 2:4 pruning. We find that a hybrid pruning model can be 1.3x faster than the filter pruning model with similar accuracy. ? 2021 IEEE.

Subjects

CNN

Filter Pruning

Machine Learning

Model Compression

Sparse Tensor Core

Computer hardware

Convolutional neural networks

Machine learning

Matrix algebra

Tensors

Convolutional neural network

Filter pruning

High-accuracy

L1 norm

Machine-learning

Model compression

Neural network model

Sparse tensor core

Sparse tensors

State-of-the-art techniques

Convolution

SDGs

[SDGs]SDG3

Type

conference paper

Accelerate CNN Models via Filter Pruning and Sparse Tensor Core

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)