Accelerate CNN Models via Filter Pruning and Sparse Tensor Core

Chen A.-T;Liu P;Hong D.-Y;Wu J.-J.

Title:	Accelerate CNN Models via Filter Pruning and Sparse Tensor Core
Authors:	Chen A.-T PANGFENG LIU Hong D.-Y Wu J.-J.
Keywords:	CNN;Filter Pruning;Machine Learning;Model Compression;Sparse Tensor Core;Computer hardware;Convolutional neural networks;Machine learning;Matrix algebra;Tensors;Convolutional neural network;Filter pruning;High-accuracy;L1 norm;Machine-learning;Model compression;Neural network model;Sparse tensor core;Sparse tensors;State-of-the-art techniques;Convolution
Issue Date:	2021
Start page/Pages:	1-9
Source:	Proceedings - 2021 9th International Symposium on Computing and Networking, CANDAR 2021
Abstract:	Convolutional neural network (CNN) is a state-of-The-Art technique in machine learning and has achieved high accuracy in many computer vision tasks. However, the number of the parameters of the models is fast increasing for accuracy improvement; therefore, it requires more computation time and memory space for training and inference. Thus, compressing the model size and improving the inference speed has become an important issue. This paper focuses on filter pruning and NVIDIA sparse tensor core. Filter pruning is one of the model compression methods which uses a method that evaluates the importance of filters in the CNN model and removes the less important ones. NVIDIA sparse tensor core is the hardware support provided by NVIDIA Ampere GPU architecture. The sparse tensor core can speed up the matrix multiplication if the matrix has a structure that manifests as a 2:4 pattern. In this paper, we proposed a hybrid pruning metric to prune the CNN model. The hybrid pruning combines filter pruning and 2:4 pruning. We apply filter pruning to remove the redundant filters in convolutional layers to make the model smaller. Next, we use 2:4 pruning to prune the model according to a 2:4 pattern to utilize the sparse tensor core hardware for speedup. In this hybrid pruning situation, we have also proposed a hybrid ranking metric to decide the filter's importance during filter pruning. In hybrid ranking metric, we will preserve the filters that are important for both of the pruning steps. By considering both metrics, we can achieve higher accuracy than traditional filter prunings. We test our hybrid pruning algorithm on MNIST, SVHN, CIFAR-10 datasets using AlexNet. From our experiments, we concluded that our hybrid ranking method achieves better accuracy than the classic L1-norm metric and the output L1-norm metric. When we prune away 40 percent of filters in the model, our method has 2.8 %, 2.9 %, 2.7% higher accuracy than the classic L1-norm metric and the output L1-norm metric on these three datasets. Next, we evaluate the inference speed. We compare the hybrid pruning model with the models that result from either filter pruning or 2:4 pruning. We find that a hybrid pruning model can be 1.3x faster than the filter pruning model with similar accuracy. ? 2021 IEEE.
URI:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124168182&doi=10.1109%2fCANDAR53791.2021.00009&partnerID=40&md5=c6bb708f6b5ebe70660ec1f2c2dee451 https://scholars.lib.ntu.edu.tw/handle/123456789/607445
DOI:	10.1109/CANDAR53791.2021.00009
Appears in Collections:	資訊工程學系

Show full item record

Page view(s)

checked on Apr 27, 2024

Google Scholar^TM

Check

DSpace CRIS

Page view(s)

Google Scholar^TM

Altmetric

Altmetric

Page view(s)

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM