https://scholars.lib.ntu.edu.tw/handle/123456789/607445
Title: | Accelerate CNN Models via Filter Pruning and Sparse Tensor Core | Authors: | Chen A.-T PANGFENG LIU Hong D.-Y Wu J.-J. |
Keywords: | CNN;Filter Pruning;Machine Learning;Model Compression;Sparse Tensor Core;Computer hardware;Convolutional neural networks;Machine learning;Matrix algebra;Tensors;Convolutional neural network;Filter pruning;High-accuracy;L1 norm;Machine-learning;Model compression;Neural network model;Sparse tensor core;Sparse tensors;State-of-the-art techniques;Convolution | Issue Date: | 2021 | Start page/Pages: | 1-9 | Source: | Proceedings - 2021 9th International Symposium on Computing and Networking, CANDAR 2021 | Abstract: | Convolutional neural network (CNN) is a state-of-The-Art technique in machine learning and has achieved high accuracy in many computer vision tasks. However, the number of the parameters of the models is fast increasing for accuracy improvement; therefore, it requires more computation time and memory space for training and inference. Thus, compressing the model size and improving the inference speed has become an important issue. This paper focuses on filter pruning and NVIDIA sparse tensor core. Filter pruning is one of the model compression methods which uses a method that evaluates the importance of filters in the CNN model and removes the less important ones. NVIDIA sparse tensor core is the hardware support provided by NVIDIA Ampere GPU architecture. The sparse tensor core can speed up the matrix multiplication if the matrix has a structure that manifests as a 2:4 pattern. In this paper, we proposed a hybrid pruning metric to prune the CNN model. The hybrid pruning combines filter pruning and 2:4 pruning. We apply filter pruning to remove the redundant filters in convolutional layers to make the model smaller. Next, we use 2:4 pruning to prune the model according to a 2:4 pattern to utilize the sparse tensor core hardware for speedup. In this hybrid pruning situation, we have also proposed a hybrid ranking metric to decide the filter's importance during filter pruning. In hybrid ranking metric, we will preserve the filters that are important for both of the pruning steps. By considering both metrics, we can achieve higher accuracy than traditional filter prunings. We test our hybrid pruning algorithm on MNIST, SVHN, CIFAR-10 datasets using AlexNet. From our experiments, we concluded that our hybrid ranking method achieves better accuracy than the classic L1-norm metric and the output L1-norm metric. When we prune away 40 percent of filters in the model, our method has 2.8 %, 2.9 %, 2.7% higher accuracy than the classic L1-norm metric and the output L1-norm metric on these three datasets. Next, we evaluate the inference speed. We compare the hybrid pruning model with the models that result from either filter pruning or 2:4 pruning. We find that a hybrid pruning model can be 1.3x faster than the filter pruning model with similar accuracy. ? 2021 IEEE. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124168182&doi=10.1109%2fCANDAR53791.2021.00009&partnerID=40&md5=c6bb708f6b5ebe70660ec1f2c2dee451 https://scholars.lib.ntu.edu.tw/handle/123456789/607445 |
DOI: | 10.1109/CANDAR53791.2021.00009 |
Appears in Collections: | 資訊工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.