https://scholars.lib.ntu.edu.tw/handle/123456789/607445
標題: | Accelerate CNN Models via Filter Pruning and Sparse Tensor Core | 作者: | Chen A.-T PANGFENG LIU Hong D.-Y Wu J.-J. |
關鍵字: | CNN;Filter Pruning;Machine Learning;Model Compression;Sparse Tensor Core;Computer hardware;Convolutional neural networks;Machine learning;Matrix algebra;Tensors;Convolutional neural network;Filter pruning;High-accuracy;L1 norm;Machine-learning;Model compression;Neural network model;Sparse tensor core;Sparse tensors;State-of-the-art techniques;Convolution | 公開日期: | 2021 | 起(迄)頁: | 1-9 | 來源出版物: | Proceedings - 2021 9th International Symposium on Computing and Networking, CANDAR 2021 | 摘要: | Convolutional neural network (CNN) is a state-of-The-Art technique in machine learning and has achieved high accuracy in many computer vision tasks. However, the number of the parameters of the models is fast increasing for accuracy improvement; therefore, it requires more computation time and memory space for training and inference. Thus, compressing the model size and improving the inference speed has become an important issue. This paper focuses on filter pruning and NVIDIA sparse tensor core. Filter pruning is one of the model compression methods which uses a method that evaluates the importance of filters in the CNN model and removes the less important ones. NVIDIA sparse tensor core is the hardware support provided by NVIDIA Ampere GPU architecture. The sparse tensor core can speed up the matrix multiplication if the matrix has a structure that manifests as a 2:4 pattern. In this paper, we proposed a hybrid pruning metric to prune the CNN model. The hybrid pruning combines filter pruning and 2:4 pruning. We apply filter pruning to remove the redundant filters in convolutional layers to make the model smaller. Next, we use 2:4 pruning to prune the model according to a 2:4 pattern to utilize the sparse tensor core hardware for speedup. In this hybrid pruning situation, we have also proposed a hybrid ranking metric to decide the filter's importance during filter pruning. In hybrid ranking metric, we will preserve the filters that are important for both of the pruning steps. By considering both metrics, we can achieve higher accuracy than traditional filter prunings. We test our hybrid pruning algorithm on MNIST, SVHN, CIFAR-10 datasets using AlexNet. From our experiments, we concluded that our hybrid ranking method achieves better accuracy than the classic L1-norm metric and the output L1-norm metric. When we prune away 40 percent of filters in the model, our method has 2.8 %, 2.9 %, 2.7% higher accuracy than the classic L1-norm metric and the output L1-norm metric on these three datasets. Next, we evaluate the inference speed. We compare the hybrid pruning model with the models that result from either filter pruning or 2:4 pruning. We find that a hybrid pruning model can be 1.3x faster than the filter pruning model with similar accuracy. ? 2021 IEEE. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124168182&doi=10.1109%2fCANDAR53791.2021.00009&partnerID=40&md5=c6bb708f6b5ebe70660ec1f2c2dee451 https://scholars.lib.ntu.edu.tw/handle/123456789/607445 |
DOI: | 10.1109/CANDAR53791.2021.00009 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。