Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing
Journal
Proceedings - 2023 11th International Symposium on Computing and Networking, CANDAR 2023
ISBN
9798350306705
Date Issued
2023-01-01
Author(s)
Abstract
Convolutional Neural Networks (CNNs) have been successful in various computer vision tasks. However, the size of state-of-The-Art CNN models tends to be tremendous, which results in very long inference times and high memory usage. Model compression technology such as unstructured pruning can prune a significant proportion of parameters without affecting accuracy, but efficient utilization of sparsity remains a challenge. Column combining compress unstructurally-pruned CNN models by combining multiple sparse columns in a convolutional filter matrix into a single dense column. In addition, pruning all but the largest magnitude weight in each row of the combined column further compresses the matrix effectively. However, previous work did not address the details of partitioning sparse columns to minimize the negative impact of additional pruning on the performance of the model. In this work, we first prove that the column partition problem is an NP-Complete problem. Next, we propose a column combining scheme based on simulated annealing and global unstructured pruning to minimize the adverse effects of additional pruning on model performance. We implement the acceleration of column-combined CNN models using the TVM AI compiler without special hardware support. The proposed scheme achieves more efficient model compression, leading to a 0.65% improvement in accuracy and a 1.24× faster inference time on VGG19 under 88% sparsity with the TinyImageNet dataset.
Subjects
column combining | deep learning | model compression | pruning | simulated annealing | TVM
Type
conference paper