Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing

Lin, Chien Hung; Hong, Ding Yong; PANGFENG LIU; Wu, Jan Jan

doi:10.1109/CANDAR60563.2023.00011

Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing

Journal

Proceedings - 2023 11th International Symposium on Computing and Networking, CANDAR 2023

ISBN

9798350306705

Date Issued

2023-01-01

Author(s)

Lin, Chien Hung

Hong, Ding Yong

PANGFENG LIU

Wu, Jan Jan

DOI

10.1109/CANDAR60563.2023.00011

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/640567

URL

https://api.elsevier.com/content/abstract/scopus_id/85185005521

Abstract

Convolutional Neural Networks (CNNs) have been successful in various computer vision tasks. However, the size of state-of-The-Art CNN models tends to be tremendous, which results in very long inference times and high memory usage. Model compression technology such as unstructured pruning can prune a significant proportion of parameters without affecting accuracy, but efficient utilization of sparsity remains a challenge. Column combining compress unstructurally-pruned CNN models by combining multiple sparse columns in a convolutional filter matrix into a single dense column. In addition, pruning all but the largest magnitude weight in each row of the combined column further compresses the matrix effectively. However, previous work did not address the details of partitioning sparse columns to minimize the negative impact of additional pruning on the performance of the model. In this work, we first prove that the column partition problem is an NP-Complete problem. Next, we propose a column combining scheme based on simulated annealing and global unstructured pruning to minimize the adverse effects of additional pruning on model performance. We implement the acceleration of column-combined CNN models using the TVM AI compiler without special hardware support. The proposed scheme achieves more efficient model compression, leading to a 0.65% improvement in accuracy and a 1.24× faster inference time on VGG19 under 88% sparsity with the TinyImageNet dataset.

Subjects

Type

conference paper

Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)