Fully nested neural network for adaptive compression and quantization

Cui Y; Liu Z; Yao W; Li Q; Chan A.B; Xue C.J.; TEI-WEI KUO; Cui Y;Liu Z;Yao W;Li Q;Chan A.B;Kuo T.-W;Xue C.J.

Fully nested neural network for adaptive compression and quantization

Journal

IJCAI International Joint Conference on Artificial Intelligence

Journal Volume

2021-January

Pages

2080-2087

Date Issued

2020

Author(s)

Cui Y

Liu Z

Yao W

Li Q

Chan A.B

Xue C.J.

TEI-WEI KUO

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097350250&partnerID=40&md5=ed52e7350f663feda9a06540f4af24f5

https://scholars.lib.ntu.edu.tw/handle/123456789/581463

Abstract

Neural network compression and quantization are important tasks for fitting state-of-the-art models into the computational, memory and power constraints of mobile devices and embedded hardware. Recent approaches to model compression/quantization are based on reinforcement learning or search methods to compress/quantize the neural network for a specific hardware platform. However, these methods require multiple runs to compress/quantize the same base neural network to different hardware setups. In this work, we propose a fully nested neural network (FN3) that runs only once to build a nested set of compressed/quantized models, which is optimal for different resource constraints. Specifically, we exploit the additive characteristic in different levels of building blocks in neural network and propose an ordered dropout (ODO) operation that ranks the building blocks. Given a trained FN3, a fast heuristic search algorithm is run offline to find the optimal removal of components to maximize the accuracy under different constraints. Compared with the related works on adaptive neural network designed only for channels or bits, the proposed approach is unified for different levels of building blocks (bits, neurons, channels, residual paths and layers). Empirical results validate strong practical performance of the proposed approach. ? 2020 Inst. Sci. inf., Univ. Defence in Belgrade. All rights reserved.

Other Subjects

Heuristic algorithms; Optimization; Reinforcement learning; Adaptive compression; Adaptive neural networks; Heuristic search algorithms; Model compression; Network compression; Power constraints; Resource Constraint; Specific hardware; Multilayer neural networks

Type

conference paper

Fully nested neural network for adaptive compression and quantization

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)