This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator

Tsou Y.-T; Chent K.-H; CHIA-LIN YANG; Cheng H.-Y; Chen J.-J; Tsai D.-Y.; Tsou Y.-T;Chent K.-H;Yang C.-L;Cheng H.-Y;Chen J.-J;Tsai D.-Y.

doi:10.1109/ASP-DAC52403.2022.9712536

This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator

Journal

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Journal Volume

2022-January

Pages

702-707

Date Issued

2022

Author(s)

Tsou Y.-T

Chent K.-H

CHIA-LIN YANG

Cheng H.-Y

Chen J.-J

Tsai D.-Y.

DOI

10.1109/ASP-DAC52403.2022.9712536

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85126122494&doi=10.1109%2fASP-DAC52403.2022.9712536&partnerID=40&md5=479ff55821c4e72f52afe6d78e976704

https://scholars.lib.ntu.edu.tw/handle/123456789/632186

Abstract

Resistive memory-based computing-in-memory (CIM) has been considered as a promising solution to accelerate convolutional neural networks (CNN) inference, which stores the weights in crossbar memory arrays and performs in-situ matrix-vector multiplications (MVMs) in an analog manner. Several techniques assume that a whole crossbar can operate concurrently and discuss how to efficiently map the weights onto crossbar arrays. However, in practice, the accumulated effect of per-cell current deviation and Analog-to-Digital-Converter overhead may greatly degrade inference accuracy, which motivates the concept of Operation Unit (OU), by which an operation per cycle in a crossbar only involve limited wordlines and bitlines to preserve satisfactory inference accuracy. With OU-based operations, the mapping of weights and scheduling strategy for parallelizing CNN convolution operations should take the cost of communication overhead and resource utilization into consideration to optimize the inference acceleration. In this work, we propose the first optimization framework named SPATEM, that efficiently executes MVMs with OU-based operations on ReRAM-based CIM accelerators. It decouples the design space into tractable steps, models the expected inference latency, and derives an optimized spatial-temporal-aware scheduling strategy. By comparing with state-of-the-arts, the experimental result shows that the derived scheduling strategy of SPATEM achieves on average 29.24% inference latency reduction with 31.28% less communication overhead by exploiting more originally unused crossbar cells. © 2022 IEEE.

SDGs

[SDGs]SDG8

Other Subjects

Analog to digital conversion; Convolutional neural networks; Rhenium compounds; RRAM; Scheduling; Communication overheads; Convolutional neural network; Matrix vector multiplication; Memory based computing; Operations units; Optimization framework; Resistive memory; Scheduling strategies; Spatial temporals; Unit-based; Convolution

Type

conference paper

This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)