Intra- And Inter- Layer Transformation to Reduce Memory Traffic for CNN Computation
Journal
ACM International Conference Proceeding Series
Date Issued
2021
Author(s)
Abstract
Edge inference has gained much popularity in recent years. Many AI accelerators have been proposed and extensively studied. Such devices are often packed with a large number of PEs (Processing Elements), and lots of on-chip SRAM. The key to successful AI acceleration is to effectively use the data transferred from off-chip DRAM to the on-chip SRAM. Most prior studies optimize the use of on-chip SRAM for a single convolution layer, they tend to ignore the opportunity of inter-layer data reuse. We have proposed an algorithm to schedule two adjacent layers of CNN operations. Our goal is to reduce traffic between DRAM and local memory more than allocating the buffer to only a single layer. Our cross-layer scheduling effectively reduces the memory traffic. We hav also verified the validity of our memory traffic reduction model on the Gemmini simulator from UC Berkeley. ? 2021 ACM.
Subjects
Computer aided design
Scheduling algorithms
Static random access storage
Adjacent layers
Data reuse
Inter-layers
Intra-layer
Layer data
Layer transformation
Local memory
Off-chip
On chips
Processing elements
Dynamic random access storage
Type
conference paper