|Title:||Intra- And Inter- Layer Transformation to Reduce Memory Traffic for CNN Computation||Authors:||Liao P.-W
|Keywords:||Computer aided design;Scheduling algorithms;Static random access storage;Adjacent layers;Data reuse;Inter-layers;Intra-layer;Layer data;Layer transformation;Local memory;Off-chip;On chips;Processing elements;Dynamic random access storage||Issue Date:||2021||Source:||ACM International Conference Proceeding Series||Abstract:||
Edge inference has gained much popularity in recent years. Many AI accelerators have been proposed and extensively studied. Such devices are often packed with a large number of PEs (Processing Elements), and lots of on-chip SRAM. The key to successful AI acceleration is to effectively use the data transferred from off-chip DRAM to the on-chip SRAM. Most prior studies optimize the use of on-chip SRAM for a single convolution layer, they tend to ignore the opportunity of inter-layer data reuse. We have proposed an algorithm to schedule two adjacent layers of CNN operations. Our goal is to reduce traffic between DRAM and local memory more than allocating the buffer to only a single layer. Our cross-layer scheduling effectively reduces the memory traffic. We hav also verified the validity of our memory traffic reduction model on the Gemmini simulator from UC Berkeley. ? 2021 ACM.
|Appears in Collections:||資訊工程學系|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.