Intra- And Inter- Layer Transformation to Reduce Memory Traffic for CNN Computation

Liao P.-WWEI-CHUNG HSUSHIH-WEI LIAO2022-04-252022-04-252021https://www.scopus.com/inward/record.uri?eid=2-s2.0-85115975157&doi=10.1145%2f3458744.3473353&partnerID=40&md5=b68ae2dfad5c8216bbb297f07a612b45https://scholars.lib.ntu.edu.tw/handle/123456789/607462Edge inference has gained much popularity in recent years. Many AI accelerators have been proposed and extensively studied. Such devices are often packed with a large number of PEs (Processing Elements), and lots of on-chip SRAM. The key to successful AI acceleration is to effectively use the data transferred from off-chip DRAM to the on-chip SRAM. Most prior studies optimize the use of on-chip SRAM for a single convolution layer, they tend to ignore the opportunity of inter-layer data reuse. We have proposed an algorithm to schedule two adjacent layers of CNN operations. Our goal is to reduce traffic between DRAM and local memory more than allocating the buffer to only a single layer. Our cross-layer scheduling effectively reduces the memory traffic. We hav also verified the validity of our memory traffic reduction model on the Gemmini simulator from UC Berkeley. ? 2021 ACM.Computer aided designScheduling algorithmsStatic random access storageAdjacent layersData reuseInter-layersIntra-layerLayer dataLayer transformationLocal memoryOff-chipOn chipsProcessing elementsDynamic random access storageIntra- And Inter- Layer Transformation to Reduce Memory Traffic for CNN Computationconference paper10.1145/3458744.34733532-s2.0-85115975157