Ting Y.-STeng Y.-FTZI-DAR CHIUEH2022-04-252022-04-25202102714310https://www.scopus.com/inward/record.uri?eid=2-s2.0-85109015226&doi=10.1109%2fISCAS51556.2021.9401434&partnerID=40&md5=c3f74c4ce115b8e9ff9cdf56149befb7https://scholars.lib.ntu.edu.tw/handle/123456789/607338In the training process of convolutional neural networks (CNN), a batch normalization (BN) layer is often inserted after a convolution layer to accelerate the convergence of CNN training. In this work, we propose a BN processor that supports both the training and inference processes. To speed up the training of CNN, the proposed work develops an efficient dataflow integrating a novel BN processor design and the processing elements for convolution acceleration. We exploited the similarities in the calculations required for the BN forward and backward passes by sharing hardware elements between both passes, therefore reducing the area overhead. In addition to functional verification of the BN processor, we also completed Automatic Placement & Routing (APR) and conducted post-APR simulation on neural network training. Finally, the proposed solution not only significantly speeds up the CNN training process, but also achieves hardware saving. ? 2021 IEEEAcceleratorBatch normalizationConvolutional neural networkHardwareTrainingConvolutionConvolutional neural networksIntegrated circuit designAutomatic placementConvolution neural networkForward-and-backwardFunctional verificationHardware elementsInference processNeural network trainingProcessing elementsMultilayer neural networks[SDGs]SDG3Batch normalization processor design for convolution neural network training and inferenceconference paper10.1109/ISCAS51556.2021.94014342-s2.0-85109015226