Hardware Acceleration in Large-Scale Tensor Decomposition for Neural Network Compression
Journal
Midwest Symposium on Circuits and Systems
Journal Volume
2022-August
ISBN
9781665402798
Date Issued
2022-01-01
Author(s)
Abstract
A tensor is a multi-dimensional array, which is embedded for neural networks. The multiply-accumulate (MAC) operations involved in a large-scale tensor introduces high computational complexity. Since the tensor usually features a low rank, the computational complexity can be largely reduced through canonical polyadic decomposition (CPD). This work presents an energy-efficient hardware accelerator that implements randomized CPD in large-scale tensors for neural network compression. A mixing method that combines the Walsh-Hadamard transform and discrete cosine transform is proposed to replace the fast Fourier transform with faster convergence. It reduces the computations for transformation by 83%. 75% of computations for solving the required least squares problem are also reduced. The proposed accelerator is flexible to support tensor decomposition with a size of up to 512\times 512\times 9\times 9. Compared to the prior dedicated processor for tensor computation, this work support larger tensors and achieves a 112\times lower latency given the same condition.
Subjects
canonical polyadic decomposition | hardware acceleration | Neural network compression | tensor decomposition
SDGs
Type
conference paper
