https://scholars.lib.ntu.edu.tw/handle/123456789/630476
標題: | Tensor Movement Orchestration in Multi-GPU Training Systems | 作者: | Lin, Shao Fu Chen, Yi Jung Cheng, Hsiang Yun CHIA-LIN YANG |
公開日期: | 1-一月-2023 | 卷: | 2023-February | 來源出版物: | Proceedings - International Symposium on High-Performance Computer Architecture | 摘要: | As deep neural network (DNN) models grow deeper and wider, one of the main challenges for training large-scale neural networks is overcoming limited GPU memory capacity. One common solution is to utilize the host memory as the external memory for swapping tensors in and out of GPU memory. However, the effectiveness of such tensor swapping can be impaired in data-parallel training systems due to contention on the shared PCIe channel to the host. In this paper, we propose the first large-model support framework that coordinates tensor movements among GPUs to alleviate PCIe channel contention. We design two types of coordination mechanisms. In the first mechanism, PCIe channel accesses from different GPUs are interleaved by selecting disjoint swapped-out tensors for each GPU. In the second method, swap commands are orchestrated to avoid contention. The effectiveness of these two methods depends on the model size and how often the GPUs synchronize on gradients. Experimental results show that compared to large-model support that is oblivious to channel contention, the proposed solution achieves average speedups of 38.3% to 31.8% when the memory footprint size is 1.33 to 2 times the GPU memory size. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/630476 | ISBN: | 9781665476522 | ISSN: | 15300897 | DOI: | 10.1109/HPCA56546.2023.10071043 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。