Efficient Dual Batch Size Deep Learning for Distributed Parameter Server Systems

Wu, Jan Jan

doi:10.1109/COMPSAC54236.2022.00110

Efficient Dual Batch Size Deep Learning for Distributed Parameter Server Systems

Journal

Proceedings - 2022 IEEE 46th Annual Computers, Software, and Applications Conference, COMPSAC 2022

ISBN

9781665488105

Date Issued

2022-01-01

Author(s)

Lu, Kuan Wei

PANGFENG LIU

Hong, Ding Yong

Wu, Jan Jan

DOI

10.1109/COMPSAC54236.2022.00110

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/632602

URL

https://api.elsevier.com/content/abstract/scopus_id/85136932825

Abstract

Distributed machine learning is essential for applying deep learning models with many data and parameters. Current researches on distributed machine learning focus on using more hardware devices powerful computing units for fast training. Consequently, the model training prefers a larger batch size to accelerate the training speed. However, the large batch training often suffers from poor accuracy due to poor generalization ability. Researchers have come up with many sophisticated methods to address this accuracy issue due to large batch sizes. These methods usually have complex mechanisms, thus making training more difficult. In addition, powerful training hardware for large batch sizes is expensive, and not all researchers can afford it. We propose a dual batch size learning scheme to address the batch size issue. We use the maximum batch size of our hardware for maximum training efficiency we can afford. In addition, we introduce a smaller batch size during the training to improve the model generalization ability. Using two different batch sizes in the same training simultaneously will reduce the testing loss and obtain a good generalization ability, with only a slight increase in the training time. We implement our dual batch size learning scheme and conduct experiments. By increasing 5% of the training time, we can reduce the loss from 1.429 to 1.246 in some cases. In addition, by appropriately adjusting the percentage of large and small batch sizes, we can increase the accuracy by 2.8% in some cases. With the additional 10% increase in training time, we can reduce the loss from 1.429 to 1.193. And after moderately adjusting the number of large batches and small batches used by GPUs, the accuracy can increase by 2.9%. Using two different batch sizes in the same training introduces two complications. First, the data processing speeds for two different batch sizes are different, so we must assign the data proportionally to maximize the overall processing speed. In addition, since the smaller batches will see fewer data due to the overall processing speed consideration, we proportionally adjust their contribution towards the global weight update in the parameter server. We use the ratio of data between the small and large batches to adjust the contribution. Experimental results indicate that this contribution adjustment increases the final accuracy by another 0.9%.

Subjects

batch size | deep neural networks | distributed learning | parameter server

Type

conference paper

Efficient Dual Batch Size Deep Learning for Distributed Parameter Server Systems

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)