报告人:沈力
研究科学家
京东探索研究院
主持人:林宙辰 教授
必赢71886网址登录智能学院、必赢626net入口
时 间:2023/10/19 10:00 - 11:00
地 址:必赢71886网址登录昌平校区教学楼115教室 / 必赢71886网址登录燕园校区理科一号楼1801
腾讯会议:504-137-495
报告题目:On Efficient Training for Large-Scale Deep Learning Models
报告摘要:
The field of deep learning has witnessed significant developments in recent years. Specifically, the large-scale models trained on vast amounts of data hold immense promise for practical applications, enhancing industrial productivity. However, it suffers from the unstable training process, stringent requirements of computational resources, and underexplored convergence analysis, e.g., Adam, as one of the most influential adaptive stochastic algorithms for training deep neural networks, has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. In this talk, we systematically investigate the convergence theory and application of efficient training algorithms for pretraining large-scale deep learning models from the perspective of optimization. Specifically, (i) we derive the first easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of the Adam optimizer for the non-convex stochastic setting. This observation, coupled with this sufficient condition, gives much deeper interpretations on the divergence of Adam. (ii) We theoretically show that distributed Adam can be linearly accelerated by using a larger number of nodes. (iii) We propose a communication-efficient variant of distributed Adam, dubbed Efficient-Adam, by adopting bi-directional compression and error-compensation techniques to reduce the communication cost and reduce compression bias, respectively. (iv) We develop FedLADA, a novel momentum-based federated optimizer via utilizing the global gradient descent and locally adaptive amended optimizer, to tackle the client drifts exacerbated by local over-fitting with the local adaptive optimizer in federated learning.
报告人简介:
Li Shen is currently a research scientist at JD Explore Academy, Beijing, China. Previously, he was a senior researcher at Tencent AI Lab. He received his bachelor's degree and Ph.D. from the School of Mathematics, South China University of Technology. His research interests include theory and algorithms for nonsmooth convex and nonconvex optimization, and their applications in statistical machine learning, deep learning and reinforcement learning. He has published more than 60 papers in peer-reviewed top-tier journal papers (JMLR, IEEE TPAMI, IJCV, IEEE TSP, IEEE TIP, etc.) and conference papers (ICML, NeurIPS, ICLR, CVPR, ICCV, etc.). He has also served as the senior program committee for AAAI 2022, AAAI 2024 and area chair for ICPR 2022, ICPR 2024, ICLR 2024.