摘要
构建深度学习云平台的基础技术Kubernetes在调度过程中使用负载均衡策略,易产生资源碎片,增加任务等待时间。Kubernetes没有考虑对GPU等外部拓展资源的评分,不能很好适应深度学习云平台的业务场景。针对以上问题提出一种负载饱和调度策略,对Kubernetes调度过程进行改进,减少资源碎片,提高资源利用率。该策略支持对用户指定的外部拓展资源进行评分,能更好适应深度学习云平台业务场景。实验结果表明,负载饱和调度策略能够减少23.40%的任务等待时间,并能将GPU利用率提升14.15%,GPU显存利用率提升6.85%。
Kubernetes was a primary technology in building deep learning cloud platform.While using load balancing strategy in the scheduling process,it would generate resource fragments,increase the task waiting time.Without considering the scoring of external expansion resources such as GPU,Kubernetes could not adapt to the business scenario of deep learning cloud platform.Aiming at the problems,a load saturation scheduling strategy was proposed to improve the Kubernetes scheduling process,which could reduce resource fragmentation and improve resource utilization.The policy also supported scoring the external expansion resources specified by users,which could adapt to the business scenario of deep learning cloud platform.The experimental results showed that the load saturation scheduling policy could reduce the task waiting time by 23.40%,and increase the GPU utilization and the GPU memory utilization by 14.15%,and 6.85%,repectively.
作者
王彬丞
王平辉
武文博
王壮
王斌
丛鹏宇
WANG Bincheng;WANG Pinghui;WU Wenbo;WANG Zhuang;WANG Bin;CONG Pengyu(Department of Automation Science and Engineering,Xi′an Jiaotong University,Xi′an 710049,China;China Mobile Communication Corporation Research Institute,Beijing 100053,China)
出处
《郑州大学学报(理学版)》
CAS
北大核心
2024年第4期21-27,共7页
Journal of Zhengzhou University:Natural Science Edition
基金
教育部-中国移动人工智能建设项目(MCM20190701)。
作者简介
第一作者:王彬丞(1998-),男,硕士研究生,主要从事Kubernetes调度、GPU调度研究,E-mail:wbc6080@163.com;通信作者:王平辉(1984-),男,教授,主要从事移动互联网安全研究,E-mail:phwang@mail.xjtu.edu.cn。