面向深度学习作业的干扰感知在线调度算法研究

OASIS:An interference-aware online scheduling algorithm for deep learning jobs

在线阅读下载PDF

导出

摘要由于GPU可以加速深度学习作业的处理,许多研究人员通过提高GPU利用率来达到减少作业完成时间的目的。与传统的作业独占GPU资源来减少作业完成时间不同,考虑了多个作业共置的问题(即同一个GPU中同时执行多个作业能有效提高GPU利用率并减少作业完成时间),提出了一种面向深度学习作业的干扰感知在线调度算法(OASIS)。该算法首先在作业共置的情况下,使用改进的机器学习方法构建了作业所需资源的预测模型。其次,为了计算作业间干扰值,设计了一种作业组合模型,通过模型计算的干扰值来主动修改作业调度策略以避免无效调度,达到减少作业完成时间的目的。最后,在真实环境中部署了实验,实验结果表明:提出的OASIS算法与经典的FCFS算法、MBP算法和SJF算法相比,不仅平均作业总体完成时间缩短了5.7%,而且平均能耗降低了4.0%,验证结果充分说明了该算法的有效性和优越性。 Since GPU can accelerate the processing of deep learning jobs,many researchers aim to reduce job completion time by improving GPU utilization.Different from the traditional approach of dedicating GPU resources to a single job to reduce completion time,this paper considers the issue of job colocation(i.e.,executing multiple jobs simultaneously on the same GPU to effectively improve GPU utilization and reduce job completion time)and proposes an interference-aware online scheduling algorithm for deep learning jobs(OASIS).This algorithm first uses an improved machine learning approach to construct a prediction model for the resources required by jobs in the context of job colocation.Then,to calculate the interference values between jobs,a job combination model is designed.The interference values calculated by this model are used to proactively adjust the job scheduling strategy to avoid ineffective scheduling,thereby reducing job completion time.Finally,experiments are deployed in a real-world environment,and the results show that compared to the classical FCFS,MBP,and SJF algorithms,the proposed OASIS algorithm not only reduces the average total job completion time by 5.7%,but also decreases the average energy consumption by 4.0%.These results fully demonstrate the effectiveness and superiority of the proposed algorithm.

作者敬超闭玉申 JING Chao;BI Yu-shen(College of Computer Science and Engineering,Guilin University of Technology,Guilin 541006;Guangxi Key Laboratory of Embedded Technology and Intelligent System,Guilin University of Technology,Guilin 541006,China)

机构地区桂林理工大学计算机科学与工程学院桂林理工大学广西嵌入式技术与智能系统重点实验室

出处《计算机工程与科学》 CSCD 北大核心 2024年第12期2138-2148,共11页 Computer Engineering & Science

基金国家自然科学基金(62362018)。

关键词深度学习干扰感知资源预测模型在线调度 deep learning interference-aware resource prediction model online scheduling

分类号 TP302 [自动化与计算机技术—计算机系统结构]

作者简介敬超(1983-),男,河南长葛人,博士,副教授,CCF会员(41171M),研究方向为高性能计算、节能调度和异构计算系统上的功耗预测。E-mail:jingchao@glut.edu.cn;闭玉申(1997-),男,广西南宁人,硕士生,研究方向为人工神经网络、功耗预测和异构计算系统的调度。E-mail:biyushen@glut.edu.cn。

引文网络
相关文献

1董爱强,胡学勇,于兴江,刘旭,戴发玉.超大规模计算平台-感知混合容器集群的高性能计算作业调度[J].自动化与仪器仪表,2024(10):60-64.
2唐宇皓,彭德中,袁钟.面向不完备混合数据的模糊多粒度异常检测[J].计算机应用,2024,44(10):3097-3104.
3张宇,邓杰,吴铁洲.基于模糊神经网络的锂离子电池组均衡策略研究[J].湖北工业大学学报,2024,39(5):20-24.
4李斌,崔宏阳.基于计算物流的自动化集装箱码头AGV生产调度[J].计算机应用研究,2024,41(6):1704-1713. 被引量：1
5曾俊海,廉胤东,彭雄峰,余锦伟.面向矩阵式制造车间AGV调度的改进模拟退火算法[J].自动化与信息工程,2024,45(5):32-39. 被引量：1
6胡小利,刘国明,黄展鹏,陈得实.FW-CADIS方法在临界事故报警系统设计中的应用[J].现代应用物理,2024,15(5):88-91.
7Kang Caiqi.Stitch by Stitch Xiamen bead embroidery inheritors integrate century-old history into contemporary culture[J].Beijing Review,2024,67(50):40-41.
8贺威,朱硕,江志刚,张华,王亚楠.基于PDEVS的柔性作业车间在线调度规则性能动态分析模型[J].现代制造工程,2024(12):16-26.
9孔瑶,姜鹏,詹常森.SPE-UV法快速测定蟾酥鲜浆中蟾毒灵、华蟾酥毒基、脂蟾毒配基的总含量[J].中国药品标准,2024,25(6):618-623.
10Meisam Gordan,Djibrilla Amadou Kountche,Daniel McCrum,Stefan Schauer,Sandra Konig,Shirley Delannoy,Lorcan Connolly,Mircea Iacob,Nicola Gregorio Durante,Yash Shekhawat,Carlos Carrasco,Takis Katsoulakos,Páraic Carroll.Protecting critical infrastructure against cascading effects:The PRECINCT approach[J].Resilient Cities and Structures,2024,3(3):1-19.

计算机工程与科学

2024年第12期

浏览历史

内容加载中请稍等...

面向深度学习作业的干扰感知在线调度算法研究

相关作者

相关机构

相关主题

浏览历史