随着高性能的多核处理器在客机航空电子系统中被应用,并且有向无环图(Directed Acyclic Graph,DAG)被用于模型功能依赖,本文研究运行在同构多处理器平台上的单周期非抢占式DAG,致力于减少DAG的完成时间并提供一个紧密而安全的界限,充分...随着高性能的多核处理器在客机航空电子系统中被应用,并且有向无环图(Directed Acyclic Graph,DAG)被用于模型功能依赖,本文研究运行在同构多处理器平台上的单周期非抢占式DAG,致力于减少DAG的完成时间并提供一个紧密而安全的界限,充分利用DAG拓扑节点的两个关键因素:并行性和依赖性.首先,引入了一个并发父子模型(Concurrent Parent and Children Model,CPCM),它精确地捕捉了上述两个因素,并且可以在解析DAG时递归地应用.在CPCM基础上,提出了一种新的调度方法减少最大完工时间,节点按以下顺序排列:1)关键路径;2)关键路径的早期前驱路径;3)更长的路径.其次,提出了一种新的响应时间分析,它为非关键节点的任何执行顺序提供了一个通用的界限,并为固定的执行顺序提供了一个特定的界限.实验表明该可调度性分析方法优于其他方法.展开更多
P k |fix| C max problem is a new scheduling problem based on the multiprocessor parallel job, and it is proved to be NP hard problem when k ≥3. This paper focuses on the case of k =3. Some new observations and new te...P k |fix| C max problem is a new scheduling problem based on the multiprocessor parallel job, and it is proved to be NP hard problem when k ≥3. This paper focuses on the case of k =3. Some new observations and new techniques for P 3 |fix| C max problem are offered. The concept of semi normal schedulings is introduced, and a very simple linear time algorithm Semi normal Algorithm for constructing semi normal schedulings is developed. With the method of the classical Graham List Scheduling, a thorough analysis of the optimal scheduling on a special instance is provided, which shows that the algorithm is an approximation algorithm of ratio of 9/8 for any instance of P 3|fix| C max problem, and improves the previous best ratio of 7/6 by M.X.Goemans.展开更多
Maintaining temporal consistency of real-time data is important for cyber-physical systems.Most of the previous studies focus on uniprocessor systems.In this paper,the problem of temporal consistency maintenance on mu...Maintaining temporal consistency of real-time data is important for cyber-physical systems.Most of the previous studies focus on uniprocessor systems.In this paper,the problem of temporal consistency maintenance on multiprocessor platforms with instance skipping was formulated based on the(m,k)-constrained model.A partitioned scheduling method SC-AD was proposed to solve the problem.SC-AD uses a derived sufficient schedulability condition to calculate the initial value of m for each sensor transaction.It then partitions the transactions among the processors in a balanced way.To further reduce the average relative invalid time of real-time data,SC-AD judiciously increases the values of m for transactions assigned to each processor.Experiment results show that SC-AD outperforms the baseline methods in terms of the average relative invalid time and the average valid ratio under different system workloads.展开更多
模型深度的不断增加和处理序列长度的不一致对循环神经网络在不同处理器上的性能优化提出巨大挑战。针对自主研制的长向量处理器FT-M7032,实现了一个高效的循环神经网络加速引擎。该引擎采用行优先矩阵向量乘算法和数据感知的多核并行方...模型深度的不断增加和处理序列长度的不一致对循环神经网络在不同处理器上的性能优化提出巨大挑战。针对自主研制的长向量处理器FT-M7032,实现了一个高效的循环神经网络加速引擎。该引擎采用行优先矩阵向量乘算法和数据感知的多核并行方式,提高矩阵向量乘的计算效率;采用两级内核融合优化方法降低临时数据传输的开销;采用手写汇编优化多种算子,进一步挖掘长向量处理器的性能潜力。实验表明,长向量处理器循环神经网络推理引擎可获得较高性能,相较于多核ARM CPU以及Intel Golden CPU,类循环神经网络模型长短记忆网络可获得最高62.68倍和3.12倍的性能加速。展开更多
文摘随着高性能的多核处理器在客机航空电子系统中被应用,并且有向无环图(Directed Acyclic Graph,DAG)被用于模型功能依赖,本文研究运行在同构多处理器平台上的单周期非抢占式DAG,致力于减少DAG的完成时间并提供一个紧密而安全的界限,充分利用DAG拓扑节点的两个关键因素:并行性和依赖性.首先,引入了一个并发父子模型(Concurrent Parent and Children Model,CPCM),它精确地捕捉了上述两个因素,并且可以在解析DAG时递归地应用.在CPCM基础上,提出了一种新的调度方法减少最大完工时间,节点按以下顺序排列:1)关键路径;2)关键路径的早期前驱路径;3)更长的路径.其次,提出了一种新的响应时间分析,它为非关键节点的任何执行顺序提供了一个通用的界限,并为固定的执行顺序提供了一个特定的界限.实验表明该可调度性分析方法优于其他方法.
文摘P k |fix| C max problem is a new scheduling problem based on the multiprocessor parallel job, and it is proved to be NP hard problem when k ≥3. This paper focuses on the case of k =3. Some new observations and new techniques for P 3 |fix| C max problem are offered. The concept of semi normal schedulings is introduced, and a very simple linear time algorithm Semi normal Algorithm for constructing semi normal schedulings is developed. With the method of the classical Graham List Scheduling, a thorough analysis of the optimal scheduling on a special instance is provided, which shows that the algorithm is an approximation algorithm of ratio of 9/8 for any instance of P 3|fix| C max problem, and improves the previous best ratio of 7/6 by M.X.Goemans.
基金Project(2020JJ4032)supported by the Hunan Provincial Natural Science Foundation of China。
文摘Maintaining temporal consistency of real-time data is important for cyber-physical systems.Most of the previous studies focus on uniprocessor systems.In this paper,the problem of temporal consistency maintenance on multiprocessor platforms with instance skipping was formulated based on the(m,k)-constrained model.A partitioned scheduling method SC-AD was proposed to solve the problem.SC-AD uses a derived sufficient schedulability condition to calculate the initial value of m for each sensor transaction.It then partitions the transactions among the processors in a balanced way.To further reduce the average relative invalid time of real-time data,SC-AD judiciously increases the values of m for transactions assigned to each processor.Experiment results show that SC-AD outperforms the baseline methods in terms of the average relative invalid time and the average valid ratio under different system workloads.
文摘模型深度的不断增加和处理序列长度的不一致对循环神经网络在不同处理器上的性能优化提出巨大挑战。针对自主研制的长向量处理器FT-M7032,实现了一个高效的循环神经网络加速引擎。该引擎采用行优先矩阵向量乘算法和数据感知的多核并行方式,提高矩阵向量乘的计算效率;采用两级内核融合优化方法降低临时数据传输的开销;采用手写汇编优化多种算子,进一步挖掘长向量处理器的性能潜力。实验表明,长向量处理器循环神经网络推理引擎可获得较高性能,相较于多核ARM CPU以及Intel Golden CPU,类循环神经网络模型长短记忆网络可获得最高62.68倍和3.12倍的性能加速。