This paper links parallel factor(PARAFAC) analysis to the problem of nominal direction-of-arrival(DOA) estimation for coherently distributed(CD) sources and proposes a fast PARAFACbased algorithm by establishing...This paper links parallel factor(PARAFAC) analysis to the problem of nominal direction-of-arrival(DOA) estimation for coherently distributed(CD) sources and proposes a fast PARAFACbased algorithm by establishing the trilinear PARAFAC model.Relying on the uniqueness of the low-rank three-way array decomposition and the trilinear alternating least squares regression, the proposed algorithm achieves nominal DOA estimation and outperforms the conventional estimation of signal parameter via rotational technique CD(ESPRIT-CD) and propagator method CD(PM-CD)methods in terms of estimation accuracy. Furthermore, by means of the initialization via the propagator method, this paper accelerates the convergence procedure of the proposed algorithm with no estimation performance degradation. In addition, the proposed algorithm can be directly applied to the multiple-source scenario,where sources have different angular distribution shapes. Numerical simulation results corroborate the effectiveness and superiority of the proposed fast PARAFAC-based algorithm.展开更多
The hybrid flow shop scheduling problem with unrelated parallel machine is a typical NP-hard combinatorial optimization problem, and it exists widely in chemical, manufacturing and pharmaceutical industry. In this wor...The hybrid flow shop scheduling problem with unrelated parallel machine is a typical NP-hard combinatorial optimization problem, and it exists widely in chemical, manufacturing and pharmaceutical industry. In this work, a novel mathematic model for the hybrid flow shop scheduling problem with unrelated parallel machine(HFSPUPM) was proposed. Additionally, an effective hybrid estimation of distribution algorithm was proposed to solve the HFSPUPM, taking advantage of the features in the mathematic model. In the optimization algorithm, a new individual representation method was adopted. The(EDA) structure was used for global search while the teaching learning based optimization(TLBO) strategy was used for local search. Based on the structure of the HFSPUPM, this work presents a series of discrete operations. Simulation results show the effectiveness of the proposed hybrid algorithm compared with other algorithms.展开更多
Current applications,consisting of multiple replicas,are packaged into lightweight containers with their execution dependencies.Considering the dominant impact of distribution efficiency of gigantic images on containe...Current applications,consisting of multiple replicas,are packaged into lightweight containers with their execution dependencies.Considering the dominant impact of distribution efficiency of gigantic images on container startup(e.g.,distributed deep learning application),the image“warm-up”technique which prefetches images of these replicas to destination nodes in the cluster is proposed.However,the current image“warm-up”technique solely focuses on identical image distribution,which fails to take effect when distributing different images to destination nodes.To address this problem,this paper proposes Hound,a simple but efficient cluster image distribution system based on Docker.To support diverse image distribution requests of cluster nodes,Hound additionally adopts node-level parallelism(i.e.,downloading images to destination nodes in parallel)to further improve the efficiency of image distribution.The experimental results demonstrate Hound outperforms Docker,kubernetes container runtime interface(CRI-O),and Docker-compose in terms of image distribution performance when cluster nodes request different images.Moreover,the high scalability of Hound is evaluated in the scenario of ten nodes.展开更多
基于人类反馈的强化学习(reinforcement learning with human feedback,RLHF)作为当前大语言模型(large language models,LLMs)对齐的主流方法,其核心优化算法——近端策略优化(proximal policy optimization,PPO)却面临着显著的效率问...基于人类反馈的强化学习(reinforcement learning with human feedback,RLHF)作为当前大语言模型(large language models,LLMs)对齐的主流方法,其核心优化算法——近端策略优化(proximal policy optimization,PPO)却面临着显著的效率问题.PPO由生成、推理、训练3个相互关联的阶段组成,各个阶段有着不同的计算特性.然而,现有的RLHF并行框架采用相同并行策略顺序执行PPO的所有阶段,这导致以下2个问题:其一,生成阶段不能充分利用计算资源,进而影响整体效率;其二,阶段间严格串行执行,未能充分利用潜在并行性.针对上述问题,提出了一个新型RLHF并行框架——Pipe-RLHF.该框架能够自适应地根据各阶段的计算特征确定最优并行策略,突破现有阶段串行范式,采用异步PPO算法发掘阶段间的并行性.具体而言,创新性地提出了适用于PPO生成阶段的延迟批间流水线并行方法,显著提升了该阶段的计算资源利用率;再次,使用异步PPO解放阶段间的依赖关系,将阶段间并行应用到PPO的加速上;最后,针对PPO算法的整体优化,构建了分层并行策略空间,并提出了一套优化算法以实现该空间中的最优解搜索.通过在多个大语言模型上的性能评估实验表明,相较于现有方法,Pipe-RLHF最高可实现3.7倍的加速比,充分验证了该框架的有效性和优越性.展开更多
基金supported by the National Natural Science Foundation of China(6137116961601167)+2 种基金the Jiangsu Natural Science Foundation(BK20161489)the open research fund of State Key Laboratory of Millimeter Waves,Southeast University(K201826)the Fundamental Research Funds for the Central Universities(NE2017103)
文摘This paper links parallel factor(PARAFAC) analysis to the problem of nominal direction-of-arrival(DOA) estimation for coherently distributed(CD) sources and proposes a fast PARAFACbased algorithm by establishing the trilinear PARAFAC model.Relying on the uniqueness of the low-rank three-way array decomposition and the trilinear alternating least squares regression, the proposed algorithm achieves nominal DOA estimation and outperforms the conventional estimation of signal parameter via rotational technique CD(ESPRIT-CD) and propagator method CD(PM-CD)methods in terms of estimation accuracy. Furthermore, by means of the initialization via the propagator method, this paper accelerates the convergence procedure of the proposed algorithm with no estimation performance degradation. In addition, the proposed algorithm can be directly applied to the multiple-source scenario,where sources have different angular distribution shapes. Numerical simulation results corroborate the effectiveness and superiority of the proposed fast PARAFAC-based algorithm.
基金Projects(61573144,61773165,61673175,61174040)supported by the National Natural Science Foundation of ChinaProject(222201717006)supported by the Fundamental Research Funds for the Central Universities,China
文摘The hybrid flow shop scheduling problem with unrelated parallel machine is a typical NP-hard combinatorial optimization problem, and it exists widely in chemical, manufacturing and pharmaceutical industry. In this work, a novel mathematic model for the hybrid flow shop scheduling problem with unrelated parallel machine(HFSPUPM) was proposed. Additionally, an effective hybrid estimation of distribution algorithm was proposed to solve the HFSPUPM, taking advantage of the features in the mathematic model. In the optimization algorithm, a new individual representation method was adopted. The(EDA) structure was used for global search while the teaching learning based optimization(TLBO) strategy was used for local search. Based on the structure of the HFSPUPM, this work presents a series of discrete operations. Simulation results show the effectiveness of the proposed hybrid algorithm compared with other algorithms.
基金supported by the National Natural Science Foundation of China(61872423)Industry Prospective Primary Research&Development Plan of Jiangsu Province(BE2017111)+1 种基金the Scientific Research Foundation of the Higher Education Institutions of Jiangsu Province(19KJA180006)the Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX20_0764)。
文摘Current applications,consisting of multiple replicas,are packaged into lightweight containers with their execution dependencies.Considering the dominant impact of distribution efficiency of gigantic images on container startup(e.g.,distributed deep learning application),the image“warm-up”technique which prefetches images of these replicas to destination nodes in the cluster is proposed.However,the current image“warm-up”technique solely focuses on identical image distribution,which fails to take effect when distributing different images to destination nodes.To address this problem,this paper proposes Hound,a simple but efficient cluster image distribution system based on Docker.To support diverse image distribution requests of cluster nodes,Hound additionally adopts node-level parallelism(i.e.,downloading images to destination nodes in parallel)to further improve the efficiency of image distribution.The experimental results demonstrate Hound outperforms Docker,kubernetes container runtime interface(CRI-O),and Docker-compose in terms of image distribution performance when cluster nodes request different images.Moreover,the high scalability of Hound is evaluated in the scenario of ten nodes.
文摘基于人类反馈的强化学习(reinforcement learning with human feedback,RLHF)作为当前大语言模型(large language models,LLMs)对齐的主流方法,其核心优化算法——近端策略优化(proximal policy optimization,PPO)却面临着显著的效率问题.PPO由生成、推理、训练3个相互关联的阶段组成,各个阶段有着不同的计算特性.然而,现有的RLHF并行框架采用相同并行策略顺序执行PPO的所有阶段,这导致以下2个问题:其一,生成阶段不能充分利用计算资源,进而影响整体效率;其二,阶段间严格串行执行,未能充分利用潜在并行性.针对上述问题,提出了一个新型RLHF并行框架——Pipe-RLHF.该框架能够自适应地根据各阶段的计算特征确定最优并行策略,突破现有阶段串行范式,采用异步PPO算法发掘阶段间的并行性.具体而言,创新性地提出了适用于PPO生成阶段的延迟批间流水线并行方法,显著提升了该阶段的计算资源利用率;再次,使用异步PPO解放阶段间的依赖关系,将阶段间并行应用到PPO的加速上;最后,针对PPO算法的整体优化,构建了分层并行策略空间,并提出了一套优化算法以实现该空间中的最优解搜索.通过在多个大语言模型上的性能评估实验表明,相较于现有方法,Pipe-RLHF最高可实现3.7倍的加速比,充分验证了该框架的有效性和优越性.