In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these met...In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these methods are discussed. A-stable real-time parallel formula of two-stage third-order and A(α)-stable real-time parallel formula with o ≈ 89.96° of three-stage fourth-order are particularly given. The numerical simulation experiments in parallel environment show that the class of algorithms is efficient and applicable, with greater speedup.展开更多
In this paper, it is supposed that the B&B algorithm finds the first optimal solution after h nodes have been expanded and m active nodes have been created in the state-space tree. Then the lower bound Ω(m+h log ...In this paper, it is supposed that the B&B algorithm finds the first optimal solution after h nodes have been expanded and m active nodes have been created in the state-space tree. Then the lower bound Ω(m+h log h) of the running time for the general sequential B&B algorithm and the lower bound Ω(m/p+h log p) for the general parallel best-first B&B algorithm in PRAM-CREW are proposed, where p is the number of processors available. Moreover, the lower bound Ω(M/p+H+(H/p) log (H/p)) is presented for the parallel algorithms on distributed memory system, where M and H represent total number of the active nodes and that of the expanded nodes processed by p processors, respectively. In addition, a nearly fastest general parallel best-first B&B algorithm is put forward. The parallel algorithm is the fastest one as p = max{hε, r}, where ε = 1/ rootlogh, and r is the largest branch number of the nodes in the state-space tree.展开更多
For a large-scale adaptive array, the heavy computational load and the high-rate data transmission are two challenges in the implementation of an adaptive digital beamforming system. An efficient parallel digital beam...For a large-scale adaptive array, the heavy computational load and the high-rate data transmission are two challenges in the implementation of an adaptive digital beamforming system. An efficient parallel digital beamforming (DBF) algorithm based on the least mean square algorithm (PLMS) is proposed. An appropriate method is found to partition the least mean square (LMS) algorithm into a number of operational modules, which can be easily executed in a distributed-parallel-processing fashion. As a result, the proposed PLMS algorithm provides an effective solution that can alleviate the bottleneck of high-rate data transmission and reduce the computational cost. PLMS requires less computational load than that of the conventional parallel algorithms based on the recursive least square (RLS) algorithm, as well as it is easier to be implemented to do real time adaptive array processing. Moreover, low sidelobe of the beam pattern is obtained by constraining the static steering vector with Tschebyscheff coefficients. Finally, a scheme of the PLMS algorithm using distributed-parallel-processing system is also proposed. The simulation results demonstrate that the PLMS algorithm has the same interference cancellation performance as that of the conventional LMS algorithm. Moreover, the PLMS algorithm can obtain the same good beamforming performance, regardless how the algorithm is partitioned. It is expected that the proposed algorithm will be used in a large-scale adaptive array system to deal with real time adaptive digital beamforming processing.展开更多
Considering premature convergence in the searching process of genetic algorithm, a chaotic migration-based pseudo parallel genetic algorithm (CMPPGA) is proposed, which applies the idea of isolated evolution and infor...Considering premature convergence in the searching process of genetic algorithm, a chaotic migration-based pseudo parallel genetic algorithm (CMPPGA) is proposed, which applies the idea of isolated evolution and information exchanging in distributed Parallel Genetic Algorithm by serial program structure to solve optimization problem of low real-time demand. In this algorithm, asynchronic migration of individuals during parallel evolution is guided by a chaotic migration sequence. Information exchanging among sub-populations is ensured to be efficient and sufficient due to that the sequence is ergodic and stochastic. Simulation study of CMPPGA shows its strong global search ability, superiority to standard genetic algorithm and high immunity against premature convergence. According to the practice of raw material supply, an inventory programming model is set up and solved by CMPPGA with satisfactory results returned.展开更多
A general and efficient parallel approach is proposed for the first time to parallelize the hybrid finiteelement-boundary-integral-multi-level fast multipole algorithm (FE-BI-MLFMA). Among many algorithms of FE-BI-M...A general and efficient parallel approach is proposed for the first time to parallelize the hybrid finiteelement-boundary-integral-multi-level fast multipole algorithm (FE-BI-MLFMA). Among many algorithms of FE-BI-MLFMA, the decomposition algorithm (DA) is chosen as a basis for the parallelization of FE-BI-MLFMA because of its distinct numerical characteristics suitable for parallelization. On the basis of the DA, the parallelization of FE-BI-MLFMA is carried out by employing the parallelized multi-frontal method for the matrix from the finiteelement method and the parallelized MLFMA for the matrix from the boundary integral method respectively. The programming and numerical experiments of the proposed parallel approach are carried out in the high perfor- mance computing platform CEMS-Liuhui. Numerical experiments demonstrate that FE-BI-MLFMA is efficiently parallelized and its computational capacity is greatly improved without losing accuracy, efficiency, and generality.展开更多
P k |fix| C max problem is a new scheduling problem based on the multiprocessor parallel job, and it is proved to be NP hard problem when k ≥3. This paper focuses on the case of k =3. Some new observations and new te...P k |fix| C max problem is a new scheduling problem based on the multiprocessor parallel job, and it is proved to be NP hard problem when k ≥3. This paper focuses on the case of k =3. Some new observations and new techniques for P 3 |fix| C max problem are offered. The concept of semi normal schedulings is introduced, and a very simple linear time algorithm Semi normal Algorithm for constructing semi normal schedulings is developed. With the method of the classical Graham List Scheduling, a thorough analysis of the optimal scheduling on a special instance is provided, which shows that the algorithm is an approximation algorithm of ratio of 9/8 for any instance of P 3|fix| C max problem, and improves the previous best ratio of 7/6 by M.X.Goemans.展开更多
The method of establishing data structures plays an important role in the efficiency of parallel multilevel fast multipole algorithm(PMLFMA).Considering the main complements of multilevel fast multipole algorithm(M...The method of establishing data structures plays an important role in the efficiency of parallel multilevel fast multipole algorithm(PMLFMA).Considering the main complements of multilevel fast multipole algorithm(MLFMA) memory,a new parallelization strategy and a modified data octree construction scheme are proposed to further reduce communication in order to improve parallel efficiency.For far interaction,a new scheme called dynamic memory allocation is developed.To analyze the workload balancing performance of a parallel implementation,the original concept of workload balancing factor is introduced and verified by numerical examples.Numerical results show that the above measures improve the parallel efficiency and are suitable for the analysis of electrical large-scale scattering objects.展开更多
As a typical representative of the NP-complete problem, the traveling salesman problem(TSP) is widely utilized in computer networks, logistics distribution, and other fields. In this paper, a discrete lion swarm optim...As a typical representative of the NP-complete problem, the traveling salesman problem(TSP) is widely utilized in computer networks, logistics distribution, and other fields. In this paper, a discrete lion swarm optimization(DLSO) algorithm is proposed to solve the TSP. Firstly, we introduce discrete coding and order crossover operators in DLSO. Secondly, we use the complete 2-opt(C2-opt) algorithm to enhance the local search ability.Then in order to enhance the efficiency of the algorithm, a parallel discrete lion swarm optimization(PDLSO) algorithm is proposed.The PDLSO has multiple populations, and each sub-population independently runs the DLSO algorithm in parallel. We use the ring topology to transfer information between sub-populations. Experiments on some benchmarks TSP problems show that the DLSO algorithm has a better accuracy than other algorithms, and the PDLSO algorithm can effectively shorten the running time.展开更多
In order to overcome the shortcoming of the classical Hungarian algorithm that it can only solve the problems where the total cost is the sum of that of each job, an improved Hungarian algorithm is proposed and used t...In order to overcome the shortcoming of the classical Hungarian algorithm that it can only solve the problems where the total cost is the sum of that of each job, an improved Hungarian algorithm is proposed and used to solve the assignment problem of serial-parallel systems. First of all, by replacing parallel jobs with virtual jobs, the proposed algorithm converts the serial-parallel system into a pure serial system, where the classical Hungarian algorithm can be used to generate a temporal assignment plan via optimization. Afterwards, the assignment plan is validated by checking whether the virtual jobs can be realized by real jobs through local searching. If the assignment plan is not valid, the converted system will be adapted by adjusting the parameters of virtual jobs, and then be optimized again. Through iterative searching, the valid optimal assignment plan can eventually be obtained.To evaluate the proposed algorithm, the valid optimal assignment plan is applied to labor allocation of a manufacturing system which is a typical serial-parallel system.展开更多
A class of nonidentical parallel machine scheduling problems are considered in which the goal is to minimize the total weighted completion time. Models and relaxations are collected. Most of these problems are NP-hard...A class of nonidentical parallel machine scheduling problems are considered in which the goal is to minimize the total weighted completion time. Models and relaxations are collected. Most of these problems are NP-hard, in the strong sense, or open problems, therefore approximation algorithms are studied. The review reveals that there exist some potential areas worthy of further research.展开更多
针对现有的波达方向(direction of arrival,DOA)估计方法在低信噪比、小快拍、多信源条件下估计精度较低的问题,提出一种基于并行坐标下降算法的DOA估计方法.首先,对空域等角度均匀划分,构造超完备冗余字典;其次,采用并行坐标下降算法...针对现有的波达方向(direction of arrival,DOA)估计方法在低信噪比、小快拍、多信源条件下估计精度较低的问题,提出一种基于并行坐标下降算法的DOA估计方法.首先,对空域等角度均匀划分,构造超完备冗余字典;其次,采用并行坐标下降算法的思想对稀疏信号进行重构,得到信号在空域的稀疏系数矩阵;最后,将稀疏矩阵行向量的l2-范数映射到空域网格上,得到准确的DOA估计值.仿真实验结果表明:在低信噪比、小快拍、多信源条件下,该方法优于子空间类算法、贪婪类算法以及凸优化类算法,具有更低的均方根误差(RMSE)、更高的DOA估计精度和运行效率.展开更多
基金This project was supported by the National Natural Science Foundation of China (No. 19871080).
文摘In this paper a class of real-time parallel modified Rosenbrock methods of numerical simulation is constructed for stiff dynamic systems on a multiprocessor system, and convergence and numerical stability of these methods are discussed. A-stable real-time parallel formula of two-stage third-order and A(α)-stable real-time parallel formula with o ≈ 89.96° of three-stage fourth-order are particularly given. The numerical simulation experiments in parallel environment show that the class of algorithms is efficient and applicable, with greater speedup.
基金This paper was supported by Ph. D. Foundation of State Education Commission of China.
文摘In this paper, it is supposed that the B&B algorithm finds the first optimal solution after h nodes have been expanded and m active nodes have been created in the state-space tree. Then the lower bound Ω(m+h log h) of the running time for the general sequential B&B algorithm and the lower bound Ω(m/p+h log p) for the general parallel best-first B&B algorithm in PRAM-CREW are proposed, where p is the number of processors available. Moreover, the lower bound Ω(M/p+H+(H/p) log (H/p)) is presented for the parallel algorithms on distributed memory system, where M and H represent total number of the active nodes and that of the expanded nodes processed by p processors, respectively. In addition, a nearly fastest general parallel best-first B&B algorithm is put forward. The parallel algorithm is the fastest one as p = max{hε, r}, where ε = 1/ rootlogh, and r is the largest branch number of the nodes in the state-space tree.
文摘For a large-scale adaptive array, the heavy computational load and the high-rate data transmission are two challenges in the implementation of an adaptive digital beamforming system. An efficient parallel digital beamforming (DBF) algorithm based on the least mean square algorithm (PLMS) is proposed. An appropriate method is found to partition the least mean square (LMS) algorithm into a number of operational modules, which can be easily executed in a distributed-parallel-processing fashion. As a result, the proposed PLMS algorithm provides an effective solution that can alleviate the bottleneck of high-rate data transmission and reduce the computational cost. PLMS requires less computational load than that of the conventional parallel algorithms based on the recursive least square (RLS) algorithm, as well as it is easier to be implemented to do real time adaptive array processing. Moreover, low sidelobe of the beam pattern is obtained by constraining the static steering vector with Tschebyscheff coefficients. Finally, a scheme of the PLMS algorithm using distributed-parallel-processing system is also proposed. The simulation results demonstrate that the PLMS algorithm has the same interference cancellation performance as that of the conventional LMS algorithm. Moreover, the PLMS algorithm can obtain the same good beamforming performance, regardless how the algorithm is partitioned. It is expected that the proposed algorithm will be used in a large-scale adaptive array system to deal with real time adaptive digital beamforming processing.
文摘Considering premature convergence in the searching process of genetic algorithm, a chaotic migration-based pseudo parallel genetic algorithm (CMPPGA) is proposed, which applies the idea of isolated evolution and information exchanging in distributed Parallel Genetic Algorithm by serial program structure to solve optimization problem of low real-time demand. In this algorithm, asynchronic migration of individuals during parallel evolution is guided by a chaotic migration sequence. Information exchanging among sub-populations is ensured to be efficient and sufficient due to that the sequence is ergodic and stochastic. Simulation study of CMPPGA shows its strong global search ability, superiority to standard genetic algorithm and high immunity against premature convergence. According to the practice of raw material supply, an inventory programming model is set up and solved by CMPPGA with satisfactory results returned.
文摘A general and efficient parallel approach is proposed for the first time to parallelize the hybrid finiteelement-boundary-integral-multi-level fast multipole algorithm (FE-BI-MLFMA). Among many algorithms of FE-BI-MLFMA, the decomposition algorithm (DA) is chosen as a basis for the parallelization of FE-BI-MLFMA because of its distinct numerical characteristics suitable for parallelization. On the basis of the DA, the parallelization of FE-BI-MLFMA is carried out by employing the parallelized multi-frontal method for the matrix from the finiteelement method and the parallelized MLFMA for the matrix from the boundary integral method respectively. The programming and numerical experiments of the proposed parallel approach are carried out in the high perfor- mance computing platform CEMS-Liuhui. Numerical experiments demonstrate that FE-BI-MLFMA is efficiently parallelized and its computational capacity is greatly improved without losing accuracy, efficiency, and generality.
文摘P k |fix| C max problem is a new scheduling problem based on the multiprocessor parallel job, and it is proved to be NP hard problem when k ≥3. This paper focuses on the case of k =3. Some new observations and new techniques for P 3 |fix| C max problem are offered. The concept of semi normal schedulings is introduced, and a very simple linear time algorithm Semi normal Algorithm for constructing semi normal schedulings is developed. With the method of the classical Graham List Scheduling, a thorough analysis of the optimal scheduling on a special instance is provided, which shows that the algorithm is an approximation algorithm of ratio of 9/8 for any instance of P 3|fix| C max problem, and improves the previous best ratio of 7/6 by M.X.Goemans.
基金supported by the National Basic Research Program of China (973 Program) (61320)
文摘The method of establishing data structures plays an important role in the efficiency of parallel multilevel fast multipole algorithm(PMLFMA).Considering the main complements of multilevel fast multipole algorithm(MLFMA) memory,a new parallelization strategy and a modified data octree construction scheme are proposed to further reduce communication in order to improve parallel efficiency.For far interaction,a new scheme called dynamic memory allocation is developed.To analyze the workload balancing performance of a parallel implementation,the original concept of workload balancing factor is introduced and verified by numerical examples.Numerical results show that the above measures improve the parallel efficiency and are suitable for the analysis of electrical large-scale scattering objects.
基金supported by the National Natural Science Foundation of China(61771293)the Key Project of Shangdong Province(2019JZZY010111)。
文摘As a typical representative of the NP-complete problem, the traveling salesman problem(TSP) is widely utilized in computer networks, logistics distribution, and other fields. In this paper, a discrete lion swarm optimization(DLSO) algorithm is proposed to solve the TSP. Firstly, we introduce discrete coding and order crossover operators in DLSO. Secondly, we use the complete 2-opt(C2-opt) algorithm to enhance the local search ability.Then in order to enhance the efficiency of the algorithm, a parallel discrete lion swarm optimization(PDLSO) algorithm is proposed.The PDLSO has multiple populations, and each sub-population independently runs the DLSO algorithm in parallel. We use the ring topology to transfer information between sub-populations. Experiments on some benchmarks TSP problems show that the DLSO algorithm has a better accuracy than other algorithms, and the PDLSO algorithm can effectively shorten the running time.
文摘In order to overcome the shortcoming of the classical Hungarian algorithm that it can only solve the problems where the total cost is the sum of that of each job, an improved Hungarian algorithm is proposed and used to solve the assignment problem of serial-parallel systems. First of all, by replacing parallel jobs with virtual jobs, the proposed algorithm converts the serial-parallel system into a pure serial system, where the classical Hungarian algorithm can be used to generate a temporal assignment plan via optimization. Afterwards, the assignment plan is validated by checking whether the virtual jobs can be realized by real jobs through local searching. If the assignment plan is not valid, the converted system will be adapted by adjusting the parameters of virtual jobs, and then be optimized again. Through iterative searching, the valid optimal assignment plan can eventually be obtained.To evaluate the proposed algorithm, the valid optimal assignment plan is applied to labor allocation of a manufacturing system which is a typical serial-parallel system.
基金the National Natural Science Foundation of China (70631003)the Hefei University of Technology Foundation (071102F).
文摘A class of nonidentical parallel machine scheduling problems are considered in which the goal is to minimize the total weighted completion time. Models and relaxations are collected. Most of these problems are NP-hard, in the strong sense, or open problems, therefore approximation algorithms are studied. The review reveals that there exist some potential areas worthy of further research.
文摘针对现有的波达方向(direction of arrival,DOA)估计方法在低信噪比、小快拍、多信源条件下估计精度较低的问题,提出一种基于并行坐标下降算法的DOA估计方法.首先,对空域等角度均匀划分,构造超完备冗余字典;其次,采用并行坐标下降算法的思想对稀疏信号进行重构,得到信号在空域的稀疏系数矩阵;最后,将稀疏矩阵行向量的l2-范数映射到空域网格上,得到准确的DOA估计值.仿真实验结果表明:在低信噪比、小快拍、多信源条件下,该方法优于子空间类算法、贪婪类算法以及凸优化类算法,具有更低的均方根误差(RMSE)、更高的DOA估计精度和运行效率.