An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main pr...An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.展开更多
In this paper,we established a class of parallel algorithm for solving low-rank tensor completion problem.The main idea is that N singular value decompositions are implemented in N different processors for each slice ...In this paper,we established a class of parallel algorithm for solving low-rank tensor completion problem.The main idea is that N singular value decompositions are implemented in N different processors for each slice matrix under unfold operator,and then the fold operator is used to form the next iteration tensor such that the computing time can be decreased.In theory,we analyze the global convergence of the algorithm.In numerical experiment,the simulation data and real image inpainting are carried out.Experiment results show the parallel algorithm outperform its original algorithm in CPU times under the same precision.展开更多
四川大学计算机学院学生团队在大规模语言模型参数高效微调系统研究方向取得重要进展,其研究成果“mLoRA:Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs”在国际数据库学术会议VLDB 2025 Rese...四川大学计算机学院学生团队在大规模语言模型参数高效微调系统研究方向取得重要进展,其研究成果“mLoRA:Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs”在国际数据库学术会议VLDB 2025 Research Track正式发表。VLDB(International Conference on Very Large Data Bases)是数据库领域的重要国际学术会议之一,涵盖数据库管理系统、数据密集型系统与大规模数据处理等方向。该工作已在多个国内外互联网企业的实际生产环境中部署应用,并获得一项中国发明专利和一项美国发明专利的受理。展开更多
Due to the complex high-temperature characteristics of hydrocarbon fuel,the research on the long-term working process of parallel channel structure under variable working conditions,especially under high heat-mass rat...Due to the complex high-temperature characteristics of hydrocarbon fuel,the research on the long-term working process of parallel channel structure under variable working conditions,especially under high heat-mass ratio,has not been systematically carried out.In this paper,the heat transfer and flow characteristics of related high temperature fuels are studied by using typical engine parallel channel structure.Through numeri⁃cal simulation and systematic experimental verification,the flow and heat transfer characteristics of parallel chan⁃nels under typical working conditions are obtained,and the effectiveness of high-precision calculation method is preliminarily established.It is known that the stable time required for hot start of regenerative cooling engine is about 50 s,and the flow resistance of parallel channel structure first increases and then decreases with the in⁃crease of equivalence ratio(The following equivalence ratio is expressed byΦ),and there is a flow resistance peak in the range ofΦ=0.5~0.8.This is mainly caused by the coupling effect of high temperature physical proper⁃ties,flow rate and pressure of fuel in parallel channels.At the same time,the cooling and heat transfer character⁃istics of parallel channels under some conditions of high heat-mass ratio are obtained,and the main factors affect⁃ing the heat transfer of parallel channels such as improving surface roughness and strengthening heat transfer are mastered.In the experiment,whenΦis less than 0.9,the phenomenon of local heat transfer enhancement and deterioration can be obviously observed,and the temperature rise of local structures exceeds 200℃,which is the risk of structural damage.Therefore,the reliability of long-term parallel channel structure under the condition of high heat-mass ratio should be fully considered in structural design.展开更多
A spacecraft attitude estimation method based on electromagnetic vector sensors(EMVS)array is proposed,which employs the orthogonally constrained parallel factor(PARAFAC)algorithm and makes use of measurements of the ...A spacecraft attitude estimation method based on electromagnetic vector sensors(EMVS)array is proposed,which employs the orthogonally constrained parallel factor(PARAFAC)algorithm and makes use of measurements of the two-dimensional direction-of-arrival(2D-DOA)and polarization angles,aiming to address the issues of incomplete,asynchronous,and inaccurate third-party reference used for attitude estimation in spacecraft docking missions by employing the electromagnetic wave’s three-dimensional(3D)wave structure as a complete third-party reference.Comparative analysis with state-ofthe-art algorithms shows significant improvements in estimation accuracy and computational efficiency with this algorithm.Numerical simulations have verified the effectiveness and superiority of this method.A high-precision,reliable,and cost-effective method for rapid spacecraft attitude estimation is provided in this paper.展开更多
Installing the splitter plates is a passive aerodynamic solution for eliminating vortex-induced vibration (VIV). However, the influences of splitter plates on the VIV and aerostatic performances are more complicated d...Installing the splitter plates is a passive aerodynamic solution for eliminating vortex-induced vibration (VIV). However, the influences of splitter plates on the VIV and aerostatic performances are more complicated due to aerodynamic interference between highway and railway decks. To study the effects of splitter plates, wind tunnel experiments for measuring VIV and aerostatic forces of twin decks under two opposite flow directions were conducted, while the surrounding flow and wind pressure of static twin decks with and without splitter plates are numerically simulated. The results showed that the incoming flow direction affects the VIV response and aerostatic coefficients. The highway deck has poor vertical and torsional VIV, and the VIV region and amplitude are different under different directions. While the railway deck only has vertical VIV when located upstream. The splitter plates can impede the process of vortex generation, shedding and impinging at the gap between twin deck, and significantly reducing the surface fluctuating pressure coefficient, thus effectively suppressing the VIV of twin decks. While, the splitter plates hurt the upstream deck regarding static wind stability and have little effect on the downstream deck. The splitter plates of appropriate width are recommended to improve VIV performances in twin parallel bridges.展开更多
As commercial drone delivery becomes increasingly popular,the extension of the vehicle routing problem with drones(VRPD)is emerging as an optimization problem of inter-ests.This paper studies a variant of VRPD in mult...As commercial drone delivery becomes increasingly popular,the extension of the vehicle routing problem with drones(VRPD)is emerging as an optimization problem of inter-ests.This paper studies a variant of VRPD in multi-trip and multi-drop(VRP-mmD).The problem aims at making schedules for the trucks and drones such that the total travel time is minimized.This paper formulate the problem with a mixed integer program-ming model and propose a two-phase algorithm,i.e.,a parallel route construction heuristic(PRCH)for the first phase and an adaptive neighbor searching heuristic(ANSH)for the second phase.The PRCH generates an initial solution by con-currently assigning as many nodes as possible to the truck–drone pair to progressively reduce the waiting time at the rendezvous node in the first phase.Then the ANSH improves the initial solution by adaptively exploring the neighborhoods in the second phase.Numerical tests on some benchmark data are conducted to verify the performance of the algorithm.The results show that the proposed algorithm can found better solu-tions than some state-of-the-art methods for all instances.More-over,an extensive analysis highlights the stability of the pro-posed algorithm.展开更多
The heat transfer between two corresponding plates,disks,and concentric pipes has many applications,including water cleansing and lubrication.Furthermore,TiO_(2)-water-based nanofluids are used widely because it is us...The heat transfer between two corresponding plates,disks,and concentric pipes has many applications,including water cleansing and lubrication.Furthermore,TiO_(2)-water-based nanofluids are used widely because it is useful for operating and controlling the temperature,especially in photovoltaic technology and solar panels.Motivated by these applications,the current study is based on the nanoparticle aggregation effect on magnetohydrodynamics(MHD)flow via rotating parallel plates with the chemical reaction.To achieve maximum heat transportation,the Bruggeman model is used to adapt the Maxwell model.Also,melting and thermal radiation effects are considered in the modeling to discuss heat transport.The Runge-Kutta-Fehlberg 4th−5th order method is used to attain numerical solutions.The main focus of this study is to see the thermodynamic behavior considering several aspects of nanoparticle aggregation.The heat transfer rate between the parallel plates is enhanced by improving the thermophoresis,radiation,and Brownian motion parameters.The rise in Schmidt number and chemical reaction rate parameter decreases the concentration distribution.This study will be helpful in enhancing the thermal efficiency of photovoltaic technology in solar plates,water purifying,thermal management of electronic devices,designing effective cooling systems,and other sustainable technologies.展开更多
Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining perform...Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach.展开更多
近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive wri...近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用a(m+1)个处理器和O(n/a+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,+<11mna.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器.展开更多
本文提出了一种直接数字频率合成器(DDFS)的设计,以Parallel_CORDIC(COrdinate Rotation Digital Computer)算法模块替代传统的查找表方式,实现了相位与幅度的一一对应,输出相位完全正交的正余弦波形;同时应用旋转角度预测及4:2的进位...本文提出了一种直接数字频率合成器(DDFS)的设计,以Parallel_CORDIC(COrdinate Rotation Digital Computer)算法模块替代传统的查找表方式,实现了相位与幅度的一一对应,输出相位完全正交的正余弦波形;同时应用旋转角度预测及4:2的进位保存加法器(CSA)技术,将速度比传统CORDIC算法提高41.7%,精度提高到10-4.最后以Xilinx的FPGA硬件实现整个设计.展开更多
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
The method of establishing data structures plays an important role in the efficiency of parallel multilevel fast multipole algorithm(PMLFMA).Considering the main complements of multilevel fast multipole algorithm(M...The method of establishing data structures plays an important role in the efficiency of parallel multilevel fast multipole algorithm(PMLFMA).Considering the main complements of multilevel fast multipole algorithm(MLFMA) memory,a new parallelization strategy and a modified data octree construction scheme are proposed to further reduce communication in order to improve parallel efficiency.For far interaction,a new scheme called dynamic memory allocation is developed.To analyze the workload balancing performance of a parallel implementation,the original concept of workload balancing factor is introduced and verified by numerical examples.Numerical results show that the above measures improve the parallel efficiency and are suitable for the analysis of electrical large-scale scattering objects.展开更多
基金Project (50371026) supported by the National Natural Science Foundation of China
文摘An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.
基金Supported by National Nature Science Foundation(12371381)Nature Science Foundation of Shanxi(202403021222270)。
文摘In this paper,we established a class of parallel algorithm for solving low-rank tensor completion problem.The main idea is that N singular value decompositions are implemented in N different processors for each slice matrix under unfold operator,and then the fold operator is used to form the next iteration tensor such that the computing time can be decreased.In theory,we analyze the global convergence of the algorithm.In numerical experiment,the simulation data and real image inpainting are carried out.Experiment results show the parallel algorithm outperform its original algorithm in CPU times under the same precision.
文摘四川大学计算机学院学生团队在大规模语言模型参数高效微调系统研究方向取得重要进展,其研究成果“mLoRA:Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs”在国际数据库学术会议VLDB 2025 Research Track正式发表。VLDB(International Conference on Very Large Data Bases)是数据库领域的重要国际学术会议之一,涵盖数据库管理系统、数据密集型系统与大规模数据处理等方向。该工作已在多个国内外互联网企业的实际生产环境中部署应用,并获得一项中国发明专利和一项美国发明专利的受理。
文摘Due to the complex high-temperature characteristics of hydrocarbon fuel,the research on the long-term working process of parallel channel structure under variable working conditions,especially under high heat-mass ratio,has not been systematically carried out.In this paper,the heat transfer and flow characteristics of related high temperature fuels are studied by using typical engine parallel channel structure.Through numeri⁃cal simulation and systematic experimental verification,the flow and heat transfer characteristics of parallel chan⁃nels under typical working conditions are obtained,and the effectiveness of high-precision calculation method is preliminarily established.It is known that the stable time required for hot start of regenerative cooling engine is about 50 s,and the flow resistance of parallel channel structure first increases and then decreases with the in⁃crease of equivalence ratio(The following equivalence ratio is expressed byΦ),and there is a flow resistance peak in the range ofΦ=0.5~0.8.This is mainly caused by the coupling effect of high temperature physical proper⁃ties,flow rate and pressure of fuel in parallel channels.At the same time,the cooling and heat transfer character⁃istics of parallel channels under some conditions of high heat-mass ratio are obtained,and the main factors affect⁃ing the heat transfer of parallel channels such as improving surface roughness and strengthening heat transfer are mastered.In the experiment,whenΦis less than 0.9,the phenomenon of local heat transfer enhancement and deterioration can be obviously observed,and the temperature rise of local structures exceeds 200℃,which is the risk of structural damage.Therefore,the reliability of long-term parallel channel structure under the condition of high heat-mass ratio should be fully considered in structural design.
文摘A spacecraft attitude estimation method based on electromagnetic vector sensors(EMVS)array is proposed,which employs the orthogonally constrained parallel factor(PARAFAC)algorithm and makes use of measurements of the two-dimensional direction-of-arrival(2D-DOA)and polarization angles,aiming to address the issues of incomplete,asynchronous,and inaccurate third-party reference used for attitude estimation in spacecraft docking missions by employing the electromagnetic wave’s three-dimensional(3D)wave structure as a complete third-party reference.Comparative analysis with state-ofthe-art algorithms shows significant improvements in estimation accuracy and computational efficiency with this algorithm.Numerical simulations have verified the effectiveness and superiority of this method.A high-precision,reliable,and cost-effective method for rapid spacecraft attitude estimation is provided in this paper.
基金Projects(51925808,52078504,51822803) supported by the National Natural Science Foundation of ChinaProject(2022JJ10082) supported by the Natural Science Foundation of Hunan Province,China+1 种基金Project(N2022Z004) supported by the Research on Technology Development Trend and Key Common Problems in Railway,ChinaProject(Xplorer Prize 2021) supported by the Tencent Foundation,China。
文摘Installing the splitter plates is a passive aerodynamic solution for eliminating vortex-induced vibration (VIV). However, the influences of splitter plates on the VIV and aerostatic performances are more complicated due to aerodynamic interference between highway and railway decks. To study the effects of splitter plates, wind tunnel experiments for measuring VIV and aerostatic forces of twin decks under two opposite flow directions were conducted, while the surrounding flow and wind pressure of static twin decks with and without splitter plates are numerically simulated. The results showed that the incoming flow direction affects the VIV response and aerostatic coefficients. The highway deck has poor vertical and torsional VIV, and the VIV region and amplitude are different under different directions. While the railway deck only has vertical VIV when located upstream. The splitter plates can impede the process of vortex generation, shedding and impinging at the gap between twin deck, and significantly reducing the surface fluctuating pressure coefficient, thus effectively suppressing the VIV of twin decks. While, the splitter plates hurt the upstream deck regarding static wind stability and have little effect on the downstream deck. The splitter plates of appropriate width are recommended to improve VIV performances in twin parallel bridges.
文摘As commercial drone delivery becomes increasingly popular,the extension of the vehicle routing problem with drones(VRPD)is emerging as an optimization problem of inter-ests.This paper studies a variant of VRPD in multi-trip and multi-drop(VRP-mmD).The problem aims at making schedules for the trucks and drones such that the total travel time is minimized.This paper formulate the problem with a mixed integer program-ming model and propose a two-phase algorithm,i.e.,a parallel route construction heuristic(PRCH)for the first phase and an adaptive neighbor searching heuristic(ANSH)for the second phase.The PRCH generates an initial solution by con-currently assigning as many nodes as possible to the truck–drone pair to progressively reduce the waiting time at the rendezvous node in the first phase.Then the ANSH improves the initial solution by adaptively exploring the neighborhoods in the second phase.Numerical tests on some benchmark data are conducted to verify the performance of the algorithm.The results show that the proposed algorithm can found better solu-tions than some state-of-the-art methods for all instances.More-over,an extensive analysis highlights the stability of the pro-posed algorithm.
基金Large research project(RGP2/159/45)supported by the Deanship of Research and Graduate Studies at King Khalid University,Saudi Arabia。
文摘The heat transfer between two corresponding plates,disks,and concentric pipes has many applications,including water cleansing and lubrication.Furthermore,TiO_(2)-water-based nanofluids are used widely because it is useful for operating and controlling the temperature,especially in photovoltaic technology and solar panels.Motivated by these applications,the current study is based on the nanoparticle aggregation effect on magnetohydrodynamics(MHD)flow via rotating parallel plates with the chemical reaction.To achieve maximum heat transportation,the Bruggeman model is used to adapt the Maxwell model.Also,melting and thermal radiation effects are considered in the modeling to discuss heat transport.The Runge-Kutta-Fehlberg 4th−5th order method is used to attain numerical solutions.The main focus of this study is to see the thermodynamic behavior considering several aspects of nanoparticle aggregation.The heat transfer rate between the parallel plates is enhanced by improving the thermophoresis,radiation,and Brownian motion parameters.The rise in Schmidt number and chemical reaction rate parameter decreases the concentration distribution.This study will be helpful in enhancing the thermal efficiency of photovoltaic technology in solar plates,water purifying,thermal management of electronic devices,designing effective cooling systems,and other sustainable technologies.
基金This work was supported by the National Natural Science Foundation of China(62073155,62002137,62106088,62206113)the High-End Foreign Expert Recruitment Plan(G2023144007L)the Fundamental Research Funds for the Central Universities(JUSRP221028).
文摘Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach.
文摘近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用a(m+1)个处理器和O(n/a+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,+<11mna.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器.
文摘本文提出了一种直接数字频率合成器(DDFS)的设计,以Parallel_CORDIC(COrdinate Rotation Digital Computer)算法模块替代传统的查找表方式,实现了相位与幅度的一一对应,输出相位完全正交的正余弦波形;同时应用旋转角度预测及4:2的进位保存加法器(CSA)技术,将速度比传统CORDIC算法提高41.7%,精度提高到10-4.最后以Xilinx的FPGA硬件实现整个设计.
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
基金supported by the National Basic Research Program of China (973 Program) (61320)
文摘The method of establishing data structures plays an important role in the efficiency of parallel multilevel fast multipole algorithm(PMLFMA).Considering the main complements of multilevel fast multipole algorithm(MLFMA) memory,a new parallelization strategy and a modified data octree construction scheme are proposed to further reduce communication in order to improve parallel efficiency.For far interaction,a new scheme called dynamic memory allocation is developed.To analyze the workload balancing performance of a parallel implementation,the original concept of workload balancing factor is introduced and verified by numerical examples.Numerical results show that the above measures improve the parallel efficiency and are suitable for the analysis of electrical large-scale scattering objects.