In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as ...In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as the metric of offloading performance.We propose an available resource-aware based parallel offloading scheme,which decides target fog nodes by RSU for computation offloading jointly considering effect of vehicles mobility and time-varying computation capability.Based on Hidden Markov model and Markov chain theories,proposed scheme effectively handles the imperfect system state information for fog nodes selection by jointly achieving mobility awareness and computation perception.Simulation results are presented to corroborate the theoretical analysis and validate the effectiveness of the proposed algorithm.展开更多
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th...As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.展开更多
Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method...Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.展开更多
An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the...An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the high-spatial-order FDTD achieves significant reduction in the number of cells,i.e.used memory,while analyzing a high-index dielectric ring resonator working as an add/drop multiplexer.The main novelty is that the wavelet-based FDTD model is extended in a parallel computation environment to solve physical problems with large dimensions.To demonstrate the efficiency of the parallelized FDTD model,a mirrored cavity is analyzed.The analysis shows that the proposed model reduces computation time and memory cost,and the parallel computation result matches the theoretical model.展开更多
Simulating turbulent liquids with breaking waves and splashes is among the most desired features in fluid animation. Lagrangian methods such as Smoothed Particle Hydrodynamics method (SPH) are a promising way to captu...Simulating turbulent liquids with breaking waves and splashes is among the most desired features in fluid animation. Lagrangian methods such as Smoothed Particle Hydrodynamics method (SPH) are a promising way to capture such properties. However, the Particle-based liquid surface simulation has not been applied very well since its consumption is way too large. This paper derives the governing equations in SPH approaches and parallelizes the dynamics-based surface simulation with the MapReduce program models which apply the SPH approach in Cloud Computing. Compared to the serial methods, this approach obtained a 3.11 times speedup on the experimental platform.展开更多
In centralized cellular network architecture,the concept of virtualized Base Station(VBS) becomes attracting since it enables all base stations(BSs) to share computing resources in a dynamic manner. This can significa...In centralized cellular network architecture,the concept of virtualized Base Station(VBS) becomes attracting since it enables all base stations(BSs) to share computing resources in a dynamic manner. This can significantly improve the utilization efficiency of computing resources. In this paper,we study the computing resource allocation strategy for one VBS by considering the non-negligible effect of delay introduced by switches. Specifically,we formulate the VBS's sum computing rate maximization as a set optimization problem. To address this problem,we firstly propose a computing resource schedule algorithm,namely,weight before one-step-greedy(WBOSG),which has linear computation complexity and considerable performance. Then,OSG retreat(OSG-R) algorithm is developed to further improve the system performance at the expense of computational complexity. Simulation results under practical setting are provided to validate the proposed two algorithms.展开更多
为解决第三代音视频编码标准(audio video coding standard 3,AVS3)帧内预测的耗时问题,提出一种基于最小编码单元(coding unit,CU)代价的帧内预测并行算法。首先,将图像划分为最小CU。然后,利用原始像素作为参考,并行计算所有最小CU的...为解决第三代音视频编码标准(audio video coding standard 3,AVS3)帧内预测的耗时问题,提出一种基于最小编码单元(coding unit,CU)代价的帧内预测并行算法。首先,将图像划分为最小CU。然后,利用原始像素作为参考,并行计算所有最小CU的帧内模式代价。最后,用代价组合的方式快速计算出其他CU的帧内模式优先级,选择最优的15个模式进入粗略模式决策(rough mode decision,RMD)阶段。此外,为减少方法引入的误差,提出了3种优化策略。在预测前对原始像素进行预处理,使其更贴合重构像素;修改帧内预测的代价函数,以更准确地估计每种模式的优先级;大尺寸CU使用顶层的CU代价作为参考,减少CU组合累积的误差。实验结果表明,在码率仅下降0.35%的情况下,整体编码的计算时间减少了27%,有效地减少了帧内预测的耗时并保证了编码质量。展开更多
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r...To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.展开更多
Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we pro...Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.展开更多
This article investigates channel allocation for cognitive networks, which is difficult to obtain the optimal allocation distribution. We first study interferences between nodes in cognitive networks and establish the...This article investigates channel allocation for cognitive networks, which is difficult to obtain the optimal allocation distribution. We first study interferences between nodes in cognitive networks and establish the channel allocation model with interference constraints. Then we focus on the use of evolutionary algorithms to solve the optimal allocation distribution. We further consider that the search time can be reduced by means of parallel computing, and then a parallel algorithm based APO is proposed. In contrast with the existing algorithms, we decompose the allocation vector into a number of sub-vectors and search for optimal allocation distribution of sub-vector in parallel. In order to speed up converged rate and improve converged value, some typical operations of evolutionary algorithms are modified by two novel operators. Finally, simulation results show that the proposed algorithm drastically outperform other optimal solutions in term of the network utilization.展开更多
基金supported in part by the National Natural Science Foundation of China under Grant 61971077,Grant 61901066in part by the Chongqing Science and Technology Commission under Grant cstc2019jcyj-msxmX0575in part by the Program for Innovation Team Building at colleges and universities in Chongqing,China under Grant CXTDX201601006
文摘In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as the metric of offloading performance.We propose an available resource-aware based parallel offloading scheme,which decides target fog nodes by RSU for computation offloading jointly considering effect of vehicles mobility and time-varying computation capability.Based on Hidden Markov model and Markov chain theories,proposed scheme effectively handles the imperfect system state information for fog nodes selection by jointly achieving mobility awareness and computation perception.Simulation results are presented to corroborate the theoretical analysis and validate the effectiveness of the proposed algorithm.
基金Foundation item:Supported by the National Natural Science Foundation of China (Grant No. 50921001), National Key Basic Research Special Foundation of China (Grant No. 2010CB832704), Scientific Project for High-tech Ships: Key Technical Research on the Semi-planning Hybrid Fore-body Trimaran, Doctoral Research Foundation of Liaoning Province (Grant No. 20091012).
文摘As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.
基金the China Natural Science Fund(No.52171253)the Natural Science Foundation of Sichuan(No.2022NSFSCO949).
文摘Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.
基金Supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2013ZX06002001- 007), the National Key Scientific Instrument and Equipment Development Projects, China (No. 2012YQ180118) and the National Natural Science Foundation of China (Nos. 11275110, 11075091 and 11105081).
基金Supported by the Scientific Research Foundation of Nanjing University of Posts and Telecommunications(NY212008,NY213116)the National Science Foundation of Jiangsu Province(BK20131383)
文摘An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the high-spatial-order FDTD achieves significant reduction in the number of cells,i.e.used memory,while analyzing a high-index dielectric ring resonator working as an add/drop multiplexer.The main novelty is that the wavelet-based FDTD model is extended in a parallel computation environment to solve physical problems with large dimensions.To demonstrate the efficiency of the parallelized FDTD model,a mirrored cavity is analyzed.The analysis shows that the proposed model reduces computation time and memory cost,and the parallel computation result matches the theoretical model.
基金supported by National High Technical Research and Development Program of China (863 program) under GrantNo.2009AA062801National Natural Science Foundation of China under Grant No.60973063+2 种基金Beijing Natural Science Foundation of China under Grant No.4092028China Fundamental Research Funds for the Central Universities under Grant No.FRF-TP-09-016BNew Century Personnel Plan for the Ministry of Education of China under Grant No.NCET-10-0221
文摘Simulating turbulent liquids with breaking waves and splashes is among the most desired features in fluid animation. Lagrangian methods such as Smoothed Particle Hydrodynamics method (SPH) are a promising way to capture such properties. However, the Particle-based liquid surface simulation has not been applied very well since its consumption is way too large. This paper derives the governing equations in SPH approaches and parallelizes the dynamics-based surface simulation with the MapReduce program models which apply the SPH approach in Cloud Computing. Compared to the serial methods, this approach obtained a 3.11 times speedup on the experimental platform.
基金funded by the key project of the National Natural Science Foundation of China (No.61431001)the National High-Tech R&D Program (863 Program 2015AA01A705)New Technology Star Plan of Beijing (No.xx2013052)
文摘In centralized cellular network architecture,the concept of virtualized Base Station(VBS) becomes attracting since it enables all base stations(BSs) to share computing resources in a dynamic manner. This can significantly improve the utilization efficiency of computing resources. In this paper,we study the computing resource allocation strategy for one VBS by considering the non-negligible effect of delay introduced by switches. Specifically,we formulate the VBS's sum computing rate maximization as a set optimization problem. To address this problem,we firstly propose a computing resource schedule algorithm,namely,weight before one-step-greedy(WBOSG),which has linear computation complexity and considerable performance. Then,OSG retreat(OSG-R) algorithm is developed to further improve the system performance at the expense of computational complexity. Simulation results under practical setting are provided to validate the proposed two algorithms.
文摘为解决第三代音视频编码标准(audio video coding standard 3,AVS3)帧内预测的耗时问题,提出一种基于最小编码单元(coding unit,CU)代价的帧内预测并行算法。首先,将图像划分为最小CU。然后,利用原始像素作为参考,并行计算所有最小CU的帧内模式代价。最后,用代价组合的方式快速计算出其他CU的帧内模式优先级,选择最优的15个模式进入粗略模式决策(rough mode decision,RMD)阶段。此外,为减少方法引入的误差,提出了3种优化策略。在预测前对原始像素进行预处理,使其更贴合重构像素;修改帧内预测的代价函数,以更准确地估计每种模式的优先级;大尺寸CU使用顶层的CU代价作为参考,减少CU组合累积的误差。实验结果表明,在码率仅下降0.35%的情况下,整体编码的计算时间减少了27%,有效地减少了帧内预测的耗时并保证了编码质量。
基金supported by the National Magnetic Confinement Fusion Research Program of China(Grant No.2014GB103000)the National Natural Science Foundation of China(Grant No.11575245)the National Natural Science Foundation of China for Youth(Grant No.11205191)
文摘To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.
基金Project supported by the National Natural Science Foundation of China (Grant Nos.61303071 and 61120106005)the Natural Science Fund from the Guangzhou Science and Information Technology Bureau (Grant No.134200026)
文摘Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.
基金supported in part by the National Natural Science Foundation under Grant No.61072069National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant No.2012ZX03003012
文摘This article investigates channel allocation for cognitive networks, which is difficult to obtain the optimal allocation distribution. We first study interferences between nodes in cognitive networks and establish the channel allocation model with interference constraints. Then we focus on the use of evolutionary algorithms to solve the optimal allocation distribution. We further consider that the search time can be reduced by means of parallel computing, and then a parallel algorithm based APO is proposed. In contrast with the existing algorithms, we decompose the allocation vector into a number of sub-vectors and search for optimal allocation distribution of sub-vector in parallel. In order to speed up converged rate and improve converged value, some typical operations of evolutionary algorithms are modified by two novel operators. Finally, simulation results show that the proposed algorithm drastically outperform other optimal solutions in term of the network utilization.