Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model ...Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model implementation face significant efficiency bottlenecks when dealing with large-scale networks and multi-round simulations.To settle this problem,this study introduces a GPU-based parallel independent cascade(GPIC)algorithm,featuring an optimized representation of the network data structure and parallel task scheduling strategies.Specifically,for this GPIC algorithm,we propose a network data structure tailored for GPU processing,thereby enhancing the computational efficiency and the scalability of the IC model.In addition,we design a parallel framework that utilizes the full potential of GPU's parallel processing capabilities,thereby augmenting the computational efficiency.The results from our simulation experiments demonstrate that GPIC not only preserves accuracy but also significantly boosts efficiency,achieving a speedup factor of 129 when compared to the baseline IC method.Our experiments also reveal that when using GPIC for the independent cascade simulation,100-200 simulation rounds are sufficient for higher-cost studies,while high precision studies benefit from 500 rounds to ensure reliable results,providing empirical guidance for applying this new algorithm to practical research.展开更多
In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as ...In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as the metric of offloading performance.We propose an available resource-aware based parallel offloading scheme,which decides target fog nodes by RSU for computation offloading jointly considering effect of vehicles mobility and time-varying computation capability.Based on Hidden Markov model and Markov chain theories,proposed scheme effectively handles the imperfect system state information for fog nodes selection by jointly achieving mobility awareness and computation perception.Simulation results are presented to corroborate the theoretical analysis and validate the effectiveness of the proposed algorithm.展开更多
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th...As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.展开更多
Cost effective separation of acetylene(C_2H_2)and ethylene(C_2H_4)is of key importance to obtain essential chemical raw materials for polymer industry.Due to the low compression limit of C_2H_2,there is an urgent dema...Cost effective separation of acetylene(C_2H_2)and ethylene(C_2H_4)is of key importance to obtain essential chemical raw materials for polymer industry.Due to the low compression limit of C_2H_2,there is an urgent demand to develop suitable materials for efficiently separating the two gases under ambient conditions.In this paper,we provided a high-throughput screening strategy to study porous metal-organic frameworks(MOFs)containing open metal sites(OMS)for C_2H_2/C_2H_4 separation,followed by a rational design of novel MOFs in-silico.A set of accurate force fields was established from ab initio calculations to describe the critical role of OMS towards guest molecules.From a large-scale computational screening of 916 experimental Cu-paddlewheel-based MOFs,three materials were identified with excellent separation performance.The structure-performance relationships revealed that the optimal materials should have the largest cavity diameter around 5-10?and pore volume in-between 0.3-1.0 cm^3 g^(-1).Based on the systematic screening study result,three novel MOFs were further designed with the incorporation of fluorine functional group.The results showed that Cu-OMS and the-F group on the aromatic rings close to Cu sites could generate a synergistic effect on the preferential adsorption of C_2H_2 over C_2H_4,leading to a remarkable improvement of C_2H_2 separation performance of the materials.The findings could provide insight for future experimental design and synthesis of high-performance nanostructured materials for C_2H_2/C_2H_4 separation.展开更多
Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method...Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.展开更多
An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the...An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the high-spatial-order FDTD achieves significant reduction in the number of cells,i.e.used memory,while analyzing a high-index dielectric ring resonator working as an add/drop multiplexer.The main novelty is that the wavelet-based FDTD model is extended in a parallel computation environment to solve physical problems with large dimensions.To demonstrate the efficiency of the parallelized FDTD model,a mirrored cavity is analyzed.The analysis shows that the proposed model reduces computation time and memory cost,and the parallel computation result matches the theoretical model.展开更多
The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger mu...The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.展开更多
为解决第三代音视频编码标准(audio video coding standard 3,AVS3)帧内预测的耗时问题,提出一种基于最小编码单元(coding unit,CU)代价的帧内预测并行算法。首先,将图像划分为最小CU。然后,利用原始像素作为参考,并行计算所有最小CU的...为解决第三代音视频编码标准(audio video coding standard 3,AVS3)帧内预测的耗时问题,提出一种基于最小编码单元(coding unit,CU)代价的帧内预测并行算法。首先,将图像划分为最小CU。然后,利用原始像素作为参考,并行计算所有最小CU的帧内模式代价。最后,用代价组合的方式快速计算出其他CU的帧内模式优先级,选择最优的15个模式进入粗略模式决策(rough mode decision,RMD)阶段。此外,为减少方法引入的误差,提出了3种优化策略。在预测前对原始像素进行预处理,使其更贴合重构像素;修改帧内预测的代价函数,以更准确地估计每种模式的优先级;大尺寸CU使用顶层的CU代价作为参考,减少CU组合累积的误差。实验结果表明,在码率仅下降0.35%的情况下,整体编码的计算时间减少了27%,有效地减少了帧内预测的耗时并保证了编码质量。展开更多
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r...To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.展开更多
基金support from the National Natural Science Foundation of China(Grant No.T2293771)the STI 2030-Major Projects(Grant No.2022ZD0211400)the Sichuan Province Outstanding Young Scientists Foundation(Grant No.2023NSFSC1919)。
文摘Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model implementation face significant efficiency bottlenecks when dealing with large-scale networks and multi-round simulations.To settle this problem,this study introduces a GPU-based parallel independent cascade(GPIC)algorithm,featuring an optimized representation of the network data structure and parallel task scheduling strategies.Specifically,for this GPIC algorithm,we propose a network data structure tailored for GPU processing,thereby enhancing the computational efficiency and the scalability of the IC model.In addition,we design a parallel framework that utilizes the full potential of GPU's parallel processing capabilities,thereby augmenting the computational efficiency.The results from our simulation experiments demonstrate that GPIC not only preserves accuracy but also significantly boosts efficiency,achieving a speedup factor of 129 when compared to the baseline IC method.Our experiments also reveal that when using GPIC for the independent cascade simulation,100-200 simulation rounds are sufficient for higher-cost studies,while high precision studies benefit from 500 rounds to ensure reliable results,providing empirical guidance for applying this new algorithm to practical research.
基金supported in part by the National Natural Science Foundation of China under Grant 61971077,Grant 61901066in part by the Chongqing Science and Technology Commission under Grant cstc2019jcyj-msxmX0575in part by the Program for Innovation Team Building at colleges and universities in Chongqing,China under Grant CXTDX201601006
文摘In this paper,we investigate vehicular fog computing system and develop an effective parallel offloading scheme.The service time,that addresses task offloading delay,task decomposition and handover cost,is adopted as the metric of offloading performance.We propose an available resource-aware based parallel offloading scheme,which decides target fog nodes by RSU for computation offloading jointly considering effect of vehicles mobility and time-varying computation capability.Based on Hidden Markov model and Markov chain theories,proposed scheme effectively handles the imperfect system state information for fog nodes selection by jointly achieving mobility awareness and computation perception.Simulation results are presented to corroborate the theoretical analysis and validate the effectiveness of the proposed algorithm.
基金Foundation item:Supported by the National Natural Science Foundation of China (Grant No. 50921001), National Key Basic Research Special Foundation of China (Grant No. 2010CB832704), Scientific Project for High-tech Ships: Key Technical Research on the Semi-planning Hybrid Fore-body Trimaran, Doctoral Research Foundation of Liaoning Province (Grant No. 20091012).
文摘As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.
基金Financial support by the Fundamental Research Funds for the Central Universities(No.buctrc201727)the Natural Science Foundation of China(No.21536001,21722602,and 21322603)。
文摘Cost effective separation of acetylene(C_2H_2)and ethylene(C_2H_4)is of key importance to obtain essential chemical raw materials for polymer industry.Due to the low compression limit of C_2H_2,there is an urgent demand to develop suitable materials for efficiently separating the two gases under ambient conditions.In this paper,we provided a high-throughput screening strategy to study porous metal-organic frameworks(MOFs)containing open metal sites(OMS)for C_2H_2/C_2H_4 separation,followed by a rational design of novel MOFs in-silico.A set of accurate force fields was established from ab initio calculations to describe the critical role of OMS towards guest molecules.From a large-scale computational screening of 916 experimental Cu-paddlewheel-based MOFs,three materials were identified with excellent separation performance.The structure-performance relationships revealed that the optimal materials should have the largest cavity diameter around 5-10?and pore volume in-between 0.3-1.0 cm^3 g^(-1).Based on the systematic screening study result,three novel MOFs were further designed with the incorporation of fluorine functional group.The results showed that Cu-OMS and the-F group on the aromatic rings close to Cu sites could generate a synergistic effect on the preferential adsorption of C_2H_2 over C_2H_4,leading to a remarkable improvement of C_2H_2 separation performance of the materials.The findings could provide insight for future experimental design and synthesis of high-performance nanostructured materials for C_2H_2/C_2H_4 separation.
基金the China Natural Science Fund(No.52171253)the Natural Science Foundation of Sichuan(No.2022NSFSCO949).
文摘Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.
基金Supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2013ZX06002001- 007), the National Key Scientific Instrument and Equipment Development Projects, China (No. 2012YQ180118) and the National Natural Science Foundation of China (Nos. 11275110, 11075091 and 11105081).
基金Supported by the Scientific Research Foundation of Nanjing University of Posts and Telecommunications(NY212008,NY213116)the National Science Foundation of Jiangsu Province(BK20131383)
文摘An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the high-spatial-order FDTD achieves significant reduction in the number of cells,i.e.used memory,while analyzing a high-index dielectric ring resonator working as an add/drop multiplexer.The main novelty is that the wavelet-based FDTD model is extended in a parallel computation environment to solve physical problems with large dimensions.To demonstrate the efficiency of the parallelized FDTD model,a mirrored cavity is analyzed.The analysis shows that the proposed model reduces computation time and memory cost,and the parallel computation result matches the theoretical model.
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.
文摘为解决第三代音视频编码标准(audio video coding standard 3,AVS3)帧内预测的耗时问题,提出一种基于最小编码单元(coding unit,CU)代价的帧内预测并行算法。首先,将图像划分为最小CU。然后,利用原始像素作为参考,并行计算所有最小CU的帧内模式代价。最后,用代价组合的方式快速计算出其他CU的帧内模式优先级,选择最优的15个模式进入粗略模式决策(rough mode decision,RMD)阶段。此外,为减少方法引入的误差,提出了3种优化策略。在预测前对原始像素进行预处理,使其更贴合重构像素;修改帧内预测的代价函数,以更准确地估计每种模式的优先级;大尺寸CU使用顶层的CU代价作为参考,减少CU组合累积的误差。实验结果表明,在码率仅下降0.35%的情况下,整体编码的计算时间减少了27%,有效地减少了帧内预测的耗时并保证了编码质量。
基金supported by the National Magnetic Confinement Fusion Research Program of China(Grant No.2014GB103000)the National Natural Science Foundation of China(Grant No.11575245)the National Natural Science Foundation of China for Youth(Grant No.11205191)
文摘To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.