Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method...Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.展开更多
As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In th...As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.展开更多
An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the...An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the high-spatial-order FDTD achieves significant reduction in the number of cells,i.e.used memory,while analyzing a high-index dielectric ring resonator working as an add/drop multiplexer.The main novelty is that the wavelet-based FDTD model is extended in a parallel computation environment to solve physical problems with large dimensions.To demonstrate the efficiency of the parallelized FDTD model,a mirrored cavity is analyzed.The analysis shows that the proposed model reduces computation time and memory cost,and the parallel computation result matches the theoretical model.展开更多
This article investigates channel allocation for cognitive networks, which is difficult to obtain the optimal allocation distribution. We first study interferences between nodes in cognitive networks and establish the...This article investigates channel allocation for cognitive networks, which is difficult to obtain the optimal allocation distribution. We first study interferences between nodes in cognitive networks and establish the channel allocation model with interference constraints. Then we focus on the use of evolutionary algorithms to solve the optimal allocation distribution. We further consider that the search time can be reduced by means of parallel computing, and then a parallel algorithm based APO is proposed. In contrast with the existing algorithms, we decompose the allocation vector into a number of sub-vectors and search for optimal allocation distribution of sub-vector in parallel. In order to speed up converged rate and improve converged value, some typical operations of evolutionary algorithms are modified by two novel operators. Finally, simulation results show that the proposed algorithm drastically outperform other optimal solutions in term of the network utilization.展开更多
Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we pro...Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.展开更多
This paper firstly applies the finite impulse response filter (FIR) theory combined with the fast Fourier transform (FFT) method to generate two-dimensional Gaussian rough surface. Using the electric field integra...This paper firstly applies the finite impulse response filter (FIR) theory combined with the fast Fourier transform (FFT) method to generate two-dimensional Gaussian rough surface. Using the electric field integral equation (EFIE), it introduces the method of moment (MOM) with RWG vector basis function and Galerkin's method to investigate the electromagnetic beam scattering by a two-dimensional PEC Gaussian rough surface on personal computer (PC) clusters. The details of the parallel conjugate gradient method (CGM) for solving the matrix equation are also presented and the numerical simulations are obtained through the message passing interface (MPI) platform on the PC clusters. It finds significantly that the parallel MOM supplies a novel technique for solving a two-dimensional rough surface electromagnetic-scattering problem. The influences of the root-mean-square height, the correlation length and the polarization on the beam scattering characteristics by two-dimensional PEC Gaussian rough surfaces are finally discussed.展开更多
We employ the parallel computing technology to study numerically the three-dimensional structure of quantized vortices of Bose-Einstein condensates, For anisotropic cases, the bending process of vortices is described ...We employ the parallel computing technology to study numerically the three-dimensional structure of quantized vortices of Bose-Einstein condensates, For anisotropic cases, the bending process of vortices is described in detail by the decrease of Gross-Pitaevskii energy. A completely straight vortex and the steady and symmetrical multiple-vortex configurations are obtained. We analyse the effect of initial conditions and angular velocity on the number and shape of vortices.展开更多
In centralized cellular network architecture,the concept of virtualized Base Station(VBS) becomes attracting since it enables all base stations(BSs) to share computing resources in a dynamic manner. This can significa...In centralized cellular network architecture,the concept of virtualized Base Station(VBS) becomes attracting since it enables all base stations(BSs) to share computing resources in a dynamic manner. This can significantly improve the utilization efficiency of computing resources. In this paper,we study the computing resource allocation strategy for one VBS by considering the non-negligible effect of delay introduced by switches. Specifically,we formulate the VBS's sum computing rate maximization as a set optimization problem. To address this problem,we firstly propose a computing resource schedule algorithm,namely,weight before one-step-greedy(WBOSG),which has linear computation complexity and considerable performance. Then,OSG retreat(OSG-R) algorithm is developed to further improve the system performance at the expense of computational complexity. Simulation results under practical setting are provided to validate the proposed two algorithms.展开更多
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r...To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.展开更多
Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model ...Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model implementation face significant efficiency bottlenecks when dealing with large-scale networks and multi-round simulations.To settle this problem,this study introduces a GPU-based parallel independent cascade(GPIC)algorithm,featuring an optimized representation of the network data structure and parallel task scheduling strategies.Specifically,for this GPIC algorithm,we propose a network data structure tailored for GPU processing,thereby enhancing the computational efficiency and the scalability of the IC model.In addition,we design a parallel framework that utilizes the full potential of GPU's parallel processing capabilities,thereby augmenting the computational efficiency.The results from our simulation experiments demonstrate that GPIC not only preserves accuracy but also significantly boosts efficiency,achieving a speedup factor of 129 when compared to the baseline IC method.Our experiments also reveal that when using GPIC for the independent cascade simulation,100-200 simulation rounds are sufficient for higher-cost studies,while high precision studies benefit from 500 rounds to ensure reliable results,providing empirical guidance for applying this new algorithm to practical research.展开更多
JMCT is a large-scale,high-fidelity,three-dimensional general neutron–photon–electron–proton transport Monte Carlo software system.It was developed based on the combinatorial geometry parallel infrastructure JCOGIN...JMCT is a large-scale,high-fidelity,three-dimensional general neutron–photon–electron–proton transport Monte Carlo software system.It was developed based on the combinatorial geometry parallel infrastructure JCOGIN and the adaptive structured mesh infrastructure JASMIN.JMCT is equipped with CAD modeling and visualizes the image output.It supports the geometry of the body and the structured/unstructured mesh.JMCT has most functions,variance reduction techniques,and tallies of the traditional Monte Carlo particle transport codes.Two energy models,multi-group and continuous,are provided.In recent years,some new functions and algorithms have been developed,such as Doppler broadening on-thefly(OTF),uniform tally density(UTD),consistent adjoint driven importance sampling(CADIS),fast criticality search of boron concentration(FCSBC)domain decomposition(DD),adaptive control rod moving(ACRM),and random geometry(RG)etc.The JMCT is also coupled with the discrete ordinate SNcode JSNT to generate source-biasing factors and weight-window parameters.At present,the number of geometric bodies,materials,tallies,depletion zones,and parallel processors are sufficiently large to simulate extremely complicated device problems.JMCT can be used to simulate reactor physics,criticality safety analysis,radiation shielding,detector response,nuclear well logging,and dosimetry calculations etc.In particular,JMCT can be coupled with depletion and thermal-hydraulics for the simulation of reactor nuclear-hot feedback effects.This paper describes the progress in advanced modeling,high-performance numerical simulation of particle transport,multiphysics coupled calculations,and large-scale parallel computing.展开更多
This work presents a computational matrix framework in terms of tensor signal algebra for the formulation of discrete chirp Fourier transform algorithms. These algorithms are used in this work to estimate the point ta...This work presents a computational matrix framework in terms of tensor signal algebra for the formulation of discrete chirp Fourier transform algorithms. These algorithms are used in this work to estimate the point target functions (impulse response functions) of multiple-input multiple-output (MIMO) synthetic aperture radar (SAR) systems. This estimation technique is being studied as an alternative to the estimation of point target functions using the discrete cross-ambiguity function for certain types of environmental surveillance applications. The tensor signal algebra is presented as a mathematics environment composed of signal spaces, finite dimensional linear operators, and special matrices where algebraic methods are used to generate these signal transforms as computational estimators. Also, the tensor signal algebra contributes to analysis, design, and implementation of parallel algorithms. An instantiation of the framework was performed by using the MATLAB Parallel Computing Toolbox, where all the algorithms presented in this paper were implemented.展开更多
The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger mu...The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.展开更多
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N...Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.展开更多
Mobile search is beset with problems because of mobile terminal constraints and also because its characteristics are different from the traditional Internet search model. This paper analyzes cloud computing technologi...Mobile search is beset with problems because of mobile terminal constraints and also because its characteristics are different from the traditional Internet search model. This paper analyzes cloud computing technologies--especially mass data storage, parallel computing, and virtualization--in an attempt to solve technical problems in mobile search. The broad prospects of cloud computing are also discussed.展开更多
Parallel multi-thread processing in advanced intelligent processors is the core to realize high-speed and high-capacity signal processing systems.Optical neural network(ONN)has the native advantages of high paralleliz...Parallel multi-thread processing in advanced intelligent processors is the core to realize high-speed and high-capacity signal processing systems.Optical neural network(ONN)has the native advantages of high parallelization,large bandwidth,and low power consumption to meet the demand of big data.Here,we demonstrate the dual-layer ONN with Mach-Zehnder interferometer(MZI)network and nonlinear layer,while the nonlinear activation function is achieved by optical-electronic signal conversion.Two frequency components from the microcomb source carrying digit datasets are simultaneously imposed and intelligently recognized through the ONN.We successfully achieve the digit classification of different frequency components by demultiplexing the output signal and testing power distribution.Efficient parallelization feasibility with wavelength division multiplexing is demonstrated in our high-dimensional ONN.This work provides a high-performance architecture for future parallel high-capacity optical analog computing.展开更多
In order to simulate and analyze the dynamic characteristics of the parachute from advanced tactical parachute system(ATPS),a nonlinear finite element algorithm and a preconditioning finite volume method are employed ...In order to simulate and analyze the dynamic characteristics of the parachute from advanced tactical parachute system(ATPS),a nonlinear finite element algorithm and a preconditioning finite volume method are employed and developed to construct three dimensional parachute fluid-structure interaction(FSI)model.Parachute fabric material is represented by membrane-cable elements,and geometrical nonlinear algorithm is employed with wrinkling technique embedded to simulate the large deformations of parachute structure by applying the NewtonRaphson iteration method.On the other hand,the time-dependent flow surrounding parachute canopy is simulated using preconditioned lower-upper symmetric Gauss-Seidel(LU-SGS)method.The pseudo solid dynamic mesh algorithm is employed to update the flow-field mesh based on the complex and arbitrary motion of parachute canopy.Due to the large amount of computation during the FSI simulation,massage passing interface(MPI)parallel computation technique is used for all those three modules to improve the performance of the FSI code.The FSI method is tested to simulate one kind of ATPS parachutes to predict the parachute configuration and anticipate the parachute descent speeds.The comparison of results between the proposed method and those in literatures demonstrates the method to be a useful tool for parachute designers.展开更多
基金the China Natural Science Fund(No.52171253)the Natural Science Foundation of Sichuan(No.2022NSFSCO949).
文摘Accurate 3-dimensional(3-D)reconstruction technology for nondestructive testing based on digital radiography(DR)is of great importance for alleviating the drawbacks of the existing computed tomography(CT)-based method.The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR.However,for 3-D reconstruction,it is limited by its high time consumption.To solve this problem,this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application.The images are utilized for 3-D reconstruction of the test model.We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes-multithreaded applications(G4-MT)and message-passing interfaces(G4-MPI)-by assessing parallel speedup and efficiency.This study explores the scalability of the hybrid G4-MPI and G4-MT modes.The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth,and the parallel efficiency is maintained at a high level.The hybrid mode has strong scalability,as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted,and the parallel speedup can be up to 151.36.The 3-D reconstruction of the model is achieved based on the filtered back projection(FBP)algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT.The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model.This method is applied to a complex model,and the quality of the reconstructed images is evaluated.
基金Foundation item:Supported by the National Natural Science Foundation of China (Grant No. 50921001), National Key Basic Research Special Foundation of China (Grant No. 2010CB832704), Scientific Project for High-tech Ships: Key Technical Research on the Semi-planning Hybrid Fore-body Trimaran, Doctoral Research Foundation of Liaoning Province (Grant No. 20091012).
文摘As well as shock wave and bubble pulse loading, cavitation also has very significant influences on the dynamic response of surface ships and other near-surface marine structures to underwater explosive loadings. In this paper, the acoustic-structure coupling method embedded in ABAQUS is adopted to do numerical analysis of underwater explosion considering cavitation. Both the shape of bulk cavitation region and local cavitation region are obtained, and they are in good agreement with analytical results. The duration of reloading is several times longer than that of a shock wave. In the end, both the single computation and parallel computation of the cavitation effect on the dynamic responses of a full-scale ship are presented, which proved that reloading caused by cavitation is non-ignorable. All these results are helpful in understanding underwater explosion cavitation effects.
基金Supported by the Scientific Research Foundation of Nanjing University of Posts and Telecommunications(NY212008,NY213116)the National Science Foundation of Jiangsu Province(BK20131383)
文摘An efficient wavelet-based finite-difference time-domain(FDTD)method is implemented for analyzing nanoscale optical devices,especially optical resonator.Because of its highly linear numerical dispersion properties the high-spatial-order FDTD achieves significant reduction in the number of cells,i.e.used memory,while analyzing a high-index dielectric ring resonator working as an add/drop multiplexer.The main novelty is that the wavelet-based FDTD model is extended in a parallel computation environment to solve physical problems with large dimensions.To demonstrate the efficiency of the parallelized FDTD model,a mirrored cavity is analyzed.The analysis shows that the proposed model reduces computation time and memory cost,and the parallel computation result matches the theoretical model.
基金supported in part by the National Natural Science Foundation under Grant No.61072069National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant No.2012ZX03003012
文摘This article investigates channel allocation for cognitive networks, which is difficult to obtain the optimal allocation distribution. We first study interferences between nodes in cognitive networks and establish the channel allocation model with interference constraints. Then we focus on the use of evolutionary algorithms to solve the optimal allocation distribution. We further consider that the search time can be reduced by means of parallel computing, and then a parallel algorithm based APO is proposed. In contrast with the existing algorithms, we decompose the allocation vector into a number of sub-vectors and search for optimal allocation distribution of sub-vector in parallel. In order to speed up converged rate and improve converged value, some typical operations of evolutionary algorithms are modified by two novel operators. Finally, simulation results show that the proposed algorithm drastically outperform other optimal solutions in term of the network utilization.
基金Project supported by the National Natural Science Foundation of China (Grant Nos.61303071 and 61120106005)the Natural Science Fund from the Guangzhou Science and Information Technology Bureau (Grant No.134200026)
文摘Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simula- tion efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then pack- age the computations of each force model into many tiny computational units called "cell loads", which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called "local domains", and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-lA supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.
基金supported by the National Natural Science Foundation of China (Grant No 60571058)the Specialized Research Fund for the Doctoral Program of Higher Education,China (Grant No 20070701010)
文摘This paper firstly applies the finite impulse response filter (FIR) theory combined with the fast Fourier transform (FFT) method to generate two-dimensional Gaussian rough surface. Using the electric field integral equation (EFIE), it introduces the method of moment (MOM) with RWG vector basis function and Galerkin's method to investigate the electromagnetic beam scattering by a two-dimensional PEC Gaussian rough surface on personal computer (PC) clusters. The details of the parallel conjugate gradient method (CGM) for solving the matrix equation are also presented and the numerical simulations are obtained through the message passing interface (MPI) platform on the PC clusters. It finds significantly that the parallel MOM supplies a novel technique for solving a two-dimensional rough surface electromagnetic-scattering problem. The influences of the root-mean-square height, the correlation length and the polarization on the beam scattering characteristics by two-dimensional PEC Gaussian rough surfaces are finally discussed.
基金Project supported partly by the National Natural Science Foundation of China (Grant Nos 10301034 and 40574069), The authors thank Professor Du Q very much for his important discussions.
文摘We employ the parallel computing technology to study numerically the three-dimensional structure of quantized vortices of Bose-Einstein condensates, For anisotropic cases, the bending process of vortices is described in detail by the decrease of Gross-Pitaevskii energy. A completely straight vortex and the steady and symmetrical multiple-vortex configurations are obtained. We analyse the effect of initial conditions and angular velocity on the number and shape of vortices.
基金funded by the key project of the National Natural Science Foundation of China (No.61431001)the National High-Tech R&D Program (863 Program 2015AA01A705)New Technology Star Plan of Beijing (No.xx2013052)
文摘In centralized cellular network architecture,the concept of virtualized Base Station(VBS) becomes attracting since it enables all base stations(BSs) to share computing resources in a dynamic manner. This can significantly improve the utilization efficiency of computing resources. In this paper,we study the computing resource allocation strategy for one VBS by considering the non-negligible effect of delay introduced by switches. Specifically,we formulate the VBS's sum computing rate maximization as a set optimization problem. To address this problem,we firstly propose a computing resource schedule algorithm,namely,weight before one-step-greedy(WBOSG),which has linear computation complexity and considerable performance. Then,OSG retreat(OSG-R) algorithm is developed to further improve the system performance at the expense of computational complexity. Simulation results under practical setting are provided to validate the proposed two algorithms.
基金supported by the National Magnetic Confinement Fusion Research Program of China(Grant No.2014GB103000)the National Natural Science Foundation of China(Grant No.11575245)the National Natural Science Foundation of China for Youth(Grant No.11205191)
文摘To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.
基金support from the National Natural Science Foundation of China(Grant No.T2293771)the STI 2030-Major Projects(Grant No.2022ZD0211400)the Sichuan Province Outstanding Young Scientists Foundation(Grant No.2023NSFSC1919)。
文摘Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model implementation face significant efficiency bottlenecks when dealing with large-scale networks and multi-round simulations.To settle this problem,this study introduces a GPU-based parallel independent cascade(GPIC)algorithm,featuring an optimized representation of the network data structure and parallel task scheduling strategies.Specifically,for this GPIC algorithm,we propose a network data structure tailored for GPU processing,thereby enhancing the computational efficiency and the scalability of the IC model.In addition,we design a parallel framework that utilizes the full potential of GPU's parallel processing capabilities,thereby augmenting the computational efficiency.The results from our simulation experiments demonstrate that GPIC not only preserves accuracy but also significantly boosts efficiency,achieving a speedup factor of 129 when compared to the baseline IC method.Our experiments also reveal that when using GPIC for the independent cascade simulation,100-200 simulation rounds are sufficient for higher-cost studies,while high precision studies benefit from 500 rounds to ensure reliable results,providing empirical guidance for applying this new algorithm to practical research.
基金supported by the National Natural Science Foundation of China (Nos. 11805017 and 12001050)
文摘JMCT is a large-scale,high-fidelity,three-dimensional general neutron–photon–electron–proton transport Monte Carlo software system.It was developed based on the combinatorial geometry parallel infrastructure JCOGIN and the adaptive structured mesh infrastructure JASMIN.JMCT is equipped with CAD modeling and visualizes the image output.It supports the geometry of the body and the structured/unstructured mesh.JMCT has most functions,variance reduction techniques,and tallies of the traditional Monte Carlo particle transport codes.Two energy models,multi-group and continuous,are provided.In recent years,some new functions and algorithms have been developed,such as Doppler broadening on-thefly(OTF),uniform tally density(UTD),consistent adjoint driven importance sampling(CADIS),fast criticality search of boron concentration(FCSBC)domain decomposition(DD),adaptive control rod moving(ACRM),and random geometry(RG)etc.The JMCT is also coupled with the discrete ordinate SNcode JSNT to generate source-biasing factors and weight-window parameters.At present,the number of geometric bodies,materials,tallies,depletion zones,and parallel processors are sufficiently large to simulate extremely complicated device problems.JMCT can be used to simulate reactor physics,criticality safety analysis,radiation shielding,detector response,nuclear well logging,and dosimetry calculations etc.In particular,JMCT can be coupled with depletion and thermal-hydraulics for the simulation of reactor nuclear-hot feedback effects.This paper describes the progress in advanced modeling,high-performance numerical simulation of particle transport,multiphysics coupled calculations,and large-scale parallel computing.
文摘This work presents a computational matrix framework in terms of tensor signal algebra for the formulation of discrete chirp Fourier transform algorithms. These algorithms are used in this work to estimate the point target functions (impulse response functions) of multiple-input multiple-output (MIMO) synthetic aperture radar (SAR) systems. This estimation technique is being studied as an alternative to the estimation of point target functions using the discrete cross-ambiguity function for certain types of environmental surveillance applications. The tensor signal algebra is presented as a mathematics environment composed of signal spaces, finite dimensional linear operators, and special matrices where algebraic methods are used to generate these signal transforms as computational estimators. Also, the tensor signal algebra contributes to analysis, design, and implementation of parallel algorithms. An instantiation of the framework was performed by using the MATLAB Parallel Computing Toolbox, where all the algorithms presented in this paper were implemented.
基金supported by the National Key R&D Program of China(2020YFB0905900).
文摘The construction of new power systems presents higher requirements for the Power Internet of Things(PIoT)technology.The“source-grid-load-storage”architecture of a new power system requires PIoT to have a stronger multi-source heterogeneous data fusion ability.Native graph databases have great advantages in dealing with multi-source heterogeneous data,which make them suitable for an increasing number of analytical computing tasks.However,only few existing graph database products have native support for matrix operation-related interfaces or functions,resulting in low efficiency when handling matrix calculations that are commonly encountered in power grids.In this paper,the matrix computation process is expressed by a strategy called graph description,which relies on the natural connection between the matrix and structure of the graph.Based on that,we implement matrix operations on graph database,including matrix multiplication,matrix decomposition,etc.Specifically,only the nodes relevant to the computation and their neighbors are concerned in the process,which prunes the influence of zero elements in the matrix and avoids useless iterations compared to the conventional matrix computation.Based on the graph description,a series of power grid computations can be implemented on graph database,which reduces redundant data import and export operations while leveraging the parallel computing capability of graph database.It promotes the efficiency of PIoT when handling multi-source heterogeneous data.An comprehensive experimental study over two different scale power system datasets compares the proposed method with Python and MATLAB baselines.The results reveal the superior performance of our proposed method in both power flow and N-1 contingency computations.
基金supported by the National Natural Science Foundation of China (No.11172134)the Funding of Jiangsu Innovation Program for Graduate Education (No.CXLX13_132)
文摘Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.
文摘Mobile search is beset with problems because of mobile terminal constraints and also because its characteristics are different from the traditional Internet search model. This paper analyzes cloud computing technologies--especially mass data storage, parallel computing, and virtualization--in an attempt to solve technical problems in mobile search. The broad prospects of cloud computing are also discussed.
基金Peng Xie acknowledges the support from the China Scholarship Council(Grant no.201804910829).
文摘Parallel multi-thread processing in advanced intelligent processors is the core to realize high-speed and high-capacity signal processing systems.Optical neural network(ONN)has the native advantages of high parallelization,large bandwidth,and low power consumption to meet the demand of big data.Here,we demonstrate the dual-layer ONN with Mach-Zehnder interferometer(MZI)network and nonlinear layer,while the nonlinear activation function is achieved by optical-electronic signal conversion.Two frequency components from the microcomb source carrying digit datasets are simultaneously imposed and intelligently recognized through the ONN.We successfully achieve the digit classification of different frequency components by demultiplexing the output signal and testing power distribution.Efficient parallelization feasibility with wavelength division multiplexing is demonstrated in our high-dimensional ONN.This work provides a high-performance architecture for future parallel high-capacity optical analog computing.
文摘In order to simulate and analyze the dynamic characteristics of the parachute from advanced tactical parachute system(ATPS),a nonlinear finite element algorithm and a preconditioning finite volume method are employed and developed to construct three dimensional parachute fluid-structure interaction(FSI)model.Parachute fabric material is represented by membrane-cable elements,and geometrical nonlinear algorithm is employed with wrinkling technique embedded to simulate the large deformations of parachute structure by applying the NewtonRaphson iteration method.On the other hand,the time-dependent flow surrounding parachute canopy is simulated using preconditioned lower-upper symmetric Gauss-Seidel(LU-SGS)method.The pseudo solid dynamic mesh algorithm is employed to update the flow-field mesh based on the complex and arbitrary motion of parachute canopy.Due to the large amount of computation during the FSI simulation,massage passing interface(MPI)parallel computation technique is used for all those three modules to improve the performance of the FSI code.The FSI method is tested to simulate one kind of ATPS parachutes to predict the parachute configuration and anticipate the parachute descent speeds.The comparison of results between the proposed method and those in literatures demonstrates the method to be a useful tool for parachute designers.