Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The b...Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The bottom boundary of the microchannel is simulated by size-dependent beam elements for the finite element method (FEM) based on a modified cou- ple stress theory. The lattice Boltzmann method (LBM) using the D2Q13 LB model is coupled to the FEM in order to solve the fluid part of the FSI problem. Because of the fact that the LBM generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallel computing. The simulations are carried out on graphics processing units (GPUs) using computed unified device architecture (CUDA). In the present study, the governing equations are non-dimensionalized and the set of dimensionless groups is exhibited to show their effects on micro-beam displacement. The numerical results show that the displacements of the micro-beam predicted by the size-dependent beam element are smaller than those by the classical beam element.展开更多
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N...Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.展开更多
Compared with the transverse isotropic(TI)medium,the orthorhombic anisotropic medium has both horizontal and vertical symmetry axes and it can be approximated as a set of vertical fissures developed in a group of hori...Compared with the transverse isotropic(TI)medium,the orthorhombic anisotropic medium has both horizontal and vertical symmetry axes and it can be approximated as a set of vertical fissures developed in a group of horizontal strata.Although the full-elastic orthorhombic anisotropic wave equation can accurately simulate seismic wave propagation in the underground media,a huge computational cost is required in seismic modeling,migration,and inversion.The conventional coupled pseudo-acoustic wave equations based on acoustic approximation can be used to significantly reduce the cost of calculation.However,these equations usually suffer from unwanted shear wave artifacts during wave propagation,and the presence of these artifacts can significantly degrade the imaging quality.To solve these problems,we derived a new pure P-wave equation for orthorhombic media that eliminates shear wave artifacts while compromising computational efficiency and accuracy.In addition,the derived equation involves pseudo-differential operators and it must be solved by 3D FFT algorithms.In order to reduce the number of 3D FFT,we utilized the finite difference and pseudo-spectral methods to conduct 3D forward modeling.Furthermore,we simplified the equation by using elliptic approximation and implemented 3D reverse-time migration(RTM).Forward modeling tests on several homogeneous and heterogeneous models confirm that the accuracy of the new equation is better than that of conventional methods.3D RTM imaging tests on three-layer and SEG/EAGE 3D salt models confirm that the ORT media have better imaging quality.展开更多
MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations.Using the features of the Julia programming language,MicroMagnetic.jl supports CPU and various GPU platforms,including NVIDI...MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations.Using the features of the Julia programming language,MicroMagnetic.jl supports CPU and various GPU platforms,including NVIDIA,AMD,Intel,and Apple GPUs.Moreover,MicroMagnetic.jl supports Monte Carlo simulations for atomistic models and implements the nudged-elastic-band method for energy barrier computations.With built-in support for double and single precision modes and a design allowing easy extensibility to add new features,MicroMagnetic.jl provides a versatile toolset for researchers in micromagnetics and atomistic simulations.展开更多
Aiming to solve the bottleneck problem of electromagnetic scattering simulation in the scenes of extremely large-scale seas and ships,a high-frequency method by using graphics processing unit(GPU)parallel acceleration...Aiming to solve the bottleneck problem of electromagnetic scattering simulation in the scenes of extremely large-scale seas and ships,a high-frequency method by using graphics processing unit(GPU)parallel acceleration technique is proposed.For the implementation of different electromagnetic methods of physical optics(PO),shooting and bouncing ray(SBR),and physical theory of diffraction(PTD),a parallel computing scheme based on the CPU-GPU parallel computing scheme is realized to balance computing tasks.Finally,a multi-GPU framework is further proposed to solve the computational difficulty caused by the massive number of ray tubes in the ray tracing process.By using the established simulation platform,signals of ships at different seas are simulated and their images are achieved as well.It is shown that the higher sea states degrade the averaged peak signal-to-noise ratio(PSNR)of radar image.展开更多
Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremend...Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need.展开更多
Graphic processing units (GPUs) have been widely recognized as cost-efficient co-processors with acceptable size, weight, and power consumption. However, adopting GPUs in real-time systems is still challenging, due ...Graphic processing units (GPUs) have been widely recognized as cost-efficient co-processors with acceptable size, weight, and power consumption. However, adopting GPUs in real-time systems is still challenging, due to the lack in framework for real-time analysis. In order to guarantee real-time requirements while maintaining system utilization ~in modern heterogeneous systems, such as multicore multi-GPU systems, a novel suspension-based k-exclusion real-time locking protocol and the associated suspension-aware schedulability analysis are proposed. The proposed protocol provides a synchronization framework that enables multiple GPUs to be efficiently integrated in multicore real-time systems. Comparative evaluations show that the proposed methods improve upon the existing work in terms of schedulability.展开更多
A combination of the lattice Boltzmann method and the most recently developed dynamic mode decomposition is proposed for stability analysis. The simulations are performed on a graphical processing unit. Stability of t...A combination of the lattice Boltzmann method and the most recently developed dynamic mode decomposition is proposed for stability analysis. The simulations are performed on a graphical processing unit. Stability of the flow past a cylinder at supercritical state, Re = 50, is studied by the combination for both the exponential growing and the limit cycle regimes. The Ritz values, energy spectrum, and modes for both regimes are presented and compared with the Koopman eigenvalues. For harmonic-like periodic flow in the limit cycle, global analysis from the combination gives the same results as those from the Koopman analysis. For transient flow as in the exponential growth regime, the combination can provide more reasonable results. It is demonstrated that the combination of the lattice Boltzmann method and the dynamic mode decomposition is powerful and can be used for stability analysis for more complex flows.展开更多
For electromagnetic scattering of 3?D complex electrically large conducting targets,a new hybrid algorithm,MoM?PO/SBR algorithm,is presented to realize the interaction of information between method of moment(MoM)and p...For electromagnetic scattering of 3?D complex electrically large conducting targets,a new hybrid algorithm,MoM?PO/SBR algorithm,is presented to realize the interaction of information between method of moment(MoM)and physical optics(PO)/shooting and bouncing ray(SBR).In the algorithm,the COC file that based on the Huygens equivalent principle is introduced,and the conversion interface between the equivalent surface and the target is established.And then,the multi?task flow model presented in this paper is adopted to conduct CPU/graphics processing unit(GPU)tests of the algorithm under three modes,i.e.,MPI/OpenMP,MPI/compute unified device architecture(CUDA)and multi?task programming model(MTPM).Numerical results are presented and compared with reference solutions in order to illustrate the accuracy and the efficiency of the proposed algorithm.展开更多
文摘Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The bottom boundary of the microchannel is simulated by size-dependent beam elements for the finite element method (FEM) based on a modified cou- ple stress theory. The lattice Boltzmann method (LBM) using the D2Q13 LB model is coupled to the FEM in order to solve the fluid part of the FSI problem. Because of the fact that the LBM generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallel computing. The simulations are carried out on graphics processing units (GPUs) using computed unified device architecture (CUDA). In the present study, the governing equations are non-dimensionalized and the set of dimensionless groups is exhibited to show their effects on micro-beam displacement. The numerical results show that the displacements of the micro-beam predicted by the size-dependent beam element are smaller than those by the classical beam element.
基金supported by the National Natural Science Foundation of China (No.11172134)the Funding of Jiangsu Innovation Program for Graduate Education (No.CXLX13_132)
文摘Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.
基金supported by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology(No.2021QNLM020001)the Major Scientific and Technological Projects of Shandong Energy Group(No.SNKJ2022A06-R23)+2 种基金the Funds of Creative Research Groups of China(No.41821002)National Natural Science Foundation of China Outstanding Youth Science Fund Project(Overseas)(No.ZX20230152)the Major Scientific and Technological Projects of CNPC(No.ZD2019-183-003)。
文摘Compared with the transverse isotropic(TI)medium,the orthorhombic anisotropic medium has both horizontal and vertical symmetry axes and it can be approximated as a set of vertical fissures developed in a group of horizontal strata.Although the full-elastic orthorhombic anisotropic wave equation can accurately simulate seismic wave propagation in the underground media,a huge computational cost is required in seismic modeling,migration,and inversion.The conventional coupled pseudo-acoustic wave equations based on acoustic approximation can be used to significantly reduce the cost of calculation.However,these equations usually suffer from unwanted shear wave artifacts during wave propagation,and the presence of these artifacts can significantly degrade the imaging quality.To solve these problems,we derived a new pure P-wave equation for orthorhombic media that eliminates shear wave artifacts while compromising computational efficiency and accuracy.In addition,the derived equation involves pseudo-differential operators and it must be solved by 3D FFT algorithms.In order to reduce the number of 3D FFT,we utilized the finite difference and pseudo-spectral methods to conduct 3D forward modeling.Furthermore,we simplified the equation by using elliptic approximation and implemented 3D reverse-time migration(RTM).Forward modeling tests on several homogeneous and heterogeneous models confirm that the accuracy of the new equation is better than that of conventional methods.3D RTM imaging tests on three-layer and SEG/EAGE 3D salt models confirm that the ORT media have better imaging quality.
基金supported by the National Key R&D Program of China(Grant No.2022YFA1403603)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB33030100)+2 种基金the National Natural Science Fund for Distinguished Young Scholar(Grant No.52325105)the National Natural Science Foundation of China(Grant Nos.12374098,11974021,and 12241406)the CAS Project for Young Scientists in Basic Research(Grant No.YSBR-084).
文摘MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations.Using the features of the Julia programming language,MicroMagnetic.jl supports CPU and various GPU platforms,including NVIDIA,AMD,Intel,and Apple GPUs.Moreover,MicroMagnetic.jl supports Monte Carlo simulations for atomistic models and implements the nudged-elastic-band method for energy barrier computations.With built-in support for double and single precision modes and a design allowing easy extensibility to add new features,MicroMagnetic.jl provides a versatile toolset for researchers in micromagnetics and atomistic simulations.
基金supported by the Opening Foundation of the Agile and Intelligence Computing Key Laboratory of Sichuan Province under Grant No.H23004the Chengdu Municipal Science and Technology Bureau Technological Innovation R&D Project(Key Project)under Grant No.2024-YF08-00106-GX.
文摘Aiming to solve the bottleneck problem of electromagnetic scattering simulation in the scenes of extremely large-scale seas and ships,a high-frequency method by using graphics processing unit(GPU)parallel acceleration technique is proposed.For the implementation of different electromagnetic methods of physical optics(PO),shooting and bouncing ray(SBR),and physical theory of diffraction(PTD),a parallel computing scheme based on the CPU-GPU parallel computing scheme is realized to balance computing tasks.Finally,a multi-GPU framework is further proposed to solve the computational difficulty caused by the massive number of ray tubes in the ray tracing process.By using the established simulation platform,signals of ships at different seas are simulated and their images are achieved as well.It is shown that the higher sea states degrade the averaged peak signal-to-noise ratio(PSNR)of radar image.
文摘Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need.
基金supported by the National Natural Science Foundation of China under Grant No.61003032/F020207
文摘Graphic processing units (GPUs) have been widely recognized as cost-efficient co-processors with acceptable size, weight, and power consumption. However, adopting GPUs in real-time systems is still challenging, due to the lack in framework for real-time analysis. In order to guarantee real-time requirements while maintaining system utilization ~in modern heterogeneous systems, such as multicore multi-GPU systems, a novel suspension-based k-exclusion real-time locking protocol and the associated suspension-aware schedulability analysis are proposed. The proposed protocol provides a synchronization framework that enables multiple GPUs to be efficiently integrated in multicore real-time systems. Comparative evaluations show that the proposed methods improve upon the existing work in terms of schedulability.
文摘A combination of the lattice Boltzmann method and the most recently developed dynamic mode decomposition is proposed for stability analysis. The simulations are performed on a graphical processing unit. Stability of the flow past a cylinder at supercritical state, Re = 50, is studied by the combination for both the exponential growing and the limit cycle regimes. The Ritz values, energy spectrum, and modes for both regimes are presented and compared with the Koopman eigenvalues. For harmonic-like periodic flow in the limit cycle, global analysis from the combination gives the same results as those from the Koopman analysis. For transient flow as in the exponential growth regime, the combination can provide more reasonable results. It is demonstrated that the combination of the lattice Boltzmann method and the dynamic mode decomposition is powerful and can be used for stability analysis for more complex flows.
文摘For electromagnetic scattering of 3?D complex electrically large conducting targets,a new hybrid algorithm,MoM?PO/SBR algorithm,is presented to realize the interaction of information between method of moment(MoM)and physical optics(PO)/shooting and bouncing ray(SBR).In the algorithm,the COC file that based on the Huygens equivalent principle is introduced,and the conversion interface between the equivalent surface and the target is established.And then,the multi?task flow model presented in this paper is adopted to conduct CPU/graphics processing unit(GPU)tests of the algorithm under three modes,i.e.,MPI/OpenMP,MPI/compute unified device architecture(CUDA)and multi?task programming model(MTPM).Numerical results are presented and compared with reference solutions in order to illustrate the accuracy and the efficiency of the proposed algorithm.