Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic pro...Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.展开更多
The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, wher...The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.展开更多
基金Projects(61170049,60903044)supported by National Natural Science Foundation of ChinaProject(2012AA010903)supported by National High Technology Research and Development Program of China
文摘Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.
基金Project(61372136) supported by the National Natural Science Foundation of China
文摘The design, analysis and parallel implementation of particle filter(PF) were investigated. Firstly, to tackle the particle degeneracy problem in the PF, an iterated importance density function(IIDF) was proposed, where a new term associating with the current measurement information(CMI) was introduced into the expression of the sampled particles. Through the repeated use of the least squares estimate, the CMI can be integrated into the sampling stage in an iterative manner, conducing to the greatly improved sampling quality. By running the IIDF, an iterated PF(IPF) can be obtained. Subsequently, a parallel resampling(PR) was proposed for the purpose of parallel implementation of IPF, whose main idea was the same as systematic resampling(SR) but performed differently. The PR directly used the integral part of the product of the particle weight and particle number as the number of times that a particle was replicated, and it simultaneously eliminated the particles with the smallest weights, which are the two key differences from the SR. The detailed implementation procedures on the graphics processing unit of IPF based on the PR were presented at last. The performance of the IPF, PR and their parallel implementations are illustrated via one-dimensional numerical simulation and practical application of passive radar target tracking.