The grid equations in decomposed domain by parallel computation are soled, and a method of local orthogonalization to solve the large-scaled numerical computation is presented. It constructs preconditioned iteration m...The grid equations in decomposed domain by parallel computation are soled, and a method of local orthogonalization to solve the large-scaled numerical computation is presented. It constructs preconditioned iteration matrix by the combination of predigesting LU decomposition and local orthogonalization, and the convergence of solution is proved. Indicated from the example, this algorithm can increase the rate of computation efficiently and it is quite stable.展开更多
In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear...In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear multistep method, which overcomes the defect of the 3rd order parallel Runge-Kutta method discussed in [1].展开更多
Based on the efficient hybrid methods for solving initial value problems of stiff ODEs, this paper derives a parallel scheme that can be used to solve the problems on parallel computers with N processors, and discusse...Based on the efficient hybrid methods for solving initial value problems of stiff ODEs, this paper derives a parallel scheme that can be used to solve the problems on parallel computers with N processors, and discusses the iteratively B-convergence of the Newton iterative process, finally, the paper provides some numberical results which show that the parallel scheme is highly efficient as N is not too large.展开更多
Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing a...Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine.展开更多
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present...The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.展开更多
An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main pr...An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.展开更多
A methodology for topology optimization based on element independent nodal density(EIND) is developed.Nodal densities are implemented as the design variables and interpolated onto element space to determine the densit...A methodology for topology optimization based on element independent nodal density(EIND) is developed.Nodal densities are implemented as the design variables and interpolated onto element space to determine the density of any point with Shepard interpolation function.The influence of the diameter of interpolation is discussed which shows good robustness.The new approach is demonstrated on the minimum volume problem subjected to a displacement constraint.The rational approximation for material properties(RAMP) method and a dual programming optimization algorithm are used to penalize the intermediate density point to achieve nearly 0-1 solutions.Solutions are shown to meet stability,mesh dependence or non-checkerboard patterns of topology optimization without additional constraints.Finally,the computational efficiency is greatly improved by multithread parallel computing with OpenMP.展开更多
An important theoretic interest is to study the relations between different interconnection networks, and to compare the capability and performance of the network structures. The most popular way to do the investigati...An important theoretic interest is to study the relations between different interconnection networks, and to compare the capability and performance of the network structures. The most popular way to do the investigation is network emulation. Based on the classical voltage graph theory, the authors develop a new representation scheme for interconnection network structures. The new approach is a combination of algebraic methods and combinatorial methods. The results demonstrate that the voltage graph theory is a powerful tool for representing well known interconnection networks and in implementing optimal network emulation algorithms, and in particular, show that all popular interconnection networks have very simple and intuitive representations under the new scheme. The new representation scheme also offers powerful tools for the study of network routings and emulations. For example, we present very simple constructions for optimal network emulations from the cube connected cycles networks to the butterfly networks, and from the butterfly networks to the hypercube networks. Compared with the most popular way of network emulation, this new scheme is intuitive and easy to realize, and easy to apply to other network structures.展开更多
Region partition(RP) is the key technique to the finite element parallel computing(FEPC),and its performance has a decisive influence on the entire process of analysis and computation.The performance evaluation index ...Region partition(RP) is the key technique to the finite element parallel computing(FEPC),and its performance has a decisive influence on the entire process of analysis and computation.The performance evaluation index of RP method for the three-dimensional finite element model(FEM) has been given.By taking the electric field of aluminum reduction cell(ARC) as the research object,the performance of two classical RP methods,which are Al-NASRA and NGUYEN partition(ANP) algorithm and the multi-level partition(MLP) method,has been analyzed and compared.The comparison results indicate a sound performance of ANP algorithm,but to large-scale models,the computing time of ANP algorithm increases notably.This is because the ANP algorithm determines only one node based on the minimum weight and just adds the elements connected to the node into the sub-region during each iteration.To obtain the satisfied speed and the precision,an improved dynamic self-adaptive ANP(DSA-ANP) algorithm has been proposed.With consideration of model scale,complexity and sub-RP stage,the improved algorithm adaptively determines the number of nodes and selects those nodes with small enough weight,and then dynamically adds these connected elements.The proposed algorithm has been applied to the finite element analysis(FEA) of the electric field simulation of ARC.Compared with the traditional ANP algorithm,the computational efficiency of the proposed algorithm has been shortened approximately from 260 s to 13 s.This proves the superiority of the improved algorithm on computing time performance.展开更多
文摘The grid equations in decomposed domain by parallel computation are soled, and a method of local orthogonalization to solve the large-scaled numerical computation is presented. It constructs preconditioned iteration matrix by the combination of predigesting LU decomposition and local orthogonalization, and the convergence of solution is proved. Indicated from the example, this algorithm can increase the rate of computation efficiently and it is quite stable.
文摘In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear multistep method, which overcomes the defect of the 3rd order parallel Runge-Kutta method discussed in [1].
文摘Based on the efficient hybrid methods for solving initial value problems of stiff ODEs, this paper derives a parallel scheme that can be used to solve the problems on parallel computers with N processors, and discusses the iteratively B-convergence of the Newton iterative process, finally, the paper provides some numberical results which show that the parallel scheme is highly efficient as N is not too large.
文摘Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine.
基金This project was supported by the National Natural Science Foundation of China (60135020).
文摘The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.
基金Project (50371026) supported by the National Natural Science Foundation of China
文摘An OpenMP approach was proposed to parallelize the sequential molecular dynamics(MD) code on shared memory machines. When a code is converted from the sequential form to the parallel form, data dependence is a main problem. A traditional sequential molecular dynamics code is anatomized to find the data dependence segments in it, and the two different methods, i.e., recover method and backward mapping method were used to eliminate those data dependencies in order to realize the parallelization of this sequential MD code. The performance of the parallelized MD code was analyzed by using some performance analysis tools. The results of the test show that the computing size of this code increases sharply form 1 million atoms before parallelization to 20 million atoms after parallelization, and the wall clock during computing is reduced largely. Some hot-spots in this code are found and optimized by improved algorithm. The efficiency of parallel computing is 30% higher than that of before, and the calculation time is saved and larger scale calculation problems are solved.
基金Projects(11372055,11302033)supported by the National Natural Science Foundation of ChinaProject supported by the Huxiang Scholar Foundation from Changsha University of Science and Technology,ChinaProject(2012KFJJ02)supported by the Key Labortory of Lightweight and Reliability Technology for Engineering Velicle,Education Department of Hunan Province,China
文摘A methodology for topology optimization based on element independent nodal density(EIND) is developed.Nodal densities are implemented as the design variables and interpolated onto element space to determine the density of any point with Shepard interpolation function.The influence of the diameter of interpolation is discussed which shows good robustness.The new approach is demonstrated on the minimum volume problem subjected to a displacement constraint.The rational approximation for material properties(RAMP) method and a dual programming optimization algorithm are used to penalize the intermediate density point to achieve nearly 0-1 solutions.Solutions are shown to meet stability,mesh dependence or non-checkerboard patterns of topology optimization without additional constraints.Finally,the computational efficiency is greatly improved by multithread parallel computing with OpenMP.
基金TheNationalScienceFundforOverseasDistinguishedYoungScholars (No .6 992 82 0 1) ,FoundationforUniversityKeyTeacherbytheMinistryofEducationandChangjiangScholarRewardProject.
文摘An important theoretic interest is to study the relations between different interconnection networks, and to compare the capability and performance of the network structures. The most popular way to do the investigation is network emulation. Based on the classical voltage graph theory, the authors develop a new representation scheme for interconnection network structures. The new approach is a combination of algebraic methods and combinatorial methods. The results demonstrate that the voltage graph theory is a powerful tool for representing well known interconnection networks and in implementing optimal network emulation algorithms, and in particular, show that all popular interconnection networks have very simple and intuitive representations under the new scheme. The new representation scheme also offers powerful tools for the study of network routings and emulations. For example, we present very simple constructions for optimal network emulations from the cube connected cycles networks to the butterfly networks, and from the butterfly networks to the hypercube networks. Compared with the most popular way of network emulation, this new scheme is intuitive and easy to realize, and easy to apply to other network structures.
基金Project(61273187)supported by the National Natural Science Foundation of ChinaProject(61321003)supported by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China
文摘Region partition(RP) is the key technique to the finite element parallel computing(FEPC),and its performance has a decisive influence on the entire process of analysis and computation.The performance evaluation index of RP method for the three-dimensional finite element model(FEM) has been given.By taking the electric field of aluminum reduction cell(ARC) as the research object,the performance of two classical RP methods,which are Al-NASRA and NGUYEN partition(ANP) algorithm and the multi-level partition(MLP) method,has been analyzed and compared.The comparison results indicate a sound performance of ANP algorithm,but to large-scale models,the computing time of ANP algorithm increases notably.This is because the ANP algorithm determines only one node based on the minimum weight and just adds the elements connected to the node into the sub-region during each iteration.To obtain the satisfied speed and the precision,an improved dynamic self-adaptive ANP(DSA-ANP) algorithm has been proposed.With consideration of model scale,complexity and sub-RP stage,the improved algorithm adaptively determines the number of nodes and selects those nodes with small enough weight,and then dynamically adds these connected elements.The proposed algorithm has been applied to the finite element analysis(FEA) of the electric field simulation of ARC.Compared with the traditional ANP algorithm,the computational efficiency of the proposed algorithm has been shortened approximately from 260 s to 13 s.This proves the superiority of the improved algorithm on computing time performance.