期刊文献+
共找到3,872篇文章
< 1 2 194 >
每页显示 20 50 100
New multi-DSP parallel computing architecture for real-time image processing 被引量:4
1
作者 Hu Junhong Zhang Tianxu Jiang Haoyang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第4期883-889,共7页
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present... The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment. 展开更多
关键词 parallel computing image processing REAL-TIME computer architecture
在线阅读 下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
2
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 heterogeneous parallel system programming framework scientific computing GPU computing molecular dynamic
在线阅读 下载PDF
基于PDM-GWO算法FPC软排线缺陷检测方法研究
3
作者 欧幸福 张淼 唐戎 《包装工程》 北大核心 2025年第19期226-238,共13页
目的提升柔性印刷电路(FPC)软排线缺陷图像的分割与检测精度、效率,解决传统方法在低对比度、强干扰及细微缺陷图像中的分割模糊和检测误差等问题,提出一种高鲁棒性、高效率的包装缺陷处理方法。方法构建基于并行动态角色记忆灰狼优化算... 目的提升柔性印刷电路(FPC)软排线缺陷图像的分割与检测精度、效率,解决传统方法在低对比度、强干扰及细微缺陷图像中的分割模糊和检测误差等问题,提出一种高鲁棒性、高效率的包装缺陷处理方法。方法构建基于并行动态角色记忆灰狼优化算法(PDM-GWO)的图像分割和缺陷检测方法。通过动态角色分配和历史位置记忆提升优化能力,引入主从并行架构,提高计算效率;分割阶段采用PDM-GWO优化多阈值策略提取清晰边缘;在检测阶段,基于边缘检测获取排线坐标,融合RANSAC拟合提取几何特征,结合Z-score统计分析,实现多类缺陷的识别。结果多组图像实验证明,该方法在PSNR、SSIM、IoU等3项指标上的平均值为22.42 dB、0.964、0.933,均优于标准GWO和典型改进型算法。在缺陷检测方面,平均检测精度达到0.9906,处理速度为9.63帧/s,优于YOLOv9、Faster-RCNN等主流方法。结论所提方法在图像分割质量、检测准确率、运行效率等方面均展现出显著优势,适用于包装自动线复杂工况下的微小缺陷检测,具备良好的工程实用性和推广价值。 展开更多
关键词 柔性印刷电路 包装缺陷检测 图像分割 灰狼优化算法 动态角色 历史记忆 并行计算
在线阅读 下载PDF
Heuristic file sorted assignment algorithm of parallel I/O on cluster computing system
4
作者 陈志刚 曾碧卿 +3 位作者 熊策 邓晓衡 曾志文 刘安丰 《Journal of Central South University of Technology》 EI 2005年第5期572-577,共6页
A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk ac... A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm. 展开更多
关键词 cluster computing parallel I/O file sorted assignment variance of service time
在线阅读 下载PDF
Effects of horizontal splitter plates on the vortex-induced vibration and aerostatic characteristics of twin separated parallel decks for a rail-cum-road bridge
5
作者 HE Xu-hui YANG Jia-feng +2 位作者 LIU Lu-lu ZOU Yun-feng HE Jing 《Journal of Central South University》 2025年第3期1024-1043,共20页
Installing the splitter plates is a passive aerodynamic solution for eliminating vortex-induced vibration (VIV). However, the influences of splitter plates on the VIV and aerostatic performances are more complicated d... Installing the splitter plates is a passive aerodynamic solution for eliminating vortex-induced vibration (VIV). However, the influences of splitter plates on the VIV and aerostatic performances are more complicated due to aerodynamic interference between highway and railway decks. To study the effects of splitter plates, wind tunnel experiments for measuring VIV and aerostatic forces of twin decks under two opposite flow directions were conducted, while the surrounding flow and wind pressure of static twin decks with and without splitter plates are numerically simulated. The results showed that the incoming flow direction affects the VIV response and aerostatic coefficients. The highway deck has poor vertical and torsional VIV, and the VIV region and amplitude are different under different directions. While the railway deck only has vertical VIV when located upstream. The splitter plates can impede the process of vortex generation, shedding and impinging at the gap between twin deck, and significantly reducing the surface fluctuating pressure coefficient, thus effectively suppressing the VIV of twin decks. While, the splitter plates hurt the upstream deck regarding static wind stability and have little effect on the downstream deck. The splitter plates of appropriate width are recommended to improve VIV performances in twin parallel bridges. 展开更多
关键词 splitter plates vortex-induced vibration(VIV) aerostatic characteristic wind tunnel test twin parallel decks the rail-cum-road bridges computational fluid dynamics
在线阅读 下载PDF
基于并行计算的PCAL信号相位实时提取系统设计
6
作者 李雪健 陈永强 +3 位作者 马宏 刘杨 王育欣 焦义文 《系统工程与电子技术》 北大核心 2025年第2期376-389,共14页
针对天线组阵设备链路中相位校准(phase calibration, PCAL)信号的高效率真实相位提取这一需求,首先提出一种优化快速傅里叶变换(fast Fourier transform, FFT)分辨率的PCAL信号真实相位提取方法。为进一步提升计算效率,将该方法与深度... 针对天线组阵设备链路中相位校准(phase calibration, PCAL)信号的高效率真实相位提取这一需求,首先提出一种优化快速傅里叶变换(fast Fourier transform, FFT)分辨率的PCAL信号真实相位提取方法。为进一步提升计算效率,将该方法与深度计算单元(deep computing unit, DCU)并行计算技术相结合,提出PCAL信号真实相位并行提取方法,并设计实现一种基于并行计算的PCAL信号相位实时提取系统。针对上述改进方法及实时系统进行实验验证,大量实验结果表明,优化FFT分辨率的方法相比传统FFT方法可实现约3倍的加速比;在引入并行计算后,加速比进一步提升近一个数量级,基于并行计算的PCAL信号相位实时提取系统可实现对有效带宽为2.2 GHz及以下、信号间隔为1 MHz、量化位数为8 bit的PCAL信号的相位实时提取。此外,设计的实时系统亦适用于其他变频设备的链路标校。 展开更多
关键词 相位提取 相位校准信号 天线组阵 并行计算 实时系统设计
在线阅读 下载PDF
基于ZYNQ-7000的飞控计算机PC/104总线数传链路设计 被引量:1
7
作者 晏鹏鹏 张玉民 盛蔚 《现代电子技术》 北大核心 2024年第14期15-19,共5页
针对飞控计算机面向多模块、多种类接口资源方向发展引发的原有飞控计算机固有性能不足、可拓展性差的问题,提出一种基于ARM+FPGA的软硬件协同工作、以PC/104作为多模块间通信总线的飞控计算机框架。该框架中针对PC/104总线与系统主存... 针对飞控计算机面向多模块、多种类接口资源方向发展引发的原有飞控计算机固有性能不足、可拓展性差的问题,提出一种基于ARM+FPGA的软硬件协同工作、以PC/104作为多模块间通信总线的飞控计算机框架。该框架中针对PC/104总线与系统主存传输的带宽不匹配问题,设计了双通道数据缓冲路径,通过FPGA设计IP实现PC/104总线时序的控制,以DMA的方式实现总线与主存DDR之间的高速数据缓存。实验结果表明,所设计的飞控数据链路可实现模块间PC/104总线以40 Mb/s的速度进行数据传输,以及通过总线实现外设到系统主存之间微秒级别延迟的高速数据交换,保证了多模块工作时的数据传输效率。 展开更多
关键词 飞控计算机 pc/104总线 ZYNQ-7000 FPGA DMA数传链路 数据交换
在线阅读 下载PDF
Preconditioned method in parallel computation
8
作者 Wu Ruichan Wei Jianing 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第1期220-222,共3页
The grid equations in decomposed domain by parallel computation are soled, and a method of local orthogonalization to solve the large-scaled numerical computation is presented. It constructs preconditioned iteration m... The grid equations in decomposed domain by parallel computation are soled, and a method of local orthogonalization to solve the large-scaled numerical computation is presented. It constructs preconditioned iteration matrix by the combination of predigesting LU decomposition and local orthogonalization, and the convergence of solution is proved. Indicated from the example, this algorithm can increase the rate of computation efficiently and it is quite stable. 展开更多
关键词 grid equations parallel computation PRECONDITION LU decomposition local orthogonalization.
在线阅读 下载PDF
Study on High-Performance Computing for Simulation of End Milling Force
9
作者 ZHANG Zhi-hai, ZHENG Li, LI Zhi-zhong, LIU Da-cheng, ZHAN G Bo-peng (Department of Industry Engineering, Tsinghua University, Beijing 1000 84, China) 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2002年第S1期183-184,共2页
Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more ... Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine. 展开更多
关键词 end-milling force model SIMULATION high-perfo rmance computing parallel algorithm Java RMI
在线阅读 下载PDF
Efficient Partially Asynchronous Parallel Simulation on Multicomputer Systems: Research and Practice
10
作者 Chen, Delai Hong, Bo +1 位作者 Xie, Zhiwu Weng, Shilie 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1998年第2期40-47,共8页
This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results cor... This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results correct and speedup the simulation, the scheme for efficient PAPSoCS is proposed and the virtual topology star is constructed to match the path of message passing for solving algorithm-architecture adequation problem. Under the circumstances that messages frequently passed inter-processor are much shorter, typically within several 4 bytes, asynchronous communication mode is employed to reduce the communication ratio. Experiment results show that asynchronous parallel simulation has much higher efficiency than its synchronous counterpart. 展开更多
关键词 parallel processing Asynchronous computation Virtual topology Multicomputer system SIMULATION
在线阅读 下载PDF
Combination Method for Parallel Computation in ODEs
11
作者 Song Xiaoqiu(Beijing Institute of Computer Application and Simulation Technology ) 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1996年第4期19-26,共8页
In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear... In this paper, a 3rd order combination method with three processes and a 4th order combination method with five processes for solving ODEs are discussed. These methods are the Runge-Kutta method combined with a linear multistep method, which overcomes the defect of the 3rd order parallel Runge-Kutta method discussed in [1]. 展开更多
关键词 SOFTWARE RELIABILITY Numerical analysis Combination method parallel computation ODEs.
在线阅读 下载PDF
A Parallel Computational Scheme on Hybrid Methods
12
作者 Zhao ShuangsuoDepartment of Mathematics, Lanzhou University, Lanzhou 730000Wang ChangyinInst. of Mech. & Elec. Eng., Gansu University of Technology, Lanzhou 730050 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1994年第4期8-18,共11页
Based on the efficient hybrid methods for solving initial value problems of stiff ODEs, this paper derives a parallel scheme that can be used to solve the problems on parallel computers with N processors, and discusse... Based on the efficient hybrid methods for solving initial value problems of stiff ODEs, this paper derives a parallel scheme that can be used to solve the problems on parallel computers with N processors, and discusses the iteratively B-convergence of the Newton iterative process, finally, the paper provides some numberical results which show that the parallel scheme is highly efficient as N is not too large. 展开更多
关键词 Hybrid methods parallel computation Iteratively B-convergence.
在线阅读 下载PDF
Implementing Higher-Order Gamma on a Massively Parallel computer-A Case study
13
作者 Linpeng Huang Kam Wing Ng, Yongqiang Sun(Department of Computer Science and EngineeringShanghai Jiao Tong University, Shanghai 200030, P. R. China)(Department of Computer Science, The Chinese University of Hong Kong, Hong Kong) 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1995年第3期56-62,共7页
Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing a... Gamma is a kernel programming language with an elegant chemical reaction metaphor in whichprograms are described in terms of multiset rewriting. Gamma formalism allows one to describe analgorithm without introducing artificial sequentiality and leads to the derivation of a parallel solution to agiven problem naturally. However, the difficulty of incorporating control strategies makes Gamma not onlyhard for one to define any sophisticated approaches but also impossible to reach a decent level of efficiencyin any direct implementation. Recently, a higherorder multiset programming paradigm, named higher--order Gamma, is introduced by Metayer to alleviate these problems. In this paper, we investigate the possibility of implementing higherorder Gamma on Maspar, a massively data parallel computer. The results showthat a program written in higher--order Gamma can be transformed naturally toward an efficientimplementation on a real parallel machine. 展开更多
关键词 Massively parallel computation GAMMA programming paradigm
在线阅读 下载PDF
气象格点数算一体空间分析库的设计与实现 被引量:2
14
作者 王舒 徐拥军 +6 位作者 何文春 吴焕萍 高峰 刘媛媛 刘北 吕冠儒 倪学磊 《应用气象学报》 北大核心 2025年第1期121-128,共8页
气象格点数据通常以文件形式存储在分布式文件库中,业务系统在使用过程中需要将文件下载到本地,对文件解析后再进行分析计算。这种方式导致数据检索困难、响应时间长、无法满足业务在线计算及交互式应用需求。为此,2022年底国家气象信... 气象格点数据通常以文件形式存储在分布式文件库中,业务系统在使用过程中需要将文件下载到本地,对文件解析后再进行分析计算。这种方式导致数据检索困难、响应时间长、无法满足业务在线计算及交互式应用需求。为此,2022年底国家气象信息中心基于天擎空间分析库研发完成了分布式环境下气象格点数据与计算集成的数算一体数据库——Post Grid,该数据库包含数据层和算子层。数据层将气象格点数据在要素、起报、预报、空间、层次、样本等维度上的拆分后统一规范化存储,提高数据库的数据读取和分析效率。算子层通过数据库中的SQL函数实现,支持在数据库内部对格点数据进行各种操作,且算子支持分布式并行计算。性能测试和业务应用结果表明:Post Grid数据库能将传统的聚合计算服务时效由分钟级提升至毫秒级,极大提高了气象格点数据服务的性能、灵活性和数算一体能力,具有广泛应用价值。 展开更多
关键词 数算一体 气象格点数据 Post Grid 并行计算 分布式
在线阅读 下载PDF
基于FPGA的MobileNetV1目标检测加速器设计 被引量:3
15
作者 严飞 郑绪文 +2 位作者 孟川 李楚 刘银萍 《现代电子技术》 北大核心 2025年第1期151-156,共6页
卷积神经网络是目标检测中的常用算法,但由于卷积神经网络参数量和计算量巨大导致检测速度慢、功耗高,且难以部署到硬件平台,故文中提出一种采用CPU与FPGA融合结构实现MobileNetV1目标检测加速的应用方法。首先,通过设置宽度超参数和分... 卷积神经网络是目标检测中的常用算法,但由于卷积神经网络参数量和计算量巨大导致检测速度慢、功耗高,且难以部署到硬件平台,故文中提出一种采用CPU与FPGA融合结构实现MobileNetV1目标检测加速的应用方法。首先,通过设置宽度超参数和分辨率超参数以及网络参数定点化来减少网络模型的参数量和计算量;其次,对卷积层和批量归一化层进行融合,减少网络复杂性,提升网络计算速度;然后,设计一种八通道核间并行卷积计算引擎,每个通道利用行缓存乘法和加法树结构实现卷积运算;最后,利用FPGA并行计算和流水线结构,通过对此八通道卷积计算引擎合理的复用完成三种不同类型的卷积计算,减少硬件资源使用量、降低功耗。实验结果表明,该设计可以对MobileNetV1目标检测进行硬件加速,帧率可达56.7 f/s,功耗仅为0.603 W。 展开更多
关键词 卷积神经网络 目标检测 FPGA MobileNetV1 并行计算 硬件加速
在线阅读 下载PDF
基于Seed-PCG法的列车-轨道-地基土三维随机振动GPU并行计算方法
16
作者 朱志辉 冯杨 +2 位作者 杨啸 李昊 邹有 《Journal of Central South University》 SCIE EI CAS CSCD 2024年第1期302-316,共15页
为了解决列车-轨道-地基土三维有限元模型随机多样本计算效率低的问题,本文提出了一种基于Seed-PCG法的高效并行计算方法。基于有限元法和虚拟激励法建立轨道不平顺激励下的三维列车-轨道-地基土耦合随机振动分析模型;针对车致地基土随... 为了解决列车-轨道-地基土三维有限元模型随机多样本计算效率低的问题,本文提出了一种基于Seed-PCG法的高效并行计算方法。基于有限元法和虚拟激励法建立轨道不平顺激励下的三维列车-轨道-地基土耦合随机振动分析模型;针对车致地基土随机振动分析产生的多右端项线性方程组求解问题,采用Seed-PCG方法进行求解。通过PCG方法求解种子系统得到的Krylov子空间进行投影,以改进其余线性方程组的初始解和对应的初始残量,有效提高了PCG法的收敛速度,最后,在MATLABCUDA混合平台上开发了并行计算程序。数值算例表明:相同计算平台下的该方法相比多点同步算法获得了104.2倍的加速;相比PCG法逐个求解方案减少了18%的迭代次数,获得了1.21倍的加速。 展开更多
关键词 Seed-pcG法 多右端项线性方程组 随机振动 GPU并行计算 列车-轨道-地基土耦合模型
在线阅读 下载PDF
冲击地压扰动响应失稳理论并行计算 被引量:1
17
作者 潘一山 王学滨 +1 位作者 郑一方 陈双印 《煤炭学报》 北大核心 2025年第1期81-91,共11页
目前,冲击地压理论研究已经完成了从定性分析到定量分析的转变。巷道围岩临界应力计算是巷道安全性评价的重要依据。鉴于冲击地压问题的极度复杂性,在理论上继续取得突破极为困难。基于理论公式的巷道围岩临界应力计算,无法考虑更复杂... 目前,冲击地压理论研究已经完成了从定性分析到定量分析的转变。巷道围岩临界应力计算是巷道安全性评价的重要依据。鉴于冲击地压问题的极度复杂性,在理论上继续取得突破极为困难。基于理论公式的巷道围岩临界应力计算,无法考虑更复杂的实际情况,例如非圆形巷道、非静水压力和复杂岩层结构。冲击地压理论和数值计算相结合具有更加广阔的应用前景,能使冲击地压理论进一步走向实际应用,这是极有价值的发展方向。这方面研究成果的成功取得依赖于数值计算技术的快速发展。研究将当今较先进的岩层运动并行计算系统StrataKing(一种自主开发的以拉格朗日元与离散元耦合方法为基础的非线性断裂力学GPU并行计算方法)与冲击地压扰动响应失稳理论相结合,首次提出了圆形巷道扰动响应失稳理论的数值模拟方法。该方法的思想是将非线性断裂力学数值分析方法中的Ⅱ型断裂能设定为中间变量,从而建立了静水压力条件下圆形巷道围岩临界应力与冲击能指数之间的关系。为了获取冲击能指数的数值解,采用了仅出现一个剪切面的理想岩样进行单轴压缩数值试验,以排除其他因素对应力-应变曲线峰后倾向于直线部分斜率的影响。对于高角度剪切破裂,提出了将非标准岩样的计算结果转换成标准岩样的结果的折算方法。折算后冲击能指数的范围为0.17~13.52,位于全国131个冲击地压矿井的调研数据之内。巷道围岩临界应力的计算结果是理论结果的0.4~2.5倍,这与针对全国20个冲击地压矿井的调研数据(临界应力的修正系数普遍大于1,甚至接近8)定性相符,从局部化破坏围岩比均匀破坏围岩的承载力高的角度进行了解释。冲击地压与局部化的关系过去有讨论,扰动响应失稳理论与局部化过去并无关系。通过局部化,扰动响应失稳理论与冲击地压之间在破坏机理上产生了密切的关联。StrataKing可为冲击地压矿井巷道安全性评价提供强大的算力支撑。 展开更多
关键词 冲击地压 定量分析 扰动响应失稳理论 冲击能指数 局部化 并行计算 临界应力
在线阅读 下载PDF
基于分解算法的混合气体多级膜分离系统全局优化
18
作者 王杰 林渠成 张先明 《化工学报》 北大核心 2025年第9期4670-4682,共13页
混合气体多级膜分离是一种高效的分离技术,通过多级膜系统的协同作用实现混合气体中各组分的高效分离。但对其进行优化设计时需要建立复杂的混合整数非线性规划(MINLP)模型,求解困难。本文提出了一种混合气体多级膜分离系统的全局优化方... 混合气体多级膜分离是一种高效的分离技术,通过多级膜系统的协同作用实现混合气体中各组分的高效分离。但对其进行优化设计时需要建立复杂的混合整数非线性规划(MINLP)模型,求解困难。本文提出了一种混合气体多级膜分离系统的全局优化方法,使用分解算法将复杂的MINLP问题分解为混合整数规划(MIP)问题和非线性规划(NLP)问题,通过MIP模型枚举分离序列,引入多线程并行计算方法,将各线程中的分离序列代入两个不同精度的NLP模型中序贯优化,综合所有优化结果后确定全局最优解。通过天然气脱硫案例得到的膜分离系统年总成本(tac)较文献最优结果降低7.35%,通过高炉煤气中捕集CO_(2)案例验证了该方法可拓展至可变压力的膜分离系统优化。 展开更多
关键词 分离 优化设计 超结构 并行计算
在线阅读 下载PDF
基于GPU并行计算的拓扑优化全流程加速设计方法
19
作者 张长东 吴奕凡 +3 位作者 周铉华 李旭东 肖息 张自来 《航空制造技术》 北大核心 2025年第12期34-41,67,共9页
随着大尺寸航空航天装备的发展需求,高效高精度的大规模拓扑优化设计成为该领域关注的焦点。针对现有大规模拓扑优化设计存在的计算量巨大、计算效率低下等问题,基于GPU并行计算开展了拓扑优化全流程加速设计方法的研究。对网格划分、... 随着大尺寸航空航天装备的发展需求,高效高精度的大规模拓扑优化设计成为该领域关注的焦点。针对现有大规模拓扑优化设计存在的计算量巨大、计算效率低下等问题,基于GPU并行计算开展了拓扑优化全流程加速设计方法的研究。对网格划分、刚度矩阵计算与组装、有限元求解等过程进行了并行加速,实现了高效高精度的体素网格划分及有限元过程的高效求解。此外,该方法针对拓扑优化设计过程的加速需求,对灵敏度过滤过程进行了并行加速处理。以300万体素单元的姿态推力器模型为设计对象,发现相比于Abaqus 2022软件的拓扑优化并行加速计算,本文所提方法的加速比提高了1259%,且两种方法的相似度极高,验证了所提方法的有效性与实用性。 展开更多
关键词 拓扑优化 并行计算 GPU加速 符号距离场 稀疏矩阵 网格划分
在线阅读 下载PDF
一种面向地球系统模式的高效并行计算框架
20
作者 王冬 刘壮 黄小猛 《计算机工程与科学》 北大核心 2025年第10期1711-1725,共15页
地球系统模式是理解过去气候与环境演变机理、预估未来全球变化情景的关键工具。然而,计算机技术的快速发展为模式开发带来了编程、移植和优化方面的巨大挑战。面向地球系统模式的自动并行计算框架OpenArray 2.0,通过提供自定义算子接口... 地球系统模式是理解过去气候与环境演变机理、预估未来全球变化情景的关键工具。然而,计算机技术的快速发展为模式开发带来了编程、移植和优化方面的巨大挑战。面向地球系统模式的自动并行计算框架OpenArray 2.0,通过提供自定义算子接口,结合隐式并行、计算流图优化、自动代码生成、即时编译和动态调度I/O等技术,实现了模式开发与计算机底层并行架构的解耦。OpenArray 2.0允许用户使用类似Matlab的串行语法编写模式,而底层可在x86、申威、GPU等多种异构平台上实现并行执行。基于OpenArray 2.0开发的模式在x86平台19200核下,可达到75%的并行效率,运行速度接近手工优化代码;在申威平台百万核环境下,实现了70%的扩展性;在GPU平台上也展现出优异的执行效率。OpenArray 2.0为地球系统模式的发展提供了一种极具潜力的替代工具,有望显著提升模式开发效率和计算性能。 展开更多
关键词 算子 隐式并行 高性能计算 地球系统模式
在线阅读 下载PDF
上一页 1 2 194 下一页 到第
使用帮助 返回顶部