近年来,以Chat GPT为代表的大语言模型(large language model,LLM)技术发展迅速.随着模型参数规模的持续增长,构建和应用大模型对数据存储规模和存储访问效率提出了更高要求,这对传统存储系统带来了严峻挑战.首先分析了大模型在数据准...近年来,以Chat GPT为代表的大语言模型(large language model,LLM)技术发展迅速.随着模型参数规模的持续增长,构建和应用大模型对数据存储规模和存储访问效率提出了更高要求,这对传统存储系统带来了严峻挑战.首先分析了大模型在数据准备、模型训练和推理阶段的存储访问特征,深入探讨了传统存储系统在大模型场景下面临的主要问题和瓶颈.针对这些挑战,提出并实现了一种高性能、可扩展的分布式元数据设计Scale FS.通过目录树元数据与属性元数据解耦的架构设计,并结合深度与广度均衡的目录树分层分区策略设计,Scale FS实现了高效的路径解析、负载均衡和系统扩展能力,能够高效管理千亿级文件.此外,Scale FS设计了细粒度元数据结构,优化了元数据访问模式,并构建了面向文件语义优化的元数据键值存储底座,显著提升了元数据访问效率并减少了磁盘I/O操作.实验结果表明,Scale FS的每秒操作次数(operations per second,OPS)是HDFS的1.04~7.12倍,而延迟仅为HDFS的12.67%~99.55%.在千亿级文件规模下,Scale FS的大部分操作性能优于HDFS在十亿级文件规模下的表现,展现出更高的扩展性和访问效率,能够更好地满足大模型场景对千亿级文件存储及高效访问的需求.展开更多
With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsi...With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsistent methods of data integration;and(3) disadvantages of different performing ways of data integration.This paper solves the above problems through overall planning and design,constructs unified running environment, consistent methods of data integration and system structure in order to advance the informationization展开更多
Enterprise application integration (EAI) focuses on the collaboration and interconnection of various information systems, so the basic problem to be solved is how EAI guarantees that the applications will produce co...Enterprise application integration (EAI) focuses on the collaboration and interconnection of various information systems, so the basic problem to be solved is how EAI guarantees that the applications will produce consistent presentation of data, message and transaction. The metadata methodology may give us certain good ideas. First, the metadata description method of manufacturing information resource, transaction process and message delivery is put forward on the basis of operation analysis of manufacturing-oriented EAI, and then the tree-structured XML schema of corresponding object is built and a framework of metadata application in the discrete Manufacturing-Oriented EAI is established. Finally, a practical enterprise information integration system in Shanghai Tobacco Machine Co., Ltd. is presented as an example to show how it functions.展开更多
文摘With the deepening informationization of Resources & Environment Remote Sensing geological survey conducted,some potential problems and deficiency are:(1) shortage of unified-planed running environment;(2) inconsistent methods of data integration;and(3) disadvantages of different performing ways of data integration.This paper solves the above problems through overall planning and design,constructs unified running environment, consistent methods of data integration and system structure in order to advance the informationization
文摘Enterprise application integration (EAI) focuses on the collaboration and interconnection of various information systems, so the basic problem to be solved is how EAI guarantees that the applications will produce consistent presentation of data, message and transaction. The metadata methodology may give us certain good ideas. First, the metadata description method of manufacturing information resource, transaction process and message delivery is put forward on the basis of operation analysis of manufacturing-oriented EAI, and then the tree-structured XML schema of corresponding object is built and a framework of metadata application in the discrete Manufacturing-Oriented EAI is established. Finally, a practical enterprise information integration system in Shanghai Tobacco Machine Co., Ltd. is presented as an example to show how it functions.