摘要
人工智能生成内容(artificial intelligence generated content,AIGC)为多模态数据下的精准计算提供了补充手段。然而,人工智能(artificial intelligence,AI)模型在工程化应用中面临模型收敛、训练稳定性、可停止性和成本等问题。本研究旨在解决垂直行业领域实现精准计算所面临的关键挑战,包括轻基座模型的选择尺度、降低基座模型的“幻觉”比例,以及如何将现有的多模态知识和多样化工具与AIGC有效结合。为此,提出了一种基于多模态知识的垂直行业领域精准计算实现方法。该方法主要包括三个核心设计思路:①基于领域数据字典和词根表对基座模型进行维度裁剪,确保基座模型的轻量级;②利用领域基础事实内容构建长链知识用例库和标准集,使模型的归纳推断能力收敛于标准答案或其附近的语义空间;③对现有多模态知识和多样化工具进行按需集成,形成以演绎为主、归纳推断为辅的精准计算支撑格局。本方法涵盖基座选择、知识准备与注入、持续训练及微调、算法集成及交叉验证等环节。该方法突破了基于词根表和领域数据字典的知识切分和向量化处理技术、基于蓝图数据结构的贝叶斯链路技术以及基于大数定律与中心极限定理的核数据训练方法,最后在保险核保场景进行了方法验证,取得了良好效果。
Artificial Intelligence Generated Content(AIGC)technology provides a supplementary means for accurate computing under multimodal data.However,AI models face problems such as model convergence,training stability,stopability and cost in engineering applications.This study aims to solve the key challenges faced by vertical industry fields in achieving accurate computing,including the selection scale of lightweight base models,reducing the″hallucination″ratio of base models,and how to effectively combine existing multimodal knowledge and diversified tools with AIGC.To this end,this paper proposes a method for implementing accurate computing in vertical industry fields based on multimodal knowledge.The method mainly includes three core design ideas:①Dimensional tailoring of the base model based on the domain data dictionary and root table to ensure the lightweight of the base model.②Construction of a long-chain knowledge use case library and standard set by the basic facts of the domain,allowing model′s inductive inference ability to converge to the semantic space of the standard answer or its vicinity.③On-demand integration of existing multimodal knowledge and diversified tools to form an accurate computing support pattern with deduction as the main method and inductive inference as the auxiliary method.The methods of this study cover base selection,knowledge preparation and injection,continuous training and fine-tuning,algorithm integration and cross-validation.Technical breakthroughs include knowledge segmentation and vectorization processing technology based on root tables and domain data dictionaries,Bayesian link technology based on blueprint data structures,and nuclear data training methods based on the law of large numbers and the central limit theorem.Finally,we verified the method in the insurance underwriting scenario and achieved good results.
作者
左春
王洋
ZUO Chun;WANG Yang(Sinosoft Co.,Ltd.,Beijing 100190,China)
出处
《河北省科学院学报》
2025年第1期7-12,33,共7页
Journal of The Hebei Academy of Sciences
关键词
精准计算
长链知识
向量嵌入
多模态
领域语义空间
相似计算
Accurate computation
Long-chain knowledge
Vector embedding
Multimodal
Domain semantic space
Similar computation
作者简介
左春(1959-),男,北京人,硕士,研究员,主要研究方向为精准计算.E-mail:zuochun@sinosoft.com.cn。