摘要
由于烟草物流行业在运营过程中涉及的数据来源极其广泛且多样,数据不仅格式各异、结构复杂,而且往往分散存储在不同的信息系统中,导致物流数据在集成的过程中,出现数据吞吐量较低的现象。针对上述现象,提出基于K-medoids聚类的异构环境多源烟草物流数据集成方法。通过欠采样平衡类别分布,利用数据相关性和阈值清洗剔除冗余信息,提高异构环境多源烟草物流数据质量,设计基于K-medoids聚类的烟草物流数据集成框架,使用迁移学习动态调整源域权重以优化目标域聚类性能,引入带有相似性约束的新数据点作为初始聚类中心,实现异构环境多源烟草物流数据的有效集成。实验结果表明,设计方法通过聚类算法能够将来自不同数据源的数据进行有效分组和整合,降低数据处理的复杂性,提高数据集成的吞吐量。
Due to the extremely wide and diverse data sources involved in the operation process of the tobacco logistics industry,the data not only has different formats and complex structures,but is also often scattered and stored in different information systems,resulting in data throughput during the integration process of logistics data Low phenomenon.Aiming at the above phenomena,a multi-source tobacco logistics data integration method based on K-medoids clustering in heterogeneous environments is proposed.By undersampling to balance category distribution,using data correlation and threshold cleaning to eliminate redundant information,we improve the quality of multi-source tobacco logistics data in heterogeneous environments.A tobacco logistics data integration framework based on K-medoids clustering is designed,and transfer learning is used to dynamically adjust source domain weights to optimize target domain clustering performance.New data points with similarity constraints are introduced as the initial clustering center to achieve effective integration of multi-source tobacco logistics data in heterogeneous environments.Experimental results show that the design method can effectively group and integrate data from different data sources through clustering algorithm,reducing the complexity of data processing and improving the throughput of data integration.
作者
郭光根
何蕊
张玉军
GUO Guanggen;HE Rui;ZHANG Yujun
出处
《科技创新与应用》
2024年第35期39-43,共5页
Technology Innovation and Application
作者简介
第一作者:郭光根(1982-),男,硕士,助理工程师。研究方向为智能物流、物流系统软件开发、设备管理;通信作者:张玉军(1987-),男,助理工程师。研究方向为智能物流、设备管理、质量控制。