为解决船舶轨迹聚类算法效率不高,检测精度低,丢失轨迹局部特征等问题,将具有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法由传统的点聚类推广为线聚类,提出一种可以直接对完整...为解决船舶轨迹聚类算法效率不高,检测精度低,丢失轨迹局部特征等问题,将具有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法由传统的点聚类推广为线聚类,提出一种可以直接对完整船舶轨迹进行聚类的具有噪声的基于密度的轨迹聚类(density-based trajectory clustering of applications with noise,DBTCAN)算法。该算法采用Hausdorff距离作为船舶轨迹之间的相似度度量,可以对不同长度的船舶轨迹进行聚类。针对DBTCAN算法需要人工确定输入参数的问题,提出一种参数自适应确定方法。选取渤海海域的船舶自动识别系统(automatic identification system,AIS)数据进行实验,结果表明,该算法能够在大量复杂的船舶轨迹中找到相似的轨迹并对其进行聚类,聚类结果与实际交通流情况一致。本文的研究成果可以为相关部门进行航线规划和海上交通监管提供依据。展开更多
针对点云数据中噪声点的剔除问题,提出了一种基于改进DBSCAN(density-based spatial clustering of applications with noise)算法的多尺度点云去噪方法。应用统计滤波对孤立离群点进行预筛选,去除点云中的大尺度噪声;对DBSCAN算法进行...针对点云数据中噪声点的剔除问题,提出了一种基于改进DBSCAN(density-based spatial clustering of applications with noise)算法的多尺度点云去噪方法。应用统计滤波对孤立离群点进行预筛选,去除点云中的大尺度噪声;对DBSCAN算法进行优化,减少算法时间复杂度和实现参数的自适应调整,以此将点云分为正常簇、疑似簇及异常簇,并立即去除异常簇;利用距离共识评估法对疑似簇进行精细判定,通过计算疑似点与其最近的正常点拟合表面之间的距离,判定其是否为异常,有效保持了数据的关键特征和模型敏感度。利用该方法对两个船体分段点云进行去噪,并与其他去噪算法进行对比,结果表明,该方法在去噪效率和特征保持方面具有优势,精确地保留了点云数据的几何特性。展开更多
针对装甲车辆运动状态复杂性、战场态势不确定性、战术迷惑和欺骗性导致装甲车辆集群运动轨迹难以准确预测的问题,提出一种基于密度的空间聚类应用(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)和长短时记忆(L...针对装甲车辆运动状态复杂性、战场态势不确定性、战术迷惑和欺骗性导致装甲车辆集群运动轨迹难以准确预测的问题,提出一种基于密度的空间聚类应用(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)和长短时记忆(Long Short Term Memory,LSTM)神经网络的装甲车辆集群轨迹预测方法。根据装甲车辆的斜坡上行驶、转向和车-车交互行驶状态,建立运动学模型。选取机动特征、环境特征和车-车交互特征等轨迹特征信息,基于双层LSTM网络预测单个装甲车辆的轨迹。基于DBSCAN算法将多条单装预测轨迹进行分段、相似度计算和聚类,获得集群代表轨迹作为装甲车辆集群的预测轨迹。仿真结果表明,所提方法能够有效预测装甲车辆集群轨迹,实现料敌于先、谋敌于前。展开更多
为了解决判别聚落群过于依赖考古专家人工划分的问题,以郑洛地区新石器时代聚落遗址为例,采用基于密度的DBSCAN(density-based spatial clustering of applications with noise)算法对聚落遗址进行空间聚类研究。通过对郑洛地区四个文...为了解决判别聚落群过于依赖考古专家人工划分的问题,以郑洛地区新石器时代聚落遗址为例,采用基于密度的DBSCAN(density-based spatial clustering of applications with noise)算法对聚落遗址进行空间聚类研究。通过对郑洛地区四个文化时期聚落遗址的分布分析,发现郑洛地区的主体聚落群从研究区东部的嵩山以南地区,转移到郑洛地区中部的伊洛河流域,并且在伊洛河流域长期定居下来,不断发展扩大;大型聚落遗址主要分布在主体聚落群里,除了裴李岗文化时期部分大型聚落较孤立;从仰韶文化后期到龙山文化时期,聚落遗址分布呈主从式环状分布格局;大多数聚落群的走向都和河流分布一致。研究表明,利用DBSCAN算法进行聚落遗址聚类是可行的,通过聚类得到郑洛地区新石器时代四个文化时期聚落遗址的分布特征。展开更多
终端区空域环境复杂、航班密集,精确的航迹预测能极大地提高空中交通服务水平,保障航班飞行安全。针对终端区的高精度多航班4D航迹预测问题,本文提出了一种基于密度的带噪声空间聚类算法(Density-Based Spatial Clustering of Applicati...终端区空域环境复杂、航班密集,精确的航迹预测能极大地提高空中交通服务水平,保障航班飞行安全。针对终端区的高精度多航班4D航迹预测问题,本文提出了一种基于密度的带噪声空间聚类算法(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)和门控循环单元(Gated Recurrent Unit,GRU)相结合的航迹预测方法,通过DBSCAN聚类,将终端区中航迹相近的航班聚类到一簇中,对每一簇航班建立基于GRU神经网络的航迹预测模型,对终端区航班进行预测时,先判断该航班属于哪一簇,然后采用与该簇对应的航迹预测模型,进行4D航迹预测。与仅研究单一航班的传统预测方法相比,本算法有效地利用了终端区的航迹数据,所建模型可以针对多架航班进行航迹预测,扩大了模型的适用范围,提高了航迹预测的预测精度。展开更多
为准确聚类复杂的船舶轨迹和辨识隐蔽轨迹簇,提出一种考虑多维特征的船舶轨迹分层聚类算法。用核心萤火虫算法(core firefly algorithm,CFA)解决具有噪声的基于密度的空间聚类(density-based spatial clustering of applications with n...为准确聚类复杂的船舶轨迹和辨识隐蔽轨迹簇,提出一种考虑多维特征的船舶轨迹分层聚类算法。用核心萤火虫算法(core firefly algorithm,CFA)解决具有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法的邻域查询冗余和参数敏感问题,并在传统船舶轨迹聚类特征的基础上引入水域环境、轨迹线型和时隙特征来分层建立轨迹相似性度量指标,最终实现轨迹的逐层递进聚类。以厦门港及其附近水域的AIS数据验证算法的有效性,检验结果表明:船舶轨迹由算法聚类为9簇;簇内动态时间规整(dynamic time warping,DTW)距离均值为5.199,簇间DTW距离均值为18.032;聚类结果符合实际的船舶交通流情况,聚类准确率为91.50%。可见,提出的算法相比其他常用的轨迹聚类算法能更有效地辨识轨迹地理分布和船舶运动特征的异同,更容易发现隐蔽的轨迹簇。由提出的算法聚类的同簇轨迹,其船舶运动特性更相似,聚类结果可为船舶交通流特性分析及船舶行为模式识别等提供典型的轨迹样本。展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clu...Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clusters at the same time.Many scientific communities have used the clustering algorithm from the perspective of density,which is one of the best methods in clustering.This study proposes a density-based spatial clustering of applications with noise(DBSCAN)algorithm based on the selected high-density areas by automatic fuzzy-DBSCAN(AFD)which works with the initialization of two parameters.AFD,by using fuzzy and DBSCAN features,is modeled by the selection of high-density areas and generates two parameters for merging and separating automatically.The two generated parameters provide a state of sub-cluster rules in the Cartesian coordinate system for the dataset.The model overcomes the problems of clustering such as morphology,overlapping,and the number of clusters in a dataset simultaneously.In the experiments,all algorithms are performed on eight data sets with 30 times of running.Three of them are related to overlapping real datasets and the rest are morphologic and synthetic datasets.It is demonstrated that the AFD algorithm outperforms other recently developed clustering algorithms.展开更多
As the demand for bike-sharing has been increasing,the oversupply problem of bike-sharing has occurred,which leads to the waste of resources and disturbance of the urban environment.In order to regulate the supply vol...As the demand for bike-sharing has been increasing,the oversupply problem of bike-sharing has occurred,which leads to the waste of resources and disturbance of the urban environment.In order to regulate the supply volume of bike-sharing reasonably,an estimating model was proposed to quantify the urban carrying capacity(UCC)for bike-sharing through the demand data.In this way,the maximum supply volume of bike-sharing that a city can accommodate can be obtained.The UCC on bike-sharing is reflected in the road network carrying capacity(RNCC)and parking facilities’carrying capacity(PFCC).The space-time consumption method and density-based spatial clustering of application with noise(DBSCAN)algorithm were used to explore the RNCC and PFCC for bike-sharing.Combined with the users’demand,the urban load ratio on bike-sharing can be evaluated to judge whether the UCC can meet users’demand,so that the supply volume of bike-sharing and distribution of the related facilities can be adjusted accordingly.The application of the model was carried out by estimating the UCC and load ratio of each traffic analysis zone in Nanjing,China.Compared with the field survey data,the effect of the proposed algorithm was verified.展开更多
文摘为解决船舶轨迹聚类算法效率不高,检测精度低,丢失轨迹局部特征等问题,将具有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法由传统的点聚类推广为线聚类,提出一种可以直接对完整船舶轨迹进行聚类的具有噪声的基于密度的轨迹聚类(density-based trajectory clustering of applications with noise,DBTCAN)算法。该算法采用Hausdorff距离作为船舶轨迹之间的相似度度量,可以对不同长度的船舶轨迹进行聚类。针对DBTCAN算法需要人工确定输入参数的问题,提出一种参数自适应确定方法。选取渤海海域的船舶自动识别系统(automatic identification system,AIS)数据进行实验,结果表明,该算法能够在大量复杂的船舶轨迹中找到相似的轨迹并对其进行聚类,聚类结果与实际交通流情况一致。本文的研究成果可以为相关部门进行航线规划和海上交通监管提供依据。
文摘针对点云数据中噪声点的剔除问题,提出了一种基于改进DBSCAN(density-based spatial clustering of applications with noise)算法的多尺度点云去噪方法。应用统计滤波对孤立离群点进行预筛选,去除点云中的大尺度噪声;对DBSCAN算法进行优化,减少算法时间复杂度和实现参数的自适应调整,以此将点云分为正常簇、疑似簇及异常簇,并立即去除异常簇;利用距离共识评估法对疑似簇进行精细判定,通过计算疑似点与其最近的正常点拟合表面之间的距离,判定其是否为异常,有效保持了数据的关键特征和模型敏感度。利用该方法对两个船体分段点云进行去噪,并与其他去噪算法进行对比,结果表明,该方法在去噪效率和特征保持方面具有优势,精确地保留了点云数据的几何特性。
文摘针对装甲车辆运动状态复杂性、战场态势不确定性、战术迷惑和欺骗性导致装甲车辆集群运动轨迹难以准确预测的问题,提出一种基于密度的空间聚类应用(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)和长短时记忆(Long Short Term Memory,LSTM)神经网络的装甲车辆集群轨迹预测方法。根据装甲车辆的斜坡上行驶、转向和车-车交互行驶状态,建立运动学模型。选取机动特征、环境特征和车-车交互特征等轨迹特征信息,基于双层LSTM网络预测单个装甲车辆的轨迹。基于DBSCAN算法将多条单装预测轨迹进行分段、相似度计算和聚类,获得集群代表轨迹作为装甲车辆集群的预测轨迹。仿真结果表明,所提方法能够有效预测装甲车辆集群轨迹,实现料敌于先、谋敌于前。
文摘为了解决判别聚落群过于依赖考古专家人工划分的问题,以郑洛地区新石器时代聚落遗址为例,采用基于密度的DBSCAN(density-based spatial clustering of applications with noise)算法对聚落遗址进行空间聚类研究。通过对郑洛地区四个文化时期聚落遗址的分布分析,发现郑洛地区的主体聚落群从研究区东部的嵩山以南地区,转移到郑洛地区中部的伊洛河流域,并且在伊洛河流域长期定居下来,不断发展扩大;大型聚落遗址主要分布在主体聚落群里,除了裴李岗文化时期部分大型聚落较孤立;从仰韶文化后期到龙山文化时期,聚落遗址分布呈主从式环状分布格局;大多数聚落群的走向都和河流分布一致。研究表明,利用DBSCAN算法进行聚落遗址聚类是可行的,通过聚类得到郑洛地区新石器时代四个文化时期聚落遗址的分布特征。
文摘终端区空域环境复杂、航班密集,精确的航迹预测能极大地提高空中交通服务水平,保障航班飞行安全。针对终端区的高精度多航班4D航迹预测问题,本文提出了一种基于密度的带噪声空间聚类算法(Density-Based Spatial Clustering of Applications with Noise,DBSCAN)和门控循环单元(Gated Recurrent Unit,GRU)相结合的航迹预测方法,通过DBSCAN聚类,将终端区中航迹相近的航班聚类到一簇中,对每一簇航班建立基于GRU神经网络的航迹预测模型,对终端区航班进行预测时,先判断该航班属于哪一簇,然后采用与该簇对应的航迹预测模型,进行4D航迹预测。与仅研究单一航班的传统预测方法相比,本算法有效地利用了终端区的航迹数据,所建模型可以针对多架航班进行航迹预测,扩大了模型的适用范围,提高了航迹预测的预测精度。
文摘为准确聚类复杂的船舶轨迹和辨识隐蔽轨迹簇,提出一种考虑多维特征的船舶轨迹分层聚类算法。用核心萤火虫算法(core firefly algorithm,CFA)解决具有噪声的基于密度的空间聚类(density-based spatial clustering of applications with noise,DBSCAN)算法的邻域查询冗余和参数敏感问题,并在传统船舶轨迹聚类特征的基础上引入水域环境、轨迹线型和时隙特征来分层建立轨迹相似性度量指标,最终实现轨迹的逐层递进聚类。以厦门港及其附近水域的AIS数据验证算法的有效性,检验结果表明:船舶轨迹由算法聚类为9簇;簇内动态时间规整(dynamic time warping,DTW)距离均值为5.199,簇间DTW距离均值为18.032;聚类结果符合实际的船舶交通流情况,聚类准确率为91.50%。可见,提出的算法相比其他常用的轨迹聚类算法能更有效地辨识轨迹地理分布和船舶运动特征的异同,更容易发现隐蔽的轨迹簇。由提出的算法聚类的同簇轨迹,其船舶运动特性更相似,聚类结果可为船舶交通流特性分析及船舶行为模式识别等提供典型的轨迹样本。
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
文摘Clustering is one of the unsupervised learning problems.It is a procedure which partitions data objects into groups.Many algorithms could not overcome the problems of morphology,overlapping and the large number of clusters at the same time.Many scientific communities have used the clustering algorithm from the perspective of density,which is one of the best methods in clustering.This study proposes a density-based spatial clustering of applications with noise(DBSCAN)algorithm based on the selected high-density areas by automatic fuzzy-DBSCAN(AFD)which works with the initialization of two parameters.AFD,by using fuzzy and DBSCAN features,is modeled by the selection of high-density areas and generates two parameters for merging and separating automatically.The two generated parameters provide a state of sub-cluster rules in the Cartesian coordinate system for the dataset.The model overcomes the problems of clustering such as morphology,overlapping,and the number of clusters in a dataset simultaneously.In the experiments,all algorithms are performed on eight data sets with 30 times of running.Three of them are related to overlapping real datasets and the rest are morphologic and synthetic datasets.It is demonstrated that the AFD algorithm outperforms other recently developed clustering algorithms.
基金Project(2018YFE0120100)supported by the National Key R&D Program of ChinaProject(YBPY2040)supported by the Scientific Research Foundation of Graduate School of Southeast University,China。
文摘As the demand for bike-sharing has been increasing,the oversupply problem of bike-sharing has occurred,which leads to the waste of resources and disturbance of the urban environment.In order to regulate the supply volume of bike-sharing reasonably,an estimating model was proposed to quantify the urban carrying capacity(UCC)for bike-sharing through the demand data.In this way,the maximum supply volume of bike-sharing that a city can accommodate can be obtained.The UCC on bike-sharing is reflected in the road network carrying capacity(RNCC)and parking facilities’carrying capacity(PFCC).The space-time consumption method and density-based spatial clustering of application with noise(DBSCAN)algorithm were used to explore the RNCC and PFCC for bike-sharing.Combined with the users’demand,the urban load ratio on bike-sharing can be evaluated to judge whether the UCC can meet users’demand,so that the supply volume of bike-sharing and distribution of the related facilities can be adjusted accordingly.The application of the model was carried out by estimating the UCC and load ratio of each traffic analysis zone in Nanjing,China.Compared with the field survey data,the effect of the proposed algorithm was verified.