Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability...Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability.In this paper,Hybrid Golden Jackal,and Improved Whale Optimization Algorithm(HGJIWOA)is proposed as an effective and optimal routing protocol that guarantees efficient routing of data packets in the established between the CHs and the movable sink.This HGJIWOA included the phases of Dynamic Lens-Imaging Learning Strategy and Novel Update Rules for determining the reliable route essential for data packets broadcasting attained through fitness measure estimation-based CH selection.The process of CH selection achieved using Golden Jackal Optimization Algorithm(GJOA)completely depends on the factors of maintainability,consistency,trust,delay,and energy.The adopted GJOA algorithm play a dominant role in determining the optimal path of routing depending on the parameter of reduced delay and minimal distance.It further utilized Improved Whale Optimisation Algorithm(IWOA)for forwarding the data from chosen CHs to the BS via optimized route depending on the parameters of energy and distance.It also included a reliable route maintenance process that aids in deciding the selected route through which data need to be transmitted or re-routed.The simulation outcomes of the proposed HGJIWOA mechanism with different sensor nodes confirmed an improved mean throughput of 18.21%,sustained residual energy of 19.64%with minimized end-to-end delay of 21.82%,better than the competitive CH selection approaches.展开更多
The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the ...The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.展开更多
Offboard active decoys(OADs)can effectively jam monopulse radars.However,for missiles approaching from a particular direction and distance,the OAD should be placed at a specific location,posing high requirements for t...Offboard active decoys(OADs)can effectively jam monopulse radars.However,for missiles approaching from a particular direction and distance,the OAD should be placed at a specific location,posing high requirements for timing and deployment.To improve the response speed and jamming effect,a cluster of OADs based on an unmanned surface vehicle(USV)is proposed.The formation of the cluster determines the effectiveness of jamming.First,based on the mechanism of OAD jamming,critical conditions are identified,and a method for assessing the jamming effect is proposed.Then,for the optimization of the cluster formation,a mathematical model is built,and a multi-tribe adaptive particle swarm optimization algorithm based on mutation strategy and Metropolis criterion(3M-APSO)is designed.Finally,the formation optimization problem is solved and analyzed using the 3M-APSO algorithm under specific scenarios.The results show that the improved algorithm has a faster convergence rate and superior performance as compared to the standard Adaptive-PSO algorithm.Compared with a single OAD,the optimal formation of USV-OAD cluster effectively fills the blind area and maximizes the use of jamming resources.展开更多
To address the significant lifecycle degradation and inadequate state of charge(SOC)balance of electric vehicles(EVs)when mitigating wind power fluctuations,a dynamic grouping control strategy is proposed for EVs base...To address the significant lifecycle degradation and inadequate state of charge(SOC)balance of electric vehicles(EVs)when mitigating wind power fluctuations,a dynamic grouping control strategy is proposed for EVs based on an improved k-means algorithm.First,a swing door trending(SDT)algorithm based on compression result feedback was designed to extract the feature data points of wind power.The gating coefficient of the SDT was adjusted based on the compression ratio and deviation,enabling the acquisition of grid-connected wind power signals through linear interpolation.Second,a novel algorithm called IDOA-KM is proposed,which utilizes the Improved Dingo Optimization Algorithm(IDOA)to optimize the clustering centers of the k-means algorithm,aiming to address its dependence and sensitivity on the initial centers.The EVs were categorized into priority charging,standby,and priority discharging groups using the IDOA-KM.Finally,an two-layer power distribution scheme for EVs was devised.The upper layer determines the charging/discharging sequences of the three EV groups and their corresponding power signals.The lower layer allocates power signals to each EV based on the maximum charging/discharging power or SOC equalization principles.The simulation results demonstrate the effectiveness of the proposed control strategy in accurately tracking grid power signals,smoothing wind power fluctuations,mitigating EV degradation,and enhancing the SOC balance.展开更多
Statistical prediction is often required in reservoir simulation to quantify production uncertainty or assess potential risks.Most existing uncertainty quantification procedures aim to decompose the input random field...Statistical prediction is often required in reservoir simulation to quantify production uncertainty or assess potential risks.Most existing uncertainty quantification procedures aim to decompose the input random field to independent random variables,and may suffer from the curse of dimensionality if the correlation scale is small compared to the domain size.In this work,we develop and test a new approach,K-means clustering assisted empirical modeling,for efficiently estimating waterflooding performance for multiple geological realizations.This method performs single-phase flow simulations in a large number of realizations,and uses K-means clustering to select only a few representatives,on which the two-phase flow simulations are implemented.The empirical models are then adopted to describe the relation between the single-phase solutions and the two-phase solutions using these representatives.Finally,the two-phase solutions in all realizations can be predicted using the empirical models readily.The method is applied to both 2D and 3D synthetic models and is shown to perform well in the P10,P50 and P90 of production rates,as well as the probability distributions as illustrated by cumulative density functions.It is able to capture the ensemble statistics of the Monte Carlo simulation results with a large number of realizations,and the computational cost is significantly reduced.展开更多
Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,th...Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,the k-means clustering algorithm,is utilized to investigate and identify plasma events in the J-TEXT plasma.This method can cluster diverse plasma events with homogeneous features,and then these events can be identified if given few manually labeled examples based on physical understanding.A survey of clustered events reveals that the k-means algorithm can make plasma events(rotating tearing mode,sawtooth oscillations,and locked mode)gathering in Euclidean space composed of multi-dimensional diagnostic data,like soft x-ray emission intensity,edge toroidal rotation velocity,the Mirnov signal amplitude and so on.Based on the cluster analysis results,an approximate analytical model is proposed to rapidly identify plasma events in the J-TEXT plasma.The cluster analysis method is conducive to data markers of massive diagnostic data.展开更多
Flying Ad hoc Network(FANET)has drawn significant consideration due to its rapid advancements and extensive use in civil applications.However,the characteristics of FANET including high mobility,limited resources,and ...Flying Ad hoc Network(FANET)has drawn significant consideration due to its rapid advancements and extensive use in civil applications.However,the characteristics of FANET including high mobility,limited resources,and distributed nature,have posed a new challenge to develop a secure and ef-ficient routing scheme for FANET.To overcome these challenges,this paper proposes a novel cluster based secure routing scheme,which aims to solve the routing and data security problem of FANET.In this scheme,the optimal cluster head selection is based on residual energy,online time,reputation,blockchain transactions,mobility,and connectivity by using Improved Artificial Bee Colony Optimization(IABC).The proposed IABC utilizes two different search equations for employee bee and onlooker bee to enhance convergence rate and exploitation abilities.Further,a lightweight blockchain consensus algorithm,AI-Proof of Witness Consensus Algorithm(AI-PoWCA)is proposed,which utilizes the optimal cluster head for mining.In AI-PoWCA,the concept of the witness for block verification is also involved to make the proposed scheme resource efficient and highly resilient against 51%attack.Simulation results demonstrate that the proposed scheme outperforms its counterparts and achieves up to 90%packet delivery ratio,lowest end-to-end delay,highest throughput,resilience against security attacks,and superior in block processing time.展开更多
Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experien...Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.展开更多
K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.展开更多
We develop an x-ray Ti/Au transition-edge sensor(TES)with an Au absorber deposited on the center of TES and improved its energy resolution using the K-means clustering algorithm in combination with Wiener filter.We fi...We develop an x-ray Ti/Au transition-edge sensor(TES)with an Au absorber deposited on the center of TES and improved its energy resolution using the K-means clustering algorithm in combination with Wiener filter.We firstly extract the main parameters of each recorded pulse trace,which are adopted to classify these traces into several clusters in the K-means clustering algorithm.Then real traces are selected for energy resolution analysis.Following the baseline correction,the Wiener filter is used to improve the signal-to-noise ratio.Although the silicon underneath the TES has not been etched to reduce the thermal conductance,the energy resolution of the developed x-ray TES is improved from 94 eV to 44 eV at 5.9 keV.展开更多
Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with ...Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.展开更多
To quantify unmanned aerial vehicle(UAV)flight risks in low-altitude airspace,we analyze the factors of UAV flight risks from three aspects:flight conflict,flight environment,and traffic characteristics.The aerial ris...To quantify unmanned aerial vehicle(UAV)flight risks in low-altitude airspace,we analyze the factors of UAV flight risks from three aspects:flight conflict,flight environment,and traffic characteristics.The aerial risk index and ground risk index of the UAV are constructed,the index screening model and the UAV flight risk assessment model are established,and a UAV flight risk assessment model based on K-means clustering has been proposed.Meanwhile,numerical simulations show the proposed method can not only evaluate the UAV flight risks effectively,but also provide technical support for UAV risk management and control.展开更多
Clustering is one of the recently challenging tasks since there is an ever.growing amount of data in scientific research and commercial applications. High quality and fast document clustering algorithms are in great d...Clustering is one of the recently challenging tasks since there is an ever.growing amount of data in scientific research and commercial applications. High quality and fast document clustering algorithms are in great demand to deal with large volume of data. The computational requirements for bringing such growing amount data to a central site for clustering are complex. The proposed algorithm uses optimal centroids for K.Means clustering based on Particle Swarm Optimization(PSO).PSO is used to take advantage of its global search ability to provide optimal centroids which aids in generating more compact clusters with improved accuracy. This proposed methodology utilizes Hadoop and Map Reduce framework which provides distributed storage and analysis to support data intensive distributed applications. Experiments were performed on Reuter's and RCV1 document dataset which shows an improvement in accuracy with reduced execution time.展开更多
文摘Reliable Cluster Head(CH)selectionbased routing protocols are necessary for increasing the packet transmission efficiency with optimal path discovery that never introduces degradation over the transmission reliability.In this paper,Hybrid Golden Jackal,and Improved Whale Optimization Algorithm(HGJIWOA)is proposed as an effective and optimal routing protocol that guarantees efficient routing of data packets in the established between the CHs and the movable sink.This HGJIWOA included the phases of Dynamic Lens-Imaging Learning Strategy and Novel Update Rules for determining the reliable route essential for data packets broadcasting attained through fitness measure estimation-based CH selection.The process of CH selection achieved using Golden Jackal Optimization Algorithm(GJOA)completely depends on the factors of maintainability,consistency,trust,delay,and energy.The adopted GJOA algorithm play a dominant role in determining the optimal path of routing depending on the parameter of reduced delay and minimal distance.It further utilized Improved Whale Optimisation Algorithm(IWOA)for forwarding the data from chosen CHs to the BS via optimized route depending on the parameters of energy and distance.It also included a reliable route maintenance process that aids in deciding the selected route through which data need to be transmitted or re-routed.The simulation outcomes of the proposed HGJIWOA mechanism with different sensor nodes confirmed an improved mean throughput of 18.21%,sustained residual energy of 19.64%with minimized end-to-end delay of 21.82%,better than the competitive CH selection approaches.
文摘The K-means algorithm is widely known for its simplicity and fastness in text clustering.However,the selection of the initial clus?tering center with the traditional K-means algorithm is some random,and therefore,the fluctuations and instability of the clustering results are strongly affected by the initial clustering center.This paper proposed an algorithm to select the initial clustering center to eliminate the uncertainty of central point selection.The experiment results show that the improved K-means clustering algorithm is superior to the traditional algorithm.
基金the National Natural Science Foundation of China(Grant No.62101579).
文摘Offboard active decoys(OADs)can effectively jam monopulse radars.However,for missiles approaching from a particular direction and distance,the OAD should be placed at a specific location,posing high requirements for timing and deployment.To improve the response speed and jamming effect,a cluster of OADs based on an unmanned surface vehicle(USV)is proposed.The formation of the cluster determines the effectiveness of jamming.First,based on the mechanism of OAD jamming,critical conditions are identified,and a method for assessing the jamming effect is proposed.Then,for the optimization of the cluster formation,a mathematical model is built,and a multi-tribe adaptive particle swarm optimization algorithm based on mutation strategy and Metropolis criterion(3M-APSO)is designed.Finally,the formation optimization problem is solved and analyzed using the 3M-APSO algorithm under specific scenarios.The results show that the improved algorithm has a faster convergence rate and superior performance as compared to the standard Adaptive-PSO algorithm.Compared with a single OAD,the optimal formation of USV-OAD cluster effectively fills the blind area and maximizes the use of jamming resources.
基金This study was supported by the National Key Research and Development Program of China(No.2018YFE0122200)National Natural Science Foundation of China(No.52077078)Fundamental Research Funds for the Central Universities(No.2020MS090).
文摘To address the significant lifecycle degradation and inadequate state of charge(SOC)balance of electric vehicles(EVs)when mitigating wind power fluctuations,a dynamic grouping control strategy is proposed for EVs based on an improved k-means algorithm.First,a swing door trending(SDT)algorithm based on compression result feedback was designed to extract the feature data points of wind power.The gating coefficient of the SDT was adjusted based on the compression ratio and deviation,enabling the acquisition of grid-connected wind power signals through linear interpolation.Second,a novel algorithm called IDOA-KM is proposed,which utilizes the Improved Dingo Optimization Algorithm(IDOA)to optimize the clustering centers of the k-means algorithm,aiming to address its dependence and sensitivity on the initial centers.The EVs were categorized into priority charging,standby,and priority discharging groups using the IDOA-KM.Finally,an two-layer power distribution scheme for EVs was devised.The upper layer determines the charging/discharging sequences of the three EV groups and their corresponding power signals.The lower layer allocates power signals to each EV based on the maximum charging/discharging power or SOC equalization principles.The simulation results demonstrate the effectiveness of the proposed control strategy in accurately tracking grid power signals,smoothing wind power fluctuations,mitigating EV degradation,and enhancing the SOC balance.
基金the funding supported by Beijing Natural Science Foundation(Grant No.3222037)the PetroChina Innovation Foundation(Grant No.2020D-5007-0203)by the Science Foundation of China University of Petroleum,Beijing(Nos.2462021YXZZ010,2462018QZDX13,and 2462020YXZZ028)
文摘Statistical prediction is often required in reservoir simulation to quantify production uncertainty or assess potential risks.Most existing uncertainty quantification procedures aim to decompose the input random field to independent random variables,and may suffer from the curse of dimensionality if the correlation scale is small compared to the domain size.In this work,we develop and test a new approach,K-means clustering assisted empirical modeling,for efficiently estimating waterflooding performance for multiple geological realizations.This method performs single-phase flow simulations in a large number of realizations,and uses K-means clustering to select only a few representatives,on which the two-phase flow simulations are implemented.The empirical models are then adopted to describe the relation between the single-phase solutions and the two-phase solutions using these representatives.Finally,the two-phase solutions in all realizations can be predicted using the empirical models readily.The method is applied to both 2D and 3D synthetic models and is shown to perform well in the P10,P50 and P90 of production rates,as well as the probability distributions as illustrated by cumulative density functions.It is able to capture the ensemble statistics of the Monte Carlo simulation results with a large number of realizations,and the computational cost is significantly reduced.
基金supported by the National Magnetic Confinement Fusion Science Program of China(Nos.2018YFE0301104 and 2018YFE0301100)National Natural Science Foundation of China(Nos.12075096 and 51821005)。
文摘Various types of plasma events emerge in specific parameter ranges and exhibit similar characteristics in diagnostic signals,which can be applied to identify these events.A semisupervised machine learning algorithm,the k-means clustering algorithm,is utilized to investigate and identify plasma events in the J-TEXT plasma.This method can cluster diverse plasma events with homogeneous features,and then these events can be identified if given few manually labeled examples based on physical understanding.A survey of clustered events reveals that the k-means algorithm can make plasma events(rotating tearing mode,sawtooth oscillations,and locked mode)gathering in Euclidean space composed of multi-dimensional diagnostic data,like soft x-ray emission intensity,edge toroidal rotation velocity,the Mirnov signal amplitude and so on.Based on the cluster analysis results,an approximate analytical model is proposed to rapidly identify plasma events in the J-TEXT plasma.The cluster analysis method is conducive to data markers of massive diagnostic data.
基金This paper is supported in part by the National Natural Science Foundation of China(61701322)the Young and Middle-aged Science and Technology Innovation Talent Support Plan of Shenyang(RC190026)+1 种基金the Natural Science Foundation of Liaoning Province(2020-MS-237)the Liaoning Provincial Department of Education Science Foundation(JYT19052).
文摘Flying Ad hoc Network(FANET)has drawn significant consideration due to its rapid advancements and extensive use in civil applications.However,the characteristics of FANET including high mobility,limited resources,and distributed nature,have posed a new challenge to develop a secure and ef-ficient routing scheme for FANET.To overcome these challenges,this paper proposes a novel cluster based secure routing scheme,which aims to solve the routing and data security problem of FANET.In this scheme,the optimal cluster head selection is based on residual energy,online time,reputation,blockchain transactions,mobility,and connectivity by using Improved Artificial Bee Colony Optimization(IABC).The proposed IABC utilizes two different search equations for employee bee and onlooker bee to enhance convergence rate and exploitation abilities.Further,a lightweight blockchain consensus algorithm,AI-Proof of Witness Consensus Algorithm(AI-PoWCA)is proposed,which utilizes the optimal cluster head for mining.In AI-PoWCA,the concept of the witness for block verification is also involved to make the proposed scheme resource efficient and highly resilient against 51%attack.Simulation results demonstrate that the proposed scheme outperforms its counterparts and achieves up to 90%packet delivery ratio,lowest end-to-end delay,highest throughput,resilience against security attacks,and superior in block processing time.
文摘Classification systems such as Slope Mass Rating(SMR) are currently being used to undertake slope stability analysis. In SMR classification system, data is allocated to certain classes based on linguistic and experience-based criteria. In order to eliminate linguistic criteria resulted from experience-based judgments and account for uncertainties in determining class boundaries developed by SMR system,the system classification results were corrected using two clustering algorithms, namely K-means and fuzzy c-means(FCM), for the ratings obtained via continuous and discrete functions. By applying clustering algorithms in SMR classification system, no in-advance experience-based judgment was made on the number of extracted classes in this system, and it was only after all steps of the clustering algorithms were accomplished that new classification scheme was proposed for SMR system under different failure modes based on the ratings obtained via continuous and discrete functions. The results of this study showed that, engineers can achieve more reliable and objective evaluations over slope stability by using SMR system based on the ratings calculated via continuous and discrete functions.
文摘K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.12293032,120101002,12173097,and U1931123)the National Key Basic Research and Development Program of China(Grant Nos.2020YFC2201703 and 2018YFA0404701)Chinese Academy of Sciences(Grant No.GJJSTD20210002)。
文摘We develop an x-ray Ti/Au transition-edge sensor(TES)with an Au absorber deposited on the center of TES and improved its energy resolution using the K-means clustering algorithm in combination with Wiener filter.We firstly extract the main parameters of each recorded pulse trace,which are adopted to classify these traces into several clusters in the K-means clustering algorithm.Then real traces are selected for energy resolution analysis.Following the baseline correction,the Wiener filter is used to improve the signal-to-noise ratio.Although the silicon underneath the TES has not been etched to reduce the thermal conductance,the energy resolution of the developed x-ray TES is improved from 94 eV to 44 eV at 5.9 keV.
基金supported by the National Natural Science Foundation of China under Grants No.61100205,No.60873001the HiTech Research and Development Program of China under Grant No.2011AA010705the Fundamental Research Funds for the Central Universities under Grant No.2009RC0212
文摘Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification.
基金supported in part by the National Natural Science Foundation of China (Nos. 71971114,61573181)Open Grant of State Key Laboratory of Air Traffic Management System and Technique(No. SKLATM201801).
文摘To quantify unmanned aerial vehicle(UAV)flight risks in low-altitude airspace,we analyze the factors of UAV flight risks from three aspects:flight conflict,flight environment,and traffic characteristics.The aerial risk index and ground risk index of the UAV are constructed,the index screening model and the UAV flight risk assessment model are established,and a UAV flight risk assessment model based on K-means clustering has been proposed.Meanwhile,numerical simulations show the proposed method can not only evaluate the UAV flight risks effectively,but also provide technical support for UAV risk management and control.
文摘Clustering is one of the recently challenging tasks since there is an ever.growing amount of data in scientific research and commercial applications. High quality and fast document clustering algorithms are in great demand to deal with large volume of data. The computational requirements for bringing such growing amount data to a central site for clustering are complex. The proposed algorithm uses optimal centroids for K.Means clustering based on Particle Swarm Optimization(PSO).PSO is used to take advantage of its global search ability to provide optimal centroids which aids in generating more compact clusters with improved accuracy. This proposed methodology utilizes Hadoop and Map Reduce framework which provides distributed storage and analysis to support data intensive distributed applications. Experiments were performed on Reuter's and RCV1 document dataset which shows an improvement in accuracy with reduced execution time.