Abstract
Data clustering is important in determining the natural groupings in data. Algorithms are commonly used to cluster data, although, the results often vary for different algorithms. Also, the algorithms, at times, get stuck when seeking local minimum solutions to a problem. Moreover, the clustering methods contribute to a low quality local optimum because of their sensitivity to initialization. This report seeks to study the nature of the standard K-Means clustering algorithm.
Moreover, the paper seeks to identify the aspects of the K-Means clustering algorithms that help find good clustering in spatial data. Each algorithm is analyzed separately from the other algorithms to reveal their behavior. Also, the report analyzes the algorithms to identify the algorithm that best minimizes the mathematical formula for the same set of data. This minimization of mathematical criterion shows the superiority of the clustering algorithm.
Introduction
Data clustering enables one to understand the natural groupings of data. Wrapper methods are used in data clustering to identify good quality clusters.
A limitation of the clustering methods is that they tend to get solutions to clustering algorithms problems from outside the cluster, therefore making the results inconsistent. A better clustering method should seek to improve the clustering algorithms to avoid inconsistencies.
The K-Harmonic Mean algorithm is superior to the other methods because it produces good clustering solutions. Also, the method is faster than other clustering methods, such as the K-Means and Gaussian expectation-minimization.
This paper studies the clustering algorithms and identifies the best clustering method based on the K-means and the K-harmonic means.
Advantages
The K-Means algorithms are superior to other data mining methods. Although the K-Means algorithms do not guarantee the accuracy, their speed and simplicity make them superior to other data clustering algorithms. Their fast speed enables them to run on large datasets. Also, K-Means algorithms generate tighter clusters. The K-Means are also memory efficient.
Disadvantages
First, the K-Means algorithms generate clusters that are difficult to compare. Secondly, the fixed number of clusters often generated by the K-means algorithms makes it hard to determine the value of K. The K-Means algorithms cannot process non-globular clusters. Moreover, initial differential partitions can generate varying final clusters. However, these limitations can be reduced by generating several clusters and then comparing the results for consistency.
Analysis
In Image Segmentation, the K-Harmonics Means (KHM) method produced quality images than the K-Means (KM) method. Also, the K-Harmonic Means produced consistent results for the same initialization, unlike the K-Means that generated varying results for the same initialization (Hamerly and Elkan 600).
The model for the family of clustering algorithms revealed that the K-Means and Gaussian expectation-maximization methods gave all data points equal importance. Also, the K-Means method was easier to understand and implement (Hamerly and Elkan, 600).
The BIRCH dataset revealed a good clustering for the fuzzy K-Means, K-Harmonic Means, and hybrid one algorithms for different initializations. However, there were varying results for the K-Means and hybrid one algorithms. The Pelleg and Moore data revealed that Gaussian expectation-maximization and the K-Means methods performed poorly (Hamerly and Elkan 605).
Future Research
There is a need to conduct more research work to reveal the statistical interpretation of the performance function of the K-Means algorithms. Also, it is necessary to determine whether the K-Means algorithms can scale up very large sets of data. There is a need to combine the K-Harmonic Means and the K-Means to take advantage of both of them.
Conclusion
Therefore, the K-Harmonic Means of algorithms are the best methods for determining the clustering of high-quality data in low dimensions (Hamerly and Elkan 605). The performance of hybrid two algorithms showed that soft membership algorithms helped to find great clusters. Therefore, the best clustering methods comprised of the fuzzy K-Means, hybrid two algorithms, and K-Harmonic Means algorithms.
Works Cited
[1] Hamerly, Greg and Charles Elkan. Alternatives to the k-meansalgorithm that find better clusterings. CIKM 2002: 600-607. Print.