Abstract:
Clustering methods separate a set of data points into groups or clusters, where data points of each cluster have the similar properties and are dissimilar from those of other clusters. In general k-means and k-medoids methods are used for data clustering. These clustering methods are heuristic and may stuck in a local optimum. To avoid this problem, we propose a hybrid Genetic Algorithm (HGA) for data clustering. For this purpose, we propose a genetic encoding of the clustering problem, where data points are separated into k clusters. The cluster centers of the generated clusters are determined using the techniques of both k-means and k-medoids methods. The fitness of the clustering is calculated using the sum of Euclidian distances of each data point from its cluster center. We experiment with Iris, Seeds, and Ionosphere datasets. Experimental results show that the proposed HGA generates 2.67% to 28.68% higher clustering accuracies than the clustering accuracies previously reported in the literature.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh.