Abstract:
In our day to day life we develop many applications based on datasets. In case of high dimensional dataset, we often face some problems while building any data mining model. When there are too many attributes in the dataset, then there may be dependency between attributes. There may be some irrelevant attributes too. So, we get less accuracy in the data mining because of the influence of dependent and irrelevant attributes. So, in order to solve this problem, we need to reduce the dimensions of the dataset. In this work, we experimentally tested influence of dimension reduction on classification problems. For this purpose, we used 4 different datasets. We used backward elimination method to reduce the dimension of the dataset down to seven dimensions. We have experimented with Multi-Layer Perceptron, Naïve Bayes, Decision Tree, K-Nearest Neighbor, and Support Vector Machine classification methods. We used 10-fold validation to train and test our dataset. Experimental results show that when the dimension is reduced, then the accuracy is improved for some classification algorithm like Multi-Layer Perceptron, Naïve Bayes and Random Forest. We come up with a conclusion that if we exclude the less significant attributes, then the classification model gives better accuracy than it does without dimension reduction.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh.