Sentiment Analysis on Twitter Data

Rahman, Anika; Ali, Ahamad

DSpace Home
→
Department of Computer Science & Engineering
→
B.Sc in Computer Science and Engineering
→
Thesis 2016
→
View Item

Sentiment Analysis on Twitter Data

Rahman, Anika; Ali, Ahamad

URI: http://dspace.ewubd.edu/handle/2525/2077

Date: 12/1/2016

Abstract:

Every day using micro blogging millions of users share opinions on various topics. Twitter is a very popular micro blogging site where users are allows a limit of 140 characters; this kind of restriction makes the users to be concise as well as expressive at the same moment. For that reason it’s become a rich source for sentiment analysis and belief mining. For the same reason we become interested to work with twitter data. The aim of this project is to develop such a functional classifier which can accurately and automatically classify the sentiment of an unknown tweet.In this thesis, we propose techniques to classify the sentiment label accurately. Therefore, we introduce two techniques: one of the technique is sentiment classification algorithm (SCA) and the other one is a machine learning algorithm SVM. For both SCA and SVM we calculate weights based on different features. Then in SCA, build feature vector we build pair of tweets by using different features. From those pair we measure the Euclidian distance for every tweet with its pairs. From those distance we only consider nearest 8 tweets label to classify that tweet. On the other hand in SVM, build a matrix from the calculated weights based on different features and by applying PCA (principle component analysis) we try to find k eigenvector with the largest Eigen values. From this transformed sample dataset we try to find the best c and best gamma by using grid search technique to use in SVM. Finally, we apply SVM to assign the sentiment label of each tweet in the test dataset. In both algorithms we use confusion matrix to calculate the accuracy. In out we have found that SCA always perform better than the SVM. We also evaluate the performance of these two techniques for different size of datasets.

Description:

This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh

Show full item record