EWU Institutional Repository

A Time Efficient Multi-Node Clustering Approach in Recommendation System

Show simple item record

dc.contributor.author Hossain, Sabbir
dc.contributor.author Ahmed, Faisal
dc.contributor.author Miah, Asifuzzaman
dc.date.accessioned 2022-09-05T04:55:39Z
dc.date.available 2022-09-05T04:55:39Z
dc.date.issued 2019-12-24
dc.identifier.uri http://dspace.ewubd.edu:8080/handle/123456789/3701
dc.description This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh. en_US
dc.description.abstract In the era of Information and communications technology where data is fueling the growth of organizations, where companies ingest raw data in massive volumes from from countless sources. But the question is how can they examine the data which both insightful, useful meaningful. This is where Big Data comes to play. Apache Spark is an open-source framework that is used process Big Data. Apache Spark is the leading platform for large- scale SQL, batch processing, stream processing , and machine learning. But one of the major drawbacks is that the time taken for processing traditional algorithms is much longer and it is also difficult to process large volume of data. Here Apache Spark multi-node clustering comes into big rescue. Which is a collection of independent machines connected through a dedicated network to work as a single centralized data processing resource. Collaborative Filtering is becoming so popular now a-days. To handle huge data sets traditional recommender systems often face challenges. In order to overcome some difficulties, some restrictions we have implemented distributed approach to do parallel computing so that we can deal with big datasets. We used Apache Spark Multi Node Clustering to do this. We have used several clustering algorithms find the similarity between users. Finally, we compare the overall permanence for a single machine vs multi-node clustering machines. In terms of scalability Apache Spark maintains great position. We consider improving scalability, Robustness of the system and evaluation Parameters. We also implemented Clustering algorithm using PySpark. The work of Apache PySpark is used to elaborate efficient parallel Implementation of our recommendation system. en_US
dc.language.iso en_US en_US
dc.publisher East West University en_US
dc.relation.ispartofseries ;CSE00204
dc.subject Information and communications technology en_US
dc.title A Time Efficient Multi-Node Clustering Approach in Recommendation System en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account