Hadoop Cluster Implementation

Sayed, Aysha Binta

DSpace Home
→
Department of Computer Science & Engineering
→
B.Sc in Computer Science and Engineering
→
Thesis 2017
→
View Item

dc.contributor.author	Sayed, Aysha Binta
dc.date.accessioned	2017-10-04T06:30:06Z
dc.date.available	2017-10-04T06:30:06Z
dc.date.issued	4/22/2017
dc.identifier.uri	http://dspace.ewubd.edu/handle/2525/2346
dc.description	This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh	en_US
dc.description.abstract	Recently, data driven science is an interdisciplinary field to gather, process, manage, analyze and extract inherit meaning from unstructured data and formulate them as structural information. Later, that information can be employed in many practical applications to solve real life problems. Hadoop is an open source data science tool and is able to process large amount of data sets in distributed manner across cluster of computers (a single server and several worker machines). Hadoop allows running several tasks in parallel and processing huge amount of complex data efficiency with respect to time, performance and cost. Thus, learning Hadoop with its different sub modules is important. This project work covers the implementation of Hadoop cluster with SSH public key authentication for processing large volumes of data, using cheap, easily available personal computer hardware (Intel/AMD based pcs) and freely available open source software (Ubuntu Linux, Apache Hadoop etc). In addition, Mapreduce and Yarn based distributed applications are ported and tested the cluster’s workability.	en_US
dc.language.iso	en_US	en_US
dc.publisher	East West University	en_US
dc.relation.ispartofseries	;00128 CSE
dc.subject	Hadoop Cluster Implementation	en_US
dc.title	Hadoop Cluster Implementation	en_US
dc.type	Thesis	en_US