Efficient Big Data Analysis with Apache Spark in HDFS
Amol Bansod
Amol Bansod, Department of Electronics and Telecommunication, V.E.S Institute of Technology, Mumbai (Maharashtra), India.
Manuscript received on 15 August 2015 | Revised Manuscript received on 25 August 2015 | Manuscript Published on 30 August 2015 | PP: 313-316 | Volume-4 Issue-6, August 2015 | Retrieval Number: F4251084615/15©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: With the size of data increasing each day, the traditional methods of data processing have become inefficient and time consuming. Today, Facebook, Google, Twitter are generating Petabytes of data each day. This large amount of data is given the term ‘Big Data’. To overcome this inefficiency, the processing of Data can be performed using Apache spark. Apache Spark is a fast, in-memory processing of large amount of data. In this research paper, the author discusses an efficient way of analyzing Big Data stored in Hadoop Distributed File System HDFS using Apache Spark framework, and its advantages over Hadoop MapReduce framework.
Keywords: Big Data, Hadoop MapReduce, Spark
Scope of the Article: Big Data Analysis