Big data Performance Evalution of Map-Reduce Pig and Hive
Santosh Kumar J1, Raghavendra S2, Raghavendra B.K.3, Meenakshi4
1Santosh Kumar J, Associate Professor in the Department of Computer Science and Engineering at K.S. School of Engineering and Management, Bangalore.
2Dr. Raghavendra S., Associate Professor in the Department of Computer Science and Engineering at CHRIST DEEMED TO BE UNIVERSITY, Bangalore.
3Dr.Raghavendra B.K. Department of Research Institute is an institution deemed to be university located in Chennai and Masters from Bengaluru.
4Meenakshi. assistant professor at Jain University Bengaluru India.
Manuscript received on August 03, 2019. | Revised Manuscript received on August 28, 2019. | Manuscript published on August 30, 2019. | PP: 2982-2985 | Volume-8 Issue-6, August 2019. | Retrieval Number: F9002088619/2019©BEIESP | DOI: 10.35940/ijeat.F9002.088619
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Big data is nothing but unstructured and structured data which is not possible to process by our traditional system its not only have the volume of data also velocity and verity of data, Processing means ( store and analyze for knowledge information to take decision), Every living, non living and each and every device generates tremendous amount of data every fraction of seconds, Hadoop is a software frame work to process big data to get knowledge out of stored data and enhance the business and solve the societal problems, Hadoop basically have two important components HDFS and Map Reduce HDFS for store and mapreduce to process. HDFS includes name node and data nodes for storage, Map-Reduce includes frame works of Job tracker and Task tracker. Whenever client request Hadoop to store name node responds with available free memory data nodes then client will write data to respective data nodes then replication factor of hadoop copies the blocks of data with other data nodes to overcome fault tolerance Name node stores the meta of data nodes. Replication is for back-up as hadoop HDFS uses commodity hardware for storage, also name node have back-up secondary name node as only point of failure the hadoop. Whenever clients want to process the data, client request the name node Job tracker then Name node communicate to Task tracker for task done. All the above components of hadoop are frame works on-top of OS for efficient utilization and manage the system recourses for big data processing. Big data processing performance is measured with bench marks programs in our research work we compared the processing i.e. execution time of bench mark program word count with Hadoop Map-Reduce python Jar code, PIG script and Hive query with same input file big.txt. and we can say that Hive is much faster than PIG and Map-reduce Python jar code Map-reduce execution time is 1m, 29sec Pig Execution time is 57 sec Hive execution time is 31 sec.
Keywords: HDFS; Hadoop JAR; Pig; Hive; CloudxLab.