Spark SQL Overview

READ MORE

What is Spark SQL and Dataframes? Spark SQL is a Spark module which lets you process and query structured data. Spark SQL uses special type of interface called Dataset which has all features of RDD plus can store extra information in order to do optimizations. When operations are performed over structured data in Spark SQL […]

Spark RDD Overview and Hands-on

READ MORE

Introduction RDD or Resilient Distributed Dataset is the fundamental data structure of Spark. It can be considered Spark’s main programming abstraction and resides in Spark Core component. RDD is a collection of items distributed across many cluster nodes that can be manipulated in parallel. Also note that Spark‚Äôs RDDs are by default recomputed each time […]

Apache Spark Overview

READ MORE

Apache Spark is an open-source cluster computing framework with a fast in-memory data processing engine. It is multiple times faster than MapReduce and provides libraries for development in R, Python, Scala and Java. It provides streaming, SQL, Machine Learning and graph processing capabilities. It can run on Hadoop, Mesos, standalone or in the cloud and […]

Data Access And Analysis in Hadoop

READ MORE

This is part III of Big Data Overview Blogs for developers: 1. Part I : What is Big Data, What is Hadoop and Hadoop Ecosystem, managing Hadoop Cluster. 2. Part II : Data Ingestion in Hadoop 3. Part III : Data Access And Analysis in Hadoop In part I, I covered the basics of big […]

Data Ingestion in Hadoop

READ MORE

This is part II of Big Data Overview Blogs : 1. Part I : What is Big Data, What is Hadoop and Hadoop Ecosystem, managing Hadoop Cluster. 2. Part II : Data Ingestion in Hadoop 3. Part III : Data Access And Analysis in Hadoop Data ingestion can be understood well by first understanding the […]

Big Data Overview

READ MORE

This is part I of Big Data Overview Blogs for developers : Part I : What is Big Data, What is Hadoop and Hadoop Ecosystem, managing Hadoop Cluster. Part II : Data Ingestion in Hadoop Part III : Data Access And Analysis in Hadoop What is big data? Technology usage has grown exponentially in last […]