Spark SQL Overview

READ MORE

Like it? Share…What is Spark SQL and Dataframes? Spark SQL is a Spark module which lets you process and query structured data. Spark SQL uses special type of interface called Dataset which has all features of RDD plus can store extra information in order to do optimizations. When operations are performed over structured data in […]

Spark RDD Overview and Hands-on

READ MORE

Like it? Share…Introduction RDD or Resilient Distributed Dataset is the fundamental data structure of Spark. It can be considered Spark’s main programming abstraction and resides in Spark Core component. RDD is a collection of items distributed across many cluster nodes that can be manipulated in parallel. Also note that Spark‚Äôs RDDs are by default recomputed […]

Apache Spark Overview

READ MORE

Like it? Share…Apache Spark is an open-source cluster computing framework with a fast in-memory data processing engine. It is multiple times faster than MapReduce and provides libraries for development in R, Python, Scala and Java. It provides streaming, SQL, Machine Learning and graph processing capabilities. It can run on Hadoop, Mesos, standalone or in the […]