Spark SQL Overview

READ MORE

Like it? Share…What is Spark SQL and Dataframes? Spark SQL is a Spark module which lets you process and query structured data. Spark SQL uses special type of interface called Dataset which has all features of RDD plus can store extra information in order to do optimizations. When operations are performed over structured data in […]

Spark RDD Overview and Hands-on

READ MORE

Like it? Share…Introduction RDD or Resilient Distributed Dataset is the fundamental data structure of Spark. It can be considered Spark’s main programming abstraction and resides in Spark Core component. RDD is a collection of items distributed across many cluster nodes that can be manipulated in parallel. Also note that Spark‚Äôs RDDs are by default recomputed […]