Talk on Apache Spark Internals/RDDs
I gave a talk at ScalaBay on how the RDD abstraction in Apache Spark enables diverse, distributed, and big data use cases in a cohesive package. The talk starts with RDD usage in the Spark API and shows how the same RDD interface is leveraged in Spark SQL, GraphX, MLlib and Spark Streaming to create a unified ecosystem.
It was my first time giving this talk with the specifics thought up in real time so I apologize for the hesitation in my tone.
If you are interested in learning more about Spark internals check out talks by two Spark committers: Reynold Xin, and Aaron Davidson. Some of the content in my talk is lifted from Reynold’s.
Continue reading →