Introduction to Apache Spark

Introduction to Apache Spark

Public courses


- Anyone can join the training
- Course outline as presented on the website
- Small groups, 3-10 people

Private courses

Price set individually

- Training workshop just for your team
- You choose date and location of the training
- Course outline tailored to your needs

About the training

Apache Spark is a platform for processing data – especially BIG DATA – using cluster mechanics. In contrast to Hadoop, Spark utilitizes in-memory concept for processing data what makes the process more efficient and is a great solution for working with huge sets of data.

The training presents the basics that are needed to start working with BIG DATA. You will learn the most important elements of SPARK project including API, basic tools such as SQL, the ability to stream or use Spark for machine learning.

Who is this training for?

  • Data Scientist
  • Data Engineer
  • Java Programmer

What will I learn?

  • Identify Spark potential and capabilities
  • Learn concepts and technologies related with Spark
  • Process large datasets with Spark SQL and DataFrames
  • Transform and modify ETL tasks using Spark API, DataFrames and Resilient Distributed Datasets(RDD)
  • Know where to find help and references to Spark

Course outline

  1. Spark Overview
    • What is Spark and its components
    • Why Spark?
    • Use and benefits of Spark
    • Spark vs Hadoop
  1. Spark Basics
    • Spark Environment
    • Using Spark shell
    • Resilient distributed datasets RDD
    • Functional programming withSpark
  1. RDD Basics
    • Structure and creating RDD with files
    • Data operations and transformations
    • Key-Value RDD
    • Interactive queries with RDD
  1. DataFrames and Spark SQL
    • Creating DataFrames
    • Querying DataFrames with Spark SQL
    • Caching
    • Generating reports
  1.  Spark jobs execution
    • Directed acyclic Graph
    • Partitions and Shuffles
    • Efficiency and memory usage
  1. Streaming
    • Sources and jobs
    • Creating Dstreams from source, API
    • Operations on Dstream
  1. Machine Learning basics with Spark ML
    • Machine learning basics with Spark
    • Example of ML solution with Spark MLib

Course Curriculum

Curriculum is empty


Send an enquiry

I am interested in


Enquire about the private (on-site) training course

I am interested in


Enquire about the public training course

I am interested in

Szybki kontakt