Machine Learning with Apache Spark

Machine Learning with Apache Spark

Public courses


- Anyone can join the training
- Course outline as presented on the website
- Small groups, 3-10 people

Private courses

Price set individually

- Training workshop just for your team
- You choose date and location of the training
- Course outline tailored to your needs

About the training

Machine learning with Apache Spark was created for experienced data scientists who want to expand their programming skills with Apache Spark technology and aim to use it for machine learning with a huge set of data. The training consists of a short introduction to Spark, ETL, data preparation, early analysis of building machine learning models, evaluation and cross validation.

Who is this training for?

The training is aimed at experienced data scientists who want to expand their programming skills with Apache Spark technology and think about using it for machine learning with a huge set of data.

What will I learn?

  • Learn how to manipulate and preprocess data with DataFrames in Spark
  • Learn how to improve efficiency with cache
  • Know how to solve problems with Spark
  • Learn how to perform ETL and combine data from different sources
  • Learn how to solve common problems with data like outliers, missing observations etc.
  • Learn how to perform data recoding and transformations before modelling: standarization, normalization, Bucketing, OnehotEncoding
  • Learn how to use ML library to solve supervised and unsupervised machine learning problems
  • Know how to learn models for problems of forecasting, classification, segmentation or anomally detection
  • Train and tune linear and logistic regression, decision trees, K-NN, Naive Bayes, Boosted decision trees, Neural networks, kmeans, Clara
  • Evaluate estimated models and tune with cross-validation

Course outline

  1. Introduction to Spark
    • Big data technologies overview
    • Scala programming basics
    • Let’s start programming with Spark
    • Programming model with Spark
    • Running Spark programs
    • API and Notebooks
    • Caching
  1. Introduction to Machine Learning
    • Machine Learning problems
    • Supervised and Unsupervised learning
    • ML step by step
    • Bias – Variance trade-off
    • Algorithm evaluation
    • Cross-validation
    • Overview of measures of fit
  1. Machine Learning workflow
    • ML applications examples
    • Types of ML models
    • Components of ML model
      • Data acquisition and storing
      • Data Cleansing and transformations
      • Model training
      • Deployment and integration
      • Monitoring
  1. ETL in Spark
    • Connecting with data source
    • Exploration and data visualisation
    • Manipulations and transformations
    • Selecting variables in analysis
  1. Training model for prediction problem
    • Types of regression models
    • Variables selection
    • Learning model on the training data
    • Model evaluation with MSE, RMSE
    • Model tuning with cross-validation
  1. Training model for classification problems
    • Overview of algorithms for classification problems
    • Training classification model
    • Model evaluation
    • Model tuning
    • Evaluation measures, accuracy, ROC, AUC
  1. Algorithms for clustering problems
    • Overview of clustering methods
    • Feature engineering and variables selection
    • Training unsupervised model
    • Evaluating algorithm
    • Tuning
  1. Recommendation systems
    • Types of recommendation algorithms
    • Variables selection
    • Training recommendation model
    • Model Evaluation
  1. Algorithms for text mining and NLP problems
    • Natural language processing
    • Text mining
    • Data preparation from raw text to numeric matrix
    • Model training
    • Evaluation
  1. Dimensionality reduction with Spark
    • Dimensionality reduction methods: PCA, SVD, Factorization, Clustering
    • Applications of dimensionality reduction in Spark
  1. ML in real time with Spark Streaming
    • Learning in real time
    • Streaming
    • Building applications with Spark Streaming
    • Online training
    • Model evaluation

Course Curriculum

Curriculum is empty


Send an enquiry

I am interested in


Enquire about the private (on-site) training course

I am interested in


Enquire about the public training course

I am interested in

Szybki kontakt