Machine Learning and Data Analysis with Scala

Machine Learning and Data Analysis with Scala

Public courses


- Anyone can join the training
- Course outline as presented on the website
- Small groups, 3-10 people

Private courses

Price set individually

- Training workshop just for your team
- You choose date and location of the training
- Course outline tailored to your needs

About the training

The machine learning training and data analysis with Scala is the second training of Scala for data scientist series. The course consists of all topics needed to start working as a Data Scientist. Among them we have creating and using machine learning algorithms in cases of segmentation, classification and prediction.

The training was created by Data Scientist experts. The participants will gain the ability to create machine learning solutions with the aid of Scala and other technologies such as Akka or Apache Spark.

Who is this training for?

Data scientist, data engineer.

What will I learn?

  • Load, transform and preprocess data with Breeze library and Spark
  • Visualize data and results with Apache Zeppelin and Scala Bokeh
  • Apply machine learning algorithms and deploy with EC2 and YARN
  • Solve all types of machine learning problems with the best algorithms

Course outline

Part I – Introduction to data analysis

  1. Basic data operations with Breeze
    • Breeze Installation– linear algebra library
    • Operations on vectors and matrices
    • Importing/Exporting flat files
  2. Data manipulation with Spark DataFrames
    • Creating DataFrame with CSV and text files
    • Basic Manipulations on DataFrame
    • Importing JSON as DataFrame
    • Importing data from RDBMS
  3. Scalable operations
    • Using Spark Cluster
    • Running cluster with EC2
    • Running task with Mesos
    • Running task with Yarn
  4. Data Visualization
    • Data Visualization with Zeppelin
    • Visualizing data with Bokeh-Scala


Part II – Machine Learning

  1. Introduction to Machine Learning
    • Common problems solved with Machine Learning algorithms
    • Taxonomy of ML problems and algorithms overview
    • Role of Scala as a machine learning tool
    • Other tools and ML technologies
  2. Development and evaluation of ML algorithms
    • ML workflow
    • Modelling as process
    • Algorithm validation and different strategies
    • Overfitting
    • Bias – Variance trade-off
  3. Data Preprocessing
    • Outliers
    • Missing observations
    • Standarization, normalization
    • Categorical data preparation
    • Binning and Bucketing
  4. Regression and Regularization
    • Linear regression
    • L1 and L2 Regularization
    • Optimalization
    • Logistic regression
  5. Naive Bayes
    • Conditional probability and Bayes theorem
    • Naive Bayes algortihm
    • Text data classification problem with Naive Bayes
  6. Support Vector Machines
    • Kernels
    • Soft Margin classifier
    • SVM
    • Support vector classifier
    • Support Vector Regressor
    • Anomalies detection with SVM
  7. Neural networks
    • Feed forward Neural Networks
    • Multilayer Perceptron
    • Activation and loss functions
    • Network optimization
    • Network architecture
    • Evaluation
  8. Genetic Algorithms
    • Evolution
    • Genetic algorithms components
    • Implementation
    • Application
  9. Reinforcement learning
    • Introduction
    • Q-Learning
    • Implementation
    • Learning classifier systems
  10. Unsupervised learning algorithms
    • Clustering
    • Dimensionality reduction
    • Expectation maximization
  11. Scalable platforms
    • Scala
    • Akka
    • Apache Spark

Course Curriculum

Curriculum is empty


Send an enquiry

I am interested in


Enquire about the private (on-site) training course

I am interested in


Enquire about the public training course

I am interested in

Szybki kontakt