Introduction to Data Science

Introduction to Data Science

Public courses


- Anyone can join the training
- Course outline as presented on the website
- Small groups, 3-10 people

Private courses

Price set individually

- Training workshop just for your team
- You choose date and location of the training
- Course outline tailored to your needs

About the training

This training is a proper introduction to using statistics methods and gives a solid basis for a future development in Data Science. The ability to use elements of probabilistic calculus and statistics methods is necessary for every Data Scientist. The training is conducted by very experienced experts of Data Science.

The training is divided into two parts. The former is concerned with the revision of the most important issues of statistics which are crucial for people interested in data analysis. The latter deals with the presentation of the most frequently used methods of statistics learning by using data. Each presented element is followed by a number of exercises which will make the learning process easier and quicker. The training is conducted in the most popular data science languages – R and Python.

Who is this training for?

This training is the introduction to Data Science. It means that all important elements of Statistics which are crucial in Data Analysis are covered in this training. If you are interested in revising the knowledge of Statistics and obtain the practical abilities in statistical data analysis then this training is what you need.

What will I learn?

After completing the training, you will:

  • Understand the basics of Theory of Probability and Statistics
  • Use basic statistical measures to describe the data
  • Construct confidence intervals for statistics
  • Verify statistical hypotheses and understand the idea of testing
  • Learn how to properly use charts in data analysis
  • Understand the concept of Bayesian approach in statistics
  • Get to know methods of statistical learning from data
  • Learn how to properly clean data for analysis
  • Learn how to use linear regression and its non-linear extensions
  • Learn how to use statistical classification methods
  • Learn how to use methods of clustering and association rules

Course outline

Part I: Statistics refresher

  1. Intro to statistical data analysis
    • Statistical approach to data analysis and learning
    • Methods for statistical learning
    • Tools
  2. Probability and distribution
    • Random variable
    • Probability distribution
    • Conditional probability
    • Population vs Sample
  3. Descriptive statistics
    • Types of data
    • Properties of distribution
    • Descriptive statistics
    • Data Transformations
    • Outliers
  4. Statistical Inference
    • Sampling distribution
    • Estimation
    • Confidence intervals
    • Hypothesis testing
    • Inferential statistics for qualitative data
    • Nonparametric tests
  5. Data Visualization
    • Boxplots
    • Histograms, density plot
    • Scatterplot
    • Heatmap
  6. Introduction to Bayesian approach
    • Bayesian approach to probability
    • Conditional probability and Bayesian theorem
    • Computations
    • MCMC

Part II: Statistical data analysis

  1. Statistical learning
    • Applications and problems for statistical learning
    • Parametrical vs nonparametric methods
    • Accuracy vs interpretability
    • Supervised vs unsupervised learning
    • Regression vs classification
    • Measuring model accuracy
    • Bias-Variance problems
  2. Linear regression
    • Correlation coefficient
    • Simple and multiple linear regression
    • Estimation methods and model validation
    • Information Criteria AIC, BIC
    • Variables selection methods
    • Regularization L1, L2
    • Dimensionality reduction methods
  3. Modelling nonlinear relationship between variables
    • Generalized linear regression
    • Multinomial regression
    • Splines
    • General Additive Models
    • Logistic regression
  4. Classification
    • Measures of accuracy
    • Discriminant analysis
    • Bayesian classifiers
    • Naïve Bayes
    • K-Nearest Neighbours
    • Decision trees
  5. Dimensionality reduction
    • Principal component analysis
    • Factor analysis
    • Singular-Value decomposition
  6. Segmentation and Associations
    • Clustering algorithms
    • Association rules
    • Measures of rules quality
    • Apriori algorithm


Course Curriculum

Curriculum is empty


Send an enquiry

I am interested in


Enquire about the private (on-site) training course

I am interested in


Enquire about the public training course

I am interested in

Szybki kontakt