
Python for Data Analysis
Public courses
£1600
Private courses
Price set individually
About the training
The key aspect of Data Science process is well prepared Data. The quality and time of this process are determined by the processing of data; thus, efficiency of work with data plays a huge role in a project’s success. Our training enables you to learn loading and preparing data by using one of the best programming language of Data Science.
During the training you will learn the best libraries and functions of Python, which will turn preparing data into an easy and pleasent process. The innovative method of the training lets you understand even the most complex concepts by presenting practical examples. The acquired knowledge and skills will be solidified thanks to a series of trainings which will provide you with solutions of the most common problems.
Who is this training for?
The training is aimed at people who work with processing and analysis of data.
What will I learn?
- Efficiently manipulate data with Python
- Use Ipython interactive environment to test Python code
- Use NumPy library for scientific computing
- Manipulate data using Dataframe objects with pandas library
- Access data from different formats
- Create pipeline processes for data cleaning, joining and transformations
- Know different solutions for missing data, outliers and discretization.
- Use functions to operate on groups and aggregate data
- Analyze efficiently different types of data with Python
- Visualize data with matplotlib and seaborn libraries
Course outline
- Introduction to Python for Data Analysis
- Python as a Data Science language
- Data analysis process – extraction, processing, exploration, visualization, modelling, validation, implementation.
- Data types and data related problems
- Python libraries for data analysis – NumPy, pandas, matplotlib IPython SciPy
- Python distributions– Anaconda, Enthought Canopy, Python(x,y)
- Python IDEs– Shell, Spyder, Eclipse(pyDev) Sublime
- References
- IPython – interactive environment for Python
- IPython Shell and Jupyter Notebook
- Getting help, documentation and exploring Python modules
- Exploring Ipython environment – keyboard shortcuts, %run, %paste, %timeit, %magic
- Using history
- HTML Notebook
- Code debugging and exception handling with %xmode
- Best coding practices
- NumPy basics– tables and vectorized computing
- Important features of NumPy library
- Python basic data types
- Multidimensional arrays ndarray – creating, basic operations, indexing, manipulating
- Scientific computing with NumPy – avoiding loops with ufuncs, Broadcasting, Aggregating
- Table operations – indexing, sorting, iterating, joining tables
- Structured tables
- Data import/export– binary and text files
- Linear algebra and random numbers
- Pandas –library for data manipulation
- Basic data types– Series, DataFrame, Index
- Data operations– indexing, selecting, filtering, ,function mapping, sorting, ranking, NaN, reindexing
- Exploratory analysis – descriptive statistics, correlations, covariance, unique values
- Missing values – filtering, imputation, working with NULL values
- Hierarchical indexes – Multiindex, Levels, creating, indexing, operations on groups
- Manipulating large datasets – eval() and query()
- Importing data
- I/O tools
- Importing data from flat files – txt, csv
- Importing data from JSON, HTML, XML HDF5
- Importing data in xls format
- Accessing data from SQL databases
- Data processing
- Merge, Join – Combining data from different sources
- Reshaping data – hierarchical index
- Pivot tables- long to wide format
- Removing duplicates, mapping values, recoding
- Discretization and binning, detecting and removing outliers
- Sampling, recoding, One-hot encoding
- Regular expressions – String types, vectorization
- Data aggregating
- GroupBy – iterating on groups, columns selection
- Grouping by Dict and Series
- Split-Apply – transformations on groups
- Pivot tables
- Crosstab
- Time series
- datetime, day, time, timestamp, time objects
- DatetimeIndex – Creating time series with different frequencies
- Generating dates, frequency, lags, leads
- Time zones, location and conversion
- Time series calculation and frequency conversion
- Data visualization with matplotlib and seaborn
- Creating plots with matplotlib, style, saving plots
- Linear plot
- Dot-plot
- Density plot, histogram and contour plot
- Legend, colours, annotations, text
- Subplots
- Visualizing data with seaborn