Data Analytics Training

Our Data Analytics courses can be taught at your facility or you can attend one of our public seminars. If you do not have an established training office or facility, ask about our turn-key training solution which provides a local venue, course registration, course evaluation, and more.

Get More Information

In addition to our popular course Introduction to Data Science for Six Sigma Professionals, we also offer a seven class R training sequence. Please use the links below to explore these training offerings.

Introduction To R
Data Munging – Getting and Preparing Data
Exploratory Data Analysis – Graphing and Summarizing
Basic Statistical Analysis in R
Regression Modeling in R
Machine Learning in R
Creating Data Products

Introduction to R

The programming languages R and Python dominate in the world of Data Science. This class is targeted to the non-programmer who has a statistics background. You will learn how to install and configure R, read data into R, use R packages, write and debug R code, and profile R code. Statistical topics serve as the working examples as you get to know R. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.

Required Software: R and R Studio Open Source Edition (free version). We recommend Microsoft Open R enhanced distribution, which is multithreaded and uses the Intel MKL for Windows/Linux and Mac Accelerate Framework on the Mac OS.

Course Outline (1 or 2 Days)

Introduction to R
Control Structures (If, For, While)
R Functions
Scoping Rules
Dates and Times
Loop Functions
Debugging
Profiling
Programming Exercise

Data Munging - Getting and Preparing Data

While most people associate model building with Data Science, much of the time is spent getting the data and preparing it for analysis. This class covers the basics of getting data from the internet, various file formats, and databases. It will also cover the basics of how to clean up the data in preparation for statistical analysis. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.

Course Outline (1 or 2 Days)

Reading Data From Files (CSV, Tab, Excel, XML, JSON)
Obtaining Data from Databases (MySQL, SQL Server, AWS, Azure)
Organizing Data using dplyr
Date Manipulation
Exercise

Exploratory Data Analysis - Graphing and Summarizing

Exploratory data analysis falls between data munging and model building. In this class we cover the different plotting systems in R along with summarization techniques. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.

Course Outline (1 or 2 Days)

Plotting Systems in R
Base Plotting System
Graphic Devices
Lattice Plotting System
ggplot2
Summarizing Data
Hierarchical Clustering
K-Means Clustering

Basic Statistical Analysis in R

This class focuses on inferential statistics in R. After the data munging of the data, it is ready for basic statistical analysis such as hypothesis testing. If the class has a background in both statistics and programming, this class can be taught in one day. Allow an additional half day for those without a programming background and another half day for those without a statistics background.

Course Outline (1 or 2 Days)

Distributions in R
Confidence Intervals
Hypothesis Testing
Power and Sample Size
Introduction to Bootstrapping

Regression Modelling in R

Regression models are typically the first step in what statistics calls “models” and data science calls “classifiers”. Despite the media attention to more complex methods such as Deep Learning, regression models are more parsimonious (think Occam’s razor for models) and often provide excellent predictive capability. In our R sequence, this course shifts from a programming to a statistics/analytics focus. Allow an additional day for those without a statistics background.

Course Outline (1 or 2 Days)

Univariate Least Squares Regression
Coding in R
Residual Analysis
Prediction
Multivariate Regression
Multivariate Residuals and Diagnostics
Logistic Regression
Introduction to Poisson Regression

Machine Learning in R

Machine learning is a statistical technique to give computer software the ability to improve performance on a task (or “learn”) with data. In this course, we will cover the basics of machine learning including training and test datasets, over fitting, underfitting, and error rates. The models (classifiers) used include regression, classification trees, and random forest.

Course Outline (1 Day)

Prediction, Cross Validation, and ROC Curves
Using R’s Caret Package
Predicting with Trees
Introduction to Random Forest

Creating Data Products

Data products span the gap between the person who created the analysis and the person who needs to consume the information. The ability to present your findings in a way that is easily understood by the receiver is key to being a good data scientist. This course uses Shiny, GoogleVis, Plotly, R Markdown, and Leaflet as new tools in your toolbox to present your results.

A good example of a data product in Shiny is this Hypergeometric Sample Size Calculator.

Course Outline (1 Day)

Introduction to Data Products
Shiny
GoogleVis
Plotly
R Markdown
Leaflet