In addition to our popular course Introduction to Data Science for Six Sigma Professionals, we also offer a seven class R training sequence. Please use the links below to explore these training offerings.
Introduction to R
The programming languages R and Python dominate in the world of Data Science. This class is targeted to the non-programmer who has a statistics background. You will learn how to install and configure R, read data into R, use R packages, write and debug R code, and profile R code. Statistical topics serve as the working examples as you get to know R. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.
Required Software: R and R Studio Open Source Edition (free version). We recommend Microsoft Open R enhanced distribution, which is multithreaded and uses the Intel MKL for Windows/Linux and Mac Accelerate Framework on the Mac OS.
Data Munging - Getting and Preparing Data
While most people associate model building with Data Science, much of the time is spent getting the data and preparing it for analysis. This class covers the basics of getting data from the internet, various file formats, and databases. It will also cover the basics of how to clean up the data in preparation for statistical analysis. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.
Exploratory Data Analysis - Graphing and Summarizing
Exploratory data analysis falls between data munging and model building. In this class we cover the different plotting systems in R along with summarization techniques. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.
Basic Statistical Analysis in R
This class focuses on inferential statistics in R. After the data munging of the data, it is ready for basic statistical analysis such as hypothesis testing. If the class has a background in both statistics and programming, this class can be taught in one day. Allow an additional half day for those without a programming background and another half day for those without a statistics background.
Distributions in R
Confidence Intervals
Hypothesis Testing
Power and Sample Size
Introduction to Bootstrapping
Regression Modelling in R
Regression models are typically the first step in what statistics calls “models” and data science calls “classifiers”. Despite the media attention to more complex methods such as Deep Learning, regression models are more parsimonious (think Occam’s razor for models) and often provide excellent predictive capability. In our R sequence, this course shifts from a programming to a statistics/analytics focus. Allow an additional day for those without a statistics background.
Machine Learning in R
Machine learning is a statistical technique to give computer software the ability to improve performance on a task (or “learn”) with data. In this course, we will cover the basics of machine learning including training and test datasets, over fitting, underfitting, and error rates. The models (classifiers) used include regression, classification trees, and random forest.
Creating Data Products
Data products span the gap between the person who created the analysis and the person who needs to consume the information. The ability to present your findings in a way that is easily understood by the receiver is key to being a good data scientist. This course uses Shiny, GoogleVis, Plotly, R Markdown, and Leaflet as new tools in your toolbox to present your results.
A good example of a data product in Shiny is this Hypergeometric Sample Size Calculator.