R – Jooho Lee

Practical Statistics for Data Scientists

Posted on March 5, 2021March 9, 2021 by Jooho L

This content of this book is extremely useful resource for learning and understand statistical concepts and techniques. It is great to see how Python and R codes are implemented for each concept, but only the snippet of codes are provided on many examples in the book. Fortunately, the publisher provides the whole codes by chapter. The widely used Python packages (pandas, numpy, scipy, statsmodels, sklearn, matplotlib, seaborn, and more) and R libraries can be easily located in each chapter and index.

R libraries used:

library(boot) #Bootstrap Functions
library(ca) #Simple, Multiple and Joint Correspondence Analysis
library(cluster) #”Finding Groups in Data”: Cluster Analysis
library(corrplot) #Visualization of a Correlation Matrix
library(dplyr) #A Grammar of Data Manipulation
library(ellipse) #Functions for Drawing Ellipses and Ellipse-Like Confidence Regions
library(FNN) #Fast Nearest Neighbor Search Algorithms and Applications
library(ggplot2) #Create Elegant Data Visualisations Using the Grammar of Graphics
library(gmodels) #Various R Programming Tools for Model Fitting
library(klaR) #Classification and Visualization
library(lmPerm) #Permutation Tests for Linear Models
library(lubridate) #Make Dealing with Dates a Little Easier
library(MASS) #Support Functions and Datasets for Venables and Ripley’s MASS
library(matrixStats) #Functions that Apply to Rows and Columns of Matrices (and to Vectors)
library(mclust) #Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
library(mgcv) #Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
library(pwr) #Basic Functions for Power Analysis
library(randomForest) #Breiman and Cutler’s Random Forests for Classification and Regression
library(rpart) #Recursive Partitioning and Regression Trees
library(tidyr) #Tidy Messy Data
library(vioplot) #Violin Plot
library(xgboost) #Extreme Gradient Boosting