» Must-Have R Packages (libraries) for Data Analytics Gulsah Semiz
Must-Have R Packages (libraries) for Data Analytics

TL;DR: If you’re a data scientist/analyst, these are the R packages you must have: tidyverse (ggplot2, dplyr, readr etc), xml2, caret, sqldf, tseries, zoo, forecast, randomForest, tree, gam, e1071, xml2, ggmap, caret, pls, plotly.


Must-Have R Packages (libraries) for Data Analytics

I literally love R and truly enjoy using the RStudio, which is an integrated development environment for R.

Why do I love R so much?

I love it mostly because it’s open-source, which means that there is a powerful community of developers improving existing features and releasing new ones constantly. It also means that for almost everything you might need as a data scientist/analyst, there is a package (aka library) created by the members of this awesome community.

Presumably, if you’re a data scientist or becoming one, you already know all these. BUT, it doesn’t mean that you know all the packages available for your service.

So here is a list of my favorite R packages along with a few words on what they do. Please leave a comment to share your favorite packages. Always happy to learn and try new R libraries.

Must-Have R Packages for Data Analytics

  • tidyverse: it’s a collection of R packages designed for data science containing:
    • ggplot2: for data visualization
    • dplyr: to manipulate (e.g. select, filter, subset etc.) data frames
    • readr: to import rectangular data like csv or tsv
    • stringr: to manipulate character strings
    • You can run install.packages(“tidyverse”) to install all these packages along with a few others that aren’t listed here.
  • sqldf: to run SQL statements on R data frames – YES!
  • tseries: to handle time series (ts) objects, also very handy for financial data analysis
  • zoo: to handle irregular (as well as regular) time series objects
  • forecast: for time series forecasting using, for instance, ARIMA
  • randomForest: classification and regression with random forest
  • tree: to fit classification and regression trees
  • gam: to build generalized additive models including non-linear smooth functions to explain/fit the data better
  • e1071: contains various statistical methods
  • xml2: to parse and process xml files
  • ggmap: to plot beutiful maps using ggplot2 framework
  • caret: to train regression and classification models for better predictions (incl. feature selection)
  • pls: to perform multivariate regression methods including principal component regression
  • plotly: to make interactive graphs

Some packages may have overlapping functions, but I think each of them has a unique value to offer.

Install all the packages

You can run the code below to install all these packages.

install.packages("tidyverse")
install.packages("xml2")
install.packages("caret")
install.packages("sqldf")
install.packages("tseries")
install.packages("zoo")
install.packages("forecast")
install.packages("randomForest")
install.packages("tree")
install.packages("gam")
install.packages("e1071")
install.packages("xml2")
install.packages("ggmap")
install.packages("caret")
install.packages("pls")
install.packages("plotly")



Gulsah Semiz
Gulsah is a data science consultant with over 6 years of cross-functional business experience. She holds a Master of Science degree in Marketing Analytics from Bentley University and enjoys playing with data to tell powerful and engaging stories.

Comments




Disclaimer

All comments submitted are subject to review and approval and except when specifically noted, any views or opinions expressed on this site are those of the individual poster and commenters.