TL;DR: If you’re a data scientist/analyst, these are the R packages you must have: tidyverse (ggplot2, dplyr, readr etc), xml2, caret, sqldf, tseries, zoo, forecast, randomForest, tree, gam, e1071, xml2, ggmap, caret, pls, plotly.

Must-Have R Packages (libraries) for Data Analytics

I literally love R and truly enjoy using the RStudio, which is an integrated development environment for R.

Why do I love R so much?

I love it mostly because it’s open-source, which means that there is a powerful community of developers improving existing features and releasing new ones constantly. It also means that for almost everything you might need as a data scientist/analyst, there is a package (aka library) created by the members of this awesome community.

Presumably, if you’re a data scientist or becoming one, you already know all these. BUT, it doesn’t mean that you know all the packages available for your service.

So here is a list of my favorite R packages along with a few words on what they do. Please leave a comment to share your favorite packages. Always happy to learn and try new R libraries.

  • tidyverse: it’s a collection of R packages designed for data science containing:
    • ggplot2: for data visualization
    • dplyr: to manipulate (e.g. select, filter, subset etc.) data frames
    • readr: to import rectangular data like csv or tsv
    • stringr: to manipulate character strings
    • You can run install.packages(“tidyverse”) to install all these packages along with a few others that aren’t listed here.
  • sqldf: to run SQL statements on R data frames – YES!
  • tseries: to handle time series (ts) objects, also very handy for financial data analysis
  • zoo: to handle irregular (as well as regular) time series objects
  • forecast: for time series forecasting using, for instance, ARIMA
  • randomForest: classification and regression with random forest
  • tree: to fit classification and regression trees
  • gam: to build generalized additive models including non-linear smooth functions to explain/fit the data better
  • e1071: contains various statistical methods
  • xml2: to parse and process xml files
  • ggmap: to plot beutiful maps using ggplot2 framework
  • caret: to train regression and classification models for better predictions (incl. feature selection)
  • pls: to perform multivariate regression methods including principal component regression
  • plotly: to make interactive graphs

Some packages may have overlapping functions, but I think each of them has a unique value to offer.

Install all the packages

You can run the code below to install all these packages.


