If you're a data scientist/analyst, these are the R packages you must have: tidyverse (ggplot2, dplyr, readr etc), xml2, caret, sqldf, tseries, zoo, forecast, randomForest, tree, gam, e1071, xml2, ggmap, caret, pls, plotly.
I literally love R, the programming language one, and truly enjoy using the RStudio, which is an integrated development environment for R.
Why do I love R so much?
I love it because it’s open-source, which means that there is a powerful community of developers improving existing features and releasing new ones constantly.
But seriously, why? Why is being open source so important? It’s particularly important because it means that for almost everything you might need as a data scientist/analyst, there is a package (aka library) created by the members of this awesome community.
Presumably, if you’re a data scientist or becoming one, you already know all these. BUT, it doesn’t mean that you know all the packages available for your service.
So here is a list of my favorite R packages along with a few words on what they do. Please leave a comment to share your favorite packages. Always happy to learn and try new R libraries.
Must-Have R Packages for Data Analytics
- tidyverse: it’s a collection of R packages designed for data science containing:
- ggplot2: for data visualization
- dplyr: to manipulate (e.g. select, filter, subset etc.) data frames
- readr: to import rectangular data like csv or tsv
- stringr: to manipulate character strings
- You can run install.packages(“tidyverse”) to install all these packages along with a few others that aren’t listed here.
- sqldf: to run SQL statements on R data frames – YES!
- tseries: to handle time series (ts) objects, also very handy for financial data analysis
- zoo: to handle irregular (as well as regular) time series objects
- forecast: for time series forecasting using, for instance, ARIMA
- randomForest: classification and regression with random forest
- tree: to fit classification and regression trees
- gam: to build generalized additive models including non-linear smooth functions to explain/fit the data better
- e1071: contains various statistical methods
- xml2: to parse and process xml files
- ggmap: to plot beutiful maps using ggplot2 framework
- caret: to train regression and classification models for better predictions (incl. feature selection)
- pls: to perform multivariate regression methods including principal component regression
- plotly: to make interactive graphs
Some packages may have overlapping functions, but I think each of them has a unique value to offer.
Install all the packages
You can run the code below to install all these packages.