R, RStudio, and the tidyverse for Geocomputing
Contents
R, RStudio, and the tidyverse for Geocomputing¶
As a geoscientist who switched from mostly typing code in Python to mostly typing code in R three years ago, it seems to me that R, particularly when combined with RStudio and the tidyverse
, is an underutilized resource in the geological community. I switched from Python to R because I needed to make a plot for my M.Sc. thesis that somebody had figured out in R already, but three years later I type more R code than anything else. Here are a few reasons why:
RStudio is the best development environment I have ever used. I have tried many IDEs for Python, R, and other languages, but none of them seem to be as simple and elegant as RStudio. Jupyter, Rodeo, PyCharm and Spyder all do some of the things that I need to do (data analysis, generating documents/figures, and building software tools), but RStudio does all of the things that I need to do, provided that I’m writing mostly R code. Development environments are certainly a matter of preference, and RStudio is definitely centered around R, but if RStudio existed for Python three years ago, there is a good chance I never would have switched.
R was built for data. Independent of RStudio and the
tidyverse
, R was built to natively handle categorical variables and missing values for all data types. While it is possible to usepandas
to do some of this in Python, I am constantly annoyed when I have to work in Python with data that contains incomplete observations or categorical variables that have a natural order.
R has the
tidyverse
. Thetidyverse
is a collection of packages for R that ameliorate the sometimes haphazard and inconsistent nature of many base R functions. Thetidyverse
provides tools for data wrangling (dplyr
,tidyr
, andreadr
, which are likepandas
in Python), tools for data communication (ggplot2
andRMarkdown
, which are likematplotlib
and the export feature of Jupyter Notebooks, respectively), and tools for building other R packages (devtools
). Thetidyverse
is maintained by the same people who maintain RStudio, and combined, they make R an end-to-end solution for data analysis. I now write articles and reports usingRMarkdown
, which embeds R code within documents, create figures usingggplot2
, and do data wrangling withdplyr
andtidyr
. Parts of this I can do using Jupyter Notebooks,pandas
, andmatplotlib
, but the completeness of thetidyverse
is impressive: I can add a few line breaks to the word processor output ofRMarkdown
, and the article is ready for submission.
R, RStudio, and the
tidyverse
have excellent teaching resources. There are without a doubt excellent teaching resources for many programming languages, but I have found that the creators of RStudio and thetidyverse
have embraced the philosophy of do-first, learn-the-details afterward, and the teaching resources they provide are mostly free. There are debates within the R community as to whether or not one should teach the details first, but I have found that teaching using R for Data Science (Grolemund and Wickham 2017) is particularly effective for newcomers and experienced programmers alike.
After three years of writing code in R, I still prefer writing code in Python, however, the ability of R, RStudio, and the tidyverse
to support the vast majority of my academic workflow means that I will likely continue to use R as my language of choice for a long time to come. For geoscientists who are curious, I would argue that at the very least, learning R, RStudio and the tidyverse
is a worthwhile investment of time.
References¶
Grolemund, G, and H Wickham (2017). R for Data Science: Visualize, Model, Transform, Tidy, and Import Data. O’Reilly. http://r4ds.had.co.nz/