Video tutorial on the essentials of R for ecology cheatsheet

Hey everyone! I just finished putting together a video tutorial that goes over my Essential Functions of R (for ecology) Cheatsheet. I decided to create a separate post here because some of you were asking for an easy walk-through of the functions on the cheatsheet and I think that merits its own post. For those that are ready to just download the cheatsheet and go running with it, here is the link to my original post on the subject.

👇 Download the Cheatsheet here 👇
Click here to download the Essential Functions of R Cheatsheet.

The cheatsheet is still a work in progress, but for now the video goes over my first version. I thought this is also a good opportunity to go over some of the questions and suggestions I’ve received since publishing this first version. More on this towards the bottom of this post.

But first, here is a link to the video:

Video thumbnail of tutorial on the essentials of R cheatsheet. Words say “80% of R in one hour?!

And here is the starting code that I use (that you can copy and paste) for following along in the tutorial:

# Starting Code (contains most of the data used for this tutorial):
num_vec <- c(3,6,3,8)
spp_vec <- c("spp1","spp3","spp2","spp3")
dataframe <- data.frame(num_vec, spp_vec)
data(trees)
tree_data <- trees
tree_data$light <- c(rep(c("shade","sun"), each=15), "sun")
tree_data$light <- as.factor(tree_data$light)
my_matrix <- as.matrix(dataframe)

If helpful, you can also download the entire script that I wrote out over the course of the tutorial: Click here to download the entire script from the video tutorial/walkthrough on the essential functions of R cheatsheet.

If this is what you came for, then you can ignore the rest of this post. (more advanced R users might want to keep reading)

Some notes on two of the more common suggestions I’ve received:

Watch out with setwd() or Jenny Bryan will burn your computer down 😜 https://www.tidyverse.org/blog/2017/12/workflow-vs-script/ — Eric Scott

I’ve already gotten some version of this comment several times. The idea is that setwd() is a function that should rarely (if ever) be used. setwd() allows you to set the working directory so that when you upload your data (or save your results) you can set where that base directory is. The problem is that you have to specify the entire path when using setwd() which makes it only applicable to your own computer (at that moment in time!). How many of you have opened an R script with the following code:

setwd("/Users/lukanegoita/Documents/my_special_folder/another_folder/final_folder")
read.csv("my_data.csv")

Only to get the error message: In file(file, "rt") : cannot open file 'my_data.csv': No such file or directory, and the only reason you get this error is because at some point or another you moved the R script to a different folder or changed some folder names and now you have no idea where that CSV is (or best case scenario it takes you a while to find it again)… Another common reason this happens is when sharing scripts. Someone else’s computer will have a totally different file path than yours. To prevent this error and for good coding / sharing practices, it’s very important to use R Studio Projects for managing all of your scripts. It’s beyond the scope of this post to explain how that works, but you can check out my older post where I explain this in more detail (along with some links to other good articles on the subject).

This is all to say that the only reason I included setwd() in this cheatsheet is because many beginners will still find this function in their code, usually from people sharing their scripts without adhering to the best practice of using Projects instead. I think this is my new thing: Don’t share scripts, share projects. Stop the spread of STWDs (Scriptually Transmitted Working Directories).

Ok, enough on that.

Second, I’ve gotten comments on why I didn’t include any of the “apply” category functions (such as lapply(), tapply(), vapply(), sapply() and just apply()). It’s true that those functions may creep up every once in a while, and they are no doubt a powerful set of tools for working with data. However, I have always been thoroughly confused by the multitude of different “apply” functions and not knowing where to use which one. Discovering the dplyr group_by() and summarize() functions made it so that I (almost) never have to use the “apply” functions now. To prevent others from going through the same frustration I went through, I just decided to omit that family of functions and stick to the few key dplyr functions I did include. The point is that I’ve been able to do most of my work without needing to use “apply” functions, so I think others can too.

Convince me otherwise and I’ll include them in a future version of the cheatsheet 😉

That’s it for now, but comment down below to keep the conversation going! I hope this cheatsheet evolves into the most helpful resource that it can be.



If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:

Also be sure to check out R-bloggers for other great tutorials on learning R

Luka Negoita, PhD
Luka Negoita, PhD
Lead Instructor

Related