How to organize your analyses with R Studio Projects
Or how to stay sane when working on big projects

Here is a post that I am sharing from my old blog to get this one started. Enjoy!
In this post I’ll go over a basic method method for organizing your ecological data analysis projects in R. Why do this? Reproducing analyses is critical for good science. There is nothing worse than trying to re-run a script when you finally get comments back from your reviewers only to find that your results are a bit different than before. What?! Speaking from personal experience, it’s taken days of blood, sweat, and tears to figure out what was different in the data, what code I was running in the wrong order, or that I was running the wrong code all together! Start now and get in the habit of sticking to a system for organizing your R projects.
While there are many methods and variations on how to do this (see links at the end of the post), the scope of this current post is to offer a short and simple overview of my own method so that you can get started ASAP. Those that follow me know that I am a big fan of getting right into the code and data—that is the best way to learn. So let’s get to it.
1) Use R Studio for all your analyses. Some of you 1% hardcore coders might prefer the minimalist terminal-type interface included in the basic R download, but for everyone else, use R Studio. It’s a no-brainer. See my video tutorial here on how to install it.
2) Create a new project (File > New Project). The directory you set here will be the folder where you store your data, scripts, and other files related to your analysis.
3) Create the folder structure inside your project folder so that it looks like this:
- “data” is where you keep your data, split into two folders, “raw” and “processed”. This is self explanatory. “Raw” is where you save your data as you entered or downloaded it (usually an excel spreadsheet file), and “processed” is where you save the CSV file ready for uploading into R
- “output” is where you save all the figures and tables that you generate with your R scripts. “scripts” is where you keep all the R code files.
- Finally, “temp” is not necessary, but I’ve found it very useful. It is a folder where I can save any temporary outputs or scripts that I want to test out or explore, but that I know should not get confused with the final output of my analyses.
4) Create your R scripts. Unless your analysis is very simple and direct, you should be using multiple scripts (pretty much always the case when your project is large enough for an entire publication). Ideally, each script should be a set of code that you can run in one go. This is not always possible, but strive for that and use a separate script for each component of the analysis. I recommend you create the following scripts right away:
- Script for loading packages and custom functions
- Script for cleaning up and preparing the data for analysis
- Script for each analysis in the project. For example, in one study you might need both a figure that presents two histograms for visualization purposes, along with one linear mixed effects regression to test your primary hypothesis. Each of those should have their own script
- Name each script using this format: “##_name_v#”, where ## indicates the order that the scripts should be run in, “name” is a descriptor, and “v#” indicates the version number. Sometimes you want to change the script, but should keep older versions in case you mess something up. That’s where saving a new file with an updated version makes sense. So, all together your first set of scripts might look like this: 00_packages_v1.r 01_dataclean_v1.r 02_HistogramFigure_v1.r 03_LMER_v1.r
5) Start off each R script with a good description of the entire project and particular scope of the script. The more comments the better, but more on script commenting in another post. Here’s an example:
That’s pretty much it! Each time you open the project in RStudio, all the scripts will open. Just make sure to run the packages and dataclean scripts before the others. By using RStudio Projects, there is no need to include a setwd() line, just add in “data/processed/“ before your filename whenever uploading any data, or add “output/“ or “temp/“ whenever exporting something.
If you want some longer in-depth explanations on code management in R, check out these other excellent blog posts:
- https://chrisvoncsefalvay.com/structuring-r-projects/
- https://kkulma.github.io/2018-03-18-Prime-Hints-for-Running-a-data-project-in-R/
- https://ntguardian.wordpress.com/2018/08/02/how-should-i-organize-my-r-research-projects/
Also be sure to check out R-bloggers for other great tutorials on learning R
Thanks for a nice article. I do however believe there are better ways than the numbering-concept in the naming principle. I don't have much experience with R so I might be overlooking something, but wouldn't it be nicer to wrap each of the script files in a function, so; LoadAndCleanData.r would contain a function - perhaps taking an argument to the file to load and return a e.g. tibble. The script to create a Histogram, would then be wrapped in a function taking a tibble as argument and producing the plots. All of this would then be orchestrated from a "main" script, that would also handle the package imports, e.g.
File main.r:
.. install packages (using e.g. PacMan)
source(LoadAndCleanData.r) # This gives us the function LoadAndCleanData(filepath) source(PlotHistogram.r) # This gives us a function to create our histogram plot
cleanedData <- LoadAndCleanData(./data/raw/input.csv) histPlot <- CreateHistogramPlot(cleanedData) View(histPlot)
etc etc
Hey there,
My name is Chad, one of the virtual assistants for the course.
Those are great points and if you're trying to keep as tidy of a main script as possible, that would absolutely work! It really just depends on what background someone is coming from and what type of IDEs (integrated development environments) and UIs (user interfaces). There's definitely an art of balancing the number of scripts you have and the length of your scripts when accomplishing a task.
Appreciate your sharing those ideas with the community! Have a great day!
Thank you for the help! I didn't think about organizing my files like this before.
I’m glad you enjoyed the blog! We are always working to post new blogs here as well as other free content on our YouTube channel (https://www.youtube.com/channel/UCysnKC1GQycHyOSZ_7Ob9vw)! Also, if you haven’t already, you can sign up for our email and get the first four lessons of our in depth “Basics of R (for ecologist)” course here: https://www.rforecology.com/waitlist-page/