Learning about data structures in R
How to use and manipulate data structures such as matrices, data frames, and vectors!
Last week, we posted a tutorial on the different types of data in R (check it out here). In this tutorial, we’re going to talk about the different structures that R provides to help you organize your data.
Data structures go hand-in-hand with data types, as both of these form the foundation for the work we do in R. You may have already worked with many of the structures that we describe in this blog post, but I wanted to take the time to describe them in depth and show you how they relate to or are different from one another.
Let’s jump in!
The different data structures
R provides several data structures that we commonly use as ecologists:
Vectors
Lists
Matrices
Data frames
4a. Tibbles
Vectors
Vectors are one of the most common data structures. You can create a vector using the function c()
. c()
combines all of its arguments into a vector like so:
# Create a vector
vec <- c("this", "is", "a", "vector")
# View the vector
vec
## [1] "this" "is" "a" "vector"
You can create a vector using any data type (numeric, character, logical, etc). However, if you combine data types in a vector, R will force all elements to be the same type. The type that R chooses for the vector will be the most “flexible” data type. Data types in order from least to greatest flexibility are: logical, integer, numeric, and character. For example, in the vector below, I combined numbers and characters into one vector.
# Create a vector
ex <- c(1, "species", 10)
# View vector
class(ex)
## [1] "character"
When we check the data type of the vector, it says character because we can change 1 and 10 to be “1” and “10”, but we can’t change “species” into a number. What number would “species” represent?? So here, R has chosen the more flexible data type — characters.
You can also examine certain attributes of the vector such as length()
(i.e., number of elements) or, if you have a character vector, number of characters in each element (nchar()
).
# View vector
ex
## [1] "1" "species" "10"
# Length of vector
length(ex)
## [1] 3
# Number of characters
nchar(ex)
## [1] 1 7 2
Vector elements can also be given names. You do this by assigning a character vector to names(my.vector)
.
# Create a vector
crabs <- c(10, 15, 26)
# Give the vector names
names(crabs) <- c("Blue crab", "Mud crab", "Ghost crab")
# View named vector
crabs
## Blue crab Mud crab Ghost crab
## 10 15 26
You can subset a vector by specifying the element number in square brackets. You could also subset a vector using the element name.
# Choose element number 3
crabs[3]
## Ghost crab
## 26
# Choose element named "Ghost crab"
crabs["Ghost crab"]
## Ghost crab
## 26
Lastly, you can view the structure of a vector using the str()
function. This will tell us that the vector is a numeric vector with 3 elements: 10, 15, and 26. Below the vector, it also says that the attribute names
for the vector is a character vector with the elements “Blue crab”, “Mud crab”, and “Ghost crab”.
str(crabs)
## Named num [1:3] 10 15 26
## - attr(*, "names")= chr [1:3] "Blue crab" "Mud crab" "Ghost crab"
Lists
Lists are similar to vectors, but are unique in that their elements do not all have to be the same type, and they can also be lists — in other words, it allows you to have vectors nested within other vectors.
To create a list, you use list()
instead of c()
.
# Create a list
animals <- list(c("Eastern elliptio", "Diamondback terrapin", "Spring peeper", "American eel"),
c(25, 3, 0, 10),
"Maryland",
c(T, T, F, T))
# View the structure of the list
str(animals)
## List of 4
## $ : chr [1:4] "Eastern elliptio" "Diamondback terrapin" "Spring peeper" "American eel"
## $ : num [1:4] 25 3 0 10
## $ : chr "Maryland"
## $ : logi [1:4] TRUE TRUE FALSE TRUE
Here, my list contains a vector of animal names (character), a vector of numbers (integer), the U.S. state that these animals can be found in (character), and a logical vector. The vectors don’t all need to be the same length — the third element has only one value, “Maryland”, while all the other elements have a length of 4.
If we view the list, you’ll notice that each element is identified within double square brackets [[these]].
# View list
animals
## [[1]]
## [1] "Eastern elliptio" "Diamondback terrapin" "Spring peeper"
## [4] "American eel"
##
## [[2]]
## [1] 25 3 0 10
##
## [[3]]
## [1] "Maryland"
##
## [[4]]
## [1] TRUE TRUE FALSE TRUE
You can subset elements of a list using double square brackets, and further subset that list element using single square brackets.
# View animal names (element 1 in the list)
animals[[1]]
## [1] "Eastern elliptio" "Diamondback terrapin" "Spring peeper"
## [4] "American eel"
# View the second animal name (element 2 of element 1 in the list)
animals[[1]][2]
## [1] "Diamondback terrapin"
As with vectors, you can give list elements names. Let’s create the same list that we did above, but give it some more descriptive names by writing name.of.element = element
within the list()
function. In the code below, I named the list elements “common.name”, “abundance”, “state”, and “presence”.
# Create a list
animals <- list(common.name = c("Eastern elliptio", "Diamondback terrapin",
"Spring peeper", "American eel"),
abundance = c(25, 3, 0, 10),
state = "Maryland",
presence = c(T, T, F, T))
# View list
animals
## $common.name
## [1] "Eastern elliptio" "Diamondback terrapin" "Spring peeper"
## [4] "American eel"
##
## $abundance
## [1] 25 3 0 10
##
## $state
## [1] "Maryland"
##
## $presence
## [1] TRUE TRUE FALSE TRUE
Now, instead of numbers inside of double square brackets, each element is identified by $name
. You can still subset the list using the element number in square brackets, like this: [[1]]
, but you can also subset the list using this dollar sign notation:
# View whether the animals were present in our survey
animals$presence
## [1] TRUE TRUE FALSE TRUE
Lists are really useful for storing lots of data, but it can get confusing if you have several lists nested in other lists. Naming your elements can help you keep things straight when subsetting your data.
Matrices
The next data structure I want to introduce is the matrix. Matrices are two-dimensional, rectangular objects that must contain elements of the same type, like a vector. These are most useful for mathematical operations, but are also common with species abundance/site data where column names are the species or sites and the rows are the other one. The cell values are the abundance of each species at every species x site combination — useful for multivariate analyses.
You can create matrices using matrix(data = your.data, nrow = num.rows, ncol = num.cols, byrow = T/F, dimnames = your.names)
.
data
accepts a vector of the data you want to use. nrow
is the number of rows you want in your matrix, while ncol
is the number of columns you want. The byrow
argument can be set to TRUE
or FALSE
depending on whether you want the matrix to fill your table by rows or by columns, though the default is FALSE
. dimnames
accepts a list of 2 elements that specifies names for the rows and columns of your matrix.
The byrow
argument is best understood through demonstration:
# Create a matrix that is filled by rows
m1 <- matrix(data = 1:12, nrow = 4, ncol = 3, byrow = T)
m1
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
# Create a matrix that is filled by columns
m2 <- matrix(data = 1:12, nrow = 4, ncol = 3, byrow = F)
m2
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
You can see that the first bit of code fills in the table row by row — it fills it in from left to right, then moves down. The second chunk of code fills in the table by columns — it fills it in from top to bottom, then moves to the right.
You can access matrix elements using single square brackets where the first number represents the row, while the second represents the column. So m1[2,3]
would access the element in the 2nd row and 3rd column. You could also type m1[2, ]
, leaving the column space blank. This will return the entire 2nd row of the matrix. Inversely, you could type m1[ , 3]
, which leaves the row space blank and returns the entire 3rd column of the matrix. Let’s see these in action.
# Return element in 2nd row, 3rd column
m1[2,3]
## [1] 6
# Return 2nd row
m1[2, ]
## [1] 4 5 6
# Return 3rd column
m1[ , 3]
## [1] 3 6 9 12
We can also look at the number of rows and columns of a matrix by using nrow()
and ncol()
; these functions are analogous to the length()
function that we used for vectors. Alternatively, we can use dim()
, which will tell us both the number of rows and columns.
# Number of rows
nrow(m1)
## [1] 4
# Number of columns
ncol(m1)
## [1] 3
# View matrix dimensions
dim(m1)
## [1] 4 3
Data frames
Data frames are the most common way to store and display tabular data in R and are the standard format for applying any analyses to your data. Like matrices, these are two-dimensional objects with rows and columns. But data frames are also like lists, in that you can have elements of several types within them. In fact, a data frame is a type of list where each list element has the same length (this is what makes them rectangular / tabular).
You have likely encountered data frames before, for example when importing data into R using functions such as read.csv()
.
You can create a data frame using the function data.frame(col1 = vector1, col2 = vector2, etc.)
, where each vector should be the same length. You could also have a vector of length 1 or a length that is a divisor of the other vector lengths — this shorter vector will then get recycled until it reaches the length of the other columns.
In the code below, I created a data frame of species, whether or not they were present, and their abundance. Each column consists of different data types. The 1st column is a character vector, the 2nd is logical, and the 3rd is numeric. This is really useful and allows us to store much more information than in a matrix.
# Create a data frame
species_dat <- data.frame(species = c("Callinectes sapidus",
"Sciaenops ocellatus",
"Anchoa mitchilli",
"Micropognias undulatus",
"Menidia menidia"),
presence = c(T, F, T, F, T),
abundance = c(2, 0, 10, 0, 9))
# View data frame
species_dat
## species presence abundance
## 1 Callinectes sapidus TRUE 2
## 2 Sciaenops ocellatus FALSE 0
## 3 Anchoa mitchilli TRUE 10
## 4 Micropognias undulatus FALSE 0
## 5 Menidia menidia TRUE 9
You also have the option to add an argument row.names = c("vector", "of", "names", "for", "rows")
, though adding row.names is less common for data frames.
As with matrices, you can view number of rows and columns using nrow(my.dataframe)
or ncol(my.dataframe)
, or use dim(my.dataframe)
to view the full dimensions.
And like matrices, you can subset your data frame into its rows or columns using single square brackets: my.dataframe[row.num, col.num]
.
# View the third item in the first column
species_dat[3, 1]
## [1] "Anchoa mitchilli"
# View the first column
species_dat[ , 1]
## [1] "Callinectes sapidus" "Sciaenops ocellatus" "Anchoa mitchilli"
## [4] "Micropognias undulatus" "Menidia menidia"
# View the third row
species_dat[3, ]
## species presence abundance
## 3 Anchoa mitchilli TRUE 10
Alternatively, you can subset your data frame in the same way as lists, by using the dollar sign symbol or double square brackets. Each column is essentially a list element, so you can easily choose a data frame column using my.dataframe$col.name
.
# View the abundance column in three different ways
species_dat$abundance
## [1] 2 0 10 0 9
species_dat[[3]]
## [1] 2 0 10 0 9
species_dat[["abundance"]]
## [1] 2 0 10 0 9
The function str()
is also useful. It shows you the structure of your data frame. This will tell you the number of rows and columns in your data frame and will tell you the data types of each column.
# View structure
str(species_dat)
## 'data.frame': 5 obs. of 3 variables:
## $ species : chr "Callinectes sapidus" "Sciaenops ocellatus" "Anchoa mitchilli" "Micropognias undulatus" ...
## $ presence : logi TRUE FALSE TRUE FALSE TRUE
## $ abundance: num 2 0 10 0 9
These are a few functions that are very useful for getting to know your data frames.
head()
ortail()
to view the first 6 or last 6 rows of your data framedim()
,nrow()
, orncol()
to view the number of rows or columns (or both!) of your data framerownames()
orcolnames()
to view or set the row or column names of your data frame. Note that justnames()
will also give you the column names of a data frame.str()
to view the structure of your data frame
As you can see, data frames are very useful for organizing complex, multi-attribute data sets that contain data of different types. No wonder we use them so often!
Tibbles
I added in tibbles as a side data structure — even though it isn’t an official data structure in R, it’s something that comes up often if you use the tidyverse
set of packages. Tibbles come with the tibble
package (which comes with the tidyverse) and are basically data frames with a few added benefits!
Functionally, tibbles are the same as data frames when you manipulate them. They can do everything that data frames can do, but they have slightly different properties that make them more convenient. In fact, ‘tibble’ stands for ‘tidy table’ :) Let’s find out what makes tibbles different.
First, let’s load up the tidyverse
set of packages.
library(tidyverse)
To create a tibble, all you have to do is use the function tibble()
, which works the same way as the function data.frame()
. When you’re creating a tibble, you can only use vectors that are either all the same length, or have length of 1. The vector with a length of 1 will just be recycled until it fills all of the rows in its column. Tibbles also don’t use row.names()
, which keeps things simpler.
Let’s create the same species table that we did earlier, but this time as a tibble.
# Create a tibble
species_dat <- tibble(species = c("Callinectes sapidus",
"Sciaenops ocellatus",
"Anchoa mitchilli",
"Micropognias undulatus",
"Menidia menidia"),
presence = c(T, F, T, F, T),
abundance = c(2, 0, 10, 0, 9))
# View the tibble and the class
species_dat
## # A tibble: 5 × 3
## species presence abundance
## <chr> <lgl> <dbl>
## 1 Callinectes sapidus TRUE 2
## 2 Sciaenops ocellatus FALSE 0
## 3 Anchoa mitchilli TRUE 10
## 4 Micropognias undulatus FALSE 0
## 5 Menidia menidia TRUE 9
When we print the tibble, it clearly tells us that it’s a tibble. It also tells us the table dimensions and the column names and data types.
You might be thinking: okay…and? The tibble doesn’t look that different from the data frame we originally created.
Let’s try another example.
This time, let’s load up an example data set that comes with the ggplot2
package. This data set is called msleep
, and describes the sleep times and brain weights of several different types of mammals. This data set already comes as a tibble, so let’s turn it into a data frame for the purposes of demonstration, using the as.data.frame()
function.
# Load data
data("msleep")
# Turn data into class data frame
msleep <- as.data.frame(msleep)
# View data
msleep
## name genus vore
## 1 Cheetah Acinonyx carni
## 2 Owl monkey Aotus omni
## 3 Mountain beaver Aplodontia herbi
## 4 Greater short-tailed shrew Blarina omni
## 5 Cow Bos herbi
## 6 Three-toed sloth Bradypus herbi
## 7 Northern fur seal Callorhinus carni
## 8 Vesper mouse Calomys <NA>
## 9 Dog Canis carni
## 10 Roe deer Capreolus herbi
## 11 Goat Capri herbi
## 12 Guinea pig Cavis herbi
## 13 Grivet Cercopithecus omni
## 14 Chinchilla Chinchilla herbi
## 15 Star-nosed mole Condylura omni
## 16 African giant pouched rat Cricetomys omni
## 17 Lesser short-tailed shrew Cryptotis omni
## 18 Long-nosed armadillo Dasypus carni
## 19 Tree hyrax Dendrohyrax herbi
## 20 North American Opossum Didelphis omni
## 21 Asian elephant Elephas herbi
## 22 Big brown bat Eptesicus insecti
## 23 Horse Equus herbi
## 24 Donkey Equus herbi
## 25 European hedgehog Erinaceus omni
## 26 Patas monkey Erythrocebus omni
## 27 Western american chipmunk Eutamias herbi
## 28 Domestic cat Felis carni
## 29 Galago Galago omni
## 30 Giraffe Giraffa herbi
## 31 Pilot whale Globicephalus carni
## 32 Gray seal Haliochoerus carni
## 33 Gray hyrax Heterohyrax herbi
## 34 Human Homo omni
## 35 Mongoose lemur Lemur herbi
## 36 African elephant Loxodonta herbi
## 37 Thick-tailed opposum Lutreolina carni
## 38 Macaque Macaca omni
## 39 Mongolian gerbil Meriones herbi
## 40 Golden hamster Mesocricetus herbi
## 41 Vole Microtus herbi
## 42 House mouse Mus herbi
## 43 Little brown bat Myotis insecti
## 44 Round-tailed muskrat Neofiber herbi
## 45 Slow loris Nyctibeus carni
## 46 Degu Octodon herbi
## 47 Northern grasshopper mouse Onychomys carni
## 48 Rabbit Oryctolagus herbi
## 49 Sheep Ovis herbi
## 50 Chimpanzee Pan omni
## 51 Tiger Panthera carni
## 52 Jaguar Panthera carni
## 53 Lion Panthera carni
## 54 Baboon Papio omni
## 55 Desert hedgehog Paraechinus <NA>
## 56 Potto Perodicticus omni
## 57 Deer mouse Peromyscus <NA>
## 58 Phalanger Phalanger <NA>
## 59 Caspian seal Phoca carni
## 60 Common porpoise Phocoena carni
## 61 Potoroo Potorous herbi
## 62 Giant armadillo Priodontes insecti
## 63 Rock hyrax Procavia <NA>
## 64 Laboratory rat Rattus herbi
## 65 African striped mouse Rhabdomys omni
## 66 Squirrel monkey Saimiri omni
## 67 Eastern american mole Scalopus insecti
## 68 Cotton rat Sigmodon herbi
## 69 Mole rat Spalax <NA>
## 70 Arctic ground squirrel Spermophilus herbi
## 71 Thirteen-lined ground squirrel Spermophilus herbi
## 72 Golden-mantled ground squirrel Spermophilus herbi
## 73 Musk shrew Suncus <NA>
## 74 Pig Sus omni
## 75 Short-nosed echidna Tachyglossus insecti
## 76 Eastern american chipmunk Tamias herbi
## 77 Brazilian tapir Tapirus herbi
## 78 Tenrec Tenrec omni
## 79 Tree shrew Tupaia omni
## 80 Bottle-nosed dolphin Tursiops carni
## 81 Genet Genetta carni
## 82 Arctic fox Vulpes carni
## 83 Red fox Vulpes carni
## order conservation sleep_total sleep_rem
## 1 Carnivora lc 12.1 NA
## 2 Primates <NA> 17.0 1.8
## 3 Rodentia nt 14.4 2.4
## 4 Soricomorpha lc 14.9 2.3
## 5 Artiodactyla domesticated 4.0 0.7
## 6 Pilosa <NA> 14.4 2.2
## 7 Carnivora vu 8.7 1.4
## 8 Rodentia <NA> 7.0 NA
## 9 Carnivora domesticated 10.1 2.9
## 10 Artiodactyla lc 3.0 NA
## 11 Artiodactyla lc 5.3 0.6
## 12 Rodentia domesticated 9.4 0.8
## 13 Primates lc 10.0 0.7
## 14 Rodentia domesticated 12.5 1.5
## 15 Soricomorpha lc 10.3 2.2
## 16 Rodentia <NA> 8.3 2.0
## 17 Soricomorpha lc 9.1 1.4
## 18 Cingulata lc 17.4 3.1
## 19 Hyracoidea lc 5.3 0.5
## 20 Didelphimorphia lc 18.0 4.9
## 21 Proboscidea en 3.9 NA
## 22 Chiroptera lc 19.7 3.9
## 23 Perissodactyla domesticated 2.9 0.6
## 24 Perissodactyla domesticated 3.1 0.4
## 25 Erinaceomorpha lc 10.1 3.5
## 26 Primates lc 10.9 1.1
## 27 Rodentia <NA> 14.9 NA
## 28 Carnivora domesticated 12.5 3.2
## 29 Primates <NA> 9.8 1.1
## 30 Artiodactyla cd 1.9 0.4
## 31 Cetacea cd 2.7 0.1
## 32 Carnivora lc 6.2 1.5
## 33 Hyracoidea lc 6.3 0.6
## 34 Primates <NA> 8.0 1.9
## 35 Primates vu 9.5 0.9
## 36 Proboscidea vu 3.3 NA
## 37 Didelphimorphia lc 19.4 6.6
## 38 Primates <NA> 10.1 1.2
## 39 Rodentia lc 14.2 1.9
## 40 Rodentia en 14.3 3.1
## 41 Rodentia <NA> 12.8 NA
## 42 Rodentia nt 12.5 1.4
## 43 Chiroptera <NA> 19.9 2.0
## 44 Rodentia nt 14.6 NA
## 45 Primates <NA> 11.0 NA
## 46 Rodentia lc 7.7 0.9
## 47 Rodentia lc 14.5 NA
## 48 Lagomorpha domesticated 8.4 0.9
## 49 Artiodactyla domesticated 3.8 0.6
## 50 Primates <NA> 9.7 1.4
## 51 Carnivora en 15.8 NA
## 52 Carnivora nt 10.4 NA
## 53 Carnivora vu 13.5 NA
## 54 Primates <NA> 9.4 1.0
## 55 Erinaceomorpha lc 10.3 2.7
## 56 Primates lc 11.0 NA
## 57 Rodentia <NA> 11.5 NA
## 58 Diprotodontia <NA> 13.7 1.8
## 59 Carnivora vu 3.5 0.4
## 60 Cetacea vu 5.6 NA
## 61 Diprotodontia <NA> 11.1 1.5
## 62 Cingulata en 18.1 6.1
## 63 Hyracoidea lc 5.4 0.5
## 64 Rodentia lc 13.0 2.4
## 65 Rodentia <NA> 8.7 NA
## 66 Primates <NA> 9.6 1.4
## 67 Soricomorpha lc 8.4 2.1
## 68 Rodentia <NA> 11.3 1.1
## 69 Rodentia <NA> 10.6 2.4
## 70 Rodentia lc 16.6 NA
## 71 Rodentia lc 13.8 3.4
## 72 Rodentia lc 15.9 3.0
## 73 Soricomorpha <NA> 12.8 2.0
## 74 Artiodactyla domesticated 9.1 2.4
## 75 Monotremata <NA> 8.6 NA
## 76 Rodentia <NA> 15.8 NA
## 77 Perissodactyla vu 4.4 1.0
## 78 Afrosoricida <NA> 15.6 2.3
## 79 Scandentia <NA> 8.9 2.6
## 80 Cetacea <NA> 5.2 NA
## 81 Carnivora <NA> 6.3 1.3
## 82 Carnivora <NA> 12.5 NA
## 83 Carnivora <NA> 9.8 2.4
## sleep_cycle awake brainwt bodywt
## 1 NA 11.90 NA 50.000
## 2 NA 7.00 0.01550 0.480
## 3 NA 9.60 NA 1.350
## 4 0.1333333 9.10 0.00029 0.019
## 5 0.6666667 20.00 0.42300 600.000
## 6 0.7666667 9.60 NA 3.850
## 7 0.3833333 15.30 NA 20.490
## 8 NA 17.00 NA 0.045
## 9 0.3333333 13.90 0.07000 14.000
## 10 NA 21.00 0.09820 14.800
## 11 NA 18.70 0.11500 33.500
## 12 0.2166667 14.60 0.00550 0.728
## 13 NA 14.00 NA 4.750
## 14 0.1166667 11.50 0.00640 0.420
## 15 NA 13.70 0.00100 0.060
## 16 NA 15.70 0.00660 1.000
## 17 0.1500000 14.90 0.00014 0.005
## 18 0.3833333 6.60 0.01080 3.500
## 19 NA 18.70 0.01230 2.950
## 20 0.3333333 6.00 0.00630 1.700
## 21 NA 20.10 4.60300 2547.000
## 22 0.1166667 4.30 0.00030 0.023
## 23 1.0000000 21.10 0.65500 521.000
## 24 NA 20.90 0.41900 187.000
## 25 0.2833333 13.90 0.00350 0.770
## 26 NA 13.10 0.11500 10.000
## 27 NA 9.10 NA 0.071
## 28 0.4166667 11.50 0.02560 3.300
## 29 0.5500000 14.20 0.00500 0.200
## 30 NA 22.10 NA 899.995
## 31 NA 21.35 NA 800.000
## 32 NA 17.80 0.32500 85.000
## 33 NA 17.70 0.01227 2.625
## 34 1.5000000 16.00 1.32000 62.000
## 35 NA 14.50 NA 1.670
## 36 NA 20.70 5.71200 6654.000
## 37 NA 4.60 NA 0.370
## 38 0.7500000 13.90 0.17900 6.800
## 39 NA 9.80 NA 0.053
## 40 0.2000000 9.70 0.00100 0.120
## 41 NA 11.20 NA 0.035
## 42 0.1833333 11.50 0.00040 0.022
## 43 0.2000000 4.10 0.00025 0.010
## 44 NA 9.40 NA 0.266
## 45 NA 13.00 0.01250 1.400
## 46 NA 16.30 NA 0.210
## 47 NA 9.50 NA 0.028
## 48 0.4166667 15.60 0.01210 2.500
## 49 NA 20.20 0.17500 55.500
## 50 1.4166667 14.30 0.44000 52.200
## 51 NA 8.20 NA 162.564
## 52 NA 13.60 0.15700 100.000
## 53 NA 10.50 NA 161.499
## 54 0.6666667 14.60 0.18000 25.235
## 55 NA 13.70 0.00240 0.550
## 56 NA 13.00 NA 1.100
## 57 NA 12.50 NA 0.021
## 58 NA 10.30 0.01140 1.620
## 59 NA 20.50 NA 86.000
## 60 NA 18.45 NA 53.180
## 61 NA 12.90 NA 1.100
## 62 NA 5.90 0.08100 60.000
## 63 NA 18.60 0.02100 3.600
## 64 0.1833333 11.00 0.00190 0.320
## 65 NA 15.30 NA 0.044
## 66 NA 14.40 0.02000 0.743
## 67 0.1666667 15.60 0.00120 0.075
## 68 0.1500000 12.70 0.00118 0.148
## 69 NA 13.40 0.00300 0.122
## 70 NA 7.40 0.00570 0.920
## 71 0.2166667 10.20 0.00400 0.101
## 72 NA 8.10 NA 0.205
## 73 0.1833333 11.20 0.00033 0.048
## 74 0.5000000 14.90 0.18000 86.250
## 75 NA 15.40 0.02500 4.500
## 76 NA 8.20 NA 0.112
## 77 0.9000000 19.60 0.16900 207.501
## 78 NA 8.40 0.00260 0.900
## 79 0.2333333 15.10 0.00250 0.104
## 80 NA 18.80 NA 173.330
## 81 NA 17.70 0.01750 2.000
## 82 NA 11.50 0.04450 3.380
## 83 0.3500000 14.20 0.05040 4.230
Okay, wow. When we print the data frame it’s pretty overwhelming. Printing the data frame shows us all of our rows and columns. And because our columns don’t all fit on one row, they have to be carried over and added as extra rows, making the printed output even longer. This is a very messy and confusing way to view our data.
Let’s turn the data back into a tibble using the as_tibble()
function, and let’s see what that looks like.
# Turn data into a tibble
msleep <- as_tibble(msleep)
# View data
msleep
## # A tibble: 83 × 11
## name genus vore order conservation sleep_total
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 Cheetah Acinon… carni Carni… lc 12.1
## 2 Owl monkey Aotus omni Prima… <NA> 17
## 3 Mountain b… Aplodo… herbi Roden… nt 14.4
## 4 Greater sh… Blarina omni Soric… lc 14.9
## 5 Cow Bos herbi Artio… domesticated 4
## 6 Three-toed… Bradyp… herbi Pilosa <NA> 14.4
## 7 Northern f… Callor… carni Carni… vu 8.7
## 8 Vesper mou… Calomys <NA> Roden… <NA> 7
## 9 Dog Canis carni Carni… domesticated 10.1
## 10 Roe deer Capreo… herbi Artio… lc 3
## # … with 73 more rows, and 5 more variables:
## # sleep_rem <dbl>, sleep_cycle <dbl>, awake <dbl>,
## # brainwt <dbl>, bodywt <dbl>
The printed tibble is much neater than the printed data frame! Although there are ways to print data frames more neatly, tibbles are automatically formatted so that the columns are abbreviated to fit on one row (or are not printed), and you only see the first ten rows of data instead of every single row. This makes it way more convenient to view your data sets.
Tibbles also reduce errors when subsetting your data. For example, when subsetting with single square brackets [ ], tibbles always return another tibble. In contrast, subsetting data frames will sometimes return a vector instead of another data frame.
And if you try to subset a tibble using a column that does not exist, you’ll receive a warning that the column does not exist. In contrast, subsetting a data frame using a column that doesn’t exist will only return NULL
, and you don’t receive an explanation of why.
# See if msleep (the tibble) has a column called "abc"
msleep$abc
## Warning: Unknown or uninitialised column: `abc`.
## NULL
# Turn msleep into a data frame
msleep <- as.data.frame(msleep)
# See if msleep (the data frame) has a column called "abc"
msleep$abc
## NULL
One other advantage to tibbles is that they allow your column names to have spaces. Normally you wouldn’t go out of your way to add spaces to your column names since it’s much better practice to use underscores “_” in place of spaces to begin with. However, sometimes the data you upload into R will contain spaces in the column names. While regular data frames replace spaces with periods “.”, tibbles maintain the original column names surrounded by back ticks (also known as the acute or left quote, it’s the apostrophe-like thing usually located above your left tab key and with the tilde ‘~’ on your keyboard). When uploading data into R, you can upload directly as a tibble and ensure all column names are maintained as they were in the original CSV by using read_csv()
(note the underscore between ‘read’ and ‘csv’ versus of the function “read.csv()”, which reads in your data as a data frame).
In short, tibbles make a number of changes to normal data frames that can help reduce errors in your data analysis. These improvements in printing and subsetting are small, but useful!
And that’s it for our blog post on data structures in R! I hope this post taught you a few useful tips and tricks for working with your data. Happy coding!
Also be sure to check out R-bloggers for other great tutorials on learning R