You should be using dplyr and the pipe!

Dplyr (https://github.com/hadley/dplyr) is a great R package for manipulating, exploring, and summarizing data frames in R. As far as I know it doesn’t do anything that can’t be done in base R, but it makes all the common data analysis tasks easier, more readable, and faster. If you work with data frames in R, I highly recommend learning dplyr and how to use it with the ‘pipe’ operator. To get started, check out the vignette, or the great course on DataCamp (https://www.datacamp.com/courses/dplyr-data-manipulation-r-tutorial).

Here’s an example using data from the nycflights13 package, which contains data on flights leaving New York City’s three largest airports in 2013. Suppose you want to answer the question “Which airlines had the largest average delays in June?”. Without using the pipe operator, your code might look something like this:

library(nycflights13)
library(dplyr)
flights1 <- filter(flights, month==6)   # keep only rows for June
flights2 <- group_by(flights1, carrier) # group by carrier
flights3 <- summarise(flights2, avg_delay=mean(dep_delay+arr_delay,na.rm=TRUE)) 
flights4 <- arrange(flights3, desc(avg_delay))

With the pipe, you can reduce typing and avoid having to make all the intermediate variables. You could also nest all the dplyr function calls inside parentheses, but as you can imagine this becomes very hard to read. Using the pipe along with dplyr makes the code much easier to read and understand:

library(nycflights13)
library(dplyr)
flights %>% 
        filter(month==6) %>%   # keep only rows for June
        group_by(carrier) %>%  # group by carrier
        summarise(avg_delay=mean(dep_delay+arr_delay,na.rm=TRUE)) %>% 
        arrange( desc(avg_delay))
Written on November 14, 2016