9. Arrange and Write Data

Author

Patrick Spauster

Video Tutorial

How do we get data out of R?

Often, we will want to take data that we clean, mutate, summarize, filter, or select with, and output it for use in another software. Think about how you might want to process a million-row data set to get some summary statistics, then create a nice table in excel. Or take some data that you need to make a chart or graphic, and export it so that you can read it into DataWrapper or some other visualization tool. Maybe you need to send your boss a list of items that are buried in a big R dataset.

Writing data will let you take data out of R and use it other places. But first we might want to use some other functions to get it looking nice and orderly.

Arrange

arrange() takes data and sorts it based on certain criteria. Like many of our basic functions, it takes a list ... of inputs to sort on. Let’s take a look at an example of something we summarized.

Let’s grab our code to read in the clean dataframe again. This time I’m just going to use a big pipe to go right to the summary.

library(tidyverse)
library(janitor)

fhv_summary <- read_csv(file = "For_Hire_Vehicles__FHV__-_Active.csv") %>% 
  clean_names() %>% 
  rename(hybrid = veh) %>% 
  mutate(
    ride_type = case_when(
      base_name == "UBER USA, LLC" & base_type == "BLACK-CAR" ~ "BLACK CAR RIDESHARE",
      base_name != "UBER USA, LLC" & base_type == "BLACK-CAR" ~ "BLACK CAR NON-RIDESHARE",
      TRUE ~ base_type #if it doesn't meet either condition, return the base_type
    )) %>% 
  group_by(ride_type) %>% #group by the variable we just created!
  summarize(no_cars = n(),
            average_year = mean(vehicle_year, na.rm = T))

fhv_summary
# A tibble: 4 × 3
  ride_type               no_cars average_year
  <chr>                     <int>        <dbl>
1 BLACK CAR NON-RIDESHARE   16225        2018.
2 BLACK CAR RIDESHARE       76710        2018.
3 LIVERY                     3652        2015.
4 LUXURY                     1731        2020.

Now Let’s say we wanted to sort this list by average oldest car to newest car.

fhv_summary %>% 
  arrange(average_year)
# A tibble: 4 × 3
  ride_type               no_cars average_year
  <chr>                     <int>        <dbl>
1 LIVERY                     3652        2015.
2 BLACK CAR NON-RIDESHARE   16225        2018.
3 BLACK CAR RIDESHARE       76710        2018.
4 LUXURY                     1731        2020.

That puts all the oldest car on top and the newest car on bottom

desc is a function that transforms a vector to descending order, and is helpful to use nested inside arrange.

fhv_arranged <- fhv_summary %>% 
  arrange(desc(average_year))

fhv_arranged
# A tibble: 4 × 3
  ride_type               no_cars average_year
  <chr>                     <int>        <dbl>
1 LUXURY                     1731        2020.
2 BLACK CAR RIDESHARE       76710        2018.
3 BLACK CAR NON-RIDESHARE   16225        2018.
4 LIVERY                     3652        2015.

Arrange also works with multiple variables - the variable listed second breaks ties - and within groups with group_by.

Write out data

Now that we have a nice table arranged the way we want, we can output it for use in another software.

write_csv() is a twin function to read_csv(). It takes the name of an object and then a filepath to write to.

fhv_arranged %>% 
  write_csv(file = "output/ride_type_by_average_year.csv")

Since we used the local path this shows up right in our project directory. We will be writing out to .csvs mostly, but there are companion functions to write out other types of data, like excel spreadsheets.