Often, we will want to take data that we clean, mutate, summarize, filter, or select with, and output it for use in another software. Think about how you might want to process a million-row data set to get some summary statistics, then create a nice table in excel. Or take some data that you need to make a chart or graphic, and export it so that you can read it into DataWrapper or some other visualization tool. Maybe you need to send your boss a list of items that are buried in a big R dataset.
Writing data will let you take data out of R and use it other places. But first we might want to use some other functions to get it looking nice and orderly.
Arrange
arrange() takes data and sorts it based on certain criteria. Like many of our basic functions, it takes a list ... of inputs to sort on. Let’s take a look at an example of something we summarized.
Let’s grab our code to read in the clean dataframe again. This time I’m just going to use a big pipe to go right to the summary.
library(tidyverse)library(janitor)fhv_summary <-read_csv(file ="For_Hire_Vehicles__FHV__-_Active.csv") %>%clean_names() %>%rename(hybrid = veh) %>%mutate(ride_type =case_when( base_name =="UBER USA, LLC"& base_type =="BLACK-CAR"~"BLACK CAR RIDESHARE", base_name !="UBER USA, LLC"& base_type =="BLACK-CAR"~"BLACK CAR NON-RIDESHARE",TRUE~ base_type #if it doesn't meet either condition, return the base_type )) %>%group_by(ride_type) %>%#group by the variable we just created!summarize(no_cars =n(),average_year =mean(vehicle_year, na.rm = T))fhv_summary
# A tibble: 4 × 3
ride_type no_cars average_year
<chr> <int> <dbl>
1 BLACK CAR NON-RIDESHARE 16225 2018.
2 BLACK CAR RIDESHARE 76710 2018.
3 LIVERY 3652 2015.
4 LUXURY 1731 2020.
Now Let’s say we wanted to sort this list by average oldest car to newest car.
fhv_summary %>%arrange(average_year)
# A tibble: 4 × 3
ride_type no_cars average_year
<chr> <int> <dbl>
1 LIVERY 3652 2015.
2 BLACK CAR NON-RIDESHARE 16225 2018.
3 BLACK CAR RIDESHARE 76710 2018.
4 LUXURY 1731 2020.
That puts all the oldest car on top and the newest car on bottom
desc is a function that transforms a vector to descending order, and is helpful to use nested inside arrange.
Since we used the local path this shows up right in our project directory. We will be writing out to .csvs mostly, but there are companion functions to write out other types of data, like excel spreadsheets.