library(tidyverse)
library(janitor)
<- read_csv(file = "For_Hire_Vehicles__FHV__-_Active.csv") %>%
fhv_summary clean_names() %>%
rename(hybrid = veh) %>%
mutate(
ride_type = case_when(
== "UBER USA, LLC" & base_type == "BLACK-CAR" ~ "BLACK CAR RIDESHARE",
base_name != "UBER USA, LLC" & base_type == "BLACK-CAR" ~ "BLACK CAR NON-RIDESHARE",
base_name TRUE ~ base_type #if it doesn't meet either condition, return the base_type
%>%
)) group_by(ride_type) %>% #group by the variable we just created!
summarize(no_cars = n(),
average_year = mean(vehicle_year, na.rm = T))
fhv_summary
9. Arrange and Write Data
Video Tutorial
How do we get data out of R?
Often, we will want to take data that we clean, mutate, summarize, filter, or select with, and output it for use in another software. Think about how you might want to process a million-row data set to get some summary statistics, then create a nice table in excel. Or take some data that you need to make a chart or graphic, and export it so that you can read it into DataWrapper or some other visualization tool. Maybe you need to send your boss a list of items that are buried in a big R dataset.
Writing data will let you take data out of R and use it other places. But first we might want to use some other functions to get it looking nice and orderly.
Arrange
arrange()
takes data and sorts it based on certain criteria. Like many of our basic functions, it takes a list ...
of inputs to sort on. Let’s take a look at an example of something we summarized.
Let’s grab our code to read in the clean dataframe again. This time I’m just going to use a big pipe to go right to the summary.
Now Let’s say we wanted to sort this list by average oldest car to newest car.
%>%
fhv_summary arrange(average_year)
That puts all the oldest car on top and the newest car on bottom
desc
is a function that transforms a vector to descending order, and is helpful to use nested inside arrange.
<- fhv_summary %>%
fhv_arranged arrange(desc(average_year))
fhv_arranged
Arrange also works with multiple variables - the variable listed second breaks ties - and within groups with group_by.
Write out data
Now that we have a nice table arranged the way we want, we can output it for use in another software.
write_csv()
is a twin function to read_csv()
. It takes the name of an object and then a filepath to write to.
%>%
fhv_arranged write_csv(file = "output/ride_type_by_average_year.csv")
Since we used the local path this shows up right in our project directory. We will be writing out to .csv
s mostly, but there are companion functions to write out other types of data, like excel spreadsheets.