The Spatial features package allows us to read in spatial data into R and transform it. Let’s get a dataset on [affordable housing production](https://data.cityofnewyork.us/resource/hg8x-zxpr) from the NYC open data portal.
But that doesn’t look very good, let’s add some other layers. I can read in (CDs)[https://data.cityofnewyork.us/resource/jp9i-3b7y] directly as a shapefile using the geojson version from the Open data portal. This map is a long way from done, but now it has a semblance of a basemap.
cd_sf <-read_sf("https://data.cityofnewyork.us/resource/jp9i-3b7y.geojson") %>%st_set_crs(st_crs(aff_hsg_sf)) # here I am setting the crs of the new data to the crs of the point data I already have
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
ggplot()+geom_sf(cd_sf,mapping =aes())+#this layer will be on bottom!geom_sf(aff_hsg_sf,mapping =aes(size = all_counted_units,color = all_counted_units),alpha =0.25) #this layer will be on top!
#alpha changes the opacity of the dots
I can also summarize data like I would in a tabular dataset
And then I could create a choropleth with the summarized data, now that I have the nta shapes (just a simple join!)
cd_aff_sum <- cd_sf %>%mutate(community_board =case_when(str_sub(boro_cd, 1, 1) =="1"~paste0("MN-",str_sub(boro_cd, 2,3)),str_sub(boro_cd, 1, 1) =="2"~paste0("BX-",str_sub(boro_cd, 2,3)),str_sub(boro_cd, 1, 1) =="3"~paste0("BK-",str_sub(boro_cd, 2,3)),str_sub(boro_cd, 1, 1) =="4"~paste0("QN-",str_sub(boro_cd, 2,3)),str_sub(boro_cd, 1, 1) =="5"~paste0("SI-",str_sub(boro_cd, 2,3)), )) %>%#take the first character of boro_cd, replace it with the boro abbreviation, add the community board numberleft_join(aff_hsg_sum, by ="community_board")
Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
Shapefile driver
Deleting layer `cd_aff_hsg' using driver `ESRI Shapefile'
Writing layer `cd_aff_hsg' to data source
`output/cd_aff_hsg.shp' using driver `ESRI Shapefile'
Writing 71 features with 10 fields and geometry type Multi Polygon.
I can map affordable units per person (normalized!)
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_sf()`).
I could write out this ggplot to edit in vector graphics
ggsave("output/aff_hsg_inc_nyc.svg")
Saving 7 x 5 in image
Warning: Removed 12 rows containing missing values or values outside the scale range
(`geom_sf()`).
Spatial Joins
What if I didn’t have a crosswalk and needed to match points to the community district? I can do a spatial join to find what cd all the points fall into.
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
I can then use summarize() to count points in polygons - or do more advanced summaries
points_polygons_join %>%as.data.frame() %>%#summarize works faster without the spatial features attachedgroup_by(boro_cd) %>%summarize(points_in_polygon =n(),units_per_cd =sum(all_counted_units, na.rm = T))
What if I wanted to find the proportion of affordable housing in the flood zone? I can do this with an intersects and mutate - I don’t even need to join the two datasets!
# floodplain_2020s_100y <- read_sf("https://data.cityofnewyork.us/resource/inra-wqx3.geojson?$limit=10000") %>% # st_make_valid() %>% #this magic function repairs any invalid geometries# st_union() %>% #here I make the entire floodplain into one big shape# st_set_crs(st_crs(points_polygons_join)) #and set it to the same crs as my points# # points_polygons_join %>% # mutate(fplain_2020s_100y = lengths(st_intersects(.,floodplain_2020s_100y))) %>% #st_intersects returns a list of the shapes it intersects with - if its 0 it didn't intersect, if its 1 it did!# as.data.frame() %>% # group_by(fplain_2020s_100y) %>% # summarize(count = n(),# units = sum(all_counted_units, na.rm = T))
Cheat Sheet
Many more spatial operations available on the cheat sheets