R-lecture-MAP01 | Bay Area Demographic

Dataframe used on this page: San Francisco 2021.RData

Displaying univariate information with maps

Maps are excellent ways to display information. They are one of the few tools that work well at both communicating with general audiences and giving experienced social analysts new insights.

For example, above is a map of the five county San Francisco Bay area with tracts color coded by median household income. A map like this, with sub-areas (tracts) colored according to some other variable (median household income), is called a choropleth map. Areas with higher incomes are redder; those with lower incomes are more yellow. It's easy for anyone to look for dark red or pale yellow areas and why they are particularly rich or poor. For someone who knows the area, the map is rich because it layers onto information they already have in their heads, like the differences between the hills and the flats in the East Bay. A knowledgeable analyst can use maps to discern how things like income are spread unevenly by race, migration recency, neighborhood turnover, etc. and explore those connections more rigorously.

One major downside of choropleth maps that use Census tracts is that they tend to mislead people about the distribution of things like income. The big tracts on the perimeter of the map sort of lead us to think that median incomes around $70,000 or so are most common. But remember, those tiny little tracts in the middle of San Francisco contain about the same number of people as the massive tracts. There are actually quite a few tiny tracts that are deep red, but this wealth doesn't pop out because it is located in densely populated areas.

Making this map

Making a map in R using data from the Census Bureau is easy largely thanks to people who have written R packages automating a lot of tedious work. In order to get the scripts below to work, you will need to install these packages:

install.packages("tidycensus")

install.packages("tidyverse")

install.packages("tigris")

install.packages("ggplot2")

install.packages("RColorBrewer")

Once you have these package installed, you can use this script or copy and paste it from the text below to produce a simple choropleth map of median household income in a tract, followed by a second map showing the percent of tract residents who identify as Latinx.

##necessary packages
library(tidycensus)
options(tigris_use_cache = TRUE)
library(tidyverse)
library(tigris)
library(ggplot2)
library(RColorBrewer)

##map of median household income in tract
ggplot(data = med_inc,aes(fill = median_income)) +
geom_sf() +
scale_fill_distiller(palette = "YlOrRd",
direction = 1,
name = 'Dollars') +
labs(title = "Median household income, 2021",
caption = "Data source: 2021 5-year ACS, US Census Bureau") +
theme_void()
ggsave("map_median_income_2021.png",dpi=800)

##map of percent of tract residents who identify as Latinx
ggplot(data = ethnoracial, aes(fill = pct_latinx)) +
geom_sf() +
scale_fill_distiller(palette = "PuBuGn",
direction = 1,
name = 'Pct.') +
labs(title = "Percent of tract residents Latinx, 2021",
caption = "Data source: 2021 5-year ACS, US Census Bureau") +
theme_void()
ggsave("map_pct_latinx_2021.png",dpi=800)

The first few lines above just load necessary packages. The command ggplot is a general plotting command from the ggplot2 package which is included as part of the tidyverse package (all of this is authored by Hadley Wickham, one of the great contributors to the collective project we call R).

The first part of ggplot just specifies the dataframe to be used ("med_inc") and the variable ("median_income") that will be used to fill in each Census tract with some color. The + at the end of the ggplot line tells R that there are subcommands coming. The geom_sf() part of the code is a subcommand that tells R we want to draw map shapes (had we used geom_bar, it would mean we want to draw bars in a bar chart; if we had used geom_histogram, it would mean we want to draw bars in a histogram; etc.).

The scale_fill_distiller subcommand gives R information about how we want our fill colors to act. The bit that says palette = "YlOrRd", specifies that we want to use a yellow-orange-red set of colors that the package RColorBrewer has predefined (that's why we had to load that package). The direction = 1 subcommand tells R to use the yellow end of the spectrum for low median incomes and the red end of the spectrum for high median incomes. If we wanted to go in the reverse order, we would have typed in direction = -1. And last, the name = "Dollars" subcommand just puts that label on our legend.

The labs subcommand puts a title on our map and a caption citing the source of our data. Even when you are just churning out maps to look at quickly for analysis, you should put on labels since they are virtually meaningless unless you know what you are looking at.

The final subcommand, theme_void(), gets rid of X and Y axes and gridlines that would be there otherwise. You can see what I'm talking about by changing this to theme_classic() or theme_test() or any of the other predefined themes.