Saving scripts in R
Generate lecture code examples
Maps in ggplot2
Exercises (maps & data manipulation)
September, 2015
Saving scripts in R
Generate lecture code examples
Maps in ggplot2
Exercises (maps & data manipulation)
Use the knitr
package
# install.packages("knitr") library("knitr") purl("https://raw.githubusercontent.com/sebastianbarfort/sds/gh-pages/_slides/lecture3.Rmd")
There are many ways to make maps in R
Today focus is on 1.
There are many useful packages for making maps in R
maps
: all kinds of mapsggcounty
: generate United States county mapsggmap
: extends ggplot2
for mapsmapDK
: maps of DenmarkLet's return to our marijuana price data
library("readr") df = read_csv("https://raw.githubusercontent.com/sebastianbarfort/sds/master/data/marijuana-street-price-clean.csv")
Generate yearly state level means
library("lubridate") library("dplyr") df$year = year(df$date) df = df %>% group_by(State, year) %>% summarise( m.price = mean(HighQ, na.rm = TRUE) ) %>% mutate( region = tolower(State) )
maps
The maps
package has geographic information on all U.S states
library("maps") library("ggplot2") us.states = map_data("state") head(us.states)
## long lat group order region subregion ## 1 -87.46201 30.38968 1 1 alabama <NA> ## 2 -87.48493 30.37249 1 2 alabama <NA> ## 3 -87.52503 30.37249 1 3 alabama <NA> ## 4 -87.53076 30.33239 1 4 alabama <NA> ## 5 -87.57087 30.32665 1 5 alabama <NA> ## 6 -87.58806 30.32665 1 6 alabama <NA>
Merge the data
df.merge = left_join(df, us.states)
ggplot2
Plotting the dataframe is easy in ggplot2
p = ggplot(df.merge, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = m.price)) + facet_wrap( ~ year, ncol = 1) + expand_limits() + theme_minimal()
ggcounty
The ggcounty
package provides data at the U.S county level
# devtools::install_github("hrbrmstr/ggcounty") library("ggcounty") data(population) # built-in US population by FIPS code data set population$brk <- cut(population$count, breaks=c(0, 100, 1000, 10000, 100000, 1000000, 10000000), labels=c("0-99", "100-1K", "1K-10K", "10K-100K", "100K-1M", "1M-10M"), include.lowest=TRUE) # define appropriate (& nicely labeled) population breaks us <- ggcounty.us() gg <- us$g # start the plot with our base map gg <- gg + geom_map(data=population, map=us$map, aes(map_id=FIPS, fill=brk), color="white", size=0.125) # add a new geom with our population (choropleth) gg <- gg + scale_fill_manual(values=c("#ffffcc", "#c7e9b4", "#7fcdbb", "#41b6c4", "#2c7fb8", "#253494"), name="Population")
ggmap
ggmap
is a package that uses the ggplot2
syntax as a template to create maps with image tiles taken from map servers such as Google and OpenStreetMap
Let's use some data on benches in Copenhagen
df = read_csv("http://wfs-kbhkort.kk.dk/k101/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=k101:baenk&outputFormat=csv&SRSNAME=EPSG:4326") names(df)
## [1] "FID" "wkb_geometry" "id" ## [4] "vejkode" "vejnavn" "park_id" ## [7] "bydel" "distrikt" "baenk_type" ## [10] "baenk_tilstand" "baenk_placering" "baenk_foto" ## [13] "baenk_driftsopgave" "baenk_fjernet" "bemaerkning" ## [16] "reg_metode" "reg_dato" "rettet_dato"
We need to do quite some data cleaning
library("dplyr") library("stringr") df = df %>% select(wkb_geometry, baenk_tilstand) # cleaning df$wkb_geometry = gsub("\\(|\\)", "", df$wkb_geometry) df$wkb_geometry = str_extract(df$wkb_geometry, "[0-9].+") x = str_split(df$wkb_geometry, pattern = " ") x = do.call(rbind.data.frame, x) df = bind_cols(df, x) names(df) = c("wbk_geometry", "baenk_tilstand", "lat", "lon") df$lon = as.numeric(as.character(df$lon)) df$lat = as.numeric(as.character(df$lat))
library("ggmap") qmplot(lat, lon, zoom = 15, data = df, maptype = "toner-background", color = I("red"))
qmplot(lat, lon, zoom = 15, data = df, maptype = "toner-lite", geom = "density2d", color = I("red"))
mapDK
A package for making maps of Denmark at different levels of aggregation
The package currently only has two functions:
mapDK
- makes the map
getID
- prints keys in case you run into merge problems
only accepts one argument: detail
library(mapDK) args(getID)
## function (detail = "municipal") ## NULL
getID(detail = "municipal")[1:10]
## [1] "aabenraa" "aalborg" "aeroe" "albertslund" "alleroed" ## [6] "aarhus" "assens" "ballerup" "billund" "bornholm"
getID(detail = "region")
## [1] "hovedstaden" "midtjylland" "nordjylland" "sjaelland" "syddanmark"
mapDK
takes the following arguments
args(mapDK)
## function (values = NULL, id = NULL, data, detail = "municipal", ## show_missing = TRUE, sub = NULL, guide.label = NULL, map.title = NULL) ## NULL
For basic maps you really only need detail
, sub
and map.title
If you want to do choropleth maps you need to specify
data
: A data frame of values and idsvalues
, id
: String variables specifying names of value and id columns in the datasetreturns a ggplot2
object you can modify if you like
detail
argument
municipality
- plots Denmark's 98 municipalitiesregion
- plots Denmark's 5 regionsrural
- plots Denmark's 11 rural areaszip
- plots Denmark's 598 zip code areaspolling
- plots Denmark's 1385 polling places (as of 2015)parish
- plots Denmark's 1931 parishessub
argument takes a vector of strings specifying subregions to be plottedmapDK()
mapDK(detail = "parish")
mapDK(values = "stemmer", id = "id", data = subset(votes, navn == "socialdemokratiet"), detail = "polling", show_missing = FALSE, guide.label = "Stemmer \nSocialdemokratiet (pct)")
Putting it all togother…
library("mapproj") library("ggmap") df = mapDK::polling df.votes = mapDK::votes df = df %>% filter(KommuneNav == "koebenhavn") df.t = left_join(df, df.votes) cph.map = ggmap(get_map(location = c(12.57, 55.68), source = "stamen", maptype = "toner", crop = TRUE, zoom = 13)) p = cph.map + geom_polygon(data = subset(df.t, navn == "socialdemokratiet"), aes(x = long, y = lat, group = group, fill = stemmer), alpha = .75)
For this exercise we will work with data on GDP per capita at the country level. You can download the data using the WDI
package as shown below
# install.packages("WDI") library("WDI") library("dplyr") df = WDI(indicator = "NY.GDP.PCAP.KN" , start = 2010, end = 2010, extra = F) df = df %>% filter(!is.na(NY.GDP.PCAP.KN))
Question 1: use the map
package and the GDP data to make a world map of GDP per capita.
Question 2: install the package countrycode
and use the countrycode
function to add a region indicator to the dataset. Create a world map faceted by your region indicator.
In this exercise you will work with data on votes for the Danish general election from 2011. You can read the data using the following piece of code
df = mapDK::votes
Question 1: use the mapDK
package to make a map of votes (in pct) for the Conservative Party ("detkonservativefolkeparti") at the polling place level.
Question 2: read up on the documentation for the dplyr
package to aggregate the data into votes (in pct) for the Conservatives at the municipal level. Plot the data using mapDK
Question 3: Repeat question 2 but only for the municipalities "Aarhus" and "Koebenhavn".
For this exercise we will work with Facebook data from the Danish parliamentary election 2015 kindly provided by 56 north.
Load the data by running
library(readr) df = read_csv("https://raw.githubusercontent.com/sebastianbarfort/sds/master/data/FV15_data.csv")
Question 1: Use the dplyr
package to aggregate the number of likes by party and "storkreds"
Question 2: Plot the data (do you need to facet?) on a map using the mapDK
package.
Question 3: Use the dplyr
package to sort the dataset according to the number of likes. Which candidate in the data is most popular? Create a dataset with only the most popular candidate by "storkreds".