Readings
Course readings are provided below. We do not expect you to read entries marked as “inspiration”. They are provided in case you have spare time and want to dig deeper into the topics. Kosuke Imai’s book is forthcoming at Princeton University Press. Professor Imai has kindly given us permission to use the textbook free of charge in advance of its official release. A PDF of the book is available through Absalon. Please do not circulate it.
August 8
Keywords: Introduction to SDS, Introduction to R.
Required
- Preparing for Social Data Science
- The Scientist: Get With the Program
- Imai, Kosuke. 2016. A First Course in Quantitative Social Science. Read chapter 1.
Inspiration
If you’re interested, and want to delve deeper into coding and programming (you certainly don’t have to, they are not required for this course), I highly recommend the following posts
August 9 & 10
Keywords: Visualization, Data Manipulation, Data Import, Functions.
Required
-
Grolemund, Garrett and Hadley Wickham. 2016. “R for Data Science”. Read chapters 3, 4 and 9. Browse chapter 15.
-
Imai, Kosuke. 2016. A First Course in Quantitative Social Science. Read chapter 4 section 1-3.1
Browse the following
-
Schwabish, Jonathan A. 2014. “An Economist’s Guide to Visualizing Data”. Journal of Economic Perspectives, 28(1): 209-34.
-
Healy, Kieran and James Moody. 2014. “Data Visualization in Sociology”. Annual Review of Sociology, 40:105–128.
Inspiration
Below are links to some interesting videos describing how companies such as the New York Times or FiveThirtyEight think about visualizing data as well as some posts and videos on the underlying theory behind the “tidyverse” and an introduction to working with spatial data in R.
-
Cox, Amanda. “Data Visualizations at the New York Times”.
-
Fivethirtyeight: How We Charted Trump’s Fall From Grace In Hip-Hop
-
Wickham, Hadley. 2016. “Making Data Analysis Easier”. Workshop presentation organised by the Monash Business Analytics Team.
-
Wickham, Hadley. 2010. “A Layered Grammar of Graphics”. Journal of Computational and Graphical Statistics, Volume 19, Number 1, Pages 3–28.
-
Wickham, Hadley. 2011. “The Split-Apply-Combine Strategy for Data Analysis”. Journal of Statistical Software 40(1).
-
Lovelace, Robin and James Cheshire. 2013. “Introduction to Spatial Data and ggplot2”.
August 11
Keywords: Web Scraping, API.
Required
-
Wickham, Hadley. 2014. “rvest: easy web scraping with R”. RStudio Blog.
-
Wickham, Hadley. 2010. “stringr: modern, consistent string processing. The R Journal. 2(2): 38-40.
-
Shiab, Nael. 2015. “On the Ethics of Web Scraping and Data Journalism”. Global Investigative Journalism Network.
-
The Economist. 2016. “What APIs are.”
Inspiration
Below are some interesting academic papers using data scraped from online sources that might provide inspiration for your exam project.
-
Stephens-Davidowitz, Seth. 2014. “The cost of racial animus on a black candidate: Evidence using Google search data.” Journal of Public Economics, 118: 26-40.
-
Stephens-Davidowitz, Seth, Hal Varian, and Michael D. Smith. 2016. “Super Returns to Super Bowl Ads?”. R & R, Journal of Political Economy.
-
Stephens-Davidowitz, Seth, and Hal Varian. 2015 “A Hands-on Guide to Google Data.” Google working paper.
-
Barberá, Pablo. 2015. “Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data.” Political Analysis, 23.1: 76-91.
-
Cavallo, Alberto. “Scraped data and sticky prices”. No. w21490. National Bureau of Economic Research, 2015.
-
Bond, Robert M., et al. 2012. “A 61-million-person experiment in social influence and political mobilization.” Nature, 489.7415: 295-298.
August 12
Keywords: Big Data, Reproducible Research.
Required
-
John Gerring. 2012. Measurements. Chapter 7 in Social Science Methodology, 2. Ed., Cambridge University Press. ((bad) copies will be provided)
-
Christine L. Borgman. Provocations, What Are Data and Data Scholarship in the Social Science. Chapters 1,2 and 6 in Big Data, Little Data, No Data. MIT Press 2015. (copies will be provided).
-
Einav and Levin: Economics in the Age of Big Data. Science. 2013. Link.
-
Edelman, Benjamin. 2012. “Using internet data for economic research.” The Journal of Economic Perspectives, 26.2: 189-206.
-
Anderson, Chris. 2008. “The end of theory: The data deluge makes the scientific method obsolete.” Wired, 16-07.
Read one of the following
-
Jones, Zachery. 2015. “Git & Github tutorial”.
-
Rainey, Carlisle. 2015. “Git for Political Science”.
-
Wickham, Hadley. 2015. “Git and GitHub”.
Background
- Lazer, David, et al. 2014. “The parable of Google Flu: traps in big data analysis.” Science, 343.14.
August 15
Keywords: Observational data, Causation.
-
Imai, Kosuke. 2016. A First Course in Quantitative Social Science. Read chapter 2 and section 4.3.
-
Samii, Cyrus. 2016. “Causal Empiricism in Quantitative Research”. Journal of Politics 78(3):941–955.
-
Deaton, Angus, and Nancy Cartwright. 2016. Understanding and Misunderstanding Randomized Controlled Trials. No. w22595. National Bureau of Economic Research.
Keywords: Prediction, Statistical Learning
-
Kleinberg, Jon, et al. “Prediction policy problems.” American Economic Review, 105.5 (2015): 491-495.
-
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. “Introduction to statistical learning”. Vol. 1. Springer, Berlin: Springer series in statistics. (pages: 15-42, 175-184, 214-227)
-
Varian. Hal. 2014. “Big Data: New Tricks for Econometrics”. Journal of Economic Perspectives, 28.2: 3-27.
Inspiration
-
Choi, Hyunyoung, and Hal Varian. “Predicting initial claims for unemployment benefits.” Google working paper.
-
Jonas, Zachery and Fridolin Linder. 2016. “Exploratory Data Analysis using Random Forests”.
-
Anderson, Chris. 2008. “The end of theory: The data deluge makes the scientific method obsolete.” Wired, 16-07.
-
Ginsberg, Jeremy, et al. 2009. “Detecting influenza epidemics using search engine query data.” Nature, 457.7232: 1012-1014.
-
Broniatowski, David Andre, Michael J. Paul, and Mark Dredze. 2014. “Twitter: big data opportunities.” Inform, 49: 255.
August 16
Keywords: Unsupervised Learning.
- Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. “Introduction to statistical learning”. Vol. 1. Springer, Berlin: Springer series in statistics. (pages: 373-399).
August 17
Keywords: Privacy.
Required
-
Alessandro Acquisti et al. 2015. “Privacy and human behavior in the age of information.” Science 347, 509.
-
Heffetz, Ori, and Katrina Ligett. 2014. “Privacy and Data-Based Research.” The Journal of Economic Perspectives, 28.2: 75-98.
-
Jesse Singal. 2015. “The Case of the Amazing Gay-Marriage Data: How a Graduate Student Reluctantly Uncovered a Huge Scientific Fraud.” New York Magazine.
-
Shiab, Nael. 2015. “Web Scraping: A Journalist’s Guide”. Global Investigative Journalism Network.
Background
-
Alessandro Acquisti. 2015. The Economics and Behavioral Economics of Privacy. Chapter 3 in Privacy, Big Data, and the Public Good: Frameworks for Engagement (eds. Julia Lane, Victoria Stodden, Stefan Bender, Helen Nissenbaum). Cambridge University Press.
-
Fabian Neuhaus & Timothy Webmoor. 2012. “AGILE ETHICS FOR MASSIFIED RESEARCH AND VISUALIZATION.” Information, Communication & Society 15:1, 43-65
August 18
Keywords: Text Data.
-
Imai, Kosuke. 2016. A First Course in Quantitative Social Science. Read chapter 5 section 1-2.
-
Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as data: The promise and pitfalls of automatic content analysis methods for political texts.” Political Analysis, 21.3: 267-297.
Inspiration
- King, G., Pan, J., & Roberts, M. E. 2013. How censorship in China allows government criticism but silences collective expression. American Political Science Review, 107(02), 326-343.
Footnotes
-
You will notice that Kosuke Imai uses the base
R
package much more frequently than we do. We will instead writeR
code that follows the “tidyverse” approach to data analysis. We do not expect you to repeat every line of code in the chapter, but you should have a rough idea of what each line of code does. ↩