Social Data Science

Fall 2015
Department of Economics
University of Copenhagen
Instructors: David Dreyer Lassen, Sebastian Barfort
TA: Kristoffer Glavind
Time: Mondays, 8am-10am (all weeks)
Thursdays, 8am-10am (odd weeks)
Location: CSS 1-1-18
Sebastian Office Hours: Wednesdays, 12pm-2pm in CSS 26-1-3</a>

The objective of this course is to learn how to gather and work with modern quantitative social science data. Increasingly, social data–data that capture how people behave and interact with each other–is available online in new, challenging forms and formats. This opens up the possibility of gathering large amounts of interesting data, to investigate existing theories and new phenomena, provided that the analyst has sufficient computer literacy while at the same time being aware of the promises and pitfalls of working with various types of data. Consequently, being an effective economist means spending large fractions of our time writing and debugging code. We write code to clean, transform, scrape and merge data that we want to analyze. This course will focus on the challenges that arise during this process, and thereby enhance our chances of posing new and challenging questions.

We will present data science methods needed for collecting and analyzing real-world data. In addition to core computational concepts, the course will focus on generating new data (collecting, scraping, working with APIs), data manipulation tools (transforming, cleaning), visualization tools (visualizing raw data and model results), reproducability tools (git, github), as well as provide an introduction to statistical techniques for predicting and classification, known as statistical learning.

The course will consist of two hours of lectures and one hour of exercises and problem solving per week. The lectures will focus on broad introductions to the topics covered in the course. One hour of exercises a week is not a large amount of time for learning how to code. We will use some of this time like development meetings: going over assignments, having detailed code reviews of various forms, and discussing blocking issues and potential solutions.

As increasing emphasis in academics is being placed on the skills needed to effectively gather, handle, and analyze data as well as present results to a range of audiences, this course will provide you with important tools for future academic study. Furthermore, the skills taught in this course are also widely used in business. R programming skills in particular are highly valued in fields such as finance and information technology. As this course is focused on general skills for working with social science data such as gathering and visualization, it is equally relevant for students seeking careers outside academia where skills such as the ability to effectively communicate the results of an analysis are in high demand.

This course assumes no knowledge of any particular software or computer program, but while I’ll try to demystify the technological side of things so students feel comfortable getting started and thinking programmatically, this will be a technical course, and students should expect to spend a significant amount of time learning these tools.

Because the course builds on a wide range of techniques, we do not have any hard requirements, but students are expected to have an interest in some subset of: statistics, econometrics, linear algebra, and a scripting language (we will use R in this course).

Course work will include writing R code, shell scripting.

Code will be distributed and collected via Git, hosted on Github.