Social Data Science


What is social data science?

Much like data science as the more general term social data science has no clear definition either. In the context of this website it is mainly the title of a course on applications of data science methods in quantitative social research.


Teaching data science skills

Students of social sciences acquire a lot of substantive expertise in several subjects like sociology, psychology, politics, economics etc. They also learn mathematics and statistics as part of their research methods training. What the social scienitific curriculum is missing in most cases are hacking skills which are needed to become a full-blown social data scientist.

The Data Science Venn Diagram

"The Data Science Venn Diagramm" by Drew Conway is licensed under CC BY-NC 3.0

One of our course goals is therefore to provide the students with the necessary technical skills to apply data science techniques to social research questions. Since we cannot expect a strong computer science background or affinity, we try to introduce concepts like coding, version control etc. as gently as possible. At the same time we want to provide at least a minimal skill set in handling state of the art data science tools.

Installing the necessary programs and getting them to run properly on each computer can be a time consuming and frustrating process especially for the less tech savvy. We therefore provide a virtual machine running a completely prepared, easy-to-use Ubuntu Linux desktop with all tools that are covered in the course already pre-installed. Of course all programs used are free software and most are under an open source licence.


Data science applications in social research

Apart from learning the technical skills the main focus of our course is the application of data science techniques to social research questions. In order to do that the students will get an introduction to different machine learning concepts. We will first cover basic classification algorithms like naive bayes and decision trees. Together we will then explore how these can be applied to problems of prediction and explanation of social phenomena.

The substantive part of the course will start with students developing a research question in one of the areas of family, education or labour market research. After that we will discuss - and finally decide on - datasets to be used to address the specific research questions. The main task for the rest of the course is then to discuss and develop the application of the data science methods the students have just learned to their own research question.