Room: 7.3280
Mary-Somerville-Straße 7
28359 Bremen
10:00 a.m. - 4:30 p.m.
Johannes Nostadt
SoSe 2018

The vast availability of data on the web is fundamentally changing the research practices in the social sciences. By mastering the tools needed for automated web data collection, a single researcher can construct a data set that would have required tremendous efforts and expenses not too long ago. The course is intended to provide an applied overview of the skills required for automatically collecting data from the web. It will give a cursory introduction  to some of the most important skills and techniques. In particular, the workshop will provide an introduction to the basic structure of HTML to enable an understanding of the underlying architectures and mechanics of websites. XPath will be introduced as a syntax to address specific elements of websites and a tool to extract them as needed. Regular expressions are covered which allow further processing textual data gathered from the web. Finally, client-server  interactions in the HTTP protocol and the structure of URLs are discussed to understand web interactions in practice.  The applied elements of the workshop will make use of the programming language R.
*Therefore, a basic familiarity with R is a prerequisite for attending the course.*