blog-image

Collecting data

In this class, we will learn how to retrieve textual data from web pages and social networks. We will cover the following points:

  • How web server and browser interact
  • What’s in a web page (html, css)
  • How to retrieve web pages (Python’s Requests Library)
  • How to process web page content (BeautifulSoup)
  • What’s an API(Application Programming Interface)
  • How to extract text from Wikipedia and messages from the Mastodon social network (wptools, Mastodon)