Link Search Menu Expand Document

Introduction to Twitter Scraping and Analysis with Python

A collection of web scraping case studies, exercises, tutorials, and resources for the 2021 FSI summer course Humanistic Approaches to Media and Data.

Tools

We will be programming exclusively with Google Colab. In order to interact with the Twitter API, we will be using the Python wrapper tweepy.

Prerequisites

The seminars expect that you already have an approved Twitter Developer Account. Apply for access to the Twitter API. A basic understanding of Python is also expected.

Coding Resources

Vocabulary

  • Web scraping: Collecting data from the internet in an automated way
  • Python: The language most often used for writing quick scripts for web scraping and data analysis. If you are using an API, there’s a good chance that there’s a wrapper in Python. Python has also been embraced by the data science community and has many libraries to support data cleaning and visualization.
  • API: Application Programming Interface. In the context of web scraping, it is a system used by web site owners to monitor and control how data exits their platform.
  • HTTP Methods: (e.g. POST, GET, UPDATE, DELETE) For web scraping we’re only interested in what are called “GET requests”, a request made to the website’s server for information. With that request, you include the type of information you need, and usually an authorization token.
  • Rate limiting: The speed limit placed on programmers that prevents them from making too many requests at once and overworking a site’s servers. This varies from site to site.
  • JSON: JavaScript Object Notation. A lightweight data format used throughout the web. If you receive a response through an API, it will almost certainly be in this format.
  • Wrapper: In web scraping a wrapper is a library of code that translates the API into a language that you’re comfortable programming in.

Thank you

Thank you to Kavita Kulkarni for her guidance in constructing these seminars. And thank you to Melanie Walsh for basically creating this course first 😄

Contact

My contact information is available on the Center for Digital Humanities website.