Link Search Menu Expand Document

Introduction to Tweepy

  1. Introduction to Tweepy
    1. Twitter Advanced Search
    2. First API Login
  2. Scrape a profile
    1. Get profile information
    2. Using the search API

Before using the Twitter API, consider Twitter Advanced Search. You can often get a lot of answers before writing a line of code! “Site-specific syntax” is not unique to Twitter. Check out useful ways to query Reddit or Google. Because working with APIs can be difficult and time-consuming, especially when first starting out, it’s best to informally test your hypotheses with these search tools as you form your research question. And as we’ll discuss soon, this search API is the only way you can get certain data without paying.

First API Login

Let’s log into the Twitter API and execute our first query! (Don’t worry, I’ll explain everything about how APIs work next. First let’s just get data onto our computer.)

First, we need your API keys. Log in to the developer portal. Create a new project under “Projects & Apps”. Name your project fsi-seminar and answer that your purpose is to learn the Twitter API. After filling out the questionnaire, you’ll be met with the alphanumeric keys generated for you to use the Twitter API. Copy both the “API Key” and the “API Secret Key” into a new Colab notebook. (You won’t need to your Authentication Token. This is used when you need special privileges, like posting tweets from your account.)

import tweepy # https://github.com/tweepy/tweepy

consumer_key = "" # put your information here!
consumer_secret = "" # put your information here!

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth)

# get tweet ID
tweet = api.get_status(1412424266763603968)
tweet.text

And there you go! You should see the text from a tweet from NASA. What do you notice about how the text is formatted?

Scrape a profile

If you need to scrape a profile, you are limited to the latest 3200 tweets. If you’ve set up th API (and placed it into the variable api) you can easily run the following code and get the latest 3200 tweets exported to a CSV.

import csv

def get_all_tweets(screen_name):
    alltweets = []  
    new_tweets = api.user_timeline(screen_name = screen_name, count=200)
    alltweets.extend(new_tweets)

    oldest = alltweets[-1].id - 1
    
    # keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        new_tweets = api.user_timeline(screen_name = screen_name, count=200, max_id=oldest)
        alltweets.extend(new_tweets)
        oldest = alltweets[-1].id - 1
        print(f"...{len(alltweets)} tweets downloaded so far")
    
    # transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text] for tweet in alltweets]
    
    # write the csv  
    with open(f'new_{screen_name}_tweets.csv', 'w') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(outtweets)

get_all_tweets('USER_NAME')  # Change me

What do you notice about how this CSV was created differently than other code we’ve looked at?

View some takeaways
  • Because we're using tweepy, we can get to the text and timestamp information straight from what tweepy calls a Tweet object. You can't do this with JSON, but tweepy already parsed the most important features and making them easily accessible.
  • We're using python's csv library instead of Pandas to write the CSV. If you're not analyzing the data and just want to make a CSV, using this library can be useful.

Get profile information

user = api.get_user('USER_NAME')

🪐 Exercise: Get the location of NASA through their Twitter profile.

View Solution

Using the search API

Twitter’s Search API is similar to their advanced search feature, in that they use the same syntax for both. In fact, it’s probably best to construct your query with their UI, test it there, and then paste it into your code once your sure that’s what you want.

# Paris's lat: 48.8566, long:2.3522, radius: 6mi
search_query = '#covid geocode:48.8566,2.3522,6mi min_faves:10'

# Get 10 items. If items() is left blank, it will get as many as possible,
#  but that command may take a while to execute.
tweets_cursor = tweepy.Cursor(api.search, q=search_query).items(10)
tweets = [tweet for tweet in tweets_cursor]

See this query in the Twitter UI.

Here I’m plugging in Paris’s geocode and a radius and minimum number of favorites to 10. Similar to the Advanced Search UI, you can look up by “Latest” or “Popular”, and/or filter by language.

If I wanted to collect all tweets to the President that mention climate change, the query would look something like:

search_query = '"climate change" OR "global warming" to:POTUS'

See this query in the Twitter UI.

Here I’ve used the OR operator to look for either the phrase climate change or global warming directed to the account POTUS.

There are endless examples. The main limitation is the time window. The search API is only valid for the past 7 days on the free tier. Twitter is currently updating its API, however, so this may change. Further, if you pursue Twitter analysis with a professor or in a more, you can apply for academic access, which provides some search functionality for free.

🦠 Exercises:

  • What could you learn by comparing English tweets in Paris with French tweets in Paris concerning Covid-19?
  • What would those queries look like?
  • Given the limitations we’ve discussed what wouldn’t you be able to determine?