Skip to Main Content

Social Media Research

Important information to consider when conducting social media research, helpful tools to assist in data collection and analysis, and links to resources on methods, ethics, examples, and more.

X (formerly Twitter) TOS, API, and rules for research

Alongside the TOS, X also has extensive guidance for non-commercial use of the API and specific guidance on what not to do with it.

Things that are restricted or forbidden include:

  • Deriving or storing health, sexuality, union membership, or other personal data
    • Some of this information can be worked with in aggregate, but not on an individual level
  • Matching X accounts to real-world people
  • Sharing large numbers of "hydrated" posts (more on this later)

Many tools for the collection of X datasets will require you to sign up for a developer account and get an API key to access the API services. You can do this on the X Developer site. 

The normal API is also known as the "streaming" API, as it provides access to new posts as they're made but not old posts. It generally will be pulling posts that are less than a week old, and many of the collection tools are intended to be set up and run while the conversation is ongoing to collect posts as they happen over a period of time. There is also the question of how comprehensive the X search system is, as it is a black-box algorithm expected to prioritize popular posts over comprehensive recall.

There is a specific elevated API access level for academic research that advanced social media researchers may want to look into as it enables more Tweet retrieval and retrieval of older archived Tweets. You can find out more about X features for academic research on the corresponding page.

Finding existing Twitter or X datasets

Twitter has been extensively studied for over a decade, and there are several existing historical datasets of Tweets for use in research. These can be found in repositories across the web.

It's important to note that Twitter TOS restricts the sharing of full Tweets in datasets. Most of these datasets will be Tweet IDs, which will need to have the associated text and metadata retrieved. This process is called "hydration" and there are a few apps that will do it for you, like the DocNow hydrator.

Existing Twitter datasets:

  • George Washington University's TweetSets
    • Contains several different datasets for each of the 2016, 2018, and 2020 federal elections, as well as tweets about major hurricanes, Olympic games, and more.
  • DocNoc Tweet Catalog
    • Contains 100+ (at time of writing) datasets on subjects from across political science, environmental science, international studies, and more.
  • The American Presidency Project and the Trump Twitter Archive archive full-text tweets from Donald Trump's now-suspended account
  • Twitter's Transparency Archive, which collects deleted Tweets believed to have been deliberate misinformation created by foreign state actors.
    • Twitter is working on creating additional no-code/low-code Tweet archives on topics of interest to academic researchers, but as of 05/2022 has only released this one.

Tools for Twitter research

  • Please see the "Collected Tools" page of this guide for more information about NVivo.

Example publications in Twitter research