Skip to Main Content

Social Media Research

Important information to consider when conducting social media research, helpful tools to assist in data collection and analysis, and links to resources on methods, ethics, examples, and more.

Social media ownership & Terms of Service

It's important to remember that all social media platforms are controlled by the corporations that own them. This has major implications for how researchers can interact with that platform - many platforms have some sort of restrictions on the automated gathering of posts, for example.

These restrictions can usually be found in a platform's "Terms of Service" (TOS). A given platform's TOS is a pseudo-legal agreement that may dictate how the platform may be interacted with by web scrapers or API access, how posts can be re-purposed, the sharing of datasets collected on the platform, and more.

Legal precedents for TOS enforcement, as well as for expectations of privacy related to any content posted to social media platforms, is still developing. Broadly, judges in the US have ruled that information published to publicly-viewable social media can no longer reasonably be expected to be private and that breaking a platform's TOS is not a crime, but laws around things like social media, privacy, and freedom of speech will vary by jurisdiction, especially internationally.

However, while not a crime, breaking a site's TOS may result in the account you used to access that site being terminated - this is a very common provision in most TOS agreements and several platforms have a history of cracking down on research uses. It is not advised to violate a platform's TOS in the course of doing research.

Ethical considerations when working with social media data

The ethics of social media research, as a new and ever-changing field of study, are still being determined. In the early days of social media research, interactions with participants were normally smaller in scale and more hands-on. With the rise of big data, computational network analysis, text mining, and more, the way that researchers interact with social media has changed. Social media itself has also changed, as platforms shift towards multimedia rather than text. As almost any study of social media will involve content produced by humans, content that may include identifying or sensitive information, and content that can, if republished, be easily reidentified by search engines, social media research should always be approached carefully. Specific disciplines may have guidelines, but in many cases the ethics are left up to the researchers.

You will want to look into the ethics of your particular field or methodology more deeply. Some starting points may be:

Classroom projects and student work like SIS SRPs do not need IRB approval. Professional researchers will want to contact the IRB for approval or exemption.

Type of collection method

Different ways of collecting data from social media platforms

There are three main ways to collect data from social media platforms, and these will interact differently with platform TOSes, produce different formats of data, and may be used together or on their own as needed for your research.

Application Programming Interfaces (APIs)

An API helps facilitate the transfer of data between two programs, which can include social media sites and your data collection tools. IBM explains an API as "An API is a set of defined rules that explain how computers or applications communicate with one another. APIs sit between an application and the web server, acting as an intermediary layer that processes data transfer between systems." IBM's page on APIs may be useful reading for those unfamiliar with the concept.

APIs will get data from platforms in the way that the platform handles the data. This information is often provided in .json format and may include more metadata than would be presented on the site normally. This is a preferred way of gathering data from social media platforms because it's standardized and complete. However, many platforms have limits on their APIs - how much you can collect at once, what you can use as queries, or how far back in time you can make API calls.

Some platforms additionally have "academic research" API access that go beyond the normal limits of the API calls. For example, academic research access to the Twitter API removes the restriction on searching older tweets and massively expands the number of tweets you can retrieve every month. Platform-specific API information is included in the platform-specific pages of this guide.

Web Scraping

Unlike APIs, web scraping does not involve the platform voluntarily providing data in the same way as it handles the data internally. Web scraping extracts data from the web pages that the platform presents to users using bots, web crawlers (like Google's), or scripts.

There are many different web scraping techniques, but most commonly information is extracted from the HTML code that makes up web pages. This is often then turned into a .csv spreadsheet file, but can also be other text formats or .json filetypes.

Web scraping is often used when API access doesn't allow for the collection of desired data, or doesn't exist at all. It enables you to collect any information presented on the platform's web pages. However, this is often frowned upon by the platforms and may be grounds for account termination under TOS.

Manual collection

In situations where platforms restrict API access and/or web scraping, or if the planned size of your dataset isn't large enough to be impractical without automated collection methods, you may want to collected the data the "old fashioned way", by hand. This may include downloading videos or images, screenshotting or making PDFs out of websites, or any other method of capturing content that is applicable for your project. A few tools for this sort of data collection can be found elsewhere in this subject guide.

You can observe, take notes, and record information manually about content on social media platforms without running afoul of the TOS. The same sort of rules apply to observation of people in the real world - if you're observing something private, like posts on a private account or direct messages and emails, you will need consent of the users and potentially Human Subjects research clearance by AU's Institutional Review Board. Classroom projects and student work like SIS SRPs do not need IRB approval.

Voluntary participation and debriefing

In many cases, social media study participants are never aware that their data has been collected and used for academic research at all. While the EU GDPR requires commercial data harvesting notifications and data harvesters to have systems for users to request their personal data sent to them and to have that deleted by the collector, there are no regulations like this for research.

Participant debriefing is a common part of in-person human subjects research, in which the researcher explains to the participant after the data collection interaction what the purpose of the study is, the researcher(s)'s hypotheses, and how the participant is contributing to it. This is, of course, difficult to do when you are collecting information from potentially tens of thousands of social media users!

Recently, there has been an attempt to tackle this challenge through the creation of a software called Bartleby, which is an automated debriefing and opt-out process for Twitter and Reddit. Researchers interested in adding automatic debriefing and/or giving their participants the option to opt out of the study may want to read the accompanying blog post from Citizens and Tech or open access journal article to learn more about large-scale social media study debriefing and how they can make use of Bartleby to accomplish it.