Subject Guides: Social Media Research: Collected tools

General (non-platform-specific) tools

Social Media Archive @ ICPSR (SOMAR) is a centralized repository for social media research data. SOMAR contains a wide range of data collected from large-scale social media platforms such as X (formerly Twitter), Facebook, Instagram, and Reddit, as well as smaller, more specialized data sets focused on specific research topics. These data sets have been collected and curated by researchers from around the world, and they cover various topics such as political communication, online behavior, and social networks. The data in our archive takes two forms. They can be public, i.e., available for immediate download, and/or restricted, i.e., available within our secure data enclave after receiving approval following a submitted restricted data application.

Tools for qualitative or quantitative analysis, text mining, content analysis, or other methods that aren't specific to individual platforms.

NVivo is the most popular qualitative research software. It can handle an astonishingly wide array of file formats, including text, audio, images, and video. Like many of its competitors, it requires a paid license, but you can get access as an AU student, staff, or faculty member. Please see: Where can I download NVivo and get help using it?
- Competitors include MAXQDA, QDA Miner, and Quirkos. All are paid, and AU does not provide licenses.
- NVivo's accompanying data collection tool, the browser ad-on NCapture, can be used with several social media sites to manually collect data.The NVivo Help site has a page on how NVivo handles social media data.
Taguette is a free and open source software for qualitative coding in text sources. It runs in your browser or can be run locally. It's lightweight and easy, but it has less features than NVivo and supports less filetypes.
Voyant is browser-based open source environment for text reading and analysis. It can perform quantitative tasks like word frequency counts and additionally visualize many different types of relationships within the text corpus.
- There are several similar open source text analysis tools, like Libro, AntConc, and KHCoder, but Voyant is browser-based and simple to use.
oTranscribe is a free and open source audio transcription platform to that helps you type transcriptions of audio and video files while they play, all in your browser or in the app if you choose to download it.
Tropy is a free and open source image description desktop app. It allows you to organize, describe, and add metadata to images. It was designed to be used with scanned images from archival document collections, but can be used for other types of images.
OpenRefine is a popular and powerful tool for cleaning and reformatting data. Desktop app, small learning curve, great documentation and tutorials.

Tools for Twitter research

Netlytic
Netlytic is a browser-based social media research tool that has text mining and network visualization features. Works with Twitter, YouTube, RSS feeds, and Reddit. Free accounts are sufficient for most student purposes. Netlytic has a YouTube channel with demonstrations for a variety of types of project.
Chorus
Fully-contained collection-to-visualization Twitter research app. Requires downloads and a fair bit of configuration. It was created specific for doing Twitter analysis in social science research and is a pretty comprehensive search and analysis tool.
Mozdeh
Mozdeh is a social media quantitative analysis FOSS software that can also collect tweets, like Netlytic or Chorus. It works with the same things as Netlyltic: Tweets, YouTube comments, Reddit comments, and manually imported data. Unlike Netlytic, it is a desktop app. It also has a YouTube channel where you can find guides to collecting and analyzing data.
TAGS (Twitter Archiving Google Sheets)
A fairly straightforward way to get Tweets into Google Sheets to be loaded into whatever qualitative or quantitative analysis software you desire.
Tweet Archiver
A cousin to TAGS, it allows you to create queries and capture corresponding Tweets as they happen.
DocNow App
Twitter dataset collection tool created by Documenting the Now. There is a browser-based version for trying it out. Will additionally need the DocNow Hydrator installed. You can find out more at docnow.io
NCapture
NCapture is a Chrome extension that allows users to manually create text datasets from Facebook, Twitter, and YouTube, as well as capture YouTube videos for analysis in NVivo. *Only able to be used with NVivo - not an option otherwise*.

Please see the "Collected Tools" page of this guide for more information about NVivo.

twarc
twarc is a command-line utility and Python package for the collection and rehydration of Twitter datasets. It was created by Documenting the Now (DocNow), who are also responsible for the DocNow App, Hydrator, and other tools. You can find out more at docnow.io
Twitter's searchtweets Python package
A Twitter-provided Python library for working with the Twitter API in command line or scripts.
Reaper
Reaper, built on the socialreaper Python library, is a desktop app with no coding required. While it calls what it does "scraping", it makes use of site APIs and the user will need to register for an API key for any site they want to use Reaper on. This includes Facebook, Twitter, Reddit, YouTube, Tumblr, and Pinterest. It outputs all data as .csv tabular files.
4CAT
4CAT is a relatively advanced tool for the collection and analysis of social media data - it's best run on a UNIX server and has dependencies that it does not automatically install itself - but with the upside that it has modules built to work with important but niche platforms like 4chan, 8kun, Parler, and more, as well as Twitter and Reddit.

Tools for Facebook research

Reaper
Reaper, built on the socialreaper Python library, is a desktop app with no coding required. While it calls what it does "scraping", it makes use of site APIs and the user will need to register for an API key for any site they want to use Reaper on. This includes Facebook, Twitter, Reddit, YouTube, Tumblr, and Pinterest. It outputs all data as .csv tabular files.
NCapture
NCapture is a Chrome extension that allows users to manually create text datasets from Facebook, Twitter, and YouTube, as well as capture YouTube videos for analysis in NVivo. *Only able to be used with NVivo - not an option otherwise*.

Voxgov
Voxgov provides real-time access to social media feeds, press releases, publications, and documents from the federal government, It has built-in features for working with government social media, including creating .csv datasets from search results, creating graphs for how frequently your search terms were posted by government accounts, most commonly associated terms or people, and more! *AU community only*

Tools for Reddit research

Netlytic
Netlytic is a browser-based social media research tool that has text mining and network visualization features. Works with Twitter, YouTube, RSS feeds, and Reddit. Free accounts are sufficient for most student purposes. Netlytic has a YouTube channel with demonstrations for a variety of types of project.
Mozdeh
Mozdeh is a social media quantitative analysis FOSS software that can also collect tweets, like Netlytic or Chorus. It works with the same things as Netlyltic: Tweets, YouTube comments, Reddit comments, and manually imported data. Unlike Netlytic, it is a desktop app. It also has a YouTube channel where you can find guides to collecting and analyzing data.
Reaper
Reaper, built on the socialreaper Python library, is a desktop app with no coding required. While it calls what it does "scraping", it makes use of site APIs and the user will need to register for an API key for any site they want to use Reaper on. This includes Facebook, Twitter, Reddit, YouTube, Tumblr, and Pinterest. It outputs all data as .csv tabular files.
PRAW: the Python Reddit API Wrapper
PRAW is a Python library for working with the Reddit API.
4CAT
4CAT is a relatively advanced tool for the collection and analysis of social media data - it's best run on a UNIX server and has dependencies that it does not automatically install itself - but with the upside that it has modules built to work with important but niche platforms like 4chan, 8kun, Parler, and more, as well as Twitter and Reddit.
pushshift.io
Pushshift is a popular wrapper for the Reddit API used with the requests package in Python. Documentation on pushshift.io is there, but tutorials must be found elsewhere.

Here's one tutorial on how to use the pushshift.io wrapper in Python.

Tools for YouTube research

Netlytic
Netlytic is a browser-based social media research tool that has text mining and network visualization features. Works with Twitter, YouTube, RSS feeds, and Reddit. Free accounts are sufficient for most student purposes. Netlytic has a YouTube channel with demonstrations for a variety of types of project.
Mozdeh
Mozdeh is a social media quantitative analysis FOSS software that can also collect tweets, like Netlytic or Chorus. It works with the same things as Netlyltic: Tweets, YouTube comments, Reddit comments, and manually imported data. Unlike Netlytic, it is a desktop app. It also has a YouTube channel where you can find guides to collecting and analyzing data.
Reaper
Reaper, built on the socialreaper Python library, is a desktop app with no coding required. While it calls what it does "scraping", it makes use of site APIs and the user will need to register for an API key for any site they want to use Reaper on. This includes Facebook, Twitter, Reddit, YouTube, Tumblr, and Pinterest. It outputs all data as .csv tabular files.
NCapture
NCapture is a Chrome extension that allows users to manually create text datasets from Facebook, Twitter, and YouTube, as well as capture YouTube videos for analysis in NVivo. *Only able to be used with NVivo - not an option otherwise*.

Voxgov
Voxgov provides real-time access to social media feeds, press releases, publications, and documents from the federal government, It has built-in features for working with government social media, including creating .csv datasets from search results, creating graphs for how frequently your search terms were used by government accounts, most commonly associated terms or people, and more! *AU community only*

Tools for TikTokk Research

TikTok, as the newest platform on this list, has not had the benefit of a decade or more of academic study and the development of research tools. Additionally, with virtually no useful public API, automated data collection necessitates violating the TOS with web scraping or complex browser-simulating reverse-engineered calls to TikTok's private API.

Like Instagram, at this time (05/2022) any tools for automated extraction of videos, or text accompanying videos, are likely to violate TOS. Additionally, the ones that do exist require at least some coding or command line knowledge. The automated tools on this list - drawrowfly's TikTok scraper and the Bellingcat Hashtag Analysis Toolset built on top of it - are provided as an example of this sort of tool, but they are complex, risk a ban from TikTok, and require coding or command line knowledge. Researchers are encouraged to explore manual collection methods.

Bellingcat: Investigate TikTok Like A Pro!
A guide from open source intelligence journalism outlet Bellingcat on manual data collection and search techniques on TikTok.
Zeeschuimer
A Firefox browser extension that helps with manual data collection by recording information from posts on Instagram and Tiktok as you scroll through them. Exports as a .json file or directly to 4CAT. Captures a large amount of metadata and will need cleaning or loading into 4CAT to be easily readable. Does not capture images or videos. Does not violate the TOS.
4CAT
4CAT is a relatively advanced tool for the collection and analysis of social media data - it's best run on a UNIX server and has dependencies that it does not automatically install itself - but with the upside that it has modules built to work with important but niche platforms like 4chan, 8kun, Parler, and more, as well as Twitter and Reddit. Works with TikTok metadata collected via Zeeschuimer.
Bellingcat's TikTok Timestamp date extractor
Very simple site that decodes Tiktok video urls to get the actual creation date, which is not otherwise shown. Does not violate TOS.
drawrowfly's TikTok Scraper
A powerful node.js application that uses unofficial API calls as well as web scraping to download videos, post text, and user/video/hashtag metadata. Violates the TOS!
Bellingcat's TikTok Hashtag Analysis Toolset
Builds on drawrowfly's node.js application to help identify co-occurances of hashtags. Link points to post on Bellingcat site giving examples of the toolset's use.