Social Media Archive @ ICPSR (SOMAR) is a centralized repository for social media research data. SOMAR contains a wide range of data collected from large-scale social media platforms such as X (formerly Twitter), Facebook, Instagram, and Reddit, as well as smaller, more specialized data sets focused on specific research topics. These data sets have been collected and curated by researchers from around the world, and they cover various topics such as political communication, online behavior, and social networks. The data in our archive takes two forms. They can be public, i.e., available for immediate download, and/or restricted, i.e., available within our secure data enclave after receiving approval following a submitted restricted data application.
Tools for qualitative or quantitative analysis, text mining, content analysis, or other methods that aren't specific to individual platforms.
TikTok, as the newest platform on this list, has not had the benefit of a decade or more of academic study and the development of research tools. Additionally, with virtually no useful public API, automated data collection necessitates violating the TOS with web scraping or complex browser-simulating reverse-engineered calls to TikTok's private API.
Like Instagram, at this time (05/2022) any tools for automated extraction of videos, or text accompanying videos, are likely to violate TOS. Additionally, the ones that do exist require at least some coding or command line knowledge. The automated tools on this list - drawrowfly's TikTok scraper and the Bellingcat Hashtag Analysis Toolset built on top of it - are provided as an example of this sort of tool, but they are complex, risk a ban from TikTok, and require coding or command line knowledge. Researchers are encouraged to explore manual collection methods.