Social media research uses user-generated social media content and interactions as the subject of research. The tutorials below focus on the methods and important considerations of social media research, as well as a tool that can do this kind of research.
Text mining uses software to process and analyze large sets of unstructured texts to identify patterns and connections. These resources outline the basics of what text mining is, common approaches, and resources that you can use to conduct this type of analysis.
Note: Not all online resources allow text mining, and projects of this type may have legal or ethical considerations to take into account. In addition, not all library-licensed materials allow use of Artificial Intelligence (AI) tools for text analysis.
Google Ngram Viewer is a beginner, open-source text searching tool that lets you visualize and graph occurrences of words in texts located in Google Books. It's easy to use and can be a great place to start for refining your research questions and getting a brief preview of the possibilities for large-scale text analysis.
Voyant Tools is a web-based reading and analysis tool for digital texts. Voyant will takes your texts, create a corpus, and can calculate word frequencies, correlations between sets of words, commonly repeated phrases, topic clusters, and other analyses of interest to researchers. You can type in multiple URLs, paste in full text, or upload your own files for analysis.
HathiTrust Research Center Analytics is a tool that can perform large-scale text analysis on materials in the Hathi Trust Digital Library, which is is home to millions of digitized books and publications that you can use to gather a set of texts for your text mining research. The resources below provide information about how to use the HathiTrust Research Center, which you have access to through AU. Note: HathiTrust Research Center will no longer be funded as of the end of 2026, but the team hopes to continue to offer services to the research community.
AntConc is an open-source concordancer software program designed by Laurence Anthony. It can take corpora, analyze clusters of words and/or phrases, and highlight structures and contexts. It works with .txt files and can be downloaded to Windows, Mac, and Linux.
NVivo is a research tool for coding data from a variety of sources, including text-based documents, interviews, surveys, maps, audio/video files, and social media data, and then automating a variety of qualitative and quantitative research analyses for those sources. NVivo can be accessed in the Anderson Computing Complex on campus.
R is an open-source software program for statistical analysis that has the ability to do some text-mining analysis as well. The resource below showcases how to use R for this purpose.
MALLET is a well-respected resource for topic modeling, or exploring relationships of words and topics within large corpora. A Python Wrapper is available if you prefer to work in Python.
These resources may be text and data mined for academic scholarship or educational purposes. These resources have different license agreements and text and data mining rights to be aware of before beginning your data collection. Many also prohibit the use of automated mining with Artificial Intelligence (AI) tool.