Alongside the TOS, X also has extensive guidance for non-commercial use of the API and specific guidance on what not to do with it.
Things that are restricted or forbidden include:
Many tools for the collection of X datasets will require you to sign up for a developer account and get an API key to access the API services. You can do this on the X Developer site.
The normal API is also known as the "streaming" API, as it provides access to new posts as they're made but not old posts. It generally will be pulling posts that are less than a week old, and many of the collection tools are intended to be set up and run while the conversation is ongoing to collect posts as they happen over a period of time. There is also the question of how comprehensive the X search system is, as it is a black-box algorithm expected to prioritize popular posts over comprehensive recall.
There is a specific elevated API access level for academic research that advanced social media researchers may want to look into as it enables more Tweet retrieval and retrieval of older archived Tweets. You can find out more about X features for academic research on the corresponding page.
Twitter has been extensively studied for over a decade, and there are several existing historical datasets of Tweets for use in research. These can be found in repositories across the web.
It's important to note that Twitter TOS restricts the sharing of full Tweets in datasets. Most of these datasets will be Tweet IDs, which will need to have the associated text and metadata retrieved. This process is called "hydration" and there are a few apps that will do it for you, like the DocNow hydrator.
Existing Twitter datasets: