Skip to main content

Research data across the lifecycle

An overview of considerations, resources and tools for working with data in your research projects

Subject Guide Author

Stefan Kramer's picture
Stefan Kramer
Associate Director for Research Data Services

Appointments: schedule one

ORCID ID: 0000-0001-5795-7629

Welcome to this guide to help you with dealing with data - for your research!

Nowadays, many things are referred to as "data" - people may say that when they mean "statistics," or "some factual/numeric information," or even just "something in digital format" - so it will be helpful to draw some boundaries and clarify some definitions first.  

  • This subject guide is about data that can be the input to, or the output of, academic research - we'll call that "research data" for short.
     
  • On the input side, data can come from a variety of sources - of course other academic research, but also from agencies or companies that collect data for purposes not necessarily with academic research in mind.
     
  • The output of academic research also includes publications or other documents, such as journal articles, conference papers, books, dissertations, theses, reports, etc. (which, in most empirical sciences, are at least in part based on research data).  This subject guide is not about that kind of output - we have many other subject guides for that.  
 

The purpose of this guide, then, is:

  • primarily, to help you with making the data that you generate in the course of your research discoverable and resuable by others, publish or share it, protect it as may necessary or required, and preserve it for posterity;
     
  • secondarily, to help you with locating and utlizing data that you may need for your empirical research.  We already have a number of other guides for finding resources that include data suitable as research input, including on statistical information, on country rankings, on Geographic Information Systems & Cartography, and on polls and public opinion.  Also, on our list of databases, those that contain downloadable, numeric data that may lend itself to analyses are identified with the icon: downloadable numeric data icon

 

What do we mean by "research data"?

What we mean by "research data" warrants some deliberation, to delineate it from other types of information, as mentioned above.  As the Australian National Data Service puts it: "Providing an authoritative definition of research data is challenging, as any definition is likely to depend on the context in which the question is asked." That is true!  

  • In the USA, one "official" definition comes from Circular A-110 by the Office of Management and Budget (emphasis added): Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. 

  • Research data can take many formats – for example, sound recordings of interviews from a population sample can be research data; text documents containing poems can be research data for an analysis of word patterns; images of the universe can be research data for an analysis of the movements of celestial bodies; a spreadsheet or database containing the recorded results of a telephone survey can be research data.  Based on its author's background, an emphasis in this guide will be on quantitative data in the social and allied sciences.

So, what is not research data?  Well, we would typically not call a journal article that presents scientific findings, including results from data analyses, "research data."  Nor would we refer to a statistical table that summarizes and presents data analyses as "research data."  Usually, we think of research data as a bunch of information - numerical datasets, images, sounds, etc. - that is so large and/or complex that it requires computer-aided analysis to make sense of - as opposed to a poster, article, presentation, or podcast that is intended to be comprehended by viewing or hearing it.

And yet, the water gets muddy again ... a whole lot of journal articles, or a whole lot of (or large enough) statistical tables, can themselves become input for (meta-)analysis.  So the output of research, in different formats and at different levels of information aggregation, can become the input of other data analysis ... we can therefore think of a lifecycle of research data.