All guides: Research integrity: Research data management

Integrity in research data

What is research data?

Before we understand what integrity might mean in this context, we must first ask: What is research data? Research data is the information, records, and files that are collected or used during the research process. Data may be numerical, descriptive, visual, raw, analysed, experimental, or observational.

Some examples of research data include:

laboratory notebooks
field notebooks
primary research data from your experiments, field observations, questionnaires, focus groups and surveys
sound and video recordings
photographs
models
artefacts from an archeological dig
computer code

Grant and funding bodies require research data to be managed through its lifecycle. You may need to provide information about the data or the data itself, for example, some journals require it or you may want to patent an invention.

Creating a research data management plan at the start of the research project is the simplest way to save time in the collection, description, analysis, and reuse of the data. Effective management and documentation of research data means you can verify your research results, replicate the research, and provide access to data.

Why is integrity so crucial with research data?

Consider the case of Gregg Semenza, winner of the 2019 Nobel Prize for Physiology or Medicine

Semenza shared his 2019 Nobel Prize with two other researchers, but only his papers keep popping up on PubPeer bearing telltale signs of data fakery. There are some recurrent author names suggesting naughty mentees or collaborators, but still, in many cases, Semenza is the last and corresponding author, so the final responsibility is his. After all, Nobel Prize recognition comes from that same last authorship.

For more examples of data fakery, contested research, and the necessity of data reproducibility, look at the article quoted above in Gregg Semenza: real Nobel Prize and unreal research data. As detailed in the piece, misrepresentation of research data, and the benefits and rewards that go with these published findings, go to the heart of integrity and trustworthy research. An inability to prove your findings is disastrous for the publication and verifiability of research, as detailed in the Reproducibility section below.

Not only is a clear and consistent methodology important in research, but easily available, reliable, and honestly presented data is crucial. Research without an integrity-driven research data practice is a house built on sand.

Reference

Schneider, L. (n.d.). Gregg Semenza: real Nobel Prize and unreal research data. https://forbetterscience.com/2020/10/07/gregg-semenza-real-nobel-prize-and-unreal-research-data/

FAIR Data Principles

The FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) were drafted at a Lorentz Center workshop in Leiden in the Netherlands in 2015, and have since received worldwide recognition by various organisations, including FORCE11, National Institutes of Health (NIH), and the European Commission, as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse. They are a way of thinking about getting the most out of your research data, and its place in the wider researcher community.

Findable

Can your data be found if someone is looking for it? Does it have a DOI or a Handle? Does it have rich metadata? Is it discoverable through a research portal or a repository?

Accessible

Does your data utilise a standardised protocol? Your data does not necessarily have to be "open" - there are sometimes good reasons why data cannot be made open, i.e. privacy concerns, national security, or commercial interests - but if it is not there should be clarity and transparency around the conditions governing access and reuse.

Interoperable

To be interoperable the data will need to use community agreed formats, language, and vocabularies. Will someone who finds your data be able to meaningfully reuse it, and build or reproduce your work? The metadata you use will also need to use community agreed standards and vocabularies, and contain links to related information using identifiers.

Reusable

Reusable data should maintain its initial richness. For example, it should not be diminished for the purpose of explaining the findings in one particular publication. It needs a clear machine-readable licence and provenance information on how the data was formed. It should also have discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse.

FAIR Data Principles

Reproducibility

Reproducibility is crucial in consideration of research data, and it should maintain its initial richness. If possible, data should have a clear machine-readable licence and provenance information on how the data was formed. It should also have discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse.

The Australian Research Data Commons (ARDC) outlines the importance of data provenance metadata explaining that "data provenance . . . is the documentation of why and how the data was produced, where, when and by whom the data is collected" (ARDC, 2022, para 1).

It is essential to capture data provenance metadata as it provides details of how the primary data was collected, methodologies and processes used to extract data, and how the data was analysed. Having data provenance metadata ensure that when the data is published that it is credible and for those reusing the data it establishes trustworthiness in the data.

For more information on the definitions and concepts on reproducibility consider the chapter 'Understanding Reproducibility and Replicability' from the National Academies and Sciences' Reproducibility and replicability in science consensus study report.

Reproducibility and replicability in science
One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery. Concerns about reproducibility and replicability have been expressed in both scientific and popular media. Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research.

Reference

ARDC. (2022). Data provenance metadata: Builds trust, credibility and reproducibility. https://ardc.edu.au/article/data-provenance-metadata-builds-trust-credibility-and-reproducibility/

Book cover attribution

National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science. National Academic Press. https://doi.org/10.17226/25303

Research Repository and data integrity

About the Research Repository

The Research Repository provides:

a free and safe environment to store, share and promote research data with the global community
the ability to mint DOIs (Digital Object Identifiers) to help you track and promote your data

Benefits of the Research Repository include:

researchers can create rich metadata to aid in discoverability, provide funding details, and choose their own license for use and reuse of their work
researchers can edit and update records as their research evolves, and load data in numerous forms (e.g. spreadsheets, raw code, large audio and video files)
publications accompanied by published data attract 25% more citations, increasing your impact and engagement
meets funder and journal requirements to publish your research data

State of open data

For an overview of the Open Data landscape please consider Digital Science's annual State of Open Data report.

The State of Open Data 2024
The State of Open Data survey continues to provide a detailed and sustained insight into the motivations, challenges, perceptions, and behaviors of researchers towards open data.

How can it help with research integrity?

The Research Repository not only offers research data visibility, but it also offers data security - when it comes to verification, provability, and "FAIR data principles", data repositories are crucial in supporting your publications and securing the integrity of your research long into the future. It offers version control that allows you to track changes to your data post-publication, and all work is stored on Amazon AWS S3 servers located in Australia that perform regular, systematic data integrity checks.

Data Integrity and Authenticity Policy

Teaching and Research guides

Research integrity

Integrity in research data

What is research data?

Why is integrity so crucial with research data?

Reference

FAIR Data Principles

Findable

Accessible

Interoperable

Reusable

Reproducibility

Reference

Book cover attribution

Research Repository and data integrity

About the Research Repository

State of open data

How can it help with research integrity?