Skip to Main Content

Research integrity

A guide outlining Library support available to Researchers and Higher Degree by Research students on aspects of research integrity.

Integrity in research data

What is research data?

Before we understand what integrity might mean in this context, we must first ask: What is research data? Research data is the information, records, and files that are collected or used during the research process. Data may be numerical, descriptive, visual, raw, analysed, experimental, or observational.

Some examples of research data include:

  • laboratory notebooks
  • field notebooks
  • primary research data from your experiments, field observations, questionnaires, focus groups and surveys
  • sound and video recordings
  • photographs
  • models
  • artefacts from an archeological dig
  • computer code

Grant and funding bodies require research data to be managed through its lifecycle. You may need to provide information about the data or the data itself, for example, some journals require it or you may want to patent an invention.  

Creating a research data management plan at the start of the research project is the simplest way to save time in the collection, description, analysis, and reuse of the data. Effective management and documentation of research data means you can verify your research results, replicate the research, and provide access to data.

Why is integrity so crucial with research data?

Consider the case of Gregg Semenza, winner of the 2019 Nobel Prize for Physiology or Medicine

Semenza shared his 2019 Nobel Prize with two other researchers, but only his papers keep popping up on PubPeer bearing telltale signs of data fakery. There are some recurrent author names suggesting naughty mentees or collaborators, but still, in many cases, Semenza is the last and corresponding author, so the final responsibility is his. After all, Nobel Prize recognition comes from that same last authorship.

For more examples of data fakery, contested research, and the necessity of data reproducibility, look at the article quoted above in Gregg Semenza: real Nobel Prize and unreal research data. As detailed in the piece, misrepresentation of research data, and the benefits and rewards that go with these published findings, go to the heart of integrity and trustworthy research. An inability to prove your findings is disastrous for the publication and verifiability of research, as detailed in the Reproducibility section below. 

Not only is a clear and consistent methodology important in research, but easily available, reliable, and honestly presented data is crucial. Research without an integrity-driven research data practice is a house built on sand.

Reference

Schneider, L. (n.d.). Gregg Semenza: real Nobel Prize and unreal research datahttps://forbetterscience.com/2020/10/07/gregg-semenza-real-nobel-prize-and-unreal-research-data/

FAIR data principles

The FAIR Data Principles (Findable, Accessible, Interoperable, Reusable) were drafted at a Lorentz Center workshop in Leiden in the Netherlands in 2015, and have since received worldwide recognition by various organisations, including FORCE11, National Institutes of Health (NIH), and the European Commission, as a useful framework for thinking about sharing data in a way that will enable maximum use and reuse. They are a way of thinking about getting the most out of your research data, and its place in the wider researcher community.

Findable

Can your data be found if someone is looking for it? Does it have a DOI or a Handle? Does it have rich metadata? Is it discoverable through a research portal or a repository? 

Accessible

Does your data utilise a standardised protocol? Your data does not necessarily have to be "open" - there are sometimes good reasons why data cannot be made open, i.e. privacy concerns, national security, or commercial interests - but if it is not there should be clarity and transparency around the conditions governing access and reuse.

Interoperable

To be interoperable the data will need to use community agreed formats, language, and vocabularies. Will someone who finds your data be able to meaningfully reuse it, and build or reproduce your work? The metadata you use will also need to use community agreed standards and vocabularies, and contain links to related information using identifiers.

Reusable

Reusable data should maintain its initial richness. For example, it should not be diminished for the purpose of explaining the findings in one particular publication. It needs a clear machine-readable licence and provenance information on how the data was formed. It should also have discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse.​

Reproducibility

Reproducibility is crucial in consideration of research data, and it should maintain its initial richness. If possible, data should have a clear machine-readable licence and provenance information on how the data was formed. It should also have discipline-specific data and metadata standards to give it rich contextual information that will allow for reuse.

The Australian Research Data Commons (ARDC) outlines the importance of data provenance metadata explaining that "data provenance . . . is the documentation of why and how the data was produced, where, when and by whom the data is collected" (ARDC, 2022, para 1).

It is essential to capture data provenance metadata as it provides details of how the primary data was collected, methodologies and processes used to extract data, and how the data was analysed. Having data provenance metadata ensure that when the data is published that it is credible and for those reusing the data it establishes trustworthiness in the data.

For more information on the definitions and concepts on reproducibility consider the chapter 'Understanding Reproducibility and Replicability' from the National Academies and Sciences' Reproducibility and replicability in science consensus study report.

Reference

ARDC. 2022. Data provenance metadata: Builds trust, credibility and reproducibility https://ardc.edu.au/article/data-provenance-metadata-builds-trust-credibility-and-reproducibility/

Book cover attribution

National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science National Academic Press. https://doi.org/10.17226/25303

Figshare and data integrity

What is Figshare?

Figshare is a best-in-class data publishing platform for RMIT researchers and Higher Degree Research students to store, manage, share, and discover research.

Figshare provides:

  • a free and safe environment to store, share and promote research data with the global community
  • the ability to mint DOIs (Digital Object Identifiers) to help you track and promote your data

Benefits of Figshare:

  • researchers can create rich metadata to aid in discoverability, provide funding details, and choose their own license for use and reuse of their work
  • researchers can edit and update records as their research evolves, and load data in numerous forms (e.g. spreadsheets, raw code, large audio and video files)
  • publications accompanied by published data attract 25% more citations, increasing your impact and engagement
  • meets funder and journal requirements to publish your research data

State of open data

For an overview of the Open Data landscape please consider Figshare's annual State of Open Data report, made in collaboration with Digital Science and Springer Nature.


How can it help with research integrity?

Figshare not only offers research data visibility, but it also offers data security - when it comes to verification, provability, and "FAIR data principles", data repositories like Figshare are crucial in supporting your publications and securing the integrity of your research long into the future. It offers version control that allows you to track changes to your data post-publication, and all work is stored on Amazon AWS S3 servers located in Australia that perform regular, systematic data integrity checks.