Data Papers – an ode to data

diesen Beitrag auf Deutsch lesen

It is well known that data is an important output of research. Many research funders demand that researchers publish their data in a repository – unless there are restrictions, e.g. concerning personal data – and thus making it available to others. Most research institutions also have policies requesting their researchers to publish their data. Researchers can use these policies to their advantage: The publication of a data paper is an option to gain more attention for a published dataset and therefore facilitating reuse. Sharing datasets and using the data of other research projects for one’s own work is becoming more and more common practice.

The concept of data papers originates from a time when it was still difficult to cite datasets directly: In the form of a descriptive article (the data paper) published in a journal, the data received scientific recognition via the traditional article citation route. Today, publication in data repositories facilitates sharing, describing, and referencing datasets e.g. by using a DOI. In additon, guidelines for citing data have been established. Yet, data journals continue to be popular: there arquite a few of them, mostly organised around subjects. Publishing a journal article on data, it seems, still has more reputation than „just“ the data set with a read-me file in a repository. Another reason for the higher reputation of datasets described in a data paper might be the external quality control, as data papers in journals undergo a peer review process. Studies on the citation of data sets indicate that data sets for which a data paper has also been published are cited more frequently than data sets without a corresponding data paper. The increased visibility can be explained by the fact that data journals are sometimes included in bibliographic databases. This provides another search option for finding datasets and adopts classic, albeit criticized, metrics (originally designed for journals) to datasets. For example, the geoscience open access data journal „Earth System Science Data“ currently has a journal impact factor of 11.333 and the more technical data journal „Scientific Data„, which belongs to „Nature“ and is also published open access, has one of 6.444.

Scientific journals usually do not publish the research data underlying the article; increasingly, however, a „Data Availability Statement“ indicates how the data can be accessed. Such availability statements vary widely: Some include that authors send the dataset upon request, others provide a link to the dataset. This is not a flaw of scientific journals – they simply serve another purpose:  Articles in scientific journals usually report on the findings, i.e., the interpretation derived from the data. Hence, scientific journals are not designed to be repositories for data: The journal’s underlying IT systems are optimized for texts; supplements to articles are often not included in long-term preservation of the journal’s content. Therefore, datasets are (in almost any case) better off in a data repository. Data journals, on the other hand, focus on a detailed description of the data, methods of collection and analysis as well as the way in which the dataset is published. Data journals demand the data described in a data paper are located on a suitable data repository. It is mandatory that both objects (data set and data paper) are mutually linked via a persistent identifier (ideally a DOI). Good to know: The review process considers not only the data paper but also the published dataset. Many data journals have a list of criteria that data repositories must meet. In general, this means that the published data should meet the FAIR principles. In order to allow for broad reuse, data should be published under a liberal distribution license (e.g., CC BY) whenever possible.

Describing data in a data paper is not a romantic love letter, but the time and attention spent describing data set and reviewing the paper are at least a sign of appreciation for a scientific output (i.e. the data set) that has unfortunately been neglected. Writing a data paper is thus a perfect activity in the „Love Data Week“!

Questions on which data repository (an institutional one like the LUH Research Data Repository, a subject-specific one, e.g. Pangaea, or a general one, e.g. Zenodo) and which data journal would be suitable for the publication of your data or data paper? Just reach out. And: If no suitable data journal can be found, we will gladly help you to start a data journal for your scientific community.

The Research Data Service Team will be happy to advise on any questions on publishing data: If you are interested in starting a data journal, please contact

... arbeitet im Bereich Publikationsdienste und betreut dort die Open-Access-Publikationsplattform TIB Open Publishing.

... hat die stellvertretende Leitung des Bereichs Publikationsdienste

… ist Historikerin (M.A.) und wissenschaftliche Bibliothekarin. Sie arbeitet im Rahmen der Niedersächsischen Landesinitiative Forschungsdaten im Bereich Publikationsdienste an der TIB.

... ist Mitarbeiterin im Referat PID- und Metadaten Services der TIB und dort vor allem für das Projekt PID Network zuständig.