Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data

Andreas Rauber; Bernhard Gößwein; Carlo Maria Zwölf; Chris Schubert; Florian Wörister; James Duncan; Katharina Flicker; Koji Zettsu; Kristof Meixner; Leslie Mcintosh; Reyna Jenkyns; Stefan Pröll; Tomasz Miksa; Mark Parsons

doi:10.1162/99608f92.be565013

Article Dans Une Revue Harvard Data Science Review Année : 2021

Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data

, , (1) , , , , , , , , , , ,

Andreas Rauber

Fonction : Auteur

Bernhard Gößwein

Fonction : Auteur

Carlo Maria Zwölf

Fonction : Auteur

Laboratoire d'Etude du Rayonnement et de la Matière en Astrophysique et Atmosphères = Laboratory for Studies of Radiation and Matter in Astrophysics and Atmospheres

Chris Schubert

Fonction : Auteur

Florian Wörister

Fonction : Auteur

James Duncan

Fonction : Auteur

Katharina Flicker

Fonction : Auteur

Koji Zettsu

Fonction : Auteur

Kristof Meixner

Fonction : Auteur

Leslie Mcintosh

Fonction : Auteur

Reyna Jenkyns

Fonction : Auteur

Stefan Pröll

Fonction : Auteur

Tomasz Miksa

Fonction : Auteur

Mark Parsons

Fonction : Auteur

Résumé

Precisely identifying arbitrary subsets of data so that these can be reproduced is a daunting challenge in data-driven science, the more so if the underlying data source is dynamically evolving. Yet an increasing number of settings exhibit exactly those characteristics. Larger amounts of data are being continuously ingested from a range of sources (be it sensor values, online questionnaires, documents, etc.), with error correction and quality improvement processes adding to the dynamics. Yet, for studies to be reproducible, for decision-making to be transparent, and for meta studies to be performed conveniently, having a precise identification mechanism to reference, retrieve, and work with such data is essential. The Research Data Alliance (RDA) Working Group on Dynamic Data Citation has published 14 recommendations that are centered around time-stamping and versioning evolving data sources and identifying subsets dynamically via persistent identifiers that are assigned to the queries selecting the respective subsets. These principles are generic and work for virtually any kind of data. In the past few years numerous repositories around the globe have implemented these recommendations and deployed solutions. We provide an overview of the recommendations, reference implementations, and pilot systems deployed and then analyze lessons learned from these implementations. This article provides a basis for institutions and data stewards considering adding this functionality to their data systems.

Domaines

Théorie de l'information et codage [math.IT]

Carlo Maria ZWÖLF : Connectez-vous pour contacter le contributeur

https://hal-obspm.ccsd.cnrs.fr/obspm-03956295

Soumis le : mercredi 25 janvier 2023-15:46:40

Dernière modification le : mercredi 15 janvier 2025-11:46:03

Dates et versions

obspm-03956295 , version 1 (25-01-2023)

Identifiants

HAL Id : obspm-03956295 , version 1
DOI : 10.1162/99608f92.be565013

Citer

Andreas Rauber, Bernhard Gößwein, Carlo Maria Zwölf, Chris Schubert, Florian Wörister, et al.. Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data. Harvard Data Science Review, 2021, 3 (4), ⟨10.1162/99608f92.be565013⟩. ⟨obspm-03956295⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

OBSPM INSU ENS-PARIS CNRS UNIV-CERGY LERMA PSL SORBONNE-UNIVERSITE SU-SCIENCES

24 Consultations

0 Téléchargements

Precisely and Persistently Identifying and Citing Arbitrary Subsets of Dynamic Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager