All Interview Questions
Part: [questionnaire.dg]pOpen
Section: Openness
Does the repository provide access to the data with minimal or no restrictions?
Access to the data should be unfettered, without even requiring an account, but requiring registraiton before using the data would be considered a minimal restriction. Similarly, minimal restrictions may also include requiring special access for accessing large amounts of data, e.g., an API key.
Are you free to download the data and reuse it with no or minimal restrictions?
Many repositories that claim to be open are only open for humans to read, not for machine-based access or for re-use. So it is important to check before depositing data that the data is free to reuse according to the definition of the commons.
Data should be stored in a non-proprietary format, i.e., a format that is published and free for re-use by anyone, e.g., CSV. In contrast, a proprietary format is not published and can only be read by certain commercial software. As the goal of publishing data in a repository is for openness and re-use, data that is reliant on propriety software is by definition non-commons compliant. Adapted from Wikipedia
Are the data covered by a commons-compliant license?
Data should be covered by the least restrictive license possible consistent with ethical and legal constraints on some types of data, e.g., human subjects data. For a list of open licenses for data, see the Open Definition.
Part: [questionnaire.dg]pFAIR
Section: Testing whether a repository implements FAIR
Does your repository assign globally unique and persistent identifiers (PIDs)?
A globally unique and persistent identifier is one of the key pillars of FAIR data. If your data can't be found reliably, i.e., they do not have a stable address that is machine-readable, then it can't be accessible, interoperable or reusable to anyone else. A repository should assign a globally unique and resolvable identifier, e.g., a DOI, to a data set. Many data repositories assign locally unique identifiers, e.g., accession numbers like 5639. These can be turned into globally unique identifiers, e.g., by adding a URL prefix. The repository should also ensure that the identifier is persistent-that is, it is never re-assigned to another entity, even if the underlying data are removed, and the repository must stand behind its resolution, ensuring that the identifier reliably resolves to the data, even if the data move location. For a more detailed description of identifiers, see The FAIR Principles Explained by the Dutch Techcenter for Life Sciences.
Does the PID resolve to a landing page that describes the data?
Both the FAIR principles and the Data citation principles require that metadata persist, even if the data they describe are no longer available. FAIR also requires that the access rights to the data be both machine-readable and human understandable. Having the persistent identifier resolve to this page rather than to the data themselves ensures that a stable reference is provided even if the data are removed. The descriptive metadata should also include the necessary information for citing the data set (see Fenner M, Crosas M, Grethe J, Kennedy D, Hermjakob H, Rocca-Serra P, Berjon R, Karcher S, Martone M, Clark T (2016) A Data Citation Roadmap for Scholarly Data Repositories. bioRXiv Dec. 28, 2016. https://doi.org/10.1101/097196)
Does the repository support the addition of rich metadata to promote search and reuse of data?
Does the repository have a policy that ensures the metadata (landing page) will persist even if the data are no longer available?
Does the repository provide web-based search and access to the data and metadata?
Does the repository provide a clear license for reuse of the data?
Does the repository enforce or allow the use of community standards for data format or metadata?
Part: [questionnaire.dg]pCitable
Section: Citable
Does the repository allow you to associate your ORCID ID with a dataset?
Does the repository provide the required metadata for supporting data citation?
The Data Citation Implementation and Pilot Project groups at FORCE11 created a set of recommendations for how to cite data. The repository should contain the relevant metadata: authors (data creators), year, title of data, repository, unique identifier and version number. For examples and more information, see A Data Citation Primer.
Does the repository provide a machine-readable landing page?
This question may be a bit harder to answer, but if the repository is implementing the FAIR principles, it will provide a programmatic interface to its metadata. It will also ensure that the landing page containing the metadata for citation will have tags embedded so that a machine can read the different elements required for the citation, e.g., authors. Why is this important for citation? In order for data citation to be able to use reference managers and other tools for inserting and formatting citations, the repositories have to embed these metadata tags. If the repository is FAIR, these tags will come from a recognized community standard, e.g., the Dublin Core or schema.org.
Part: [questionnaire.dg]pTrustworthy
Section: Trustworthy-ness
Does the repository adhere to the requirements for the Data Repository Gold Seal of Approval?
Terms
- Data Repository Gold Seal of Approval: see https://www.datasealofapproval.org/en/information/requirements/
Is the repository stakeholder governed?
Repositories play a critical role in the commons, as they are the publishers of scholarly objects. but how do we determine whether a repository is a trustworthy publisher, that is, that they will adhere to the principles of the commons? We adapt here some of the principles for open infrastructures laid out by Bilder G, Lin J, Neylon C (2015) Principles for Open Scholarly Infrastructure. One of the most important is that the repository is stakeholder governed. In this way, the repository can be responsive to the commons community.
Is the governance of the repository transparent?
In general, the operations of a repository, including the selection of an advisory board, should be transparent.
Is the code that runs the data infrastructure covered under an open source license?
Trust requires that the community feels in control and that the mission of the repository cannot be co-opted by a few stakeholders or controlling interests. Therefore, Bilder et al. recommend that the code and content be "forkable", that is, all crucial parts, including data, code, legal agreements, can be replicated.