Question number, id | Question text | Answers |
---|---|---|
1 [sc-drc.dg]acc |
Does the repository provide access to the data with minimal or no restrictions?How easy is it for users to gain access to the data? Are any impediments in place reasonable given the nature of the data, e.g., authorization for sensitive data. Repositories that make metadata available during the embargo period should not be penalized on this question. Options
|
|
2 [sc-drc.dg]reuse |
Are you free to reuse the data with no or minimal restrictions?Many repositories that claim to be open are only open for humans to read, not for machine-based access or for re-use. So it is important to check before depositing the data that it is free to re-use according to the definition of the Commons. Data should be stored in a non-proprietary format, that is, a format that is published and free for re-use by anyone, such as CSV. In contrast, proprietary formats can only be read by certain commercial software. As the goal of publishing data in a repository is for openness and re-use, data reliant on propriety software is by definition non-commons compliant. Adapted from Wikipedia Consider the type of data and whether the access mechanism places undue restraints on the ability to re-use the data. Also, consider the license that specifies 3rd party rights: does it allow the data to be re-used and -shared as part of a new product? Ideally, the repository should have a clear statement about acceptable file format characteristics. In the absence of such a policy, a check can be made to ascertain which formats are available. If there are both open and closed examples they are coded as “Yes with proprietary formats”. If individual data sets are offered under multiple licenses, this can complicate the re-use process further. Options
|
|
3 [sc-drc.dg]lic-clr |
Does the repository provide a clear license for reuse of the data?Ideally, a metadata field Options
|
|
4 [sc-drc.dg]lic-cc |
Are the data covered by a commons-compliant license?FAIR requires a clear license but it is mute about the level of openness; the Commons requires that the data be as open as possible; closed as necessary. Is the license used consistent with that? In this question, we use the definition for "Open" from [https://opendefinition.org/licenses/](the Open Definition). These licenses conform to the Open Definition but not to Re-Use Options
|
|
5 [sc-drc.dg]plat |
Does the repository platform make it easy to work with (e.g. download/re-use) the data?Most repositories provide data for download, but with very large data sets, download can be a significant impediment for reuse. In such a case, a cloud platform may make it easier for researchers to actually reuse the data. |
|
6 [sc-drc.dg]ru-doc |
Does the repository require or support documentation that aids in proper (re)-use of the data?Vignettes or help that are designed not just for use of the repository, but that helps users understand the types of questions that can be answered by using the data and tools. May be at the repository level for homogeneous data or at the data set level for heterogeneous data. Repositories are expected to have basic help materials and tutorials. We are asking for a level above that to fully achieve FAIR. not just how to perform certain functions but why you can use the resource to answer certain types of questions. Options
|
|
7 [sc-drc.dg]sch-ui |
Does the repository provide a search facility for the data and metadata?Human focused: On-line repositories should provide a means to search available data either through keyword or structured search. |
|
8 [sc-drc.dg]pid-g |
Does the repository assign globally unique and persistent identifiers (PIDs)?A globally unique and persistent identifier is one of the key pillars of FAIR data. If a data set can't be found reliably, i.e., it does not have a stable address that is machine-readable, then it can't be accessible, interoperable or reusable to anyone else. A repository should assign a globally unique and resolvable identifier, e.g., a DOI, to a data set. Many data repositories assign locally unique identifiers, e.g., accession numbers like 5639. These can be turned into globally unique identifiers, e.g., by adding a URL prefix. The repository should also ensure that the identifier is persistent-that is, it is never re-assigned to another entity, even if the underlying data are removed, and the repository must stand behind its resolution, ensuring that the identifier reliably resolves to the data, even if the data move location. For a more detailed description of identifiers, see The FAIR Principles Explained by the Dutch Techcenter for Life Sciences. Answer is "Yes" if this is the default option, i.e., an externally linked and registered PID, e.g., a DOI, Handle, ARK. Sometimes Accession numbers are offered as standard. These can be easily upgraded to PIDs through the Compact Identifiers functionality (e.g. by registering them at identifiers.org) but unless this is specified on the website, the response is "No". |
|
9 [sc-drc.dg]orcid |
Does the repository allow you to associate your ORCID ID with a dataset?Data sets are scholarly works and should be credited as such. The use of ORCID streamlines this process. Options:
|
|
10 [sc-drc.dg]md-level |
Does the repository support the addition of rich metadata to promote search and reuse of data?We are interpreting rich metadata to include the basic descriptive information about the data set, i.e., those fields recommended by the DCIP, with the addition of critical biomedical metadata, e.g., organism studied, disease condition, technique. dkNET has a recommended set of rich metadata. These data provide an overall context for understanding what the data set is about, but don’t necessarily delve into particulars. Options
|
|
11 [sc-drc.dg]md-prv |
Are the (meta)data associated with detailed provenance?Is the appropriate provenance provided, e.g., if they use the Gene Ontology term do we know it’s from gene ontology? In biomedicine, making sure that the relationship between subjects and specimens and data is explicit is extremely important. RRIDs should be used to make sure that all data sets that use the same strain or sample can be found and combined.
Options
|
|
12 [sc-drc.dg]md-daci |
Does the repository provide the required metadata for supporting data citation?The repository should provide the necessary metadata for a full data citation according to the Joint Declaration of Data Citation Principles. Authors, Title of data set, Version, Repository, Date published, PID. It should also be set up to enable exporting the citation reference via a reference manager (e.g. JSON, XML, Bibtex). Options
|
|
13 [sc-drc.dg]md-ref |
Do the metadata include qualified references to other (meta)data?How well specified are the relationships included in the metadata, e.g., applied in the context of publications, does the resource use the DataCite or some other schema/standard that specifies the relationship of an identifier to the data set, e.g., a PubMed ID for a publication that first reported the data set. Should be machine friendly, e.g., ID’s for publications rather than free text. Options
|
|
14 [sc-drc.dg]md-lnk |
Does the repository support bidirectional linkages between related objects such that a user accessing one object would know that there is a relationship to another object?E.g., does the repository provide a linkage between the publication that first described the data and the data set; does the repository maintain bidirectional linkages between versions, if a dataset has multiple parts, each deposited in a different specialist repository, are the linkages clearly specified across all repositories. Options
|
|
15 [sc-drc.dg]fmt-com |
Does the repository enforce or allow the use of community standards for data format or metadata?A statement by the repository on the standards they follow and their enforcement policy, including curation and/or software validation. The standards should be recognized as a community standard, e.g., in FAIRsharing or through associated publications. If no such statement can be found on the site, then “No” |
|
16 [sc-drc.dg]md-dkn |
Does the repository accept metadata that is applicable to the dkNET community disciplines?Biomedical repositories, in addition to the basic Dublin Core or Schema.org metadata, require certain fields to maximize utility as specified in dkNET’s rich metadata specification. In addition, since dkNET is fostering an information network among the centers and data bases funded by NIDDK, we are expecting that they will include relevant connections to other dkNET listed resources. Options
|
|
17 [sc-drc.dg]md-psst |
Does the repository have a policy that ensures the metadata (landing page) will persist even if the data are no longer available?Is there evidence that metadata persists even when the data are no longer available. Ideally, repositories clearly state their accessioning and de-accessioning policies as per the data citation principles. Options
|
|
18 [sc-drc.dg]md-FAIR |
Do the metadata use vocabularies that follow FAIR principles?Use of a community ontology, e.g., OBO, or a controlled vocabulary that follows FAIR principles in order to facilitate combining data from one repository with another. Options
|
|
19 [sc-drc.dg]land-ctsp |
Does the machine-readable landing page support data citation?Ideally, the above metadata (both descriptive and data citation relevant) should be able to be harvested automatically, e.g., by a citation manager. We check this by:
|
|
20 [sc-drc.dg]md-cs |
Does the repository use a recognized community standard for representing basic metadata?There are good schemas now available for general purpose data set metadata, e.g., DataCite schema, Dublin Core, schema.org. When a recognized schema is used, it promotes interoperability among data repositories and helps with data set search. Does the repository have supporting software and tools to enforce and take advantage of this standard, e.g., a validator. Options
|
|
21 [sc-drc.dg]acc-api |
Can the (meta)data be accessed via a standards compliant API?The repository provides documentation on how to programmatically access their content and that this method uses a well recognized and used method for access, e.g., RESTful services. |
|
22 [sc-drc.dg]md-vcb |
Do the metadata use a formal accessible shared and broadly applicable language for knowledge representation?The key concept here is “shared”. That is, two resources that use the same tags to mean the same thing, can be combined more easily than if they assign custom labels. https://www.go-fair.org/fair-principles/i1-metadata-use-formal-accessible-shared-broadly-applicable-language-knowledge-representation/ In assessing the repository, consider the hurdles that have to be cleared in order to use the data and metadata, in other words, what does the user have to struggle with before using the data? Check formats, services provided and evaluate whether they conform with the principle. Resources include GOFAIR:
The RDF extensible knowledge representation model is a way to describe and structure datasets. The Dublin Core Schema is an example. Also includes: OWL, JSON LD, OPM (Open Provenance Model) and OntoDM (Ontology for Data Mining), EBI RDF Platform ontologies Options
|
|
23 [sc-drc.dg]sch-api |
Does the repository provide an API-based search of the data and metadata?Application focused: A remote system can send a query according to a structured API, and the repository will return a list of datasets or research artifacts that match the query criteria. |
|
24 [sc-drc.dg]gov-tsp |
Is the governance of the repository transparent?In general, the operations of a repository, including the selection of an advisory board, should be transparent. Evidence of how decisions are made that affect the repository’s scope or direction, e.g., Is there an Advisory Board? how are advisory members chosen, what are their terms, how are decisions made on behalf of the repository? Is it one person? Is there a voting system? Do we know who runs the repository? Options:
|
|
25 [sc-drc.dg]oss |
Is the code that runs the data infrastructure covered under an open source license?From the principles of open infrastructures. If the repository violates the community principles, could the repository be recreated by the community? Some of them are and say so. Some things to look for:
Options:
|
|
26 [sc-drc.dg]tr-seal |
Has the repository been certified by Data Seal of Approval or the Core Trust Seal or equivalent?These two review processes have merged but either is acceptable and indicates that the repository has undergone an external review for trustworthiness. Links |
|
27 [sc-drc.dg]gov-stk |
Is the repository stakeholder governed?Does the repository make it clear how the community participates in the decision making process for the repository. We adapt here some of the principles for open infrastructures laid out by Bilder G, Lin J, Neylon C (2015) Principles for Open Scholarly Infrastructure. One of the most important is that the repository is stakeholder governed. Options:
|
|
28 [sc-drc.dg]land-api |
Does the repository provide a machine-readable landing page?Ideally, the citation metadata (both descriptive and data citation relevant) should be able to be harvested automatically, e.g., by a citation manager. We check this by:
More on implementations of machine-readable metadata on dataset landing pages, see M. Fenner et al. A data citation roadmap for scholarly data repositories, Scientific Data, 2019. doi.org/10.1038/s41597-019-0031-8. |
|
29 [sc-drc.dg]land-pg |
Does the PID or other dataset identifier resolve to a landing page that describes the data?Both the FAIR principles and the Data citation principles require that metadata persist, even if the data they describe are no longer available. FAIR also requires that the access rights to the data be both machine-readable and human understandable. Having the persistent identifier resolve to this page rather than to the data themselves ensures that a stable reference is provided even if the data are removed. The descriptive metadata should also include the necessary information for citing the data set (see Fenner M, Crosas M, Grethe J, Kennedy D, Hermjakob H, Rocca-Serra P, Berjon R, Karcher S, Martone M, Clark T (2016) A Data Citation Roadmap for Scholarly Data Repositories. bioRXiv Dec. 28, 2016. https://doi.org/10.1101/097196) We are interpreting this as a stable landing page that contains metadata about the data set that uses the identifier for the data set in the URL. Cool URI’s don’t change. |
|
30 [sc-drc.dg]md-pid |
Does the metadata clearly and explicitly include identifiers of the data it describes?Should have a metadata field = data set identifier or equivalent that points to the PID or other identifier if no PID Sometimes it is useful to check the API services if documented about what they provide
|
|
31 [sc-drc.dg]pid-l |
Does the repository assign, or the contributor provide, a locally unique identifier to the data set or the data contribution?Examples include an accession number, a UUID, or some other convention. Note: The use of a title or free text as the unique string is not considered compliant. |
|
End of dkNET-Repository Compliance questionnaire |