Establishing CFDE Metadata Schemas and Serializations

Goal: Support description of hetergeneous DCC data assets

Method: Document the CFDE consensus regarding metadata models and representations for exchange

We produced a straw-man “core” metadata model to try to converge the discussions and near-term goals. This first model is focused on tracking digital assets (files) grouped into datasets and contextualized by data-generating events (such as assays and analyses), biospecimens, subjects, and subject groups (cohorts).

Core ER diagram

To support per-DCC extensions and heterogeneity, we also identified the need for metadata exchange to include a model declaration for the actual structure of the exchanged data. We chose to adopt Frictionless Data Table Schema format as an extant, neutral format intended for this purpose. The proposed core model was both illustrated in the entity-relationship (ER) diagram above and formalized using the Table Schema format.

Simple scripts were also prototyped to convert the core model definedusing this Table Schema format and deploy it in a Deriva test catalog. These scripts can produce an empty catalog suitable for development and testing of metadata ingest processes.