Ontologies and Standards: Difference between revisions
(Created page with "GlyGen follows the C'''ommon Data Fund Ecosystem (CFDE) Ontology & Standards Best Practices''' guidelines as defined by the CFDE Ontology Working Group for its data. The '''Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG)''', of which the GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the '''CFDE search portal''', the '''Cross Cut Metadata Model (C2M2)''' was developed...") |
mNo edit summary |
||
Line 1: | Line 1: | ||
GlyGen follows the | GlyGen follows the '''Common Fund Data Ecosystem (CFDE) Ontology & Standards Best Practices''' guidelines as defined by the CFDE Ontology Working Group for its data. | ||
The '''Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG)''', of which | The '''Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG)''', of which GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the '''CFDE search portal''', the '''Cross Cut Metadata Model (C2M2)''' was developed, enabling researchers to query and discover relevant data assets efficiently. The Ontology WG ensures metadata consistency by selecting appropriate '''ontologies and controlled vocabularies''' for key data fields, balancing granularity with usability. Through a structured '''Request for Comment (RFC) process''', the WG formalizes ontology choices and maintains documentation for transparency. Additionally, '''ontology “slims”''' provide high-level groupings of metadata terms, improving data visualization and usability. The CFDE Portal integrates these standards, offering streamlined metadata submission and querying based on ontology terms. | ||
Latest revision as of 16:59, 14 March 2025
GlyGen follows the Common Fund Data Ecosystem (CFDE) Ontology & Standards Best Practices guidelines as defined by the CFDE Ontology Working Group for its data.
The Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG), of which GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the CFDE search portal, the Cross Cut Metadata Model (C2M2) was developed, enabling researchers to query and discover relevant data assets efficiently. The Ontology WG ensures metadata consistency by selecting appropriate ontologies and controlled vocabularies for key data fields, balancing granularity with usability. Through a structured Request for Comment (RFC) process, the WG formalizes ontology choices and maintains documentation for transparency. Additionally, ontology “slims” provide high-level groupings of metadata terms, improving data visualization and usability. The CFDE Portal integrates these standards, offering streamlined metadata submission and querying based on ontology terms.
Brief Introduction
A goal of the Common Fund Data Ecosystem (CFDE) is to provide a centralized resource to house metadata associated with data assets from all Common Fund-funded data coordination centers. To accomplish this goal, a metadata database called the Cross Cut Metadata Model (C2M2) was created to support the CFDE search portal where researchers can query to identify data assets of interest. The CFDE Ontology Working Group consists of representatives from participating data coordination centers who worked together to establish standards for the capture of metadata in the C2M2. This involved choosing which types of metadata to include in the C2M2 as well as what ontologies or controlled vocabularies would be used to capture the information. It was recognized early in the process that it is not desirable to capture all metadata from the data coordination centers, rather just enough so that the portal can be an effective tool to guide researchers to the locations of relevant data. Therefore, pragmatic decisions were made about the level of metadata complexity to collect. Similarly, it was outside the scope of this project to try to create a structure to unify all vocabularies and standards, thus ontologies and controlled vocabularies were chosen not because they were judged to be the “best” but because they had the granularity, scope, community buy-in, and associated resources (e.g. mappings to other ontologies) that allowed them to be effectively used by the data coordination centers for submission of data to the C2M2. For more information on the Ontology Working Group please read our charter and a paper presented at the workshop on “FAIR ontology harmonization and TRUST data interoperability” held at the 2022 International Conference on Biomedical Ontology (ICBO) - “Data Harmonization through use of community standards in the Common Fund Data Ecosystem”.
Ontologies/controlled vocabularies chosen for use in C2M2 with associated metadata fields
- EDAM - file format, file data type
- Uberon - anatomy
- Ontology for Biomedical Investigations (OBI) - assay type. analysis type, sample preparation method
- Human Disease Ontology (DOID) - disease
- PubChem - chemical entity
- NCBI taxon - species
- Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity - race, ethnicity
- SNOMED CT subset of terms - sex
- Human Phenotype Ontology (HPO) - subject/donor phenotype
- Mammalian Phenotype Ontology (MP) - subject/donor phenotype
There are numerous ontology browsers that can be used to visualize and explore the above ontologies. Some popular ones are the EBI Ontology Lookup Service (OLS), OntoBee, and the NCBO portal.
Documentation of vocabulary choices
Documentation on choice of ontologies and controlled vocabularies was made through the “Request for comment” (RFC) process. RFCs were drafted for each standard that was chosen. Drafts were sent to the entire CFDE consortium for comment. Once the comment period ends the RFCs become official policy. Pending and final RFCs are stored in the RFC repository. Ontology Working Group RFCs are available for:
- EDAM RFC
- Uberon RFC
- Ontology for Biomedical Investigations (OBI) RFC
- Human Disease Ontology (DOID) RFC
- PubChem RFC
Ontology/Controlled vocabulary “Slims”
DCCs are always encouraged to use the most granular terms that are applicable to their metadata as this provides the most accurate and specific information. However, visualization of hundreds or thousands of terms that have been used within a dataset can present challenges. Ontology “slims” can solve this difficulty by providing a way to see a more high-level view of a set of annotations. Generally, a slim is built using more general, less specific terms from an ontology representing broad classes within the ontology. Granular terms can then be mapped to the slim term under which they have parentage. Specific metadata term associations can then be binned into slim-term-based categories via those mappings. Descriptions of the process that was used to build slims for the C2M2 standards is in the Slim RFC (this RFC is still in the comment phase.) The slims for the following vocabularies are available in the linked folders - the latest slim will be the one with the highest version number.
- EDAM slims
- Uberon slim
- Ontology for Biomedical Investigation (OBI) assay slim
- Human Disease Ontology slim
- NCBI taxon slim
Submission of metadata using the above vocabularies
Extensive submission documentation exists on the process for submitting data to the C2M2 and checking it in the portal.
CFDE Portal
The CFDE Portal offers ways to query and view metadata according to ontology terms or slim terms. See portal documentation for more information.