Ontologies and Standards: Difference between revisions

Latest revision as of 16:59, 14 March 2025

GlyGen follows the Common Fund Data Ecosystem (CFDE) Ontology & Standards Best Practices guidelines as defined by the CFDE Ontology Working Group for its data.

The Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG), of which GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the CFDE search portal, the Cross Cut Metadata Model (C2M2) was developed, enabling researchers to query and discover relevant data assets efficiently. The Ontology WG ensures metadata consistency by selecting appropriate ontologies and controlled vocabularies for key data fields, balancing granularity with usability. Through a structured Request for Comment (RFC) process, the WG formalizes ontology choices and maintains documentation for transparency. Additionally, ontology “slims” provide high-level groupings of metadata terms, improving data visualization and usability. The CFDE Portal integrates these standards, offering streamlined metadata submission and querying based on ontology terms.

Brief Introduction

A goal of the Common Fund Data Ecosystem (CFDE) is to provide a centralized resource to house metadata associated with data assets from all Common Fund-funded data coordination centers. To accomplish this goal, a metadata database called the Cross Cut Metadata Model (C2M2) was created to support the CFDE search portal where researchers can query to identify data assets of interest. The CFDE Ontology Working Group consists of representatives from participating data coordination centers who worked together to establish standards for the capture of metadata in the C2M2. This involved choosing which types of metadata to include in the C2M2 as well as what ontologies or controlled vocabularies would be used to capture the information. It was recognized early in the process that it is not desirable to capture all metadata from the data coordination centers, rather just enough so that the portal can be an effective tool to guide researchers to the locations of relevant data. Therefore, pragmatic decisions were made about the level of metadata complexity to collect. Similarly, it was outside the scope of this project to try to create a structure to unify all vocabularies and standards, thus ontologies and controlled vocabularies were chosen not because they were judged to be the “best” but because they had the granularity, scope, community buy-in, and associated resources (e.g. mappings to other ontologies) that allowed them to be effectively used by the data coordination centers for submission of data to the C2M2. For more information on the Ontology Working Group please read our charter and a paper presented at the workshop on “FAIR ontology harmonization and TRUST data interoperability” held at the 2022 International Conference on Biomedical Ontology (ICBO) - “Data Harmonization through use of community standards in the Common Fund Data Ecosystem”.

Ontologies/controlled vocabularies chosen for use in C2M2 with associated metadata fields

EDAM - file format, file data type
Uberon - anatomy
Ontology for Biomedical Investigations (OBI) - assay type. analysis type, sample preparation method
Human Disease Ontology (DOID) - disease
PubChem - chemical entity
NCBI taxon - species
Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity - race, ethnicity
SNOMED CT subset of terms - sex
Human Phenotype Ontology (HPO) - subject/donor phenotype
Mammalian Phenotype Ontology (MP) - subject/donor phenotype

There are numerous ontology browsers that can be used to visualize and explore the above ontologies. Some popular ones are the EBI Ontology Lookup Service (OLS), OntoBee, and the NCBO portal.

Documentation of vocabulary choices

Documentation on choice of ontologies and controlled vocabularies was made through the “Request for comment” (RFC) process. RFCs were drafted for each standard that was chosen. Drafts were sent to the entire CFDE consortium for comment. Once the comment period ends the RFCs become official policy. Pending and final RFCs are stored in the RFC repository. Ontology Working Group RFCs are available for:

Ontology/Controlled vocabulary “Slims”

DCCs are always encouraged to use the most granular terms that are applicable to their metadata as this provides the most accurate and specific information. However, visualization of hundreds or thousands of terms that have been used within a dataset can present challenges. Ontology “slims” can solve this difficulty by providing a way to see a more high-level view of a set of annotations. Generally, a slim is built using more general, less specific terms from an ontology representing broad classes within the ontology. Granular terms can then be mapped to the slim term under which they have parentage. Specific metadata term associations can then be binned into slim-term-based categories via those mappings. Descriptions of the process that was used to build slims for the C2M2 standards is in the Slim RFC (this RFC is still in the comment phase.) The slims for the following vocabularies are available in the linked folders - the latest slim will be the one with the highest version number.

Submission of metadata using the above vocabularies

Extensive submission documentation exists on the process for submitting data to the C2M2 and checking it in the portal.

CFDE Portal

The CFDE Portal offers ways to query and view metadata according to ontology terms or slim terms. See portal documentation for more information.

Revision as of 18:29, 4 February 2025 (view source) Jeetvora (talk \| contribs) (Created page with "GlyGen follows the C'''ommon Data Fund Ecosystem (CFDE) Ontology & Standards Best Practices''' guidelines as defined by the CFDE Ontology Working Group for its data. The '''Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG)''', of which the GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the '''CFDE search portal''', the '''Cross Cut Metadata Model (C2M2)''' was developed...") Tag: Visual edit		Latest revision as of 16:59, 14 March 2025 (view source) Mazumder (talk \| contribs) mNo edit summary Tag: Visual edit
Line 1:		Line 1:
	GlyGen follows the C'''~~ommon~~ Data ~~Fund~~ Ecosystem (CFDE) Ontology & Standards Best Practices''' guidelines as defined by the CFDE Ontology Working Group for its data.		GlyGen follows the '''Common Fund Data Ecosystem (CFDE) Ontology & Standards Best Practices''' guidelines as defined by the CFDE Ontology Working Group for its data.

	The '''Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG)''', of which ~~the~~ GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the '''CFDE search portal''', the '''Cross Cut Metadata Model (C2M2)''' was developed, enabling researchers to query and discover relevant data assets efficiently. The Ontology WG ensures metadata consistency by selecting appropriate '''ontologies and controlled vocabularies''' for key data fields, balancing granularity with usability. Through a structured '''Request for Comment (RFC) process''', the WG formalizes ontology choices and maintains documentation for transparency. Additionally, '''ontology “slims”''' provide high-level groupings of metadata terms, improving data visualization and usability. The CFDE Portal integrates these standards, offering streamlined metadata submission and querying based on ontology terms.		The '''Common Fund Data Ecosystem (CFDE) Ontology Working Group (WG)''', of which GlyGen is also a member, plays a critical role in standardizing metadata across Common Fund-funded data coordination centers. To support the '''CFDE search portal''', the '''Cross Cut Metadata Model (C2M2)''' was developed, enabling researchers to query and discover relevant data assets efficiently. The Ontology WG ensures metadata consistency by selecting appropriate '''ontologies and controlled vocabularies''' for key data fields, balancing granularity with usability. Through a structured '''Request for Comment (RFC) process''', the WG formalizes ontology choices and maintains documentation for transparency. Additionally, '''ontology “slims”''' provide high-level groupings of metadata terms, improving data visualization and usability. The CFDE Portal integrates these standards, offering streamlined metadata submission and querying based on ontology terms.

Ontologies and Standards: Difference between revisions

Latest revision as of 16:59, 14 March 2025

Navigation menu