ML-Ready Datasets: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
(→‎Available ML-Ready Datasets: Changed observations in ccRCC dataset.)
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 10: Line 10:
!Data   
!Data   
!Condition
!Condition
!n
!Number of Samples
|-
|-
|Human Diabetes Glycomics (ML Ready)
|[[Human Diabetes Glycomics (ML Ready)]]
|N-glycome Abundance
|N-glycome Abundance
|Diabetes
|Diabetes
|74
|74
|-
|-
|Human ccRCC Glycoproteomics (ML Ready)
|[[Human ccRCC Glycoproteomics (ML Ready)]]
|Glycopeptide Abundance
|Glycopeptide Abundance
|Clear Cell Rena Carcinoma
|Clear Cell Rena Carcinoma
|183
|183
|}
|}

Latest revision as of 14:43, 6 August 2024

Machine Learning (ML)-ready datasets are structured datasets that have been pre-processed and organized which makes them suitable for training ML models. These datasets typically require minimal modifications and allow users with little to no scripting experience but some domain knowledge to streamline the model development process. GlyGen consists of glycomics and glycoproteomics ML-ready datasets that allow glycobiology and bioinformatics scientists to leverage ML-ready datasets for disease risk assessment for conditions like Type II Diabetes and Clear Cell Renal Carcinoma. GlyGen is also developing an ML-ready dataset in collaboration with the University of Delaware that maps various features (such as disease, cell line, tissue, and species) to their respective ontological IDs that will be publicly available on data.glygen.org after publication.

Bioinformatics has greatly benefited from ML-ready datasets in areas such as protein structure prediction, biomarker discovery, clinical data analysis, and systems biology. GlyGen aims to extend these benefits by providing user-friendly tools and resources that make advanced data analysis more accessible to researchers, regardless of their technical background. Our goal is to enable more researchers to leverage machine learning in their work to facilitate discoveries and advancements in the field.

Available ML-Ready Datasets

All ML-ready datasets are available at data.glygen.org.

ML-Ready Datasets
Dataset Data Condition Number of Samples
Human Diabetes Glycomics (ML Ready) N-glycome Abundance Diabetes 74
Human ccRCC Glycoproteomics (ML Ready) Glycopeptide Abundance Clear Cell Rena Carcinoma 183