ML-Ready Datasets: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
(Provided more GlyGen specific information.)
(Edited the links)
 
Line 1: Line 1:
Machine Learning (ML)-ready datasets are structured datasets that have been pre-processed and organized which makes them suitable for training ML models. These datasets typically require minimal modifications and allow users with little to no scripting experience but some domain knowledge to streamline the model development process. GlyGen consists of glycomics and glycoproteomics ML-ready datasets that allow glycobiology and bioinformatics scientists to leverage ML-ready datasets for disease risk assessment for conditions like Type II Diabetes and Clear Cell Renal Carcinoma. GlyGen is also developing an ML-ready dataset in collaboration with the University of Delaware that maps various features (such as disease, cell line, tissue, and species) to their respective ontological IDs that will be publicly available on [[data.glygen.org]] after publication.  
Machine Learning (ML)-ready datasets are structured datasets that have been pre-processed and organized which makes them suitable for training ML models. These datasets typically require minimal modifications and allow users with little to no scripting experience but some domain knowledge to streamline the model development process. GlyGen consists of glycomics and glycoproteomics ML-ready datasets that allow glycobiology and bioinformatics scientists to leverage ML-ready datasets for disease risk assessment for conditions like Type II Diabetes and Clear Cell Renal Carcinoma. GlyGen is also developing an ML-ready dataset in collaboration with the University of Delaware that maps various features (such as disease, cell line, tissue, and species) to their respective ontological IDs that will be publicly available on [https://data.glygen.org/ data.glygen.org] after publication.  


Bioinformatics has greatly benefited from ML-ready datasets in areas such as protein structure prediction, biomarker discovery, clinical data analysis, and systems biology. GlyGen aims to extend these benefits by providing user-friendly tools and resources that make advanced data analysis more accessible to researchers, regardless of their technical background. Our goal is to enable more researchers to leverage machine learning in their work to facilitate discoveries and advancements in the field.
Bioinformatics has greatly benefited from ML-ready datasets in areas such as protein structure prediction, biomarker discovery, clinical data analysis, and systems biology. GlyGen aims to extend these benefits by providing user-friendly tools and resources that make advanced data analysis more accessible to researchers, regardless of their technical background. Our goal is to enable more researchers to leverage machine learning in their work to facilitate discoveries and advancements in the field.


== Available ML-Ready Datasets ==
== Available ML-Ready Datasets ==
All ML-ready datasets are available at [[data.glygen.org]].  
All ML-ready datasets are available at [https://data.glygen.org/ data.glygen.org].  
{| class="wikitable"
{| class="wikitable"
|+ML-Ready Datasets
|+ML-Ready Datasets

Latest revision as of 15:02, 28 June 2024

Machine Learning (ML)-ready datasets are structured datasets that have been pre-processed and organized which makes them suitable for training ML models. These datasets typically require minimal modifications and allow users with little to no scripting experience but some domain knowledge to streamline the model development process. GlyGen consists of glycomics and glycoproteomics ML-ready datasets that allow glycobiology and bioinformatics scientists to leverage ML-ready datasets for disease risk assessment for conditions like Type II Diabetes and Clear Cell Renal Carcinoma. GlyGen is also developing an ML-ready dataset in collaboration with the University of Delaware that maps various features (such as disease, cell line, tissue, and species) to their respective ontological IDs that will be publicly available on data.glygen.org after publication.

Bioinformatics has greatly benefited from ML-ready datasets in areas such as protein structure prediction, biomarker discovery, clinical data analysis, and systems biology. GlyGen aims to extend these benefits by providing user-friendly tools and resources that make advanced data analysis more accessible to researchers, regardless of their technical background. Our goal is to enable more researchers to leverage machine learning in their work to facilitate discoveries and advancements in the field.

Available ML-Ready Datasets

All ML-ready datasets are available at data.glygen.org.

ML-Ready Datasets
Dataset Data Condition n
Human Diabetes Glycomics (ML Ready) N-glycome Abundance Diabetes 74
Human ccRCC Glycoproteomics (ML Ready) Glycopeptide Abundance Clear Cell Rena Carcinoma