Human ccRCC Glycoproteomics (ML Ready): Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
(Updated page to include description and metadata for the ccrCC dataset.)
 
(Provided link to access data.)
 
Line 1: Line 1:
The Human Glycosylation Sites (PDC) dataset contains intact glycopeptide abundances, biospecimen and clinical metadata from Homo sapiens (taxid:9606) clear cell renal cell carcinoma (ccRCC) tumor and normal adjacent tissues collected by the CPTAC program and can be used to stratify risk groups of renal cancer patients. The data was retrieved from the Proteomic Data Commons (PDC) and can be accessed at <nowiki>https://proteomic.datacommons.cancer.gov/pdc/study/PDC000471</nowiki>. The abundances and clinical features were combined and cleaned by the GlyGen team to make ready for machine learning algorithms.
The Human Glycosylation Sites (PDC) dataset contains intact glycopeptide abundances, biospecimen and clinical metadata from Homo sapiens (taxid:9606) clear cell renal cell carcinoma (ccRCC) tumor and normal adjacent tissues collected by the CPTAC program and can be used to stratify risk groups of renal cancer patients. The data was retrieved from the Proteomic Data Commons (PDC) and can be accessed at <nowiki>https://proteomic.datacommons.cancer.gov/pdc/study/PDC000471</nowiki>. The abundances and clinical features were combined and cleaned by the GlyGen team to make ready for machine learning algorithms. This dataset can be accessed through GlyGen's data portal (https://data.glygen.org/GLY_001046).  


=== Dataset Meta-Data ===
=== Dataset Meta-Data ===

Latest revision as of 15:03, 6 August 2024

The Human Glycosylation Sites (PDC) dataset contains intact glycopeptide abundances, biospecimen and clinical metadata from Homo sapiens (taxid:9606) clear cell renal cell carcinoma (ccRCC) tumor and normal adjacent tissues collected by the CPTAC program and can be used to stratify risk groups of renal cancer patients. The data was retrieved from the Proteomic Data Commons (PDC) and can be accessed at https://proteomic.datacommons.cancer.gov/pdc/study/PDC000471. The abundances and clinical features were combined and cleaned by the GlyGen team to make ready for machine learning algorithms. This dataset can be accessed through GlyGen's data portal (https://data.glygen.org/GLY_001046).

Dataset Meta-Data

This dataset consists of 183 instances with 11103 columns. Columns 1-10815 contain glycopeptide abundances with missing values imputed for intact glycopeptides quantified in >50% of the samples using DreamAI (https://github.com/WangLab-MSSM/DreamAI).

Description of Headers and Corresponding Values
Header Value(s)
tissue type Normal or Tumor
cause of death Cancer related, Infection, Unknown
days to death Days to death after resection
vital status Alive or Dead
last known disease status With tumor, Tumor free or Unknown tumor status
progression or recurrence Yes, No or Not Reported
disease response PD-Progressive Disease or CR-Complete Response