Human ccRCC Glycoproteomics (ML Ready): Difference between revisions
(Updated page to include description and metadata for the ccrCC dataset.) |
(Provided link to access data.) |
||
Line 1: | Line 1: | ||
The Human Glycosylation Sites (PDC) dataset contains intact glycopeptide abundances, biospecimen and clinical metadata from Homo sapiens (taxid:9606) clear cell renal cell carcinoma (ccRCC) tumor and normal adjacent tissues collected by the CPTAC program and can be used to stratify risk groups of renal cancer patients. The data was retrieved from the Proteomic Data Commons (PDC) and can be accessed at <nowiki>https://proteomic.datacommons.cancer.gov/pdc/study/PDC000471</nowiki>. The abundances and clinical features were combined and cleaned by the GlyGen team to make ready for machine learning algorithms. | The Human Glycosylation Sites (PDC) dataset contains intact glycopeptide abundances, biospecimen and clinical metadata from Homo sapiens (taxid:9606) clear cell renal cell carcinoma (ccRCC) tumor and normal adjacent tissues collected by the CPTAC program and can be used to stratify risk groups of renal cancer patients. The data was retrieved from the Proteomic Data Commons (PDC) and can be accessed at <nowiki>https://proteomic.datacommons.cancer.gov/pdc/study/PDC000471</nowiki>. The abundances and clinical features were combined and cleaned by the GlyGen team to make ready for machine learning algorithms. This dataset can be accessed through GlyGen's data portal (https://data.glygen.org/GLY_001046). | ||
=== Dataset Meta-Data === | === Dataset Meta-Data === |
Latest revision as of 15:03, 6 August 2024
The Human Glycosylation Sites (PDC) dataset contains intact glycopeptide abundances, biospecimen and clinical metadata from Homo sapiens (taxid:9606) clear cell renal cell carcinoma (ccRCC) tumor and normal adjacent tissues collected by the CPTAC program and can be used to stratify risk groups of renal cancer patients. The data was retrieved from the Proteomic Data Commons (PDC) and can be accessed at https://proteomic.datacommons.cancer.gov/pdc/study/PDC000471. The abundances and clinical features were combined and cleaned by the GlyGen team to make ready for machine learning algorithms. This dataset can be accessed through GlyGen's data portal (https://data.glygen.org/GLY_001046).
Dataset Meta-Data
This dataset consists of 183 instances with 11103 columns. Columns 1-10815 contain glycopeptide abundances with missing values imputed for intact glycopeptides quantified in >50% of the samples using DreamAI (https://github.com/WangLab-MSSM/DreamAI).
Header | Value(s) |
---|---|
tissue type | Normal or Tumor |
cause of death | Cancer related, Infection, Unknown |
days to death | Days to death after resection |
vital status | Alive or Dead |
last known disease status | With tumor, Tumor free or Unknown tumor status |
progression or recurrence | Yes, No or Not Reported |
disease response | PD-Progressive Disease or CR-Complete Response |