Data release notes
(Redirected from Data Release Notes)
Version 2.5.1
Release Highlights:
- Added two ML-ready datasets for diabetes and ccRCC
- New datasets for Chicken (Gallus gallus) added
- New proteoform datasets from EMBL, University of Zagreb, and PDC
(see Data release notes/2.5.1 for full list of updates)
Version 2.4.1
- New datasets for Pig (Sus scrofa) added
- Glycan, Protein and Proteoform datasets updated for the current organisms
- Added glycan cross references to EMBL and Glycan Array Data Repository
- PubMed IDs added to glycan species annotations
Version 2.3.1
- New datasets for Slime Mold (Dictyostelium discoideum) added (GLY_000823, ...)
- Glycan, Protein and Proteoform datasets updated for the current organisms in GlyGen
- Added new records from literature mining of 30k publications (GLY_000481, GLY_000492)
- New supporting JSON files for glymagesvg semantic highlighting (GLY_000821)
- New datasets on disease information for Yeast and Fruitfly from Alliance of Genome Resources added
- Linkout dataset for MGI created
- Categories for Protein and Glycan Xrefs created
- Redirection of data.glygen.org links to new data.glygen.org links created
Version 2.2.1
- Updated Glycan, Protein and Proteoform datasets
- Updated Biomarker datasets
- New datasets for Yeast (Saccharomyces cerevisiae) added
- Xrefs for ViralGlycome added for SARS-CoV2 to the glycan details page
- Datasets for CFDE (submitted) and EuropePMC linkouts created and added to data.glygen.org
- Added new disease, tissue and cell line expression records to glycan details page
- Glycan abundance data added to publication details page
- New GlycoCT{xml} dataset added to data.glygen.org (GLY_000817)
- Addition of Yeast, Dictyostelium and Pig glycan species annotations
- Changed organism names from EBI Taxonomy names to match with NCBI Taxonomy names on GlyGen and GlyGen Data portal. (Sars coronavirus (sars-cov-1) to Severe acute respiratory syndrome-related coronavirus, Sars coronavirus (sars-cov-2 or 2019-ncov) to Severe acute respiratory syndrome coronavirus 2, Hepatitis c virus (genotype 1a, isolate h) to Hepatitis c virus (isolate h) and Hepatitis c virus (genotype 1b, isolate japanese) to Hepatitis c virus (isolate japanese).
Version 2.1.1
Main article: Data release notes/2.1.1
- Updated Glycan, Protein and Proteoform datasets
- Removal of SNV/mutation entries from COSMIC resource from the human_protein_mutation_cancer.csv dataset [GLY_000024]
- Updated SPARQL endpoint with newer data
- New interface and search functionality on data.glygen.org
Version 2.0.3
Main article: Data release notes/2.0.3
- Protein, Proteoform and Glycan datasets updated with new release data
- Addition of GlyGen Xref from GeneCards to GlyGen (example - https://www.genecards.org/cgi-bin/carddisp.pl?gene=HGF)
- O-GlcNAc data added from O-GlcNAc atlas (GLY_000708, GLY_000709, GLY_000710, GLY_000711).
- O-Glucosylation data added from the dataset shared from the lab of Robert S Haltiwanger by Daniel Williamson (author) (GLY_000716).
- Glyconnect cross-references datasets created and added to the GlyGen Mapper tool.
- Glycan Structure Dictionary dataset updated GLY_000557.
- SNV/ cancer mutation data from OncoMX was updated. The new data is as per genome assembly reference GRCh38.
- Names of few citations files changed. Missing citations files were created.
- Addition of new monosaccharides.
- Addition of biomarker data to data.glygen.org (Glycan - GLY_000737 | Protein - GLY_000625)
Version 1.12.3
Main article: Data release notes/1.12.3
This version was released at April 28th, 2022
- Protein and Glycan datasets updated with new release data
- Glycogenes dataset for mouse and human created
- Glycan Structure Dictionary dataset created
- Cancer Biomarker dataset created
- Addition of new data entries from UniCarbKB
- Addition of expression score to Gene Expression (Normal) dataset (GLY_000028)and Expression Tissue section in Protein details page
- Homolog clusters dataset updated with new clusters
- Migration of BCOs to IEEE specification
- Changes in BCOs IO Domain and Usability domain
- Crosslinks from NCBI PubMed to GlyGen publication details page were created and added by PubMed
- Added Drosophila melanogaster species annotation to glycans
- Added glycan classifications GPI anchor and O-GlcNAc
- Updated glycan images from GlyTouCan
- Added glycosylation site ranges (start position and end position) to UniCarbKB proteoform glycosylation datasets
- Errors fixes
Version 1.8.22
Main article: Data release notes/1.8.22
This version was released at April 1st, 2021
- F protein (P0C045-1) was removed from Hepatitis C virus (genotype 1a, isolate H) proteome
- Hepatitis C Virus (genotype 1a, isolate H) Tax ID changed from 11108 to 63746.
- NCBI GeneID and Refseq datasets for SARS-CoV2 created
- Accession history files created for tracking protein and glycan accessions
- Mouse and rat disease datasets created to add disease data in the disease section
- Data entries for O-GlcNAc data now point to The O-GlcNAc resource instead of the dataset.
- human_proteoform_glycosylation_sites_o_glcnac_mcw updated with new O-GlcNAc data
- n-sequon and n-sequon type fields added to the glycosylation datasets for data validation
- Removed predicted glycosylation data from the UniProtKB data when reported glycosylation data is present for the same data entry.
- Update to the Glycomotif alignment logic, motif keyword, publications
- Update to the GlyConnect data, Automatic Literature Data, Disease data from Genomics England
- Update to the GlyTouCan to ChEBI mapping method.
- Addition of new glycan type and subtypes
- Integration of GlycoTree alignment infrastructure
- addition of semantic names from GlycoMotif
- Addition of ~1800 new GlyTouCan accessions
- Addition of new sections in glycan detail pages:
- subsumption(related glycans through GNOme)
- expression(glycans expressed in cell-line/Tissue)
- History (release no. where the glycan was introduced in GlyGen)
- Addition of glycan keywords (Motif Group), Synonyms, Reducing End information to the Motif detail page.
- Addition of citations from NCFG data for Asparagine-linked glycans
- Addition of GNOme and SandBox references
- Addition of Disease data from Glycosmos
- Addition of evidence to the "Biosynthetic Enzyme" section on the glycan detail page.
- Addition of Rhea and Reactome cross-references to glycan detail pages
- Addition of GlyGen links in ChEBI and GlyTouCan databases.
Version 1.7.13
Main article: Data release notes/1.7.13
Related API version: 1.7.12
This version was released at July 20th, 2020
- Update to the New Motif list. Replacing GlyTouCan Motif accessions with GlyGen motif list with GlyTouCan Motifs as X-ref's.
- Addition of Motif synonyms.
- Including glycosylation annotation (Protein+PMID+Glycan) on unreported sites from GlyConnect.
- Refined/corrected BioSynthetic enzyme data.
- Addition of new Glycan subtypes (o-fucose, o-mannose).
- Updated glycan images from GlyTouCan.
- New data: Protein+GAGs data from MatrixDB.
- Addition of GlyGen x-ref on GlyConnect protein entry pages.
- Addition of GlyGen x-ref on PubChem compound and PubChem protein pages.
- Addition of new glycans into ChEBI.
- Addition of Mutagenesis data from UniProt
- Addition of PMIDs for SNV data
- GlyGen dataset BCO prefix changed from DSBCO to GLY
Version 1.5.36
Main article: Data release notes/1.5.36
Related API version: 1.5.43
This version was released at July 20th, 2020
- Added O-GlcNAc data extracted from the literature by Stephanie Olivier’s group (GLYDS000518)
- Added germline and somatic variation data that has effect on glycosite (loss of glycosylation site and gain of glycosylation sequon) to the mutation section
- Added the literature extracted glycosite for SARS-CoV1 M protein
- Added glycosylation subtypes to the *_proteoform_glycosylation_sites_uniprotkb.csv
- Added species annotation via subsumption (for human, mouse). Rat and HCV to follow in the next release.
- Added updated GlyConnect data. (additional o-GlcNAc sites)
- Added glycosylation sites through text mining (first iteration).
Version 1.5.18
Main article: Data release notes/1.5.18
Related API version: 1.5.26
This version was released at April 15th, 2020
- Added UniProtKB Gene synonyms (search and details)
- Added RefSeq Gene names and synonyms
- Added RefSeq Protein synonyms
- Added UniProtKB Protein synonyms
- Updated the Fasta headers of the protein sequences that now resembles the fasta header of UniProtKB sequences
- Updated BioMuta data with addition of comments that shows which filters were passed
- Created new datasets:mutation literature mining dataset, dbSNP somatic and germline mutations datasets.
- Added GlyGen to Pharos Xref in the protein detail cross-reference section
- Added HCV 1a and 1b, SARS-CoV1 and 2 proteomes
- Added the MIM disease name where DO names were not available
- Updated glycan species annotations.
- Added Human, Mouse, Rat glycosylation data from GlyConnect.
- Added HCV1a glycosylation data from 1 publication.
- Added human glycosylation data from 2 publications.
- Added glycan x-refs to MatrixDB, GlycoEpitope
- Added MatrixDB Protein-GAGs interaction data. (at GlyGen Data)
- Included GlyTouCan-composition accessions.
- Added protein x-refs to GlycoProtDB
- Retired GlycO and GlycomeDB xrefs .
- Added SNFG glycans (at GlyGen Data)
- Added animated GIF and .mp4 video of 3D model of SARS-CoV-2 spike glycoprotein. (at GlyGen Data)
- Updated the synthesized glycan list from Dr. Boons group.
- Included additional 2 FAQs: How do I find a GlyTouCan Boons accession for my glycan composition? and How can I convert my glycan sequence to different formats (e.g IUPAC, WURCS, GlycoCT, LinearCode, etc.)?
Version 1.0
Main article: Data release notes/1.0
This version was released at Nov 22, 2019
- Isoform Alignment.
- Homolog Alignment.
- New Usecase added in the quick search.
- Composition Search.
- Go ID search.
- PMID search.
- Batch search on advanced protein and glycan search page.
- Multi-select option for amino acids on advanced glycoprotein search.
- Multi-select option for organisms on advanced glycan search.
- Integrate subsumption browser.