Data release notes

From GlyGen Wiki
(Redirected from Data Release Notes)
Jump to navigation Jump to search


Version 2.5.1

Release Highlights:

  • Added two ML-ready datasets for diabetes and ccRCC
  • New datasets for Chicken (Gallus gallus) added
  • New proteoform datasets from EMBL, University of Zagreb, and PDC

(see Data release notes/2.5.1 for full list of updates)

Version 2.4.1

  • New datasets for Pig (Sus scrofa) added
  • Glycan, Protein and Proteoform datasets updated for the current organisms
  • Added glycan cross references to EMBL and Glycan Array Data Repository
  • PubMed IDs added to glycan species annotations

Version 2.3.1

  • New datasets for Slime Mold (Dictyostelium discoideum) added (GLY_000823, ...)
  • Glycan, Protein and Proteoform datasets updated for the current organisms in GlyGen
  • Added new records from literature mining of 30k publications (GLY_000481, GLY_000492)
  • New supporting JSON files for glymagesvg semantic highlighting (GLY_000821)
  • New datasets on disease information for Yeast and Fruitfly from Alliance of Genome Resources added
  • Linkout dataset for MGI created
  • Categories for Protein and Glycan Xrefs created
  • Redirection of data.glygen.org links to new data.glygen.org links created

Version 2.2.1

  • Updated Glycan, Protein and Proteoform datasets
  • Updated Biomarker datasets
  • New datasets for Yeast (Saccharomyces cerevisiae) added
  • Xrefs for ViralGlycome added for SARS-CoV2 to the glycan details page
  • Datasets for CFDE (submitted) and EuropePMC linkouts created and added to data.glygen.org
  • Added new disease, tissue and cell line expression records to glycan details page
  • Glycan abundance data added to publication details page
  • New GlycoCT{xml} dataset added to data.glygen.org (GLY_000817)
  • Addition of Yeast, Dictyostelium and Pig glycan species annotations
  • Changed organism names from EBI Taxonomy names to match with NCBI Taxonomy names on GlyGen and GlyGen Data portal. (Sars coronavirus (sars-cov-1) to Severe acute respiratory syndrome-related coronavirus, Sars coronavirus (sars-cov-2 or 2019-ncov) to Severe acute respiratory syndrome coronavirus 2, Hepatitis c virus (genotype 1a, isolate h) to Hepatitis c virus (isolate h) and Hepatitis c virus (genotype 1b, isolate japanese) to Hepatitis c virus (isolate japanese).

Version 2.1.1

Main article: Data release notes/2.1.1

  • Updated Glycan, Protein and Proteoform datasets
  • Removal of SNV/mutation entries from COSMIC resource from the human_protein_mutation_cancer.csv dataset [GLY_000024]
  • Updated SPARQL endpoint with newer data
  • New interface and search functionality on data.glygen.org

Version 2.0.3

Main article: Data release notes/2.0.3

  • Protein, Proteoform and Glycan datasets updated with new release data
  • Addition of GlyGen Xref from GeneCards to GlyGen (example - https://www.genecards.org/cgi-bin/carddisp.pl?gene=HGF)
  • O-GlcNAc data added from O-GlcNAc atlas (GLY_000708, GLY_000709, GLY_000710, GLY_000711).
  • O-Glucosylation data added from the dataset shared from the lab of Robert S Haltiwanger by Daniel Williamson (author) (GLY_000716).
  • Glyconnect cross-references datasets created and added to the GlyGen Mapper tool.
  • Glycan Structure Dictionary dataset updated GLY_000557.
  • SNV/ cancer mutation data from OncoMX was updated. The new data is as per genome assembly reference GRCh38.
  • Names of few citations files changed. Missing citations files were created.
  • Addition of new monosaccharides.
  • Addition of biomarker data to data.glygen.org (Glycan - GLY_000737 | Protein - GLY_000625)

Version 1.12.3

Main article: Data release notes/1.12.3

This version was released at April 28th, 2022

  • Protein and Glycan datasets updated with new release data
  • Glycogenes dataset for mouse and human created
  • Glycan Structure Dictionary dataset created
  • Cancer Biomarker dataset created
  • Addition of new data entries from UniCarbKB
  • Addition of expression score to Gene Expression (Normal) dataset (GLY_000028)and Expression Tissue section in Protein details page
  • Homolog clusters dataset updated with new clusters
  • Migration of BCOs to IEEE specification
  • Changes in BCOs IO Domain and Usability domain
  • Crosslinks from NCBI PubMed to GlyGen publication details page were created and added by PubMed
  • Added Drosophila melanogaster species annotation to glycans
  • Added glycan classifications GPI anchor and O-GlcNAc
  • Updated glycan images from GlyTouCan
  • Added glycosylation site ranges (start position and end position) to UniCarbKB proteoform glycosylation datasets
  • Errors fixes

Version 1.8.22

Main article: Data release notes/1.8.22

This version was released at April 1st, 2021

  • F protein (P0C045-1) was removed from Hepatitis C virus (genotype 1a, isolate H) proteome
  • Hepatitis C Virus (genotype 1a, isolate H) Tax ID changed from 11108 to 63746.
  • NCBI GeneID and Refseq datasets for SARS-CoV2 created
  • Accession history files created for tracking protein and glycan accessions
  • Mouse and rat disease datasets created to add disease data in the disease section
  • Data entries for O-GlcNAc data now point to The O-GlcNAc resource instead of the dataset.
  • human_proteoform_glycosylation_sites_o_glcnac_mcw updated with new O-GlcNAc data
  • n-sequon and n-sequon type fields added to the glycosylation datasets for data validation
  • Removed predicted glycosylation data from the UniProtKB data when reported glycosylation data is present for the same data entry.
  • Update to the Glycomotif alignment logic, motif keyword, publications
  • Update to the GlyConnect data, Automatic Literature Data, Disease data from Genomics England
  • Update to the GlyTouCan to ChEBI mapping method.
  • Addition of new glycan type and subtypes
  • Integration of GlycoTree alignment infrastructure
  • addition of semantic names from GlycoMotif
  • Addition of ~1800 new GlyTouCan accessions
  • Addition of new sections in glycan detail pages:
    • subsumption(related glycans through GNOme)
    • expression(glycans expressed in cell-line/Tissue)
    • History (release no. where the glycan was introduced in GlyGen)
  • Addition of glycan keywords (Motif Group), Synonyms, Reducing End information to the Motif detail page.
  • Addition of citations from NCFG data for Asparagine-linked glycans
  • Addition of GNOme and SandBox references
  • Addition of Disease data from Glycosmos
  • Addition of evidence to the "Biosynthetic Enzyme" section on the glycan detail page.
  • Addition of Rhea and Reactome cross-references to glycan detail pages
  • Addition of GlyGen links in ChEBI and GlyTouCan databases.  

Version 1.7.13

Main article: Data release notes/1.7.13

Related API version: 1.7.12

This version was released at July 20th, 2020

  • Update to the New Motif list. Replacing GlyTouCan Motif accessions with GlyGen motif list with GlyTouCan Motifs as X-ref's.
  • Addition of Motif synonyms.
  • Including glycosylation annotation (Protein+PMID+Glycan) on unreported sites from GlyConnect.
  • Refined/corrected BioSynthetic enzyme data.
  • Addition of new Glycan subtypes (o-fucose, o-mannose).
  • Updated glycan images from GlyTouCan.
  • New data: Protein+GAGs data from MatrixDB.
  • Addition of GlyGen x-ref on GlyConnect protein entry pages.
  • Addition of GlyGen x-ref on PubChem compound and PubChem protein pages.
  • Addition of new glycans into ChEBI.
  • Addition of Mutagenesis data from UniProt
  • Addition of PMIDs for SNV data
  • GlyGen dataset BCO prefix changed from DSBCO to GLY

Version 1.5.36

Main article: Data release notes/1.5.36

Related API version: 1.5.43

This version was released at July 20th, 2020

  1. Added O-GlcNAc data extracted from the literature by Stephanie Olivier’s group (GLYDS000518)
  2. Added germline and somatic variation data that has effect on glycosite (loss of glycosylation site and gain of glycosylation sequon) to the mutation section
  3. Added the literature extracted glycosite for SARS-CoV1 M protein
  4. Added glycosylation subtypes to the *_proteoform_glycosylation_sites_uniprotkb.csv
  5. Added species annotation via subsumption (for human, mouse). Rat and HCV to follow in the next release.
  6. Added updated GlyConnect data. (additional o-GlcNAc sites)
  7. Added glycosylation sites through text mining (first iteration).

Version 1.5.18

Main article: Data release notes/1.5.18

Related API version: 1.5.26

This version was released at April 15th, 2020

  1. Added UniProtKB Gene synonyms (search and details)
  2. Added RefSeq Gene names and synonyms
  3. Added RefSeq Protein synonyms
  4. Added UniProtKB Protein synonyms
  5. Updated the Fasta headers of the protein sequences that now resembles the fasta header of UniProtKB sequences
  6. Updated BioMuta data with addition of comments that shows which filters were passed
  7. Created new datasets:mutation literature mining dataset, dbSNP somatic and germline mutations datasets.
  8. Added GlyGen to Pharos Xref in the protein detail cross-reference section
  9. Added HCV 1a and 1b, SARS-CoV1 and 2 proteomes
  10. Added the MIM disease name where DO names were not available
  11. Updated glycan species annotations.
  12. Added Human, Mouse, Rat glycosylation data from GlyConnect.
  13. Added HCV1a glycosylation data from 1 publication.
  14. Added human glycosylation data from 2 publications.
  15. Added glycan x-refs to MatrixDB, GlycoEpitope
  16. Added MatrixDB Protein-GAGs interaction data. (at GlyGen Data)
  17. Included GlyTouCan-composition accessions.
  18. Added protein x-refs to GlycoProtDB
  19. Retired GlycO and GlycomeDB xrefs .
  20. Added SNFG glycans (at GlyGen Data)
  21. Added animated GIF and .mp4 video of 3D model of SARS-CoV-2 spike glycoprotein. (at GlyGen Data)
  22. Updated the synthesized glycan list from Dr. Boons group.
  23. Included additional 2 FAQs: How do I find a GlyTouCan Boons accession for my glycan composition? and How can I convert my glycan sequence to different formats (e.g IUPAC, WURCS, GlycoCT, LinearCode, etc.)?

Version 1.0

Main article: Data release notes/1.0

This version was released at Nov 22, 2019

  • Isoform Alignment.
  • Homolog Alignment.
  • New Usecase added in the quick search.
  • Composition Search.
  • Go ID search.
  • PMID search.
  • Batch search on advanced protein and glycan search page.
  • Multi-select option for amino acids on advanced glycoprotein search.
  • Multi-select option for organisms on advanced glycan search.
  • Integrate subsumption browser.

External links

https://data.glygen.org/history