GlyGen Wikipedia: Difference between revisions

GlyGen
Content
Description	GlyGen is the Computational and Informatics Resources for Glycoscience.
Data types; captured	Glycans, Proteins, and Glycoproteins.
Organisms	Homo sapiens, Mus musculus, and Rattus norvegicus.
Contact
Primary citation	GlyGen announcement.
Access
Data format	FASTA, JSON.
Website	www.glygen.org
Web service URL	Yes – Python API
Miscellaneous
License	Creative Commons General Public License
Versioning	Yes
Data release; frequency	Portal: 12 weeks Data: 12 weeks
Version	1.4 (16/ Sep/2019)
Curation policy	Yes – manual and automatic. Rules for automatic annotation generated by database curators and computational algorithms.
Bookmarkable; entities	Yes – individual protein and glycan entries and search results.

VisualWikitext

Latest revision as of 14:03, 10 December 2019

GlyGen is a database for glycans, glycoconjugates and related gene, protein and other molecular biology information. GlyGen retrieves information from multiple international data sources such as PDB, RefSeq, and UniProt, and integrates and harmonizes content to allow unique searches that cannot be executed in any of the integrated databases alone.

Organization

The GlyGen project is an international multi-institutional effort. The effort is led by the University of Georgia (UGA) and the George Washington University (GW). The two institutions collaborate in the development of the GlyGen portal. Whereas UGA is responsible for the front-end web development and GW for the back-end database. In addition, GW is also responsible for the data retrieval and data integration. To this end GW works together with the international GlyGen collaborators including: the European Bioinformatics Institute (EMBL-EBI) and the National Center for Biotechnology Information (NCBI), the Georgetown University, Soka University, and Griffith University (Institute for Glycomics).

Integrated databases

Currently GlyGen integrates data from the following publicly available databases:

Content and features

GlyGen is a data integration and dissemination project for carbohydrate and glycoconjugate related data. GlyGen retrieves information from multiple international data sources and integrates and harmonizes this data. The GlyGen web portal allows exploration of this data and execution of unique searches that cannot be performed using any of the integrated databases in isolation.

Data Integration - Data from the different resources are accessed and downloaded in resource-specific formats (e.g. RDF, FASTA, CSV).
Data Collection - Data integration with intensive data quality control. Metadata is captured using the BioCompute Object schema.
Quick Search - Complex multi-domain search queries can be performed using the quick searches which are based on user requests.
Explore Searches - GlyGen provides users with Glycan, Protein, Glycoprotein searches via simple or advanced search options.
Data Visualization - Ability to visualize GlyGen data statistics via charts, bars, and diagrams. GlyGen integrates human, mouse, and rat proteins, glycans, and glycoproteins.
Resources - A library of glycobiology resources including databases, tools, learning material and tutorials are provided.
SPARQL Endpoint - All datasets are also RDFized using standard ontologies (e.g. UniProt RDF schema, GlycoCoO, FALDO) and made available via our public endpoint.
Feedback - Our integrated feedback system allows users to submit comments and suggestions on every web page.

Availability

We have chosen to apply the Creative Commons Attribution 4.0 International (CC BY 4.0) license to all our database sets. This allows to copy, distribute, display and make commercial use of the data in all legislations, provided users give us credit. The source code of the project is released under the GNU General Public License v3 and is available in our GlyGen GitHub repository. GlyGen data is available for free of charge and accessible via GlyGen GitHub repository, Portal, Data, API, SPARQL.

Funding

GlyGen is an international project funded by the National Institutes of Health to facilitate glycoscience research by integrating diverse kinds of information, including glycomics, genomics, proteomics (and glycoproteomics), cell biology, developmental biology and biochemistry. GlyGen is supported and funded by the NIH Glycoscience Common Fund Program managed by the Office of Strategic Coordination at the National Institute of Health (NIH) under the grant 1U01GM125267-01.

References

↑ GlyGen, Article. (October 2019). "GlyGen: Computational and Informatics Resources for Glycoscience". Glycobiology. 1 (Resources for Glycoscience). doi:10.1093/glycob/cwz080. PMID 31616925.

External links

Oficial

Official website

Funding

Availability

[1] GlyGen, Article. (October 2019). "GlyGen: Computational and Informatics Resources for Glycoscience". Glycobiology. 1 (Resources for Glycoscience). doi:10.1093/glycob/cwz080. PMID 31616925.

[1]

@@ Line 1: / Line 1: @@
 {{infobox biodatabase
 |title = GlyGen
-|logo = [[File:Logo-glygen-blue-top.svg|220px]]
+|logo = [[File:Logo-glygen-blue-top.svg]]
 |description = '''GlyGen''' is the '''Computational and Informatics Resources for Glycoscience.'''
-|scope = Glycans, Proteins, and Glycoproteins
+|scope = [[Glycans]], [[Proteins]], and [[Glycoproteins]].
-|organism = Homo sapiens, Mus musculus, and Rattus norvegicus.
+|organism = [[Homo sapiens]], [[Mus musculus]], and [[Rattus norvegicus]].
-|center = [[European Bioinformatics Institute|EMBL-EBI]], UK; [[Swiss Institute of Bioinformatics|SIB]], Switzerland; [[Protein Information Resource|PIR]], US.
+|center =
 |laboratory =
 |author =
-|citation = UniProt Consortium<ref>{{cite journal|last1=UniProt|first1=Consortium.|title=UniProt: a hub for protein information.|journal=Nucleic Acids Research|date=January 2015|volume=43|issue=Database issue|pages=D204–12|pmid=25348405|doi=10.1093/nar/gku989|pmc=4384041}}</ref>
+|citation = GlyGen announcement.<ref>{{cite journal|last1=GlyGen|first1=Article.|title=GlyGen: Computational and Informatics Resources for Glycoscience.|journal=Glycobiology|date=October 2019|volume=1|issue=Resources for Glycoscience|pmid=31616925|doi=10.1093/glycob/cwz080}}</ref>
 |released =
 |standard =
-|format = Custom flat file, [[FASTA]], [[General feature format|GFF]], [[Resource Description Framework|RDF]], [[XML]].
+|format = [[FASTA]], [[JSON]].
-|url = {{URL|www.uniprot.org}}<br />{{URL|www.uniprot.org/news/}}
+|url = {{URL|www.glygen.org}}
-|download = {{URL|www.uniprot.org/downloads}} & for downloading complete data sets {{URL|ftp.uniprot.org}}
+|download =
-|webservice = Yes – [[Python (programming language)|PYTHON]] [[API]] see all frameworks {{URL|https://www.glygen.org/frameworks.html/|here}}
+|webservice = Yes – [[Python (programming language)|Python]] [https://api.glygen.org/ API]
 |sql =
 |sparql =
 |webapp =
 |standalone =
-|license =
+|license = [[Creative Commons]] [[General Public License]]
 |versioning = Yes
-|frequency = 12 weeks
+|frequency = '''Portal:''' 12 weeks '''Data:''' 12 weeks
 |curation = Yes – manual and automatic. Rules for automatic annotation generated by database curators and computational algorithms.
-|bookmark =
+|bookmark = Yes – individual protein and glycan entries and search results.
-|version =
+|version = 1.4 (16/ Sep/2019)
 }}
+'''GlyGen''' is a database for [[glycans]], [[glycoconjugates]] and related [[gene]], [[protein]] and other [[molecular biology]] information. GlyGen retrieves information from multiple international data sources such as [[PDB]], [[RefSeq]], and [[UniProt]], and integrates and harmonizes content to allow unique searches that cannot be executed in any of the integrated databases alone.
-'''GlyGen''' is a data integration and dissemination project for carbohydrate and glycoconjugate related data. GlyGen retrieves information from multiple international data sources and integrates and harmonizes this data. This web portal allows exploring this data and performing unique searches that cannot be executed in any of the integrated databases alone.
+==Organization==
+The GlyGen project is an international multi-institutional effort. The effort is led by the [[University of Georgia]] (UGA) and the [[George Washington University]] (GW). The two institutions collaborate in the development of the GlyGen portal. Whereas UGA is responsible for the [[front-end web development]] and GW for the [[back-end database]]. In addition, GW is also responsible for the data retrieval and data integration. To this end GW works together with the international GlyGen collaborators including: the [[European Bioinformatics Institute]] (EMBL-EBI) and the [[National Center for Biotechnology Information]] (NCBI), the [[Georgetown University]], [[Soka University]], and [[Griffith University]] (Institute for Glycomics).
-==About==
+==Integrated databases==
-'''GlyGen''' is a Computational and Informatics Resources for Glycoscience. GlyGen retrieves information from multiple international data sources and integrates and harmonizes this data. GlyGen allows for exploring these data by performing unique searches that cannot be executed in any of the existing databases alone.
+Currently GlyGen integrates data from the following publicly available databases:
+* [[BioXpress]]
+* [[BioMuta]]
+* [[Disease Ontology]]
+* [[GlyTouCan]]
+* [[Mouse Genome Database]] ([[MGI]]
+* [[PubChem|NCBI PubChem]]
+* [[PubMed|NCBI PubMed]]
+* [[RefSeq|NCBI RefSeq]]
+* [[Taxonomy_(biology)|NCBI Taxonomy]]
+* [[Orthologous MAtrix]] (OMA)
+* The [[Protein Ontology]] ([[PRO]])
+* RCSB The [[Protein Data Bank]] (PDB)
+* [[The Monarch Initiative]]
+* [[UniCarbKB]]
+* [[UniProtKB]] The [[UniProt Knowledgebase]]
-'''The GlyGen mission''' is to provide computational and informatics resources and tools for glycosciences research.
+== Content and features ==
-Integrate data and knowledge from diverse disciplines relevant to glycobiology. Address needs inside and outside the glycoscience community.
+GlyGen is a data integration and dissemination project for carbohydrate and glycoconjugate related data. GlyGen retrieves information from multiple international data sources and integrates and harmonizes this data. The GlyGen web portal allows exploration of this data and execution of unique searches that cannot be performed using any of the integrated databases in isolation.
+* ''Data Integration'' - Data from the different resources are accessed and downloaded in resource-specific formats (e.g. RDF, FASTA, CSV).
+* '' Data Collection'' - Data integration with intensive data quality control. Metadata is captured using the BioCompute Object schema.
+* ''Quick Search'' - Complex multi-domain search queries can be performed using the quick searches which are based on user requests.
+* ''Explore Searches'' - GlyGen provides users with Glycan, Protein, Glycoprotein searches via simple or advanced search options.
+* ''Data Visualization'' - Ability to visualize GlyGen data statistics via charts, bars, and diagrams. GlyGen integrates human, mouse, and rat proteins, glycans, and glycoproteins.
+* ''Resources'' - A library of glycobiology resources including databases, tools, learning material and tutorials are provided.
+* ''SPARQL Endpoint'' - All datasets are also RDFized using standard ontologies (e.g. UniProt RDF schema, GlycoCoO, FALDO) and made available via our public endpoint.
+* ''Feedback'' - Our integrated feedback system allows users to submit comments and suggestions on every web page.
-'''The major goal of GlyGen''' is to develop an integrated, extendable, and cross-disciplinary resource providing tools and data to address specific questions in glycoscience. Currently, these questions can be answered only by extensive literature-based research and/or manual collection of data from disparate databases and websites. The GlyGen project is built using insight gained during workshops that evaluated existing resources and identified pressing community needs.
+== Availability ==
+We have chosen to apply the [[Creative Commons]] Attribution 4.0 International ([[CC BY 4.0]]) license to all our database sets. This allows to copy, distribute, display and make commercial use of the data in all legislations, provided users give us credit.
-'''The GlyGen Effort.''' GlyGen is a cooperative, global, community-driven project. An open, standardized environment for independent development and integration of additional research tools by other investigators. More than 15 investigators in four countries play key roles in the project. Two years of organized discussion and planning involving nearly 100 investigators.
+The source code of the project is released under the GNU [[General Public License]] v3 and is available in our [[GlyGen Wikipedia#External links|GlyGen GitHub repository]].
+GlyGen data is available for free of charge and accessible via [[GlyGen Wikipedia#External links|GlyGen GitHub repository]], [[GlyGen Wikipedia#External links|Portal]], [[GlyGen Wikipedia#External links|Data]], [[GlyGen Wikipedia#External links|API]], [[GlyGen Wikipedia#External links|SPARQL]].
-'''GlyGen as the Resource.''' Ongoing technical advances are accelerating the pace and sophistication of glycoscience data acquisition, the transformation of data to glycobiology knowledge, and insight.
-Understanding is compromised by the lack of glycoinformatics databases and tools to combine information from related disciplines.
-The functional and biomedical interpretation of glycobiology data is slowed by our limited ability to integrate it with biological knowledge from diverse disciplines.
-GlyGen addresses these needs as a broadly relevant and sustainable glycoinformatics resource that provides a roadmap to explore data from diverse domains in the context of glycoscience.
 ==Funding==
-GlyGen is an international project funded by The National Institutes of Health to facilitate glycoscience research by integrating diverse kinds of information, including glycomics, genomics, proteomics (and glycoproteomics), cell biology, developmental biology and biochemistry. GlyGen is supported and funded by the [[GlyGen Wikipedia#External links|NIH Glycoscience Common Fund Program]] managed by the [[GlyGen Wikipedia#External links|Office of Strategic Coordination]] at [[GlyGen Wikipedia#External links|National Institute of Health (NIH)]] under the grant [[GlyGen Wikipedia#External links|1U01GM125267-01]].
+GlyGen is an international project funded by the [[National Institutes of Health]] to facilitate glycoscience research by integrating diverse kinds of information, including [[glycomics]], [[genomics]], [[proteomics]] (and [[glycoproteomics]]), [[cell biology]], [[developmental biology]] and [[biochemistry]]. GlyGen is supported and funded by the [[GlyGen Wikipedia#External links|NIH Glycoscience Common Fund Program]] managed by the [[GlyGen Wikipedia#External links|Office of Strategic Coordination]] at the [[National Institute of Health]] (NIH) under the grant [[GlyGen Wikipedia#External links|1U01GM125267-01]].
-==Databasess==
-We have chosen to apply the [[GlyGen Wikipedia#External links|Creative Commons Attribution 4.0 International (CC BY 4.0)]] license to all our database sets. This means that you are free to copy, distribute, display and make commercial use of these databases in all legislations, provided you give us credit.
-The source code of the project is released under [[GlyGen Wikipedia#External links|GNU General Public License v3]] and is available in our [[GlyGen Wikipedia#External links|GlyGen GitHub]] repository.
-Below are some of the databases we integrate data from, along with their license information.
-{| class="wikitable"
-|-
-! Databases !! Description !! License Type
-|-
-| [https://hive.biochemistry.gwu.edu/bioxpress BioXpress] || BioXpress is a gene/miRNA expression and disease association database with expression levels mapped to genes or miRNAs. || [https://creativecommons.org/licenses/by/4.0/ CC BY 4.0]
-|-
-| [https://hive.biochemistry.gwu.edu/bioxpress BioXpress] || BioXpress is a gene/miRNA expression and disease association database with expression levels mapped to genes or miRNAs. || [https://creativecommons.org/licenses/by/4.0/ CC BY 4.0]
-|-
-|}
-== Resources ==
-A list of publicly available databases, repositories and knowledgebases providing glycan-related information.
-{| class="wikitable"
+== See also ==
-|-
+{{See also|proteomics|glycomics|data warehouse|sparql}}
-! Category !! Website !! Description !! Publications
-|-
-| Database || [http://www.cazy.org/ CAZy] || The CAZy database describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds. ||
-|-
-| Database || [http://csdb.glycoscience.ru/ CSDB: Carbohydrate Structure Database] || A set of manually curated carbohydrate structure databases with associated metadata (taxonomic annotation, bibliography, and NMR data). There are subdatabases focusing on bacterial carbohydrates, plant and fungal carbohydrates as well as a database for glycosyltransferases. || Yes
-|-
-| Database || [https://www.glycostore.org/ GlycoStore] || GlycoStore is a curated chromatographic, electrophoretic and mass-spectrometry composition database of N-, O-, glycosphingolipid (GSL) glycans and free oligosaccharides associated with a range of glycoproteins, glycolipids and biotherapeutics. This database associates glycan structures with standardized retention times (UPLC, HILIC-UPLC, CE-LIF, HILIC-HPLC, PGC-LC-ESI-MS/MS). || Yes
-|-
-| Database || [https://ccr2.cancer.gov/resources/Cbl/Tools/Antibody/Default.aspx DAGR] || Database of Anti-Glycan Reagents (DAGR) is a publicly available, comprehensive resource for anticarbohydrate antibodies, their applications, availability, and quality. DAGR allows to search and identify antibodies and reagent lectins to various carbohydrates as well as to obtain information about those antibodies. In addition, users can add new antibodies/lectins to the database or information about them.|| Yes
-|-
-| Knowledgebase || [https://www.cazypedia.org/index.php/Main_Page CAZypedia] || CAZypedia is a community-driven resource to assemble a comprehensive encyclopedia of the carbohydrate-active enzymes and associated carbohydrate-binding modules involved in the synthesis and degradation of complex carbohydrates. CAZypedia is closely connected with, the actively curated CAZy Database. ||
-|-
-| Knowledgebase || [http://www.unicarbkb.org/ UniCarbKB] || UniCarbKB is a curated knowledgebase of glycosylated proteins, the attached glycan structures and if known, the glycosylation site. || Yes
-|-
-| Knowledgebase || [https://jcggdb.jp/GlycoPOD/protocolListShow GlycoPOD]|| GlycoPOD is the glycosciences protocol database. This resource contains an extensive and curated list of experimental protocols and detailed instructions on the execution of these protocols as well as the necessary reagents and hardware. || Yes
-|-
-| Repository || [https://glytoucan.org/ GlyTouCan]|| GlyTouCan is the international glycan structure repository. This repository is the uncurated registry for glycan structures that assigns globally unique accession numbers to any glycan independent of the level of information provided by the experimental method used to identify the structure.|| Yes
-|-
-|}
+==References==
+{{reflist}}
 ==External links==
 ;Oficial
-*[http://glygen.org/ Official website]
+*[https://www.glygen.org/ Official website]
 ;Funding
 *[https://commonfund.nih.gov/glycoscience| NIH Glycoscience Common Fund Program]
 *[https://commonfund.nih.gov/about/osc| Office of Strategic Coordination]
-*[https://www.nih.gov/| National Institute of Health (NIH)]
 *[https://projectreporter.nih.gov/project_info_details.cfm?aid=9391499&icde=0| Grant 1U01GM125267-01]
-;Databases
+;Availability
-*[https://creativecommons.org/licenses/by/4.0/| Creative Commons Attribution 4.0 International (CC BY 4.0)]
+*[https://github.com/glygener/ GlyGen GitHub repository]
-*[https://www.gnu.org/licenses/gpl-3.0.en.html| GNU General Public License v3]
+*[https://www.glygen.org/ Portal]
-*[https://github.com/glygener| GlyGen GitHub repository]
+*[https://data.glygen.org/ Data]
+*[https://sparql.glygen.org/ SPARQL]
+*[https://api.glygen.org/ API]
+{{Bioinformatics}}
+<!-- Categories -->
+[[Category:Biological databases]]
+[[Category:Online databases]]
+[[Category:Proteomics]]
+[[Category:Glycomics]]

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive and DNA Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: Protein Data Bank, Ensembl and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish Information Network
Software	BLAST Bowtie Clustal HMMER MUSCLE SAMtools TopHat
Other	Server: ExPASy Ontology: Gene Ontology Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format
Related topics	Computational biology List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons