GlyGen Wikipedia

From GlyGen Wiki
Revision as of 18:24, 14 October 2019 by Tvwillia (talk | contribs)
Jump to navigation Jump to search
GlyGen
Content
DescriptionGlyGen is the Computational and Informatics Resources for Glycoscience.
Data types
captured
Glycans, Proteins, and Glycoproteins
OrganismsHomo sapiens, Mus musculus, and Rattus norvegicus.
Access
Data formatFASTA, JSON.
Websitewww.glygen.org
Web service URLYes – PYTHON API see all frameworks here
Miscellaneous
VersioningYes
Data release
frequency
12 weeks
Curation policyYes – manual and automatic. Rules for automatic annotation generated by database curators and computational algorithms.

GlyGen is a data integration and dissemination project for carbohydrate and glycoconjugate related data. GlyGen retrieves information from multiple international data sources and integrates and harmonizes this data. This web portal allows exploring this data and performing unique searches that cannot be executed in any of the integrated databases alone.

About

GlyGen is a Computational and Informatics Resources for Glycoscience. GlyGen retrieves information from multiple international data sources and integrates and harmonizes this data. GlyGen allows for exploring these data by performing unique searches that cannot be executed in any of the existing databases alone.

The GlyGen mission is to provide computational and informatics resources and tools for glycosciences research. Integrate data and knowledge from diverse disciplines relevant to glycobiology. Address needs inside and outside the glycoscience community.

The major goal of GlyGen is to develop an integrated, extendable, and cross-disciplinary resource providing tools and data to address specific questions in glycoscience. Currently, these questions can be answered only by extensive literature-based research and/or manual collection of data from disparate databases and websites. The GlyGen project is built using insight gained during workshops that evaluated existing resources and identified pressing community needs.

The GlyGen Effort. GlyGen is a cooperative, global, community-driven project. An open, standardized environment for independent development and integration of additional research tools by other investigators. More than 15 investigators in four countries play key roles in the project. Two years of organized discussion and planning involving nearly 100 investigators.

GlyGen as the Resource. Ongoing technical advances are accelerating the pace and sophistication of glycoscience data acquisition, the transformation of data to glycobiology knowledge, and insight. Understanding is compromised by the lack of glycoinformatics databases and tools to combine information from related disciplines. The functional and biomedical interpretation of glycobiology data is slowed by our limited ability to integrate it with biological knowledge from diverse disciplines. GlyGen addresses these needs as a broadly relevant and sustainable glycoinformatics resource that provides a roadmap to explore data from diverse domains in the context of glycoscience.

Funding

GlyGen is an international project funded by The National Institutes of Health to facilitate glycoscience research by integrating diverse kinds of information, including glycomics, genomics, proteomics (and glycoproteomics), cell biology, developmental biology and biochemistry. GlyGen is supported and funded by the NIH Glycoscience Common Fund Program managed by the Office of Strategic Coordination at National Institute of Health (NIH) under the grant 1U01GM125267-01.

Databasess

We have chosen to apply the Creative Commons Attribution 4.0 International (CC BY 4.0) license to all our database sets. This means that you are free to copy, distribute, display and make commercial use of these databases in all legislations, provided you give us credit. The source code of the project is released under GNU General Public License v3 and is available in our GlyGen GitHub repository. Below are some of the databases we integrate data from, along with their license information.

Databases Description License Type
BioXpress BioXpress is a gene/miRNA expression and disease association database with expression levels mapped to genes or miRNAs. CC BY 4.0
BioMuta BioMuta is a single-nucleotide variation (SNV) and disease association database where variations are mapped to genomes and RefSeq nucleotide entries, and unified through UniProtKB/Swiss-Prot positional coordinates. CC BY 4.0
Disease Ontology The Disease Ontology has been developed as a standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease terms, phenotype characteristics and related medical vocabulary disease concepts. CC0 1.0
GlyTouCan GlyTouCan is an International Glycan Repository that enables the registration, query, and linking of glycan structures through a browser or programmable interface. The service also provides a variety of methods to query and browse the relationships of structures based on logic inherent within the structures itself. CC BY 4.0
MGI MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. Warranty Disclaimer And Copyright Notice
NCBI HomoGene HomoloGene an automated system for constructing putative homology groups from the complete gene sets of a wide range of eukaryotic species. NCBI License
NCBI PubChem PubChem is the world's largest collection of freely accessible chemical information. Search chemicals by name, molecular formula, structure, and other identifiers. Find chemical and physical properties, biological activities, safety and toxicity information, patents, literature citations and more. NCBI License
NCBI PubMed PubMed comprises more than 30 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. NCBI License
NCBI Refseq Refseq A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein. NCBI License
NCBI Taxonomy The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases. This currently represents about 10% of the described species of life on the planet. NCBI License
OMA Browser The OMA (“Orthologous MAtrix”) project is a method and database for the inference of orthologs among complete genomes. The distinctive features of OMA are its broad scope and size, high quality of inferences, feature-rich web interface, availability of data in a wide range of formats and interfaces. CC BY-SA 2.5
PRO The PRotein Ontology (PRO) formally defines taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. CC BY 4.0
RCSB PDB The Protein Data Bank (PDB) is an archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. RCSB PDB Policies
The Monarch Initiative The Monarch Initiative is the use of computational reasoning to enable phenotype comparison both within and across species, with the ultimate goal of improving biomedical research. They are driven to truly integrate biological information using semantics, and present it in a novel way, leveraging phenotypes to bridge the knowledge gap. CC BY 3.0
UniCarbKB UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data. CC BY-NC-ND 3.0
UniProtKB The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. CC BY 4.0

Resources

A list of publicly available databases, repositories and knowledgebases providing glycan-related information.

Category Website Description Publications
Database CAZy The CAZy database describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds.
Database CSDB: Carbohydrate Structure Database A set of manually curated carbohydrate structure databases with associated metadata (taxonomic annotation, bibliography, and NMR data). There are subdatabases focusing on bacterial carbohydrates, plant and fungal carbohydrates as well as a database for glycosyltransferases. Yes
Database GlycoStore GlycoStore is a curated chromatographic, electrophoretic and mass-spectrometry composition database of N-, O-, glycosphingolipid (GSL) glycans and free oligosaccharides associated with a range of glycoproteins, glycolipids and biotherapeutics. This database associates glycan structures with standardized retention times (UPLC, HILIC-UPLC, CE-LIF, HILIC-HPLC, PGC-LC-ESI-MS/MS). Yes
Database DAGR Database of Anti-Glycan Reagents (DAGR) is a publicly available, comprehensive resource for anticarbohydrate antibodies, their applications, availability, and quality. DAGR allows to search and identify antibodies and reagent lectins to various carbohydrates as well as to obtain information about those antibodies. In addition, users can add new antibodies/lectins to the database or information about them. Yes
Knowledgebase CAZypedia CAZypedia is a community-driven resource to assemble a comprehensive encyclopedia of the carbohydrate-active enzymes and associated carbohydrate-binding modules involved in the synthesis and degradation of complex carbohydrates. CAZypedia is closely connected with, the actively curated CAZy Database.
Knowledgebase UniCarbKB UniCarbKB is a curated knowledgebase of glycosylated proteins, the attached glycan structures and if known, the glycosylation site. Yes
Knowledgebase GlycoPOD GlycoPOD is the glycosciences protocol database. This resource contains an extensive and curated list of experimental protocols and detailed instructions on the execution of these protocols as well as the necessary reagents and hardware. Yes
Repository GlyTouCan GlyTouCan is the international glycan structure repository. This repository is the uncurated registry for glycan structures that assigns globally unique accession numbers to any glycan independent of the level of information provided by the experimental method used to identify the structure. Yes


External links

Oficial
Funding
Databases