Protein details/Sequence

From GlyGen Wiki
Revision as of 16:36, 6 December 2021 by Jmurrow (talk | contribs) (Created page with "From GlyGen Wiki == Function == This section contains the following information: * '''Sequence''': The protein sequence in FASTA format from UniProtKB database. The Sequence...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

From GlyGen Wiki

Function

This section contains the following information:

  • Sequence: The protein sequence in FASTA format from UniProtKB database. The Sequence section offers highlighting of certain annotations such as N-linked Sites, Sequon, Phosphorylation etc when the annotation is selected. The Sequence section also offers to view the protein sequence in ProtVista tool that also allows viewing of various annotations.

Source of information

The sequencce data is collected and integrated from UniProtKB, Glycam, and GlyToucan databases.

  • UniProtKB - protein FASTA sequences gathered from the UniProtKB database.
  • Glycam - Glycan sequences gathered in Glycam IUPAC format for associated glyans.
  • GlyTouCan - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).

Data Access

The collected data is processed and stored at data.glygen.org in the following datasets:

Homo Sapiens (Human) Datasets

  • Human Protein Canonical sequences (UniProtKB; GLY_000002)
  • Human Protein Isoform sequences (UniProtKB; GLY_000053)
  • Human Protein Sequence Info (UniProtKB; GLY_000398)

Hepatitis C Virus Datasets

  • HCV1a Protein Canonical sequences (UniProtKB; GLY_000346)
  • HCV1b Protein Canonical sequences (UniProtKB; GLY_000347)
  • HCV1a Protein Isoform Sequences (UniProtKB; GLY_000348)
  • HCV1b Protein Isoform Sequences (UniProtKB; GLY_000349)
  • HCV1a Protein Sequence Info (UniProtKB; GLY_000350)
  • HCV1b Protein Sequence Info (UniProtKB; GLY_000351)

Mus musculus (Mouse) Datasets


  • Mouse Protein Canonical sequences (UniProtKB; GLY_000012)
  • Mouse Protein Isoform sequences (UniProtKB; GLY_000053)
  • Mouse Protein Sequence Info (UniProtKB; GLY_000399)

Rattus norvegicus (Rat) datasets


  • Rat Protein Canonical sequences (UniProtKB; GLY_000240)
  • Rat Protein Isoform sequences (UniProtKB; GLY_000255)
  • Rat Protein Sequence Info (UniProtKB; GLY_000400)

SARS Coronavirus datasets

  • SARS-CoV1 Protein Isoform sequences (UniProtKB; GLY_000411)
  • SARS-CoV2 Protein Isoform sequences (UniProtKB; GLY_000412)
  • SARS-CoV1 Protein Canonical sequences (UniProtKB; GLY_000415)
  • SARS-CoV2 Protein Canonical sequences (UniProtKB; GLY_000416)
  • SARS-CoV1 Protein Sequence Info (UniProtKB; GLY_000400)
  • SARS-CoV2 Protein Sequence Info (UniProtKB; GLY_000441)


Glycan Sequence Datasets

  • Glycan Sequences Glycam IUPAC (Glycam; GLY_000287)
  • Glycan Sequences GlycoCT (GlyTouCan; GLY_000288)
  • Glycan Sequences InChI (GlyTouCan, PubChem; GLY_000289)
  • Glycan Sequences IUPAC Extended (GlyTouCan; GLY_000290)
  • Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; GLY_000291)
  • Glycan Sequences WURCS (GlyTouCan; GLY_000292)