Protein details/Sequence

From GlyGen Wiki
Jump to navigation Jump to search

The Sequence section of the Protein details page in GlyGen displays the canonical protein sequence and offers highlighting of certain annotations such as N-linked Sites, sequon, phosphorylation, etc. when the annotation is selected.

Sequence

This section contains the following information:

  • Sequence - The canonical protein sequence in FASTA format from the UniProtKB database
  • N-Linked Sites -
  • O-Linked Sites -
  • Variation from mutation -
  • Sequon -
  • Phosphorylation -
  • Glycation -

Source of information

The sequencce data is collected and integrated from UniProtKB, Glycam, and GlyToucan databases.

  • UniProtKB - protein FASTA sequences gathered from the UniProtKB database.
  • Glycam - Glycan sequences gathered in Glycam IUPAC format for associated glyans.
  • GlyTouCan - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).

Data access

The collected data is processed and stored at data.glygen.org in the following datasets:

Homo Sapiens (Human) Datasets

  • Human Protein Canonical sequences (UniProtKB; GLY_000002)
  • Human Protein Isoform sequences (UniProtKB; GLY_000053)
  • Human Protein Sequence Info (UniProtKB; GLY_000398)

Hepatitis C Virus Datasets

  • HCV1a Protein Canonical sequences (UniProtKB; GLY_000346)
  • HCV1b Protein Canonical sequences (UniProtKB; GLY_000347)
  • HCV1a Protein Isoform Sequences (UniProtKB; GLY_000348)
  • HCV1b Protein Isoform Sequences (UniProtKB; GLY_000349)
  • HCV1a Protein Sequence Info (UniProtKB; GLY_000350)
  • HCV1b Protein Sequence Info (UniProtKB; GLY_000351)

Mus musculus (Mouse) Datasets

  • Mouse Protein Canonical sequences (UniProtKB; GLY_000012)
  • Mouse Protein Isoform sequences (UniProtKB; GLY_000053)
  • Mouse Protein Sequence Info (UniProtKB; GLY_000399)

Rattus norvegicus (Rat) datasets

  • Rat Protein Canonical sequences (UniProtKB; GLY_000240)
  • Rat Protein Isoform sequences (UniProtKB; GLY_000255)
  • Rat Protein Sequence Info (UniProtKB; GLY_000400)

SARS Coronavirus datasets

  • SARS-CoV1 Protein Isoform sequences (UniProtKB; GLY_000411)
  • SARS-CoV2 Protein Isoform sequences (UniProtKB; GLY_000412)
  • SARS-CoV1 Protein Canonical sequences (UniProtKB; GLY_000415)
  • SARS-CoV2 Protein Canonical sequences (UniProtKB; GLY_000416)
  • SARS-CoV1 Protein Sequence Info (UniProtKB; GLY_000400)
  • SARS-CoV2 Protein Sequence Info (UniProtKB; GLY_000441)

Glycan Sequence Datasets

  • Glycan Sequences Glycam IUPAC (Glycam; GLY_000287)
  • Glycan Sequences GlycoCT (GlyTouCan; GLY_000288)
  • Glycan Sequences InChI (GlyTouCan, PubChem; GLY_000289)
  • Glycan Sequences IUPAC Extended (GlyTouCan; GLY_000290)
  • Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; GLY_000291)
  • Glycan Sequences WURCS (GlyTouCan; GLY_000292)

Data harmonization

Data filtering