Protein details/Sequence: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
==Function==
The Sequence section of the [[Protein details]] page in GlyGen displays the canonical protein sequence and offers highlighting of certain annotations such as N-linked Sites, Sequon, Phosphorylation, etc. when the annotation is selected.
 
==Sequence==
This section contains the following information:
This section contains the following information:


*'''Sequence''': The protein sequence in FASTA format from UniProtKB database. The Sequence section offers highlighting of certain annotations such as N-linked Sites, Sequon, Phosphorylation etc when the annotation is selected. The Sequence section also offers to view the protein sequence in ProtVista tool that also allows viewing of various annotations.
*'''Sequence''' - The canonical protein sequence in FASTA format from the UniProtKB database
*'''N-Linked Sites''' -
*'''O-Linked Sites''' -
*'''Variation from mutation''' -
*'''Sequon''' -
*'''Phosphorylation''' -
*'''Glycation''' -


==Source of information==
==Source of information==
Line 61: Line 69:
*Glycan sequences Byonic (Glycam; [https://data.glygen.org/GLY_000559 GLY_000559])
*Glycan sequences Byonic (Glycam; [https://data.glygen.org/GLY_000559 GLY_000559])


== Data harmonization ==
==Data harmonization==


== Data filtering ==
==Data filtering==

Revision as of 15:08, 9 December 2021

The Sequence section of the Protein details page in GlyGen displays the canonical protein sequence and offers highlighting of certain annotations such as N-linked Sites, Sequon, Phosphorylation, etc. when the annotation is selected.

Sequence

This section contains the following information:

  • Sequence - The canonical protein sequence in FASTA format from the UniProtKB database
  • N-Linked Sites -
  • O-Linked Sites -
  • Variation from mutation -
  • Sequon -
  • Phosphorylation -
  • Glycation -

Source of information

The sequencce data is collected and integrated from UniProtKB, Glycam, and GlyToucan databases.

  • UniProtKB - protein FASTA sequences gathered from the UniProtKB database.
  • Glycam - Glycan sequences gathered in Glycam IUPAC format for associated glyans.
  • GlyTouCan - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).

Data access

The collected data is processed and stored at data.glygen.org in the following datasets:

Homo Sapiens (Human) Datasets

  • Human Protein Canonical sequences (UniProtKB; GLY_000002)
  • Human Protein Isoform sequences (UniProtKB; GLY_000053)
  • Human Protein Sequence Info (UniProtKB; GLY_000398)

Hepatitis C Virus Datasets

  • HCV1a Protein Canonical sequences (UniProtKB; GLY_000346)
  • HCV1b Protein Canonical sequences (UniProtKB; GLY_000347)
  • HCV1a Protein Isoform Sequences (UniProtKB; GLY_000348)
  • HCV1b Protein Isoform Sequences (UniProtKB; GLY_000349)
  • HCV1a Protein Sequence Info (UniProtKB; GLY_000350)
  • HCV1b Protein Sequence Info (UniProtKB; GLY_000351)

Mus musculus (Mouse) Datasets

  • Mouse Protein Canonical sequences (UniProtKB; GLY_000012)
  • Mouse Protein Isoform sequences (UniProtKB; GLY_000053)
  • Mouse Protein Sequence Info (UniProtKB; GLY_000399)

Rattus norvegicus (Rat) datasets

  • Rat Protein Canonical sequences (UniProtKB; GLY_000240)
  • Rat Protein Isoform sequences (UniProtKB; GLY_000255)
  • Rat Protein Sequence Info (UniProtKB; GLY_000400)

SARS Coronavirus datasets

  • SARS-CoV1 Protein Isoform sequences (UniProtKB; GLY_000411)
  • SARS-CoV2 Protein Isoform sequences (UniProtKB; GLY_000412)
  • SARS-CoV1 Protein Canonical sequences (UniProtKB; GLY_000415)
  • SARS-CoV2 Protein Canonical sequences (UniProtKB; GLY_000416)
  • SARS-CoV1 Protein Sequence Info (UniProtKB; GLY_000400)
  • SARS-CoV2 Protein Sequence Info (UniProtKB; GLY_000441)

Glycan Sequence Datasets

  • Glycan Sequences Glycam IUPAC (Glycam; GLY_000287)
  • Glycan Sequences GlycoCT (GlyTouCan; GLY_000288)
  • Glycan Sequences InChI (GlyTouCan, PubChem; GLY_000289)
  • Glycan Sequences IUPAC Extended (GlyTouCan; GLY_000290)
  • Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; GLY_000291)
  • Glycan Sequences WURCS (GlyTouCan; GLY_000292)

Data harmonization

Data filtering