Protein details/Sequence: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:
*[https://glytoucan.org/ '''GlyTouCan'''] - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).
*[https://glytoucan.org/ '''GlyTouCan'''] - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).


==Data Access==
==Data access==
The collected data is processed and stored at '''[https://data.glygen.org/ data.glygen.org]''' in the following datasets:
The collected data is processed and stored at '''[https://data.glygen.org/ data.glygen.org]''' in the following datasets:


Line 60: Line 60:


*Glycan sequences Byonic (Glycam; [https://data.glygen.org/GLY_000559 GLY_000559])
*Glycan sequences Byonic (Glycam; [https://data.glygen.org/GLY_000559 GLY_000559])
== Data harmonization ==
== Data filtering ==

Revision as of 17:08, 6 December 2021

Function

This section contains the following information:

  • Sequence: The protein sequence in FASTA format from UniProtKB database. The Sequence section offers highlighting of certain annotations such as N-linked Sites, Sequon, Phosphorylation etc when the annotation is selected. The Sequence section also offers to view the protein sequence in ProtVista tool that also allows viewing of various annotations.

Source of information

The sequencce data is collected and integrated from UniProtKB, Glycam, and GlyToucan databases.

  • UniProtKB - protein FASTA sequences gathered from the UniProtKB database.
  • Glycam - Glycan sequences gathered in Glycam IUPAC format for associated glyans.
  • GlyTouCan - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).

Data access

The collected data is processed and stored at data.glygen.org in the following datasets:

Homo Sapiens (Human) Datasets

  • Human Protein Canonical sequences (UniProtKB; GLY_000002)
  • Human Protein Isoform sequences (UniProtKB; GLY_000053)
  • Human Protein Sequence Info (UniProtKB; GLY_000398)

Hepatitis C Virus Datasets

  • HCV1a Protein Canonical sequences (UniProtKB; GLY_000346)
  • HCV1b Protein Canonical sequences (UniProtKB; GLY_000347)
  • HCV1a Protein Isoform Sequences (UniProtKB; GLY_000348)
  • HCV1b Protein Isoform Sequences (UniProtKB; GLY_000349)
  • HCV1a Protein Sequence Info (UniProtKB; GLY_000350)
  • HCV1b Protein Sequence Info (UniProtKB; GLY_000351)

Mus musculus (Mouse) Datasets

  • Mouse Protein Canonical sequences (UniProtKB; GLY_000012)
  • Mouse Protein Isoform sequences (UniProtKB; GLY_000053)
  • Mouse Protein Sequence Info (UniProtKB; GLY_000399)

Rattus norvegicus (Rat) datasets

  • Rat Protein Canonical sequences (UniProtKB; GLY_000240)
  • Rat Protein Isoform sequences (UniProtKB; GLY_000255)
  • Rat Protein Sequence Info (UniProtKB; GLY_000400)

SARS Coronavirus datasets

  • SARS-CoV1 Protein Isoform sequences (UniProtKB; GLY_000411)
  • SARS-CoV2 Protein Isoform sequences (UniProtKB; GLY_000412)
  • SARS-CoV1 Protein Canonical sequences (UniProtKB; GLY_000415)
  • SARS-CoV2 Protein Canonical sequences (UniProtKB; GLY_000416)
  • SARS-CoV1 Protein Sequence Info (UniProtKB; GLY_000400)
  • SARS-CoV2 Protein Sequence Info (UniProtKB; GLY_000441)

Glycan Sequence Datasets

  • Glycan Sequences Glycam IUPAC (Glycam; GLY_000287)
  • Glycan Sequences GlycoCT (GlyTouCan; GLY_000288)
  • Glycan Sequences InChI (GlyTouCan, PubChem; GLY_000289)
  • Glycan Sequences IUPAC Extended (GlyTouCan; GLY_000290)
  • Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; GLY_000291)
  • Glycan Sequences WURCS (GlyTouCan; GLY_000292)

Data harmonization

Data filtering