Protein details/Sequence: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
(Created page with "From GlyGen Wiki == Function == This section contains the following information: * '''Sequence''': The protein sequence in FASTA format from UniProtKB database. The Sequence...")
 
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
From GlyGen Wiki
The Sequence section of the [[Protein details]] page in GlyGen displays the canonical protein sequence and offers highlighting of certain annotations such as N-linked Sites, sequon, phosphorylation, etc. when the annotation is selected.


== Function ==
==Sequence==
This section contains the following information:
This section contains the following information:


* '''Sequence''': The protein sequence in FASTA format from UniProtKB database. The Sequence section offers highlighting of certain annotations such as N-linked Sites, Sequon, Phosphorylation etc when the annotation is selected. The Sequence section also offers to view the protein sequence in ProtVista tool that also allows viewing of various annotations.
*'''Sequence''' - The canonical protein sequence in FASTA format from the UniProtKB database
*'''N-Linked Sites''' -
*'''O-Linked Sites''' -
*'''Variation from mutation''' -
*'''Sequon''' -
*'''Phosphorylation''' -
*'''Glycation''' -


== Source of information ==
==Source of information==
The sequencce data is collected and integrated from '''[https://www.uniprot.org/ UniProtKB], [https://glycam.org/ Glycam],''' and '''[https://glytoucan.org/ GlyToucan]''' databases.
The sequencce data is collected and integrated from '''[https://www.uniprot.org/ UniProtKB], [https://glycam.org/ Glycam],''' and '''[https://glytoucan.org/ GlyToucan]''' databases.


* '''[https://www.uniprot.org/ UniProtKB] -''' protein FASTA sequences gathered from the UniProtKB database.  
*'''[https://www.uniprot.org/ UniProtKB] -''' protein FASTA sequences gathered from the UniProtKB database.
* '''[https://glycam.org/ Glycam] -''' Glycan sequences gathered in Glycam IUPAC format for associated glyans.  
*'''[https://glycam.org/ Glycam] -''' Glycan sequences gathered in Glycam IUPAC format for associated glyans.
* [https://glytoucan.org/ '''GlyTouCan'''] - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).
*[https://glytoucan.org/ '''GlyTouCan'''] - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).


== Data Access ==
==Data access==
The collected data is processed and stored at '''[https://data.glygen.org/ data.glygen.org]''' in the following datasets:
The collected data is processed and stored at '''[https://data.glygen.org/ data.glygen.org]''' in the following datasets:


Homo Sapiens (Human) Datasets
Homo Sapiens (Human) Datasets


* Human Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000002 GLY_000002])
*Human Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000002 GLY_000002])
* Human Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000053 GLY_000053])
*Human Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000053 GLY_000053])
* Human Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000398 GLY_000398])
*Human Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000398 GLY_000398])


Hepatitis C Virus Datasets
Hepatitis C Virus Datasets


* HCV1a Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000346 GLY_000346])
*HCV1a Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000346 GLY_000346])
* HCV1b Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000347 GLY_000347])
*HCV1b Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000347 GLY_000347])
* HCV1a Protein Isoform Sequences (UniProtKB; [https://data.glygen.org/GLY_000348 GLY_000348])
*HCV1a Protein Isoform Sequences (UniProtKB; [https://data.glygen.org/GLY_000348 GLY_000348])
* HCV1b Protein Isoform Sequences (UniProtKB; [https://data.glygen.org/GLY_000349 GLY_000349])
*HCV1b Protein Isoform Sequences (UniProtKB; [https://data.glygen.org/GLY_000349 GLY_000349])
* HCV1a Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000350 GLY_000350])
*HCV1a Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000350 GLY_000350])
* HCV1b Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000351 GLY_000351])
*HCV1b Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000351 GLY_000351])


Mus musculus (Mouse) Datasets
Mus musculus (Mouse) Datasets


 
*Mouse Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000012 GLY_000012])
 
*Mouse Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000054 GLY_000053])
* Mouse Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000012 GLY_000012])
*Mouse Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000399 GLY_000399])
* Mouse Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000054 GLY_000053])
* Mouse Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000399 GLY_000399])


Rattus norvegicus (Rat) datasets
Rattus norvegicus (Rat) datasets


 
*Rat Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000240 GLY_000240])
 
*Rat Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000255 GLY_000255])
* Rat Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000240 GLY_000240])
*Rat Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000400 GLY_000400])
* Rat Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000255 GLY_000255])
* Rat Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000400 GLY_000400])


SARS Coronavirus datasets
SARS Coronavirus datasets


* SARS-CoV1 Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000411 GLY_000411])
*SARS-CoV1 Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000411 GLY_000411])
* SARS-CoV2 Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000412 GLY_000412])
*SARS-CoV2 Protein Isoform sequences (UniProtKB; [https://data.glygen.org/GLY_000412 GLY_000412])
* SARS-CoV1 Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000415 GLY_000415])
*SARS-CoV1 Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000415 GLY_000415])
* SARS-CoV2 Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000416 GLY_000416])
*SARS-CoV2 Protein Canonical sequences (UniProtKB; [https://data.glygen.org/GLY_000416 GLY_000416])
* SARS-CoV1 Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000440 GLY_000400])
*SARS-CoV1 Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000440 GLY_000400])
* SARS-CoV2 Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000441 GLY_000441])
*SARS-CoV2 Protein Sequence Info (UniProtKB; [https://data.glygen.org/GLY_000441 GLY_000441])


Glycan Sequence Datasets


*Glycan Sequences Glycam IUPAC (Glycam; [https://data.glygen.org/GLY_000287 GLY_000287])
*Glycan Sequences GlycoCT (GlyTouCan; [https://data.glygen.org/GLY_000288 GLY_000288])
*Glycan Sequences InChI (GlyTouCan, PubChem; [https://data.glygen.org/GLY_000289 GLY_000289])
*Glycan Sequences IUPAC Extended (GlyTouCan; [https://data.glygen.org/GLY_000290 GLY_000290])
*Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; [https://data.glygen.org/GLY_000291 GLY_000291])
*Glycan Sequences WURCS (GlyTouCan; [https://data.glygen.org/GLY_000292 GLY_000292])


Glycan Sequence Datasets
*Glycan sequences Byonic (Glycam; [https://data.glygen.org/GLY_000559 GLY_000559])


* Glycan Sequences Glycam IUPAC (Glycam; [https://data.glygen.org/GLY_000287 GLY_000287])
==Data harmonization==
* Glycan Sequences GlycoCT (GlyTouCan; [https://data.glygen.org/GLY_000288 GLY_000288])
* Glycan Sequences InChI (GlyTouCan, PubChem; [https://data.glygen.org/GLY_000289 GLY_000289])
* Glycan Sequences IUPAC Extended (GlyTouCan; [https://data.glygen.org/GLY_000290 GLY_000290])
* Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; [https://data.glygen.org/GLY_000291 GLY_000291])
* Glycan Sequences WURCS (GlyTouCan; [https://data.glygen.org/GLY_000292 GLY_000292])


* Glycan sequences Byonic (Glycam; [https://data.glygen.org/GLY_000559 GLY_000559])
==Data filtering==

Latest revision as of 19:22, 9 December 2021

The Sequence section of the Protein details page in GlyGen displays the canonical protein sequence and offers highlighting of certain annotations such as N-linked Sites, sequon, phosphorylation, etc. when the annotation is selected.

Sequence

This section contains the following information:

  • Sequence - The canonical protein sequence in FASTA format from the UniProtKB database
  • N-Linked Sites -
  • O-Linked Sites -
  • Variation from mutation -
  • Sequon -
  • Phosphorylation -
  • Glycation -

Source of information

The sequencce data is collected and integrated from UniProtKB, Glycam, and GlyToucan databases.

  • UniProtKB - protein FASTA sequences gathered from the UniProtKB database.
  • Glycam - Glycan sequences gathered in Glycam IUPAC format for associated glyans.
  • GlyTouCan - Glycan sequences in IUPAC extended format for associated glycans (GlyTouCan Accessions).

Data access

The collected data is processed and stored at data.glygen.org in the following datasets:

Homo Sapiens (Human) Datasets

  • Human Protein Canonical sequences (UniProtKB; GLY_000002)
  • Human Protein Isoform sequences (UniProtKB; GLY_000053)
  • Human Protein Sequence Info (UniProtKB; GLY_000398)

Hepatitis C Virus Datasets

  • HCV1a Protein Canonical sequences (UniProtKB; GLY_000346)
  • HCV1b Protein Canonical sequences (UniProtKB; GLY_000347)
  • HCV1a Protein Isoform Sequences (UniProtKB; GLY_000348)
  • HCV1b Protein Isoform Sequences (UniProtKB; GLY_000349)
  • HCV1a Protein Sequence Info (UniProtKB; GLY_000350)
  • HCV1b Protein Sequence Info (UniProtKB; GLY_000351)

Mus musculus (Mouse) Datasets

  • Mouse Protein Canonical sequences (UniProtKB; GLY_000012)
  • Mouse Protein Isoform sequences (UniProtKB; GLY_000053)
  • Mouse Protein Sequence Info (UniProtKB; GLY_000399)

Rattus norvegicus (Rat) datasets

  • Rat Protein Canonical sequences (UniProtKB; GLY_000240)
  • Rat Protein Isoform sequences (UniProtKB; GLY_000255)
  • Rat Protein Sequence Info (UniProtKB; GLY_000400)

SARS Coronavirus datasets

  • SARS-CoV1 Protein Isoform sequences (UniProtKB; GLY_000411)
  • SARS-CoV2 Protein Isoform sequences (UniProtKB; GLY_000412)
  • SARS-CoV1 Protein Canonical sequences (UniProtKB; GLY_000415)
  • SARS-CoV2 Protein Canonical sequences (UniProtKB; GLY_000416)
  • SARS-CoV1 Protein Sequence Info (UniProtKB; GLY_000400)
  • SARS-CoV2 Protein Sequence Info (UniProtKB; GLY_000441)

Glycan Sequence Datasets

  • Glycan Sequences Glycam IUPAC (Glycam; GLY_000287)
  • Glycan Sequences GlycoCT (GlyTouCan; GLY_000288)
  • Glycan Sequences InChI (GlyTouCan, PubChem; GLY_000289)
  • Glycan Sequences IUPAC Extended (GlyTouCan; GLY_000290)
  • Glycan Sequences SMILES Isomeric (GlyTouCan, PubChem; GLY_000291)
  • Glycan Sequences WURCS (GlyTouCan; GLY_000292)

Data harmonization

Data filtering