Protein details/General

From GlyGen Wiki
Revision as of 20:03, 19 May 2022 by Karinamartinez (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The General section of the Protein details page in GlyGen provides identifying information about the protein from reference databases and sources.

General

Screenshot of the General section on the Protein details page in GlyGen.

This section provides the following information:

  • Gene Name - Name of the gene that encodes the protein. Eg. EGFR
  • Gene Location - A fixed position on a chromosome where a gene is located. The gene location contains chromosome number along with gene start and end position on the chromosome. The gene location information is retrieved from ENSEMBL database. Eg Chromosome: 7 (55,019,021 - 55,211,628)
  • UniProtKB ID - The entry name or mnemonic identifier assigned to the protein accession that consists of up to 11 uppercase alphanumeric characters for UniProtKB/Swiss-Prot entry and up to 16 uppercase alphanumeric characters for UniProtKB/TrEMBL entry with a defined naming convention. Eg. EGFR_HUMAN
  • UniProtKB Accession - Accession assigned to the protein in the UniProtKB database. Eg. P00533-1
  • Protein Length - Proteins are made up of a combination of 20 amino acids. The protein length indicates the number of amino acids constituting a given protein sequence. The protein length information is retrieved from UniProtKB database. Eg. 793
  • UniProtKB Entry Name - The UniProtKB entry name is the recommended protein name as per UniProtKB database. Eg. Epidermal growth factor receptor
  • Chemical Mass - Molecular weight of a given protein, provided in Daltons (unified atomic mass unit). Molecular weight information is retrieved from UniProtKB database. Eg. 134,277 Da
  • RefSeq Accession - Accession assigned by NCBI Reference sequence database, formatted with two characters followed by an underscore. RefSeq accessions mapped to UniProtKB accessions are retrieved from UniProtKB database. Eg. NP_001333828.1
  • RefSeq Name - Name assigned by NCBI Reference Sequence Database. Eg. epidermal growth factor receptor isoform g precursor
  • Organism - Name of the organism that is the source of the protein sequence. The NCBI Taxonomy ID for the organism is also provided in the Organism field. Eg. Homo sapiens (Human) [9606]

Source of Information

The General data is collected and integrated from UniProtKB, NCBI RefSeq, and ENSEMBL databases.