Glycomics paper curation

From GlyGen Wiki
Revision as of 17:08, 22 August 2023 by MindyPorterfield (talk | contribs) (new version of table)
Jump to navigation Jump to search

This article describes the curation of glycomics papers as part of the GlyGen project.

Curation workflow

  • Identify a paper of interest, and look for glycan structures being reported in both primary and supplemental data sections.
  • Draw the glycans in Grits Toolbox,
    • Pay close attention to drawing them exactly as they are presented in the paper with no assumptions from the curator including
      • Reducing end type (reduced, free etc)
      • Derivitization status (permethylated C12, native, etc)
      • Topology (which arm monosaccharides are placed on)
      • Linkages
  • Export the glycan drawings to gws format and put in a folder in SharePoint
  • Request the gws file to be processed (by Sena) into an excel sheet and glytoucan id's added. Structures without glytoucan ids are submitted to glytoucan and registered. The finished excel is placed in Sharepoint and notification is given.
  • Once this is complete, retrieve the excel file from SharePoint
  • Check that the glytoucan structure matches the structure you submitted.
  • Add meta info using the specified format (see curation table section) and fill in those columns.
  • Make a copy tab and delete columns not specified in the Curation table (ex file name, row number, both cartoons, status, error)
  • Save a tab as Final-GlyGen to designate which tab to use in downstream steps

Curation table

Table structure for glycomics information

The file will be a CSV file using “,” as cell delimiter and all cells will be quoted.

List of columns in the curation file that needs to be filled for glycomics information.  

Bold Red indicates mandatory information

Keep in this order

Extra columns can be added to the end if needed

GlyTouCan ID G17689DH From Senas spreadsheet
Paper Evidence PMID:25753706

DOI:10.1007/978-1-4939-2343-4_8  

PMID or DOI
Species   9606 From NCBI Taxonomy browser
Strain   Oregon-R (Model organisms fly, yeast, mouse)  

Add species number to Uniprotlink and get text name from there (in column on rt)

https://www.uniprot.org/taxonomy/<taxID>

Look for the strain on the right side. Find the strain you are looking for. Sometimes synonymes are given in parenthesis. Use the term in the beginning of the line.

 

If no match is found => Audio meeting

Tissue   UBERON:0002107 From Uberon, if it can not be found we will discuss.
Cell line ID Cellosaurus:CVCL_A4VI From Cellosaurus
Disease   DOID:3571 From Human Disease Ontology
Glycan dictionary term ID GSD000011
has_abundance yes

no

Are there Numbers associated with the amount present in a sample
has_expression yes

no

Don't use for now until talk with Karina
Functional annotation/Keyword <term1>|<term2> Mike will provide a dictionary (~15 terms) that Mindy will use. See list end of sheet
Experimental technique LC-MS|MS profile Free text, we create dictionary to avoid things like LC/MS LC-MS,  
  • MS
  • MS/MS
  • LC-MS/MS
  • LC-MS
  • CE-MS
  • CE-MS/MS
  • CE
  • HPLC
  • GC
  • GC-MS
Glycan dictionary term ID GSD000011 From the Glycan structure dictionary.
Variant (Fly, yeast, mouse) Wild Type, Tollo395,   Gene name and position (if known) as text  
Organismal/cellular Phenotype   HP:0012373   HPO (human)

https://hpo.jax.org/app/


If no match is found => Audio meeting

Molecular Phenotype   Gene name

Discuss with Jeet when find example

Contributor createdBy:Mindy Porterfield(mindy@something.de, CCRC)|createdBy:Name(email,institution) From ticket 42


The following columns can have multiple entries per line/cell:

  • Disease  
  • Functional annotation
  • Keywords
  • Glycan dictionary term ID
  • Contributor
  • Experimental technique


The following format will be used for these cells: <term1>|<term2>


For experimental techniques the following (non-comprehensive) dictionary will be used:

  • MS
  • MS/MS
  • LC-MS/MS
  • LC-MS
  • CE-MS
  • CE-MS/MS
  • CE
  • HPLC
  • GC
  • GC-MS

This list will be extended if new experimental techniques are detected in the papers.


For Functional annotation use the following (non-comprehensive) dictionary:

  • adhesion
  • homing
  • inflammation
  • protein targeting
  • protein secretion
  • protein stability
  • protein folding
  • ER stress
  • protein degradation
  • circulating half-life
  • clearance
  • internalization
  • metastasis
  • shielding
  • recognition
  • toxin receptor
  • viral receptor
  • microbial receptor
  • receptor signaling
  • sperm maturation
  • Added terms (Mike)
  • differentiation
  • biomarker

This list will be extended if new function annotation terms are detected in the papers.

Curation rules