Glycomics paper curation: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
m (link to composition browser changed)
 
(8 intermediate revisions by the same user not shown)
Line 4: Line 4:


* Identify a paper of interest, and look for glycan structures being reported in both primary and supplemental data sections.  
* Identify a paper of interest, and look for glycan structures being reported in both primary and supplemental data sections.  
* Draw the glycans in Grits Toolbox,  
* Draw the glycans represented by cartoons in Grits Toolbox,  
** Pay close attention to drawing them exactly as they are presented in the paper with no assumptions from the curator including
** Pay close attention to drawing them exactly as they are presented in the paper with no assumptions from the curator including
*** Reducing end type (reduced, free etc)  
*** Reducing end type (reduced, free etc)  
Line 10: Line 10:
*** Topology (which arm monosaccharides are placed on)
*** Topology (which arm monosaccharides are placed on)
*** Linkages
*** Linkages
*** only draw cartoons if they are drawn in the paper, draw compositions as compositions, may have multiple representations for ex a spectra with cartoons and a table with compositions, make both
* Export the glycan drawings to gws format and put in a folder in SharePoint  
* Export the glycan drawings to gws format and put in a folder in SharePoint  
* Request the gws file to be processed (by Sena) into an excel sheet and glytoucan id's added. Structures without glytoucan ids are submitted to glytoucan and registered. The finished excel is placed in Sharepoint and notification is given.
* Request the gws file to be processed (by Sena) into an excel sheet and glytoucan id's added. Structures without glytoucan ids are submitted to glytoucan and registered. The finished excel is placed in Sharepoint and notification is given.
Line 15: Line 16:
* Check that the glytoucan structure matches the structure you submitted.  
* Check that the glytoucan structure matches the structure you submitted.  
* Add meta info using the specified format (see curation table section) and fill in those columns.  
* Add meta info using the specified format (see curation table section) and fill in those columns.  
* Make a copy tab and delete columns not specified in the Curation table (ex file name, row number, both cartoons, status, error)
* Make a copy as a new tab and delete columns not specified in the Curation table (ex file name, row number, both cartoons, status, error). The column format must be the same, same order, same column name.
* Save a tab as Final-GlyGen to designate which tab to use in downstream steps
* Save a tab as Final-GlyGen to designate which tab to be used in downstream processing
* For structures that are compositions only you must Look up GlyToucan ID's manually and create the same table as for cartoons.
** by going to  https://gnome.glyomics.org/CompositionBrowser.html and selecting the appropriate # of each residue. Select double ?? forms (Top ? means ring size, bottom ? means anomere)
** or searching by mass in glygen


== Curation table ==
== Curation table ==




Table structure for glycomics information  
 
Table structure for glycomics information


The file will be a CSV file using “,” as cell delimiter and all cells will be quoted.   
The file will be a CSV file using “,” as cell delimiter and all cells will be quoted.   


List of columns in the curation file that needs to be filled for glycomics information. Bold indicates mandatory information:
List of columns in the curation file that needs to be filled for glycomics information.  
 
Bold indicates mandatory information  
 
Keep in this order and the column names the same


Extra columns can be added to the end if needed
{| class="wikitable"
{| class="wikitable"
|GlyTouCan ID  
 
|}
{| class="wikitable"
|'''GlyTouCan ID'''
|G17689DH  
|G17689DH  
|From Senas spreadsheet  
|From Senas spreadsheet, or manually look up
|-
|-
|Paper  
|'''Evidence'''
|PMID:25753706  
|PMID:25753706  


Line 38: Line 51:
|PMID or DOI  
|PMID or DOI  
|-
|-
|Species  
|'''Species'''  
|9606  
|9606  
|From NCBI Taxonomy browser  
|From NCBI Taxonomy browser  
|-
|-
|Strain (fly, yeast, mouse)
|Strain  
|Oregon-R  
|Oregon-R  
|Flybase for fly  
|(Model organisms fly, yeast, mouse)  


SGD for Yeast  
Add species number to Uniprotlink and get text name from there (in column on rt)


Mouse  
<nowiki>https://www.uniprot.org/taxonomy/</nowiki><taxID>


Or text from paper if not in a dictionary
Look for the strain on the right side. Find the strain you are looking for. Sometimes synonymes are given in parenthesis. Use the term in the beginning of the line.
 
 
 
If no match is found => Audio meeting
|-
|-
|Tissue  
|Tissue  
Line 61: Line 78:
|From Cellosaurus  
|From Cellosaurus  
|-
|-
|Disease  
|*Disease  
|DOID:3571  
|DOID:3571  
|From Human Disease Ontology  
|From Human Disease Ontology  
|-
|*Glycan dictionary term ID
|GSD000011
|
|-
|-
|has_abundance  
|has_abundance  
Line 77: Line 98:
|Don't use for now until talk with Karina  
|Don't use for now until talk with Karina  
|-
|-
|Functional annotation/Keyword  
|*Functional annotation/Keyword  
|<nowiki><term1>|<term2> </nowiki>
|<nowiki><term1>|<term2> </nowiki>
|We (Mike) will provide a dictionary (~15 terms) that Mindy will use.  
|Mike will provide a dictionary (~15 terms) that Mindy will use. See list end of sheet
|-
|*Experimental technique
 
|<nowiki>LC-MS|MS profile </nowiki>
|Free text, we create dictionary to avoid things like LC/MS LC-MS,  
 
* MS
* MS/MS
* LC-MS/MS
* LC-MS
* CE-MS
* CE-MS/MS
* CE
* HPLC
* GC
* GC-MS
 
|-
|-
|Glycan dictionary term ID  
|*Glycan dictionary term ID  
|GSD000011  
|GSD000011  
|From the [[Glycan structure dictionary]].  
|From the [[Glycan structure dictionary]].  
|-
|-
|Contributor
|Variant (Fly, yeast, mouse)
|<nowiki>createdBy:Mindy Porterfield(mindy@something.de, CCRC)|createdBy:Name(email,institution) </nowiki>
|Wild Type, Tollo395,  
|From ticket 42
|Gene name and position (if known) as text  
 
|-
|-
|Experimental technique
|Organismal/cellular Phenotype  
|<nowiki>LC-MS|MS profile </nowiki>
|HP:0012373  
|Free text, we create dictionary to avoid things like LC/MS LC-MS
|HPO (human)
 
<nowiki>https://hpo.jax.org/app/</nowiki>  
 


If no match is found => Audio meeting


|-
|-
|Variant (Fly, yeast, mouse)  
|*Molecular Phenotype
|Wild type, Tollo,  
|Flybase for fly
|Gene name
 
Discuss with Jeet when find example
|-
|*Contributor
|<nowiki>curatedBy:Mindy Porterfield(mindy@something.de, CCRC)|curatedBy:Name(email,institution) </nowiki>
|<ins>'''Example'''</ins>
 
'''createdWith''':GRITS Toolbox (<nowiki>http://www.grits-toolbox.org/</nowiki>) |'''createdWith''':Software/Tool/Code Name (URL or Sourecode URL)
 
<ins>'''Combined Example'''</ins>
 
'''curatedBy''':Mindy Porterfield (mindyp@ccrc.uga.edu, CCRC)|'''curatedBy''':Anh Nyugen (anh@gwu.edu, GW)|'''createdWith''':GRITS Toolbox (<nowiki>http://www.grits-toolbox.org/</nowiki>)


SGD for Yeast
The field can have multiple contribution types separated by <code>|</code>


There is no need to add a corresponding author or other author name list if curating the paper by yourself.


|-
If you are curating the paper use <code>curatedBy</code>, if you are using a software/tool/code to add information to the dataset use <code>createdWith</code>. For final dataset to be integrated GW will use <code>createdBy</code> . For final dataset to be integrated GW will use <code>createdBy</code>. If the initial dataset is shared to you by a researcher use <code>contributedBy</code> or <code>authoredBy</code> whichever is applicable as per this. There is no need to add corresponding author or other author name list if curating the paper by yourself.
|Organismal/cellular Phenotype  
|Eye color, blood type,  
|Mondo


HPO
<ins>'''Example'''</ins>


Fly anatomy FBBD
'''curatedBy''':Mindy Porterfiel (mindy@something.de,CCRC)|curatedBy:Name (email,institution).


'''createdWith''':GRITS Toolbox (<nowiki>http://www.grits-toolbox.org/)|createdWith:Software/Tool/Code</nowiki> Name (URL or Sourecode URL)


Text in paper, observable features, including disease phenotypes such as diabetes, autism spectrum disorder and cancer, or traits such as height, hair color and blood type.
'''contributedBy''':Bob Haltiwanger (bh@ccrc.edu,CCRC)|contributedBy:Name (email,institution)
|-
|Molecular Phenotype
|APOE,  
|direct effect of a variant at the molecular level
|}


'''authoredBy''': Daniel Williamson (daniel.williamson25@uga.edu,CCRC)|authoredBy:Name (email,instituition)


The following columns can have multiple entries per line/cell:  
'''createdBy''':Jeet Vora (jeetvora@gwu.edu,GW)|createdBy:Name (email,instituition)
|}


* Disease  
* Functional annotation
* Keywords


* Glycan dictionary term ID
* Contributor
* Experimental technique


An Asterisk * notes column can have multiple entries per line/cell: The following format will be used for these cells: <term1>|<term2>


The following format will be used for these cells: <term1>|<term2>




Line 184: Line 229:


This list will be extended if new function annotation terms are detected in the papers.  
This list will be extended if new function annotation terms are detected in the papers.  
'''Contributor''' 
{| class="wikitable"
|provenance property
|Definition
|-
|createdBy
|The creator agent participated in some activity that generated the entity. This could be someone who created the dataset or entry.
|-
|contributedBy
|The contributor participated in some activity that generated the entity. This could be someone who provided the data and may not be the one who generated the data. The person could have have made modifications to the data.
|-
|authoredBy
|The author participated in some activity that generated the entity. This could be someone who authored and generated the data.
|-
|curatedBy
|The curator participated in some activity that generated the entity. This is someone who curates data from the publication or raw data.
|-
|importedFrom
|The original resource of imported information. Import means that the content has been preserved, but transcribed somehow, for instance to fit a different representation model by converting formats or structure. The imported resource does not have to be complete but should be consistent with the knowledge conveyed by the original resource. This is the resource from where the data or dataset was downloaded and processed with modifying/transforming the original data or dataset.
|-
|retrievedFrom
|The resource from where the URI or dataset has been retrieved from. Retrieval indicates that this resource has the same representation as the original resource. This is the resource from where the data or dataset was downloaded and processed without modifying the original data. This is the resource from where the data or dataset was downloaded and processed without modifying/ transforming the original data or dataset.
|-
|createdWith
|The software or tool from the which the content was generated. The software is the author itself. AI machine learning tool, software suite, etc
|}


== Curation rules ==
== Curation rules ==
only draw cartoons if they are drawn in the paper, draw compositions as compositions, may have multiple representations for ex a spectra with cartoons and a table with compositions, make both

Latest revision as of 18:40, 16 February 2024

This article describes the curation of glycomics papers as part of the GlyGen project.

Curation workflow

  • Identify a paper of interest, and look for glycan structures being reported in both primary and supplemental data sections.
  • Draw the glycans represented by cartoons in Grits Toolbox,
    • Pay close attention to drawing them exactly as they are presented in the paper with no assumptions from the curator including
      • Reducing end type (reduced, free etc)
      • Derivitization status (permethylated C12, native, etc)
      • Topology (which arm monosaccharides are placed on)
      • Linkages
      • only draw cartoons if they are drawn in the paper, draw compositions as compositions, may have multiple representations for ex a spectra with cartoons and a table with compositions, make both
  • Export the glycan drawings to gws format and put in a folder in SharePoint
  • Request the gws file to be processed (by Sena) into an excel sheet and glytoucan id's added. Structures without glytoucan ids are submitted to glytoucan and registered. The finished excel is placed in Sharepoint and notification is given.
  • Once this is complete, retrieve the excel file from SharePoint
  • Check that the glytoucan structure matches the structure you submitted.
  • Add meta info using the specified format (see curation table section) and fill in those columns.
  • Make a copy as a new tab and delete columns not specified in the Curation table (ex file name, row number, both cartoons, status, error). The column format must be the same, same order, same column name.
  • Save a tab as Final-GlyGen to designate which tab to be used in downstream processing
  • For structures that are compositions only you must Look up GlyToucan ID's manually and create the same table as for cartoons.

Curation table

Table structure for glycomics information

The file will be a CSV file using “,” as cell delimiter and all cells will be quoted.

List of columns in the curation file that needs to be filled for glycomics information.  

Bold indicates mandatory information

Keep in this order and the column names the same

Extra columns can be added to the end if needed

GlyTouCan ID G17689DH From Senas spreadsheet, or manually look up
Evidence PMID:25753706

DOI:10.1007/978-1-4939-2343-4_8  

PMID or DOI
Species   9606 From NCBI Taxonomy browser
Strain   Oregon-R (Model organisms fly, yeast, mouse)  

Add species number to Uniprotlink and get text name from there (in column on rt)

https://www.uniprot.org/taxonomy/<taxID>

Look for the strain on the right side. Find the strain you are looking for. Sometimes synonymes are given in parenthesis. Use the term in the beginning of the line.

 

If no match is found => Audio meeting

Tissue   UBERON:0002107 From Uberon, if it can not be found we will discuss.
Cell line ID Cellosaurus:CVCL_A4VI From Cellosaurus
*Disease   DOID:3571 From Human Disease Ontology
*Glycan dictionary term ID GSD000011
has_abundance yes

no

Are there Numbers associated with the amount present in a sample
has_expression yes

no

Don't use for now until talk with Karina
*Functional annotation/Keyword <term1>|<term2> Mike will provide a dictionary (~15 terms) that Mindy will use. See list end of sheet
*Experimental technique LC-MS|MS profile Free text, we create dictionary to avoid things like LC/MS LC-MS,  
  • MS
  • MS/MS
  • LC-MS/MS
  • LC-MS
  • CE-MS
  • CE-MS/MS
  • CE
  • HPLC
  • GC
  • GC-MS
*Glycan dictionary term ID GSD000011 From the Glycan structure dictionary.
Variant (Fly, yeast, mouse) Wild Type, Tollo395,   Gene name and position (if known) as text  
Organismal/cellular Phenotype   HP:0012373   HPO (human)

https://hpo.jax.org/app/


If no match is found => Audio meeting

*Molecular Phenotype   Gene name

Discuss with Jeet when find example

*Contributor curatedBy:Mindy Porterfield(mindy@something.de, CCRC)|curatedBy:Name(email,institution) Example

createdWith:GRITS Toolbox (http://www.grits-toolbox.org/) |createdWith:Software/Tool/Code Name (URL or Sourecode URL)

Combined Example

curatedBy:Mindy Porterfield (mindyp@ccrc.uga.edu, CCRC)|curatedBy:Anh Nyugen (anh@gwu.edu, GW)|createdWith:GRITS Toolbox (http://www.grits-toolbox.org/)

The field can have multiple contribution types separated by |

There is no need to add a corresponding author or other author name list if curating the paper by yourself.

If you are curating the paper use curatedBy, if you are using a software/tool/code to add information to the dataset use createdWith. For final dataset to be integrated GW will use createdBy . For final dataset to be integrated GW will use createdBy. If the initial dataset is shared to you by a researcher use contributedBy or authoredBy whichever is applicable as per this. There is no need to add corresponding author or other author name list if curating the paper by yourself.

Example

curatedBy:Mindy Porterfiel (mindy@something.de,CCRC)|curatedBy:Name (email,institution).

createdWith:GRITS Toolbox (http://www.grits-toolbox.org/)|createdWith:Software/Tool/Code Name (URL or Sourecode URL)

contributedBy:Bob Haltiwanger (bh@ccrc.edu,CCRC)|contributedBy:Name (email,institution)

authoredBy: Daniel Williamson (daniel.williamson25@uga.edu,CCRC)|authoredBy:Name (email,instituition)

createdBy:Jeet Vora (jeetvora@gwu.edu,GW)|createdBy:Name (email,instituition)


An Asterisk * notes column can have multiple entries per line/cell: The following format will be used for these cells: <term1>|<term2>


For experimental techniques the following (non-comprehensive) dictionary will be used:

  • MS
  • MS/MS
  • LC-MS/MS
  • LC-MS
  • CE-MS
  • CE-MS/MS
  • CE
  • HPLC
  • GC
  • GC-MS

This list will be extended if new experimental techniques are detected in the papers.


For Functional annotation use the following (non-comprehensive) dictionary:

  • adhesion
  • homing
  • inflammation
  • protein targeting
  • protein secretion
  • protein stability
  • protein folding
  • ER stress
  • protein degradation
  • circulating half-life
  • clearance
  • internalization
  • metastasis
  • shielding
  • recognition
  • toxin receptor
  • viral receptor
  • microbial receptor
  • receptor signaling
  • sperm maturation
  • Added terms (Mike)
  • differentiation
  • biomarker

This list will be extended if new function annotation terms are detected in the papers.

Contributor

provenance property Definition
createdBy The creator agent participated in some activity that generated the entity. This could be someone who created the dataset or entry.
contributedBy The contributor participated in some activity that generated the entity. This could be someone who provided the data and may not be the one who generated the data. The person could have have made modifications to the data.
authoredBy The author participated in some activity that generated the entity. This could be someone who authored and generated the data.
curatedBy The curator participated in some activity that generated the entity. This is someone who curates data from the publication or raw data.
importedFrom The original resource of imported information. Import means that the content has been preserved, but transcribed somehow, for instance to fit a different representation model by converting formats or structure. The imported resource does not have to be complete but should be consistent with the knowledge conveyed by the original resource. This is the resource from where the data or dataset was downloaded and processed with modifying/transforming the original data or dataset.
retrievedFrom The resource from where the URI or dataset has been retrieved from. Retrieval indicates that this resource has the same representation as the original resource. This is the resource from where the data or dataset was downloaded and processed without modifying the original data. This is the resource from where the data or dataset was downloaded and processed without modifying/ transforming the original data or dataset.
createdWith The software or tool from the which the content was generated. The software is the author itself. AI machine learning tool, software suite, etc

Curation rules

only draw cartoons if they are drawn in the paper, draw compositions as compositions, may have multiple representations for ex a spectra with cartoons and a table with compositions, make both