GlyGen Monthly Talks

From GlyGen Wiki
Revision as of 21:49, 15 October 2020 by Xiying (talk | contribs) (Created page with "On 1st Tuesday of every Month at 11 AM EST, GlyGen team will meet and make monthly progress report. Invited speakers will give a talk for possible collaboration with GlyGen. N...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

On 1st Tuesday of every Month at 11 AM EST, GlyGen team will meet and make monthly progress report. Invited speakers will give a talk for possible collaboration with GlyGen. New technological breakthroughs, new discoveries, possible improvement and proposals that may promote the development of Glygen will be discussed and shared.

October 10/06/2020

Attendees: Jeet, Nathan, Nhat Duong, Rahi, Raja, Akul, Frederique, Lihua, Preethi, Mike, Rene, Sujeet, Rupali, Ten, Vijay, Will, Tatiana,

Agenda:

  • Monthly Progress Report
  • Talk by - Nhat Duong and Nathan Edwards - Title: “Glycan Structure Extraction from Scientific Literature”

https://edwardslab.bmcb.georgetown.edu/glyimgdemo/

Abstract:

The extraction and representation of accepted glycobiology knowledge is challenging due to the widespread use of images to represent glycans in published literature. In the absence of an explicit computer-readable glycan sequence or accession number, human curation is required to extract published glycosylation knowledge for our glycomics data-resources. Glycan structure images in published literature are highly stylized but poorly standardized, despite efforts by the Standardized Nomenclature for Glycans (SNFG) group to make them more consistent. Automated extraction of glycan structures from figures in published manuscripts will ease the curation effort.

We use a combination of open-source Python modules for parsing and manipulating PDF files; neural network-based object classification to locate glycans in manuscripts’ figures; and OpenCV-based image analysis to extract glycan structure details. Figures from GlyGen and UniCarbKB manuscript annotations were curated to construct a training set of figures with glycan bounding-box locations.This object classification approach successfully identifies glycan structures in manuscript figures for subsequent detailed glycan image analysis and highlights the glycan structures in the manuscript.

Glycan structure details are extracted using the open-source OpenCV library. Using color masks, monosaccharides' distinctive colors and shapes are recognized, establishing the monosaccharide composition of the glycan. Monosaccharide linkage is then determined, where possible. The glycan structure’s orientation and the reducing-end monosaccharide is then established by identifying common glycan cores. Together, this information is sufficient to construct a GlycoCT format sequence for the

structure’s topology and to then search for a matching glycan in GlyTouCan. Once matched, the GlyTouCan accession is used to embed a targeted, clickable link to the GNOme Structure Browser on top of the glycan structure in the original PDF file.

This infrastructure provides a surprisingly effective tool for extracting glycan structures from the figures of published glycobiology manuscripts. The method successfully identifies glycans' positions on all pages, extracts their topological structure in GlycoCT format, and annotates them in-place with deep-links to  GNOme so the curator can verify or refine the specific details of the structure. This prototype demonstrates the potential utility of automated extraction of glycan structures from published manuscript figures, significantly lowering the curation burden for the representation of glycosylation knowledge in

glycomics resources.

Minutes:

September 09/01/2020

Attendees:

Agenda:

  • Monthly Progress Report
  • Talk by - Gareth Owen - Title "Integration of GlyGen/GlyTouCan glycans in ChEBI"

Abstract - The importance of glycans and glycoproteins in molecular biology has grown rapidly in recent years. In this talk, the development of procedures to enhance the flexibility of glycan registration in ChEBI through a collaboration between ChEBI and GlyGen is described.

Minutes: