Extended GAL File: Difference between revisions

From GlyGen Wiki
Jump to navigation Jump to search
(Initial version of the page with page structure)
 
No edit summary
Line 1: Line 1:
The '''extended GAL file''' is a file format that upgrades the GAL (GenePix Array List) file with additional information required to exchange glycan array slide layout data and metadata. The extended GAL is compatible with the normal GAL file allowing to use to use in the scanner software as well as for the data exchange.
The '''extended GAL file''' is a file format that upgrades the [[GAL File|GAL file]] with additional information required to exchange glycan array slide layout data and metadata. The extended GAL is compatible with the normal GAL file allowing the use in the scanner software as well as for the data exchange.


== Motivation ==
== Motivation ==
https://support.moleculardevices.com/s/article/GenePix-File-Formats#gal
While for gene arrays the 5 columns in the [[GAL File|GAL file]] are sufficient to describe location and identify of the feature/spot it is not for glycan array. The molecule printed on the spot can not be simply described by an identifier or name alone since it consists of multiple parts. For example a linker and a glycan or a protein, up to multiple linkers and multiple glycans. It is also possible to have mixtures of glycoconjugates on a single spot. In addition to use the file for exchanging minimum reporting requirements it is also necessary to store depended metadata for each spot as well. To compensate the shortcomings of a [[GAL File|GAL file]] an extended version was developed which is compatible with the original file and its use in an array experiment.


== Compatibility to GAL file ==
== Format ==
The extended GAL file follows the format of the [[GAL File|GAL file]] but adds additional columns and rows to [[GAL File#Header|header]] and [[GAL File#Record section|record section]].


== Common GAL file parts ==
==Record section==
For the ID column of the file the feature ID from the array data repository are used. If multiple features have been printed on the same spot these IDs are separated by "|". The repsotiory feature ID describes a molecule completely including all its parts (glycans, linker, protein etc.).


=== Block ===
Beyond the common columns of the [[GAL File|GAL file]] several additional columns have been added to the extented GAL file to allow capturing of glycan information and metadata in the file. The columns are added after the ID column of the [[GAL File#Record section|GAL file]] in the described order.
The block number for the feature (required).


=== Row ===
===SequenceType ''(optional)''===
The column location within the block (required).
This column is used in combination with the next two columns to describe the sequence of the feature(s). In this column the format used in the next two columns is defined. Possible formats are ...
{{Expand section|small=no}}


=== Column ===
=== Sequence ''(optional)''===
The row location within the block (required).
This column contains the sequence of the feature in the format specified in the previous column. The sequence also contains the linker. If multiple features are printed on the same spot the sequences are separated by "|"


=== Name ===
=== SequenceNoLinker ''(optional)''===
(Optional)  Name to be displayed for the given feature (optional; limited to 40 characters in GenePix Pro 4.0 and earlier, no limit in 4.1).
This column contains the sequence of the glycan without the linker in the format specified in the SequenceType column. If multiple features are printed on the same spot the sequences are separated by "|".


=== ID ===
=== Linker ''(optional)''===
(Required)  Identifier for each feature (required; limited to 40 characters in GenePix Pro 4.0 and earlier, no limit in 4.1).
This column contains the linker name. If multiple features are printed on the same spot the sequences are separated by "|".


== Extension columns ==
=== Concentration ''(optional)''===
Beyond the common columns mentioned above several additional columns have been added to the extented GAL file to allow capturing of glycan information and metadata in the file.
This column contains the concentration of each feature on the spot. This is the concentration of the mixture on the spot and is provided in ...<Unit of measurement>. The value is not provided the concentration will be considered ''unknown''.
 
{{Expand section|small=no}}
=== SequenceType===
 
=== Sequence===
 
=== SequenceNoLinker ===
 
=== Linker ===
 
=== Concentration ===


=== Type ===
=== Type ===
{{Expand section|small=no}}


=== Mixture ===
=== Mixture ===
{{Expand section|small=no}}


=== Buffer ===
=== Buffer ===


=== Ration ===
=== Ratio ''(optional)'' ===
1:0.1
If more than one feature is printed on the spot this column is used to specify the ration between the features in percent (e.g., 90:10 or 40:40:20). The sum has to be 100. The order of the ration corresponds to the order of features in the ID column or the sequences in the sequence column. If more than one feature is printed and the ration column is empty the ratio is considered as ''unknown''. If only one feature is printed an empty column is equivalent to 100%
 
=== PrintingFlags ===
=== PrintingFlags ===
{{Expand section|small=no}}


=== Group ===
=== Group ===
{{Expand section|small=no}}


=== Carrier ===
=== Carrier ===
{{Expand section|small=no}}


=== Method ===
=== Method ===
{{Expand section|small=no}}


=== Reference ===
=== Reference ===
{{Expand section|small=no}}


=== Volumne ===
=== Volumne ===
{{Expand section|small=no}}


=== Dispenses ===
=== Dispenses ===
{{Expand section|small=no}}

Revision as of 03:57, 29 November 2021

The extended GAL file is a file format that upgrades the GAL file with additional information required to exchange glycan array slide layout data and metadata. The extended GAL is compatible with the normal GAL file allowing the use in the scanner software as well as for the data exchange.

Motivation

While for gene arrays the 5 columns in the GAL file are sufficient to describe location and identify of the feature/spot it is not for glycan array. The molecule printed on the spot can not be simply described by an identifier or name alone since it consists of multiple parts. For example a linker and a glycan or a protein, up to multiple linkers and multiple glycans. It is also possible to have mixtures of glycoconjugates on a single spot. In addition to use the file for exchanging minimum reporting requirements it is also necessary to store depended metadata for each spot as well. To compensate the shortcomings of a GAL file an extended version was developed which is compatible with the original file and its use in an array experiment.

Format

The extended GAL file follows the format of the GAL file but adds additional columns and rows to header and record section.

Record section

For the ID column of the file the feature ID from the array data repository are used. If multiple features have been printed on the same spot these IDs are separated by "|". The repsotiory feature ID describes a molecule completely including all its parts (glycans, linker, protein etc.).

Beyond the common columns of the GAL file several additional columns have been added to the extented GAL file to allow capturing of glycan information and metadata in the file. The columns are added after the ID column of the GAL file in the described order.

SequenceType (optional)

This column is used in combination with the next two columns to describe the sequence of the feature(s). In this column the format used in the next two columns is defined. Possible formats are ...

Sequence (optional)

This column contains the sequence of the feature in the format specified in the previous column. The sequence also contains the linker. If multiple features are printed on the same spot the sequences are separated by "|"

SequenceNoLinker (optional)

This column contains the sequence of the glycan without the linker in the format specified in the SequenceType column. If multiple features are printed on the same spot the sequences are separated by "|".

Linker (optional)

This column contains the linker name. If multiple features are printed on the same spot the sequences are separated by "|".

Concentration (optional)

This column contains the concentration of each feature on the spot. This is the concentration of the mixture on the spot and is provided in ...<Unit of measurement>. The value is not provided the concentration will be considered unknown.

Type

Mixture

Buffer

Ratio (optional)

If more than one feature is printed on the spot this column is used to specify the ration between the features in percent (e.g., 90:10 or 40:40:20). The sum has to be 100. The order of the ration corresponds to the order of features in the ID column or the sequences in the sequence column. If more than one feature is printed and the ration column is empty the ratio is considered as unknown. If only one feature is printed an empty column is equivalent to 100%

PrintingFlags

Group

Carrier

Method

Reference

Volumne

Dispenses