Extended GAL File
The extended GAL file is a file format that upgrades the GAL file with additional information required to exchange glycan array slide layout data and metadata. The extended GAL is compatible with the normal GAL file allowing the use in the scanner software as well as for the data exchange.
Motivation
While for gene arrays the 5 columns in the GAL file are sufficient to describe location and identify of the feature/spot it is not for glycan array. The molecule printed on the spot can not be simply described by an identifier or name alone since it consists of multiple parts. For example a linker and a glycan or a protein, up to multiple linkers and multiple glycans. It is also possible to have mixtures of glycoconjugates on a single spot. In addition to use the file for exchanging minimum reporting requirements it is also necessary to store depended metadata for each spot as well. To compensate the shortcomings of a GAL file an extended version was developed which is compatible with the original file and its use in an array experiment.
Format
The extended GAL file follows the format of the GAL file but adds additional columns and rows to header and record section.
Record section
For the ID column of the file the feature ID from the array data repository are used. If multiple features have been printed on the same spot these IDs are separated by "||". The repsotiory feature ID describes a molecule completely including all its parts (glycans, linker, protein etc.).
Beyond the common columns of the GAL file several additional columns have been added to the extented GAL file to allow capturing of glycan information and metadata in the file. The columns are added after the ID column of the GAL file in the described order.
Group (optional)
Identification number of features in this array. The average intensity value is calculated based on the intensity values of the feature, which have the same group number. For example, this is used for identifying features which printing solutions were prepared in the different dates, which were printed in the different dates (such as reprinting) etc.
Concentration
This is the concentration of each feature on the spot with the unit below:
- fmol/spot
- ul/spot
- mM
- uM
- ug/ml
- mg/ml
#N/A can be used to represent unknown or not available value.
Ratio (optional)
If more than one feature is printed on the spot, this column is used to specify the ratio between the features (e.g., 1:0.1 or 2:2:1). The order of the ration corresponds to the order of features in the ID column. If more than one feature is printed and the ration column is empty the ratio is considered as unknown. If only one feature is printed an empty column is equivalent to 100%.
Buffer
This is the buffer composition of the solution used for printing. If this is empty, the buffer composition is considered as unknown. Users are recommended to use #N/A to indicate that the buffer composition information is not available.
Volumne
Dispenses
Carrier
Method
Reference
Comment (optional)
Indentification number of features in this array. The intensity values of the features, which have the same group number, are used for calculating the average intensity value.