Extended GAL File

From GlyGen Wiki
Jump to navigation Jump to search

The extended GAL file is a file format that upgrades the GAL file with additional information required to exchange glycan array slide layout data and metadata. The extended GAL is compatible with the normal GAL file allowing the use in the scanner software as well as for the data exchange.

Motivation

While for gene arrays the 5 columns in the GAL file are sufficient to describe location and identify of the feature/spot it is not for glycan array. The molecule printed on the spot can not be simply described by an identifier or name alone since it consists of multiple parts. For example a linker and a glycan or a protein, up to multiple linkers and multiple glycans. It is also possible to have mixtures of glycoconjugates on a single spot. In addition to use the file for exchanging minimum reporting requirements it is also necessary to store depended metadata for each spot as well. To compensate the shortcomings of a GAL file an extended version was developed which is compatible with the original file and its use in an array experiment.

Format

The extended GAL file follows the format of the GAL file but adds additional columns and rows to header and record section.

Header

In addition to the traditional header rows defined in the GAL file, the extended GAL file enables describing default parameters such as default concentration, buffer, volume, dispenses, carrier, method and reference. To define these simply add the following lines to the bottom of your Block definitions in the header.

DefaultConcentration= 100 uM
DefaultBuffer= 100 mM Phosphate Buffer pH 8.5
DefaultVolume= 330 pL
DefaultDispenses= 1
DefaultCarrier=
DefaultMethod=
DefaultReference= 

Note, that since additional rows are added to the header, and additional columns (as described below) are added to the record section, an adjustment needs to be made in the ATF definition in the second line of the file as explained in the header of the GAL file.

Record section

ID (required)

For the ID column of the file the feature ID from the array data repository are used. The repository feature ID describes a molecule completely including all its parts (glycans, linker, protein etc.). The feature with this ID should already be created in the array repository. If multiple features have been printed on the same spot these IDs are separated by "||". Note the ID or RepoID should also be present in your final results file.

Additional Columns

Beyond the common columns of the GAL file several additional columns have been added to the extended GAL file to allow capturing of glycan information and metadata in the file. The columns are added after the ID column of the GAL file in the described order.

RepoID (optional)

If in case the ID column does not match the feature ID in the array repository, you can use this column to enter the repository relevant ID as described above. If this column is filled, it will take precedence over the ID column to match the corresponding feature to the spot. Note the ID or RepoID should also be present in your final results file.

Group (optional)

Identification number of features in this array. The average intensity value is calculated based on the intensity values of the feature, which have the same group number. For example, this is used for identifying features with the same ID which printing solutions were prepared in the different dates, which were printed in the different dates (such as reprinting) etc.

Concentration

This is the concentration of each feature on the spot. If the default concentration value is defined in the header section, it is used for the empty concentration column. Users are required to use #N/A to represent "not available". If there are multiple features on the spot, the concentrations for each feature are separated by "||" similar to the ID column.

Available units of the concentration are:

  • fmol/spot
  • ul/spot
  • mM
  • uM
  • ug/ml
  • mg/ml

Ratio (optional)

If more than one feature is printed on the spot, this column is used to specify the ratio between the features (e.g., 1:0.1 or 2:2:1). The order of the ratio corresponds to the order of features in the ID column (or RepoID column). If more than one feature is printed and the ration column is empty the ratio is considered as unknown. If only one feature is printed an empty column is equivalent to 100%.

Buffer

This is the buffer composition of the solution used for printing. If the default buffer composition information is defined in the header section, it is used for the empty buffer column. Users are required to use #N/A to indicate that the buffer composition information is “not available”.

Volume (optional)

The volume of solution deposited per spot per drop (if known). If the default volume is defined in the header section, it is used for the empty volume column. If the default value is not defined, the empty volume column is considered as "unknown". #N/A can be used for representing “not available” volume. Available units of the volume are uL, nL and pL.

Dispenses

The number of drops or times pin contacts made per spot (drops / contacts / strikes), for example "3 drops" or "1 contacts". If the default dispense value is defined in the header section, it is used for the empty dispense column. #N/A is required to indicate that the dispense information is "not available".

Carrier (optional)

Carrier reagent name used in the formulation. If the default carrier is defined in the header section, the empty carrier column is considered as the default. If the default value is not defined, the empty carrier column is considered as "unknown". #N/A can be used for indicating that the carrier information is “not available”.

Method (optional)

Description of the method to prepare the formulation. If the default method is defined in the header section, it is used for the empty method column. If the default method is not defined, the empty column is considered as "unknown". #N/A can be used for indicating that the carrier information is “not available”.

Reference (optional)

A reference information that describes the formulation. The reference type (PMID, DOI and URL) is required to be included to indicate the reference format, for example “PMID: 34299556” and “DOI: 10.3390/molecules26144281”. If the default reference is defined in the header section, it is used for the empty reference column. If the default value is not defined, the empty column is considered as "unknown". #N/A can be used for indicating that the reference information is “not available”.

Comment (optional)

General comments regarding the spot.

PrintingFlags (optional)