GAL File

From GlyGen Wiki
Jump to navigation Jump to search

The GenePix Array List format, or short GAL file, describes the size and position of blocks on a microarray slide. For each spot within a block the coordinates (row, column) as well as information of the feature on the spot (name and identifier) are stores.

Format

The GAL file follows the Axon Text File definition, a special variation of a CSV file. The file consists of a header and the record section which contains one record per line. Records are organized in columns that are separated by a field separator. These separators can be a tab or a comma character where space characters around these separators are ignored. Text string should be surrounded by quotation marks to avoid interpretation conflicts that may occur if tabs, commas or quotes are used as part of the text.

Example

ATF	1.0
10	5
"Type=GenePix ArrayList V1.0"	
"BlockCount=8"
"URL=https://www.ncbi.nlm.nih.gov/nuccore/[ID]"
"Block1=  500,  500,  100,   24,  180,   21,  180"		
"Block2= 4996,  500,  100,   24,  180,   21,  180"		
"Block3= 9492,  500,  100,   24,  180,   21,  180"		
"Block4= 13988,  500,  100,   24,  180,   21,  180"		
"Block5=  500, 4996,  100,   24,  180,   21,  180"		
"Block6= 4996, 4996,  100,   24,  180,   21,  180"		
"Block7= 9492, 4996,  100,   24,  180,   21,  180"		
"Block8= 13988, 4996,  100,   24,  180,   21,  180"					
"Block"	"Column"	"Row"	"Name"	"ID"
1	1	1	"MGAT4"	"NM_001013872.1"

Header

The header of the GAL file are the first few lines of the file that do not follow the recording formatting. There are four required header lines and a list of optional lines. Except for the first two lines and the last line all other header lines are key-value pairs with the format key=value which are surrounded by quotes.

File format (required)

The first line of the file is the same in all GAL files and consist of the file format (ATF) and format version (1.0) separated by tab.

Header and record lines (required)

The second line of the file consists of two tab separated numbers. The first is the number of optional rows in the header. This number + four mandatory lines provides the number of total lines in the header. The second number is the number of data column which for a standard GAL file is usually 5.

Type (required)

Type of the GAL file. For all standard GAL files the type is "GenePix Array List v1.0".

BlockCount (optional)

Number of block on the slides described in this file. This number should correlate with the lines describing position and dimension of the blocks.

BlockType (optional)

Type of the block layout. Usual "BlockType=0" for rectangular blocks.

URL (optional)

A base URL that can be used together with the ID column in the record section to hyperlink the individual entries. The part of the URL that needs to be replaced with the record specific ID is marked using "[ID]".

Supplier (optional)

Manufacturer that supplied the array slide or person/group that printed slide.

ArrayerSoftwareName (optional

Name of the arrayer/printer software used to print the slide.

ArrayerSoftwareVersion (optional)

Version number of the arrayer/printer software.

ArrayName (optional)

Name of the array/slide as provided or sold y the manufacturer.

ArrayRevision (optional)

Version number of the array/slide if several different version of with the same name have been generated.

SlideBarcode (optional)

Barcode of the slide if available.

Block (optional)

For each block on the slide a own line in this section is required. The number of blocks should also correspond with the "BlockCount" above. Each line in this section uses the format "Blockn= xOrigin, yOrigin, FeatureDiameter, xFeatures, xSpacing, yFeatures, ySpacing". Where n is the number of the block usually in increments of 1 starting with 1. This number also has to correspond with the block column in the record section below. xOrigin and yOrigin are the x and y coordinates of the center of the first feature/spot on the block (top left corner). The distance is in µm to the top left corner of the slide. FeatureDiameter is the intended size of each feature/spot in the block in µm. xFeatures and yFeatures are the number of spots/features in x and y direction. They are the number of rows and columns of the block and have to correspond with the column and row columns in the record section. xSpacing and ySpacing is the spacing between the different rows/columns on the block in µm. Following the "=" in the line all numbers are separated by ","

User Defined (optional)

User can add any number of additional row using the formatting "User Defined=...". Please note that this is the only key in the header that can be repeated.

Record section heading (required)

This line is the last line of the header and contains the columns of the record section. The usual text for this line is "Block" "Column" "Row" "Name" "ID". The quotes are not required but recommended.

Record section

This section follows right after the header section and contains all features of the slide line by line. In a standard GAL file there are five columns per record with four of the columns being required and one being optional.

Block (required)

The number of the block for the feature. This number needs to correspond with a line in the block section that provides the coordinate information for each feature/spot.

Column (required)

The column number of the feature/spot in the block. Columns are enumerated starting with 1. This number can not be greater than the xFeatures value in the corresponding line in the block section of the header.

Row (required)

The row number of the feature/spot in the block. Rows are enumerated starting with 1. This number can not be greater than the yFeatures value in the corresponding line in the block section of the header.

Name (optional)

Name of the feature to be display. This in the only optional value and this column can be empty. In older version of the GenePix Pro software (4.0 and before) this name is limited to 40 characters. Following version 4.1 there is no limit.

ID (required)

Identifier of the feature. Similar to name this field was limited to 40 characters before version 4.0 and has no limit after 4.1. The identifier can also be used in combination with the URL field from the header to create a hyperlink for the feature. If the spot is empty the keyword "empty" is used as ID.

These are the five predefined columns in the GAL file. User can add more columns with additional information with are not used for the array layout alignment but will all be added to the GPR file after spot alignment. This is the base for the extended GAL file which adds many additional columns to store glycan array related information and metadata.

Reference