Processed data file

From GlyGen Wiki
Revision as of 18:54, 3 March 2022 by Rene (talk | contribs)
Jump to navigation Jump to search

The processed data file is used to upload the final data values of an glycan array experiment to the Glycan array data repository as part of the processed data upload. The file is an Excel table that may consist of multiple sheet. However one sheet has to be labeled "Repo Data" and has to follow the format instructions below. Only this sheet will be used to extract data from and load them to the repository. The sheet is designed to be human readable as well and can contain additional information which is ignored by the repository.

File Format

The processed data file is a standard Excel spreadsheet in *.xlsx format. The file may contain more than one sheet but one of the sheets need to be labeled "Repo Data". This sheet has to contain the following columns with the corresponding label in row one. All columns have to be in the spreadsheet. But columns labeled as optional might be left empty.

Example of the table format with the necessary columns.

ID (required)

An user chosen ID that has to be unique for each row. This ID will not be imported to the repository but will be used by the repository to report errors in individual rows. The ID may be an auto-increment number or any alpha-numerical text.

Name (optional)

Human readable name of the feature. This name is optional and will not be used by the repository to associate the data with the features. The repository will use RepoID instead. The purpose of this column is to support human readability of this spreadsheet.

RFU (required)

Averaged intensity value for the features. This value will be extracted by the repository and has to be a number or floating point number.

SD (required)

The standard deviation after averaging the intensity values of all spots with the same feature. This value will be extracted by the repository and has to be a number or floating point number.

CV (optional)

The coefficient of variation is calculated out of the standard deviation (SD) and average intensity value (RFU). This value is not extracted from the spreadsheet but recalculated by the repository.

Rank (optional)

A user defined column that can be used to providing an additional sorting criteria beyond the RFU column. This column is not extracted when loading data to the repository

Name2 (optional)

A column used to provide a secondary name or feature sequence. This column is not extracted by the repository.

RepoID (required)

ID of the feature in the repository. This ID must be already present in the repository and must be used in the slide that the processed data is associated. This row describes a mixture of features printed on the same spot the feature IDs are separated by "||" (e.g., Feature123||Feature124).

GroupID (optional)

This column contains the group ID that is associated with the feature when creating the block layout. As long as all spots with the same feature/concentration are average into a single value this column can be left empty. However, if on the same slide a feature is present multiple times and the intensities are not average into one value (e.g., one value for all replicated of the feature on block 1 and another for block 2) the group ID is used to identify which feature this intensity value belongs to.

Concentration (optional)

Concentration of the feature on the spot. This value is not extracted by the repository since this information is already provided when creating the block layout. The column is only used for human readability.

Additional columns (optional)

Additional columns can be added to the end of the table. These columns will be ignored by the repository.