The ASPEX package: affected sib-pair exclusion mapping: Basic syntax and parameter specifications

3. Basic syntax and parameter specifications

The ASPEX programs use the ``TCL'' library for reading and parsing parameter files. TCL is a simple, flexible scripting language that is in the public domain, and is available for many different systems. The same parameter file can generally be used for all of the ASPEX programs, with each program extracting just the information it needs and ignoring the rest.

All the ASPEX programs accept the following command-line parameters:

-V

Reports the ASPEX release number for this program.

-v

Selects more verbose output. Multiple -v options can be specified for increasing levels of verbosity.

-q

Selects ``quiet'' output: suppresses warnings about Mendelian incompatibilities in marker data files.

-c cmd

Executes the specified TCL command.

-f file

Executes TCL commands from the specified file.

The programs read parameters either from the command line or from separate parameter files. Multiple -c and -f options will be evaluated in order from left to right.

Here is a sample TCL parameter file:


# The number of marker loci
set nloc 5
# Marker names
set loc { l1 l2 l3 l4 l5 }
# Distances between markers
set dist { 0.1 0.1 0.1 0.1 0.1 0.1 }
# Sibling recurrence risk ratio
set risk 2.0
# Identity by descent probabilities: additive model
set z "[expr 0.25/$risk] 0.50 [expr 0.50-0.25/$risk]"

Parameters are specified by name (as in, ``set nloc 5''). Comments are preceded by the ``#'' character. The TCL interpreter can perform basic math, as shown in the last line.

Lists of values can be grouped using braces or quotes. List elements are separated by blanks, not commas. Lists can also be split across several lines, for example:


set dist {
    0.1 0.1 0.1 0.1 0.1
}

The ASPEX programs will also use default parameter files if present. If they exist, the aspexrc1 file in the current directory will be processed before any command line parameters, and the aspexrc2 file will be processed after the command line. The names and paths for these parameter files can be changed by setting the ASPEXRC1 and ASPEXRC2 system environment variables.

The following parameters are common to all the ASPEX programs:

nloc

An integer: the number of marker loci. If not specified explicitly, it will be determined from the loc parameter.

loc

A list of nloc strings identifying the markers. The markers should be listed in map order along the chromosome.

omit

A list of strings identifying families or individuals whose data should be left out of analyses. Specify an individual using a string of the form ``X.Y'' where X is the family ID and Y is the person's ID. To omit an entire family, specify ``X.*'' where X is the family ID.

blank

A string: the allele identifier used for missing data. The default is ``0''.

fid_width

An integer: the width of the family identifiers, in characters, to use when formatting tabular output. The default is 2.

pid_width

An integer: the width of the person identifiers, in characters, to use when formatting tabular output. The default is 2.

allele_width

An integer: the width of the allele names, in characters, to use when formatting tabular output. The default is 2.

loc_width

An integer: the width of marker names, in characters, to use when formatting tabular output. The default is 10.

discard_partial

A boolean value: indicates if partially typed loci (one allele known, one blank) should be counted, or whether they should be treated as both-blank. The default is 1 or true. The sib_tdt program always discards partial genotypes.

sex_linked

A boolean value: indicates if the allele data is for the sex chromosomes, as opposed to autosomes. The default is false (autosomal).

sick_set

A string: the list of character values for the affected status field that are interpreted as ``affected''. The default is ``YyTt2''.

well_set

A string: the list of character values for the affected status field that are interpreted as ``unaffected''. The default is ``NnFf1''.

unk_set

A string: the list of character values for the affected status field that are interpreted as ``unknown''. The default is ``Uu?0''.

Programs that generate exclusion maps (sib_ibd and sib_phase) use the following additional parameters:

dist

A list of nloc+1 map distances between all the markers, including distances from the end markers to the corresponding telomere. All map distances are specified in Morgans.

dist_xx, dist_xy

These parameters are similar to the dist parameter, but specify sex-specific recombination maps.

mapping

Specifies the mapping function for recombination fractions. Valid values are ``Kosambi'' or ``Haldane''. The default is ``Kosambi''.

z

A list of three numbers: the probabilities of two siblings being identical by descent for 0, 1, or 2 alleles, given that they are both affected. For sex-linked data, the list should consist of only two numbers, since identity by descent is only calculated for the maternal alleles. These values are only used when most_likely is false.

max_step

The maximum gap to leave between data points interpolated between markers in the lod map, in units of map distance. The default is 0.01 Morgans

fix_step

A flag indicating if the gap size between map points should be fixed at max_step, or whether it should be allowed to vary between markers. The default is false, which guarantees that there will be a data point at every marker position.

most_likely

A flag indicating if a maximum likelihood calculation of the sharing at each locus should be done. The default is false (don't do the calculation). When this flag is set, for each marker position or point along the map, the programs will determine the set of Z values that give the highest LOD score at that position. The corresponding % sharing and maximized LOD scores are reported.

linear_model

A flag indicating if the maximum likelihood calculation should fit to a linear model, or to a two-parameter model over all possible Z values. The default is true (i.e., use just a linear model). At present, the two-parameter model does not use a ``possible triangle'' constraint.

no_Dv

A flag indicating if maximum likelihood calculations with a linear model should use a model with no dominance variance. The default is false. When false, maximum likelihood calculations assume the following ``multiplicative'' model for z values:


z[2] = y^2
z[1] = 2*y*(1-y) 
z[0] = (1-y)^2

where y is the sharing at this locus. If no_Dv is true, then this model is replaced by an additive model, where z[1] is fixed at 0.5:


z[2] = y-0.25
z[1] = 0.5
z[0] = 0.75-y

In terms of the sibling recurrence risk ratio, lambda, the multiplicative model has the form:


z[2] = 1 + 0.25/lambda - 1/sqrt(lambda)
z[1] = 1/sqrt(lamda) - 0.5/lamdba
z[0] = 0.25/lambda

and the additive model has the form:


z[2] = 0.5 - 0.25/lambda
z[1] = 0.5
z[0] = 0.25/lambda

truncate_sharing

If most_likely is turned on, then this flag indicates if sharing should be required to be at least 50% for the likelihood maximization. The effect is that positive LOD scores will only be indicated for positions with greater than expected sharing. For affected sib pairs, this is sensible because a predicted sharing of less than 50% would be inconsistent with any simple genetic model. If count_discordant is enabled, then the direction of truncation is reversed: sharing is required to be no more than 50%. The default is 1 or true.

exclusion_level

A floating point value, in LOD score units. If set, then instead of finding a maximum likelihood model, the programs will find the model farthest from the null hypothesis that has a LOD score no higher than the specified value. Thus, this finds an upper bound on the effect of a putative gene at a given position, for exclusion at this level. The default (0.0) disables the exclusion calculation. This value should never be positive.

count_once

A boolean value: indicates if only strictly independent sib pairs should be counted. Normally, for families with more than two sibs, all pairwise combinations are scored. If this flag is set, then only pairs with the first affected sib will be counted. The default is 0 or false.

first_pair

A boolean value: indicates if only the first appropriate sib pair in each family should be counted, as opposed to all pairs, or all pairs including the first sib, as indicated by count_once. The default is 0 or false.

count_unaffected

A boolean value: indicates if sib pairs should be counted where the first sib is unaffected. The default is 0 or false, i.e., count pairs with affected sibs.

count_discordant

A boolean value: indicates if the disease status for the second sib in a pair should be discordant with the first sib. The default is 0 or false, i.e., count pairs that are concordant for disease status.

error_freq

A floating-point number: this specifies the probability of a typing error at an arbitrary marker position. The sib_ibd, sib_phase, and sib_map programs use this to identify marker data that is likely to represent typing errors. The method is based on detection of unlikely recombination patterns, so it is only effective in regions that are densely typed. When an error is detected, all marker data at that position for that family will be excluded from subsequent calculations. The default error frequency is 0 (meaning that all data is assumed to be correct). Reasonable values are on the order of 0.01.

Be careful when using count_once in conjunction with count_discordant. The disease status of the first member of each pair is always determined by count_unaffected. When count_once is enabled, the number of discordant sib pairs counted will depend on whether count_unaffected is set or not. If count_unaffected is false, then within each family, pairs of the first affected sib with all unaffected sibs will be counted. If count_unaffected is true, then pairs of the first unaffected sib with all affected sibs will be counted.

Next Previous Contents