The ASPEX package: affected sib-pair exclusion mapping: Tips and Tricks

12. Tips and Tricks

Here are a few random bits of useful information that may be helpful when using ASPEX. Many of the tricks depend on features of the TCL command language embedded in all the ASPEX tools. A discussion of TCL syntax is beyond the scope of this document, but this section at least gives an idea of the sorts of things that can be done with TCL. As you will see, TCL syntax, in particular the meaning of different quoting and bracketing constructs, is somewhat obscure. If you want to learn more about TCL, a very good reference is ``Tcl and the Tk Toolkit'' by John Ousterhout.

12.1 Data organization

I typically organize files for a project so that ASPEX parameter files are in the main directory, and data files are in a subdirectory (maybe organized by chromosome, depending on how the files were generated). I also keep map files (containing just the loc and dist parameters) in a separate subdirectory. Since ASPEX input file handling is very flexible, I try to leave the raw data as close to its original form as possible: if the data is organized one marker per file, or one chromosome per file, or one gel per file, I keep that organization, and let ASPEX take care of extracting the right information for a particular analysis.

The omit parameter also helps here, because it allows subsetting of data without having to touch the raw genotype files. In my limited experience, editing raw data files in the course of a study (say, if an individual is excluded due to a nonpaternity, or a family is excluded due to a misdiagnosis) is best kept to a minimum: it is nice to be able to leave the data alone, and just have a command file that describes (with comments) who is being included and excluded, and why.

Use the aspexrc1 and aspexrc2 files to store all the defaults for a project (things like field widths, the blank allele identifier, and the setting of count_once). They can also be used to select default analysis options. The order of evaluation of these files can be important: set something in aspexrc1 if you might want to override it with a command-line option, but if you want to evaluate something depending on a command-line option, it needs to go in aspexrc2. Options that will probably never change can be put in either place.

For example, here could be a reasonable aspexrc1 file:


# These won't ever change
set blank "N"
set fid_width 3
set pid_width 3
set allele_width 3
# Defaults we may want to override
set count_once true
set truncate_sharing false
set sex_linked false
set no_Dv false

And here is something that could go in aspexrc2. This initializes z based on the current setting of no_Dv if a sibling recurrence risk ratio, risk, is available. Otherwise, it selects a maximum likelihood calculation.


# This needs to execute after the command-line options
if [ info exists risk ] then
    if $sex_linked {
        set z "[expr 0.5/$risk] [expr 1.0-0.5/$risk]"
    } else {
        set z1 [expr $no_Dv ? 0.50 : 1/sqrt($risk) - 0.50/$risk]
        set z "[expr 0.25/$risk] $z1 [expr 1.0 - 0.25/$risk - $z1]"
    }
} else {
    set most_likely true
}

Another data organization trick that is useful when working with ASPEX is to keep affected-status information in a separate file that does not include any genotype data. This is particularly useful for analyses with more than one definition of disease status, because the same set of raw genotype data files can be analyzed with several separate disease-status files.

12.2 More command file tricks

Here is another useful bit of TCL code for aspexrc1: it allows a marker map to be given in absolute rather than relative positions.


proc map {a} {
    global loc dist
    set n [ expr [llength $a] - 1 ]
    set d1 [lindex $a 0]
    for {set i 1} {$i < $n} {incr i} {
        set d2 [lindex $a $i]
        lappend dist [ expr $d2 - $d1 ]
        set d1 $d2
        incr i
        lappend loc [lindex $a $i]
    }
    lappend dist [ expr [lindex $a $n] - $d1 ]
}

So... the following normal aspex map description:


set loc { A B C }
set dist { 0.1 0.2 0.2 0.1 }

could instead be specified as:


map {
  0.0
  0.1 A
  0.3 B
  0.5 C
  0.6
}

If you prefer specifying distances in centimorgans, put the following in aspexrc2:


# Convert the dist array to centimorgans
set d ""
set n [ expr [llength $dist] ]
for {set i 0} {$i < $n} {incr i} {
    lappend d [ expr 0.01 * [lindex $dist $i] ]
}
set dist $d

A parameter file can also be structured to contain different sets of options for different ASPEX programs. The name of the invoking program is available as a TCL variable, argv0. For example:


# Omit families with half sibs from regular analysis
if { $argv0 != "sib_kin" } {
    set omit { 101060.* 200442.* }
}

12.3 Shortcuts for simple analyses

Sometimes, it is convenient to specify lists of markers on the command line rather than in a parameter file. For instance, to calculate the recombination distance between two markers:


sib_map -q -c "set loc { mark1 mark2 }" files ...

or to calculate a TDT or sharing statistics at a single marker:


sib_tdt -q -v -c "set loc mark1" files ...
sib_ibd -q -c "set loc mark1" files ...

These shortcuts work best if you have already put all your default parameter settings in aspexrc1 and aspexrc2.

12.4 The example subdirectory

The files in the example subdirectory demonstrate several ways of organizing linkage data for use with ASPEX.

We generated simulated data for 50 three-sib families, assuming a gene with a sibling risk recurrence ratio of 10 and a succeptibility allele frequency of 0.10, with a population frequency of disease of 0.02. The example files include:

aspexrc1 and aspexrc2: sample ASPEX startup files, with lots of comments.
c1.map: a marker map for the simulated data, again with comments.
c1_all.dat: affected status and genotype data for the 12 markers described in c1.map.
c1_1.dat, c1_2.dat, status.dat: the same data as c1_all.dat, divided into several genotype data files and a file containing just the affected status information.

Here are some examples of simple analysis commands to try with these files, to familiarize yourself with ASPEX syntax:


# Basic multipoint IBD analysis with typed parents
sib_ibd -f c1.map c1_all.dat
# Suppress warnings about Mendelian incompatibilities
sib_ibd -q -f c1.map c1_all.dat
# The same, but arranging the input data differently
sib_ibd -q -f c1.map status.dat c1_1.dat c1_2.dat

# Multipoint using allele frequencies to fill in missing parents
sib_phase -f c1.map c1_all.dat
# The same, but for discordant pairs instead of affected pairs
sib_phase -f c1.map -c "set count_discordant true" c1_all.dat
# Score affected pairs for a specific relative risk ratio
sib_phase -f c1.map -c "set risk 5.0" c1_all.dat
# Generate a detailed multipoint map for plotting
sib_phase -q -v -f c1.map -c "set risk 5.0" c1_all.dat

# Remove data inconsistencies, then re-run analysis
sib_clean -f c1.map c1_all.dat > clean.dat
sib_phase -f c1.map clean.dat

# TDT with multi-allele score statistics for each marker
sib_tdt -f c1.map c1_all.dat
# Detailed TDT scores for each allele
sib_tdt -v -f c1.map c1_all.dat
# Shorthand for scoring just one marker
sib_tdt -v -c "set loc M7" c1_all.dat

# Generate a distance map from the sib data
sib_map -f c1.map c1_all.dat
# Shorthand for recombination distance between two markers
sib_map -q -c "set loc { M7 M8 }" c1_all.dat

Next Previous Contents