Here are a few random bits of useful information that may be helpful when using ASPEX. Many of the tricks depend on features of the TCL command language embedded in all the ASPEX tools. A discussion of TCL syntax is beyond the scope of this document, but this section at least gives an idea of the sorts of things that can be done with TCL. As you will see, TCL syntax, in particular the meaning of different quoting and bracketing constructs, is somewhat obscure. If you want to learn more about TCL, a very good reference is ``Tcl and the Tk Toolkit'' by John Ousterhout.
I typically organize files for a project so that ASPEX parameter files
are in the main directory, and data files are in a subdirectory (maybe
organized by chromosome, depending on how the files were generated).
I also keep map files (containing just the loc
and dist
parameters) in a separate subdirectory. Since ASPEX input file
handling is very flexible, I try to leave the raw data as close to its
original form as possible: if the data is organized one marker per
file, or one chromosome per file, or one gel per file, I keep that
organization, and let ASPEX take care of extracting the right
information for a particular analysis.
The omit
parameter also helps here, because it allows subsetting
of data without having to touch the raw genotype files. In my limited
experience, editing raw data files in the course of a study (say, if an
individual is excluded due to a nonpaternity, or a family is excluded
due to a misdiagnosis) is best kept to a minimum: it is nice to be
able to leave the data alone, and just have a command file that
describes (with comments) who is being included and excluded, and
why.
Use the aspexrc1
and aspexrc2
files to store all the
defaults for a project (things like field widths, the blank allele
identifier, and the setting of count_once
). They can also be
used to select default analysis options. The order of evaluation of
these files can be important: set something in aspexrc1
if you
might want to override it with a command-line option, but if you want
to evaluate something depending on a command-line option, it needs to
go in aspexrc2
. Options that will probably never change can be
put in either place.
For example, here could be a reasonable aspexrc1
file:
# These won't ever change
set blank "N"
set fid_width 3
set pid_width 3
set allele_width 3
# Defaults we may want to override
set count_once true
set truncate_sharing false
set sex_linked false
set no_Dv false
And here is something that could go in aspexrc2
. This initializes
z
based on the current setting of no_Dv
if a sibling
recurrence risk ratio, risk
, is available. Otherwise, it
selects a maximum likelihood calculation.
# This needs to execute after the command-line options
if [ info exists risk ] then
if $sex_linked {
set z "[expr 0.5/$risk] [expr 1.0-0.5/$risk]"
} else {
set z1 [expr $no_Dv ? 0.50 : 1/sqrt($risk) - 0.50/$risk]
set z "[expr 0.25/$risk] $z1 [expr 1.0 - 0.25/$risk - $z1]"
}
} else {
set most_likely true
}
Another data organization trick that is useful when working with ASPEX is to keep affected-status information in a separate file that does not include any genotype data. This is particularly useful for analyses with more than one definition of disease status, because the same set of raw genotype data files can be analyzed with several separate disease-status files.
Here is another useful bit of TCL code for aspexrc1
: it allows a
marker map to be given in absolute rather than relative positions.
proc map {a} {
global loc dist
set n [ expr [llength $a] - 1 ]
set d1 [lindex $a 0]
for {set i 1} {$i < $n} {incr i} {
set d2 [lindex $a $i]
lappend dist [ expr $d2 - $d1 ]
set d1 $d2
incr i
lappend loc [lindex $a $i]
}
lappend dist [ expr [lindex $a $n] - $d1 ]
}
So... the following normal aspex map description:
set loc { A B C }
set dist { 0.1 0.2 0.2 0.1 }
could instead be specified as:
map {
0.0
0.1 A
0.3 B
0.5 C
0.6
}
If you prefer specifying distances in centimorgans, put the following
in aspexrc2
:
# Convert the dist array to centimorgans
set d ""
set n [ expr [llength $dist] ]
for {set i 0} {$i < $n} {incr i} {
lappend d [ expr 0.01 * [lindex $dist $i] ]
}
set dist $d
A parameter file can also be structured to contain different sets of
options for different ASPEX programs. The name of the invoking
program is available as a TCL variable, argv0
. For example:
# Omit families with half sibs from regular analysis
if { $argv0 != "sib_kin" } {
set omit { 101060.* 200442.* }
}
Sometimes, it is convenient to specify lists of markers on the command line rather than in a parameter file. For instance, to calculate the recombination distance between two markers:
sib_map -q -c "set loc { mark1 mark2 }" files ...
or to calculate a TDT or sharing statistics at a single marker:
sib_tdt -q -v -c "set loc mark1" files ...
sib_ibd -q -c "set loc mark1" files ...
These shortcuts work best if you have already put all your default
parameter settings in aspexrc1
and aspexrc2
.
The files in the example
subdirectory demonstrate several ways of
organizing linkage data for use with ASPEX.
We generated simulated data for 50 three-sib families, assuming a gene with a sibling risk recurrence ratio of 10 and a succeptibility allele frequency of 0.10, with a population frequency of disease of 0.02. The example files include:
aspexrc1
and aspexrc2
: sample ASPEX startup files,
with lots of comments.c1.map
: a marker map for the simulated data, again with
comments.c1_all.dat
: affected status and genotype data for the 12
markers described in c1.map
.c1_1.dat
, c1_2.dat
, status.dat
: the same data as
c1_all.dat
, divided into several genotype data files and a file
containing just the affected status information.Here are some examples of simple analysis commands to try with these files, to familiarize yourself with ASPEX syntax:
# Basic multipoint IBD analysis with typed parents
sib_ibd -f c1.map c1_all.dat
# Suppress warnings about Mendelian incompatibilities
sib_ibd -q -f c1.map c1_all.dat
# The same, but arranging the input data differently
sib_ibd -q -f c1.map status.dat c1_1.dat c1_2.dat
# Multipoint using allele frequencies to fill in missing parents
sib_phase -f c1.map c1_all.dat
# The same, but for discordant pairs instead of affected pairs
sib_phase -f c1.map -c "set count_discordant true" c1_all.dat
# Score affected pairs for a specific relative risk ratio
sib_phase -f c1.map -c "set risk 5.0" c1_all.dat
# Generate a detailed multipoint map for plotting
sib_phase -q -v -f c1.map -c "set risk 5.0" c1_all.dat
# Remove data inconsistencies, then re-run analysis
sib_clean -f c1.map c1_all.dat > clean.dat
sib_phase -f c1.map clean.dat
# TDT with multi-allele score statistics for each marker
sib_tdt -f c1.map c1_all.dat
# Detailed TDT scores for each allele
sib_tdt -v -f c1.map c1_all.dat
# Shorthand for scoring just one marker
sib_tdt -v -c "set loc M7" c1_all.dat
# Generate a distance map from the sib data
sib_map -f c1.map c1_all.dat
# Shorthand for recombination distance between two markers
sib_map -q -c "set loc { M7 M8 }" c1_all.dat