This program reads a file of genotype data for nuclear families consisting of two parents and one or more ``affected'' siblings. It reports frequencies for all alleles in the parents, and allele transmission frequencies for transmission disequilibrium (TDT) tests.
This version of sib_tdt
will calculate empirical probabilities
for chi-squared statistics, which will accurately reflect association
independent of linkage within families. This calculation is done by
permuting parent alleles while fixing the IBD status of sibs within a
family.
There are two run-time parameters that affect the accuracy of p-values
calculated using the permutation procedure: min_reps
and
max_reps
. For each replicate dataset, a TDT score is calculated
and compared with the score for the actual data, and the empirical
p-value equals the number of replicate scores that exceed the actual
score, divided by the total number of replicates. The algorithm will
generate new replicates until the numerator exceeds min_reps
, or
until the denominator exceeds max_reps
, whichever happens first.
sib_tdt
[-v
]
[-c
cmd]
[-f
file]
[marker_data ...]
[>
tdt_data]
One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The TDT listing will be sent to standard output.
The following parameters should be specified using TCL commands via
either the -c
or -f
mechanisms:
A boolean value: indicates that transmissions to just the first sib in each family should be scored. The default is false.
During p-value estimation, stop generating replicates when at least this many have scores larger than the observed TDT score. This determines the accuracy of large p-values. The default is 1000.
During p-value estimation, never generate more than this number of sample replicates. This determines the accuracy of small p-values. The default is 40000.
Here is a sample parameter file for sib_tdt
:
set loc { "l1" "l2" "l3" "l4" }
set blank "00"
set discard_partial true
The default non-verbose output is a table summarizing sample coverage and chi-squared statistics for each marker. The table lists, for each position, the number of distinct alleles, the heterozygosity based on just typed parents, and the overall percentage of typed individuals. The TDT results are summarized by the sum of chi-squared statistics for transmission of all alleles, and the maximum chi-squared obtained for any one allele, and the corresponding estimated p-values.
If sex_split
is true, then separate chi-squared scores and
p-values are reported for maternal and paternal transmissions.
For the default min_reps
and max_reps
settings, p-values are
accurate to within about 5%, but gradually become less accurate for
p<0.01.
The sum statistic and maximum statistic are generally similar, but the sum statistic is more sensitive to cases where multiple alleles are in disequilibrium.
If verbose (-v
) output is selected, the summary table will be
replaced by allele transmission tables for each marker, with the form:
[al] [n] [%] [ft] [fn] [fc] [mt] [mn] [mc] [st] [sn] [sc]
where [al]
is the allele name, [n]
is the number
of times it is seen in the parents, and [%]
is the percent
frequency. [ft]
is the number of times the allele was
transmitted through the father, [fn]
is the number of times
it was not transmitted, and [fc]
is the chi-squared score
for this outcome. [mt]
, [mn]
, and [mc]
are the same, for the mother. Likewise, [st]
, [sn]
, and
[sc]
combine the results for both parents.
The combined counts may be larger than the sum of the counts for the two parents. If two parents and a child are all heterozygous with the same genotype at a given position, one copy of each of the child's alleles was transmitted and the other was not. These cases are added into the ``combined'' totals.
If only one parent is typed at a particular marker, transmission through that parent will be scored in cases where it is not biased by allele frequencies. This reduces to cases where the parent and child are both heterozygous, but have only one allele in common (i.e., parent AB, child AC). This differs from treatment in previous versions: prior to version 1.11, single-parent transmissions were scored (incorrectly) even in situations that were biased, and from version 1.11 through 1.16, single-parent cases were never scored.
If very verbose (-vv
) output is selected, the TDT results are
replaced by a detailed listing of allele transmissions for each child.
The listing indicates which allele was inherited from which parent,
for all children. In the special case described above where
sib_tdt
cannot assign the transmitted alleles to specific
parents, the allele pair is enclosed in square brackets.