Next Previous Contents

7. The sib_phase program

The sib_phase program generates a multipoint exclusion map from marker data for affected sibling pairs. While sib_phase still calculates separate likelihoods for each affected pair assuming they are independent, it will use all the available marker data from a family when calculating likelihoods. It will use allele frequencies to reconstruct missing parents, and will also phase the parents across multiple markers. When parent marker data is available, for two-sib families, sib_phase is equivalent to (but slower than) sib_ibd.

The execution time of sib_phase depends sharply on the amount of missing marker information. The calculation can become very time consuming for large families if there are adjacent loci for which the available marker data still allows a large number of possible inheritance states. One example of such a case is when two adjacent loci are untyped for most but not all children. Another case is when at least one parent is untyped, and all typed individuals are homozygous for the same allele. When sib_phase encounters a family that it expects will take a long time to evaluate, it will print a warning message with an estimated ``cost'' for the family, which is roughly on the order of the calculation time, in seconds. The sib_phase program also has a strict upper limit of 16 sibs per family.

The memory requirements of sib_phase are an exponential function of sibship size. The worst-case requirements scale as 8*2^(2N-2) for a sibship of size N. However, for the common situation in which a sib is either untyped, or typed at nearly all markers, memory requirements should actually scale as 8*2^(N-1), a much more manageable number for all reasonable sibship sizes. In practice, memory requirements should rarely ever be an issue, even with large sibships.

There are two analysis modes in sib_phase, depending on the value of the use_allele_freq parameter. Allele frequencies are estimated from all the marker data, including parents and children. If use_allele_freq is disabled, sib_phase should give the same results as sib_ibd, with the exception that it can reconstruct parents when both are untyped. If use_allele_freq is enabled, then when parents cannot be reconstructed, the allele frequencies are used to estimate the probabilities of each IBD state.

Like sib_ibd, sib_phase will use gender-specific recombination maps if specified with the dist_xx and dist_xy parameters. Separate maternal and paternal LOD scores will be reported if the sex_split parameter is used. With sib_phase, the parental LOD scores will generally not be additive, unless all parents are typed. The reconstruction of missing parental genotypes using allele frequencies is done for both parents together, so the LOD score for one parent is ``contaminated'' with information derived from the other parent that is difficult to remove. While the resulting parental LOD scores are not independent, they may still be useful for qualitatively assessing how excess sharing is distributed between the two parents. If most_likely is specified along with sex_split, then the maternal and paternal scores are reported using the overall maximum-likelihood model.

7.1 Command syntax

sib_phase [-v] [-c cmd] [-f file] [marker_data ...] [> map_data]

One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The exclusion map will be sent to standard output.

7.2 TCL parameters

The following parameters should be specified using TCL commands via either the -c or -f mechanisms:

use_allele_freq

A boolean value: indicates if allele frequencies should be used to estimate IBD probabilities when parents are missing. The default is true.

fixed_freq

A boolean value: if false, indicates that allele frequencies should be estimated from the given marker data. If true, then frequencies should be specified in the parameter file. The default is false.

sex_split

This parameter specifies if separate LOD scores should be calculated for paternal and maternal sharing.

show_freqs

A boolean value: if set, then allele frequencies are reported.

show_pairs

A boolean value: if set, then posterior probabilities of IBD=0,1,2 for each location are reported for each sib pair.

There is a new TCL command, freq, for specifying fixed allele frequencies in the parameter file, when fixed_freq is enabled. The syntax is:

freq [loc] { [a1] [f1] [a2] [f2] ... }

where [loc] is the marker name, [a1] and [f1] are an allele name and its frequency, and so on.

Here is a sample parameter file for sib_phase:

set nloc 4
set loc { "l1" "l2" "l3" "l4" }
set blank "0"
set discard_partial true
set dist { 0.1 0.1 0.1 0.1 0.1 }
set risk 2.0
set z "[expr 0.25/$risk] 0.50 [expr 0.50-0.25/$risk]"
set mapping Haldane
set most_likely true
set no_Dv true
set fixed_freq true
freq "l1" { "1" 0.3 "2" 0.7 }
freq "l2" { "1" 0.5 "2" 0.5 }
freq "l3" { "1" 0.9 "2" 0.1 }
freq "l4" { "1" 0.5 "2" 0.5 }

In this example, the identical-by-descent probabilities are derived from a simple sibling recurrence risk ratio assuming an additive model.

7.3 Output

The default output of sib_phase is a table summarizing the actual and expected identity by state at each marker position. A chi-squared value for the IBS totals is generated for each marker, and the multipoint LOD score is also shown. If most_likely is enabled, then sib_phase will instead calculate a maximum likelihood estimate of the sharing at each marker position (or point along the map). As with sib_ibd, the table will list the percent sharing, model parameters, and lod score at each marker position.

For autosomal data, the identity by state estimate is given by:

IBS = 0.5 + (0.75 * S2) - (0.5 * S3) + (0.25 * S4)

where S2 is the sum of squared allele frequencies at this position, S3 is the sum of cubed frequencies, and S4 is the sum of fourth powers of the frequencies.

For sex-linked data, the identity by state estimate is given by:

IBS = 0.5 + ((1.0 - SAME/2.0) * S2) - ((0.5 - SAME/2.0) * S3)

where S2 and S3 are as for an autosomal marker, and SAME is the fraction of sib pairs in the sample that are same-sex.

If verbose (-v) output is selected, sib_phase will generate a multipoint exclusion map. The format is the same as for sib_ibd.


Next Previous Contents