The sib_phase
program generates a multipoint exclusion map from
marker data for affected sibling pairs. While sib_phase
still
calculates separate likelihoods for each affected pair assuming they
are independent, it will use all the available marker data from a
family when calculating likelihoods. It will use allele frequencies
to reconstruct missing parents, and will also phase the parents across
multiple markers.
When parent marker data is available, for two-sib families,
sib_phase
is equivalent to (but slower than) sib_ibd
.
The execution time of sib_phase
depends sharply on the amount of
missing marker information. The calculation can become very time
consuming for large families if there are adjacent loci for which the
available marker data still allows a large number of possible
inheritance states. One example of such a case is when two adjacent
loci are untyped for most but not all children. Another case is when
at least one parent is untyped, and all typed individuals are
homozygous for the same allele. When sib_phase
encounters a
family that it expects will take a long time to evaluate, it will
print a warning message with an estimated ``cost'' for the family,
which is roughly on the order of the calculation time, in seconds.
The sib_phase
program also has a strict upper limit of 16 sibs
per family.
The memory requirements of sib_phase
are an exponential function
of sibship size. The worst-case requirements scale as 8*2^(2N-2) for
a sibship of size N. However, for the common situation in which a sib
is either untyped, or typed at nearly all markers, memory requirements
should actually scale as 8*2^(N-1), a much more manageable number for
all reasonable sibship sizes. In practice, memory requirements should
rarely ever be an issue, even with large sibships.
There are two analysis modes in sib_phase
, depending on the value
of the use_allele_freq
parameter. Allele frequencies are
estimated from all the marker data, including parents and children.
If use_allele_freq
is disabled, sib_phase
should give the same results as
sib_ibd
, with the exception that it can reconstruct parents
when both are untyped. If use_allele_freq
is enabled, then when
parents cannot be reconstructed, the allele frequencies are used to
estimate the probabilities of each IBD state.
Like sib_ibd
, sib_phase
will use gender-specific
recombination maps if specified with the dist_xx
and dist_xy
parameters. Separate maternal and paternal LOD scores will be
reported if the sex_split
parameter is used. With
sib_phase
, the parental LOD scores will generally not be
additive, unless all parents are typed. The reconstruction of missing
parental genotypes using allele frequencies is done for both parents
together, so the LOD score for one parent is ``contaminated'' with
information derived from the other parent that is difficult to remove.
While the resulting parental LOD scores are not independent, they may
still be useful for qualitatively assessing how excess sharing is
distributed between the two parents. If most_likely
is specified
along with sex_split
, then the maternal and paternal scores are
reported using the overall maximum-likelihood model.
sib_phase
[-v
]
[-c
cmd]
[-f
file]
[marker_data ...]
[>
map_data]
One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The exclusion map will be sent to standard output.
The following parameters should be specified using TCL commands via
either the -c
or -f
mechanisms:
A boolean value: indicates if allele frequencies should be used to estimate IBD probabilities when parents are missing. The default is true.
A boolean value: if false, indicates that allele frequencies should be estimated from the given marker data. If true, then frequencies should be specified in the parameter file. The default is false.
This parameter specifies if separate LOD scores should be calculated for paternal and maternal sharing.
A boolean value: if set, then allele frequencies are reported.
A boolean value: if set, then posterior probabilities of IBD=0,1,2 for each location are reported for each sib pair.
There is a new TCL command, freq
, for specifying fixed allele
frequencies in the parameter file, when fixed_freq
is enabled.
The syntax is:
freq [loc] { [a1] [f1] [a2] [f2] ... }
where [loc]
is the marker name, [a1]
and [f1]
are an
allele name and its frequency, and so on.
Here is a sample parameter file for sib_phase
:
set nloc 4
set loc { "l1" "l2" "l3" "l4" }
set blank "0"
set discard_partial true
set dist { 0.1 0.1 0.1 0.1 0.1 }
set risk 2.0
set z "[expr 0.25/$risk] 0.50 [expr 0.50-0.25/$risk]"
set mapping Haldane
set most_likely true
set no_Dv true
set fixed_freq true
freq "l1" { "1" 0.3 "2" 0.7 }
freq "l2" { "1" 0.5 "2" 0.5 }
freq "l3" { "1" 0.9 "2" 0.1 }
freq "l4" { "1" 0.5 "2" 0.5 }
In this example, the identical-by-descent probabilities are derived from a simple sibling recurrence risk ratio assuming an additive model.
The default output of sib_phase
is a table summarizing the
actual and expected identity by state at each marker position. A
chi-squared value for the IBS totals is generated for each marker, and
the multipoint LOD score is also shown.
If most_likely
is enabled, then sib_phase
will instead calculate
a maximum likelihood estimate of the sharing at each marker position
(or point along the map). As with sib_ibd
, the table will list
the percent sharing, model parameters, and lod score at each marker
position.
For autosomal data, the identity by state estimate is given by:
IBS = 0.5 + (0.75 * S2) - (0.5 * S3) + (0.25 * S4)
where S2
is the sum of squared allele frequencies at this
position, S3
is the sum of cubed frequencies, and S4
is the
sum of fourth powers of the frequencies.
For sex-linked data, the identity by state estimate is given by:
IBS = 0.5 + ((1.0 - SAME/2.0) * S2) - ((0.5 - SAME/2.0) * S3)
where S2
and S3
are as for an autosomal marker, and
SAME
is the fraction of sib pairs in the sample that are
same-sex.
If verbose (-v
) output is selected, sib_phase
will generate a
multipoint exclusion map. The format is the same as for sib_ibd
.