This program reads a file of genotype data for nuclear families consisting of two parents and two or more ``affected'' siblings. It reports identity-by-descent information for all affected sibling pairs.
The program works by first trying to identify the parental origin of each sibling allele. At a given locus, for each parent, the program compares the corresponding alleles for each member of an affected sibling pair. If both alleles can be uniquely identified, the pair is scored as either identical or non-identical by descent from that parent at that position. If the match is ambiguous, the position is scored as uninformative.
If marker data from additional siblings is available, it will be used
to reconstruct missing parents. The reconstruction will depend on the
settings of the count_discordant
, count_unaffected
, and
count_once
parameters. When scoring a given sib pair, the first
sib in that pair is used for reconstruction. Also, any sibs that are
not part of any counted pair will be used. In the common case of
concordant affected sib pairs, this means that one affected and all
unaffected sibs will be used for reconstruction. Be aware that the
IBD results for different types of sib pairs (affected, unaffected,
discordant) cannot be simply added to get results for all pairs, due
to the different reconstruction criteria used in each case.
A side effect of the use of the first sib from each pair in parent
reconstruction is that the the choice of that sib, and thus, the
reconstruction and sharing results, will depend on the order of sibs
in the input files. The limit_build
parameter can be used to
prevent this, at the cost of losing some information for small
families with untyped parents. Alternatively, the best_order
parameter specifies that sibs should be sorted by number of typed
loci. The ``first sib'' will be the one with fewest untyped loci,
which may improve reconstructions if many subjects have incomplete
data. This also affects the count_once
option: all pairs will be
constructed using the most completely typed sib.
The lod score at a particular probe position is determined by finding the nearest informative markers flanking the probe on either side, for both the maternal and paternal alleles. The probability of the given marker data assuming the probe is in a particular IBD state is calculated from the known IBD states of these flanking alleles, and the recombination distances to the probe. These probabilities are then used to determine the odds of the marker data, given a disease gene of known penetrance at the probe locus, versus the odds of the marker data if the probe has zero penetrance.
In a family with more than two affected siblings, each sibling's
marker data will contribute to several sib pairs, as each pair is
counted independently. While this does not bias the observed sharing
results, it does mean that data from large families will carry more
weight than data from small families. This effect can be controlled
with the count_once
parameter.
The sib_ibd
program can use gender-specific recombination maps
for calculation of LOD scores, if available. In this case, distances
should be specified with the dist_xx
and dist_xy
parameters.
It is also possible to request separate LOD scores for maternal and
paternal sharing contributions, with the sex_split
parameter.
These LOD scores are only independent (and additive) if a
``multiplicative'' disease model is used. If most_likely
is
specified along with sex_split
, then the maternal and paternal
scores are reported using the overall maximum-likelihood model.
sib_ibd
[-v
]
[-c
cmd]
[-f
file]
[marker_data ...]
[>
ibd_data]
One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The ibd listing will be sent to standard output.
The following parameters should be specified using TCL commands via
either the -c
or -f
mechanisms:
This parameter restricts parent reconstructions to only use sibs that are not part of counted pairs. The default is false, meaning that the first sib in a pair will also be used for reconstructions.
This indicates that sibs should be sorted using the number of typed
loci. The internal ordering of sibs within a family determines how
parent reconstructions are done, and determines the choice of pairs if
count_once
is used. The default is false, meaning the order of
sibs in the input data is preserved.
This parameter specifies if separate LOD scores should be calculated for paternal and maternal sharing.
A boolean value: if set, then posterior probabilities of IBD=0,1,2 for each location are reported for each sib pair.
Here is a sample parameter file for sib_ibd
:
set nloc 4
set loc { "l1" "l2" "l3" "l4" }
set dist { 0.1 0.1 0.1 0.1 0.1 }
set blank "00"
set fid_width 6
set discard_partial true
set risk 2.0
set z "[expr 0.25/$risk] 0.50 [expr 0.50-0.25/$risk]"
set mapping Haldane
In this example, the identical-by-descent probabilities are derived from a simple sibling recurrence risk ratio assuming an additive model.
The sib_ibd
program will automatically reorder markers based on
the map in the loc
parameter. The order in which markers are
given in the input file (or files) is not important. Families do not
need to be present in all input files, however, if a family is
present, all family members must be present and in the same order in
each case.
The default output is a summary of sharing at each locus, broken down
by parent. The markers in the output file will be sorted into their
proper order along the chromosome, as specified in the loc
parameter. A chi-squared statistic for the observed sharing, with one
degree of freedom, is calculated for each marker, along with a
multipoint likelihood.
In some cases, sib_ibd
can determine that for a particular sib
pair, one allele is shared and one is not, but there is no way of
knowing if it is the maternal allele or the paternal allele. These
cases will be included in the ``Combined'' sharing table.
If most_likely
is enabled, then sib_ibd
will calculate
a maximum likelihood estimate of the sharing at each marker position
In this case, the table will also include maximum likelihood estimates
of the z0, z1, and z2 values (for the specified model); the
corresponding percent sharing; and the multipoint LOD score, for each
marker position.
If verbose (-v
) output is selected, a multipoint exclusion map
will be generated, with lines like:
[offset] [lod score]
where [offset]
is a genetic recombination distance, and
[lod score]
is the log likelihood ratio of the marker data
for a disease gene at this position with the given sibling recurrence
risk ratio, versus a sibling risk ratio of 1.
If most_likely
is enabled, the verbose output is like:
[offset] [sharing] [z0] [z1] [z2] [mlod score]
where [sharing]
and [mlod score]
are the percent sharing and
lod score obtained by the maximum likelihood calculation, and z0
,
z1
, and z2
are the maximum likelihood model parameters.
If sex_split
is enabled, then instead of reporting a single LOD
score at each marker or map position, three LOD scores are reported:
the partial paternal and maternal scores, as well as the regular
combined score. If separate paternal and maternal maps are available,
then both map positions will be included in the map output, like:
[male map] [female map] [sharing] [z0] [z1] [z2] [mlod]
or if sex_split
is set:
[male map] [female map] [sharing] [z0] [z1] [z2] [pat] [mat] [mlod]
If very verbose (-vv
) output is requested, then haplotype data
and the raw IBD results will be displayed. For each sibling
in a family, there will be a line like:
[fid] [pid] [a1f] [a1m] [a2f] [a2m] ...
where [fid]
is the family ID, [pid]
is the person
ID of a child, [a1f]
is the allele at position 1 inherited
from the father, and [a1m]
is the allele inherited from the
mother. The alleles are given as either ``ND'' for not determined,
``NC'' for not consistent, or the allele name from the input data.
Following the genotype data for a family will be IBD results for each sib pair, of the form:
[fid] [pid1] [pid2] [ibd1f] [ibd1m] [ibd2f] [ibd2m] ...
where [fid]
is the family ID, [pid1]
and [pid2]
are the person ID's of two siblings,
[ibd1f]
is the ibd result for marker 1 through the father,
and [ibd1m]
is the same through the mother. The ibd result
is either ``0'' for not informative, ``-'' for no match, or ``+'' for
a match. When sib_ibd
determines that one allele is
shared (but cannot determine which it is), the ibd result will be
listed as ``?''.