The ASPEX package: affected sib-pair exclusion mapping: The sib

5. The sib_ibd program

This program reads a file of genotype data for nuclear families consisting of two parents and two or more ``affected'' siblings. It reports identity-by-descent information for all affected sibling pairs.

The program works by first trying to identify the parental origin of each sibling allele. At a given locus, for each parent, the program compares the corresponding alleles for each member of an affected sibling pair. If both alleles can be uniquely identified, the pair is scored as either identical or non-identical by descent from that parent at that position. If the match is ambiguous, the position is scored as uninformative.

If marker data from additional siblings is available, it will be used to reconstruct missing parents. The reconstruction will depend on the settings of the count_discordant, count_unaffected, and count_once parameters. When scoring a given sib pair, the first sib in that pair is used for reconstruction. Also, any sibs that are not part of any counted pair will be used. In the common case of concordant affected sib pairs, this means that one affected and all unaffected sibs will be used for reconstruction. Be aware that the IBD results for different types of sib pairs (affected, unaffected, discordant) cannot be simply added to get results for all pairs, due to the different reconstruction criteria used in each case.

A side effect of the use of the first sib from each pair in parent reconstruction is that the the choice of that sib, and thus, the reconstruction and sharing results, will depend on the order of sibs in the input files. The limit_build parameter can be used to prevent this, at the cost of losing some information for small families with untyped parents. Alternatively, the best_order parameter specifies that sibs should be sorted by number of typed loci. The ``first sib'' will be the one with fewest untyped loci, which may improve reconstructions if many subjects have incomplete data. This also affects the count_once option: all pairs will be constructed using the most completely typed sib.

The lod score at a particular probe position is determined by finding the nearest informative markers flanking the probe on either side, for both the maternal and paternal alleles. The probability of the given marker data assuming the probe is in a particular IBD state is calculated from the known IBD states of these flanking alleles, and the recombination distances to the probe. These probabilities are then used to determine the odds of the marker data, given a disease gene of known penetrance at the probe locus, versus the odds of the marker data if the probe has zero penetrance.

In a family with more than two affected siblings, each sibling's marker data will contribute to several sib pairs, as each pair is counted independently. While this does not bias the observed sharing results, it does mean that data from large families will carry more weight than data from small families. This effect can be controlled with the count_once parameter.

The sib_ibd program can use gender-specific recombination maps for calculation of LOD scores, if available. In this case, distances should be specified with the dist_xx and dist_xy parameters. It is also possible to request separate LOD scores for maternal and paternal sharing contributions, with the sex_split parameter. These LOD scores are only independent (and additive) if a ``multiplicative'' disease model is used. If most_likely is specified along with sex_split, then the maternal and paternal scores are reported using the overall maximum-likelihood model.

5.1 Command syntax

sib_ibd [-v] [-c cmd] [-f file] [marker_data ...] [> ibd_data]

One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The ibd listing will be sent to standard output.

5.2 TCL parameters

The following parameters should be specified using TCL commands via either the -c or -f mechanisms:

limit_build

This parameter restricts parent reconstructions to only use sibs that are not part of counted pairs. The default is false, meaning that the first sib in a pair will also be used for reconstructions.

best_order

This indicates that sibs should be sorted using the number of typed loci. The internal ordering of sibs within a family determines how parent reconstructions are done, and determines the choice of pairs if count_once is used. The default is false, meaning the order of sibs in the input data is preserved.

sex_split

This parameter specifies if separate LOD scores should be calculated for paternal and maternal sharing.

show_pairs

A boolean value: if set, then posterior probabilities of IBD=0,1,2 for each location are reported for each sib pair.

Here is a sample parameter file for sib_ibd:


set nloc 4
set loc { "l1" "l2" "l3" "l4" }
set dist { 0.1 0.1 0.1 0.1 0.1 }
set blank "00"
set fid_width 6
set discard_partial true
set risk 2.0
set z "[expr 0.25/$risk] 0.50 [expr 0.50-0.25/$risk]"
set mapping Haldane

In this example, the identical-by-descent probabilities are derived from a simple sibling recurrence risk ratio assuming an additive model.

The sib_ibd program will automatically reorder markers based on the map in the loc parameter. The order in which markers are given in the input file (or files) is not important. Families do not need to be present in all input files, however, if a family is present, all family members must be present and in the same order in each case.

5.3 Output

The default output is a summary of sharing at each locus, broken down by parent. The markers in the output file will be sorted into their proper order along the chromosome, as specified in the loc parameter. A chi-squared statistic for the observed sharing, with one degree of freedom, is calculated for each marker, along with a multipoint likelihood.

In some cases, sib_ibd can determine that for a particular sib pair, one allele is shared and one is not, but there is no way of knowing if it is the maternal allele or the paternal allele. These cases will be included in the ``Combined'' sharing table.

If most_likely is enabled, then sib_ibd will calculate a maximum likelihood estimate of the sharing at each marker position In this case, the table will also include maximum likelihood estimates of the z0, z1, and z2 values (for the specified model); the corresponding percent sharing; and the multipoint LOD score, for each marker position.

If verbose (-v) output is selected, a multipoint exclusion map will be generated, with lines like:


[offset] [lod score]

where [offset] is a genetic recombination distance, and [lod score] is the log likelihood ratio of the marker data for a disease gene at this position with the given sibling recurrence risk ratio, versus a sibling risk ratio of 1.

If most_likely is enabled, the verbose output is like:


[offset] [sharing] [z0] [z1] [z2] [mlod score]

where [sharing] and [mlod score] are the percent sharing and lod score obtained by the maximum likelihood calculation, and z0, z1, and z2 are the maximum likelihood model parameters.

If sex_split is enabled, then instead of reporting a single LOD score at each marker or map position, three LOD scores are reported: the partial paternal and maternal scores, as well as the regular combined score. If separate paternal and maternal maps are available, then both map positions will be included in the map output, like:


[male map] [female map] [sharing] [z0] [z1] [z2] [mlod]

or if sex_split is set:


[male map] [female map] [sharing] [z0] [z1] [z2] [pat] [mat] [mlod]

If very verbose (-vv) output is requested, then haplotype data and the raw IBD results will be displayed. For each sibling in a family, there will be a line like:


[fid] [pid] [a1f] [a1m] [a2f] [a2m] ...

where [fid] is the family ID, [pid] is the person ID of a child, [a1f] is the allele at position 1 inherited from the father, and [a1m] is the allele inherited from the mother. The alleles are given as either ``ND'' for not determined, ``NC'' for not consistent, or the allele name from the input data.

Following the genotype data for a family will be IBD results for each sib pair, of the form:


[fid] [pid1] [pid2] [ibd1f] [ibd1m] [ibd2f] [ibd2m] ...

where [fid] is the family ID, [pid1] and [pid2] are the person ID's of two siblings, [ibd1f] is the ibd result for marker 1 through the father, and [ibd1m] is the same through the mother. The ibd result is either ``0'' for not informative, ``-'' for no match, or ``+'' for a match. When sib_ibd determines that one allele is shared (but cannot determine which it is), the ibd result will be listed as ``?''.

Next Previous Contents