Next Previous Contents

8. The sib_map program

The sib_map program generates two-point and multipoint maximum likelihood estimates of the map distances between markers, based on marker data from nuclear families. It will also determine support intervals. Allele frequencies will be used to reconstruct missing parents. All sibling marker data is used to determine likelihoods, without ever being broken down into sib pairs.

The sib_map program uses the same likelihood engine used in sib_phase, so it also has a strict limit of 16 siblings per family.

The multipoint maximization algorithm determines the complete set of distances that gives the global maximum likelihood for the marker data across all markers. The two-point algorithm considers each pair of adjacent markers separately, ignoring information from more distant markers.

The support intervals calculated for multipoint distance estimates assume that all other distances are fixed at their maximum likelihood positions while one target interval is varied. In a future version, we may implement a method which will optimize the other distances as the target interval is varied.

If error_freq is non-zero, then sib_map will attempt to identify likely typing errors using provisional distance maps using the multipoint algorithm. The map is recalculated iteratively until no new errors are discovered.

By default, sib_map calculates a single sex-averaged map. If sex_split is enabled, then separate maps are calculated based on the observed paternal and maternal recombination rates.

In a second mode of operation, sib_map calculates three-point distances for one marker against all other pairs of adjacent markers along a map. This data can be used to verify map orders, or to position new markers on an already determined map. This mode always uses sex-averaged distances.

8.1 Command syntax

sib_map [-v] [-c cmd] [-f file] [marker_data ...] [> map_data]

One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The map(s) will be written to standard output.

8.2 TCL parameters

The following additional parameters can be specified using TCL commands via either the -c or -f mechanisms:

support

Specifies the LOD score cutoff for support intervals. The default is 1.0 log units, which means that support intervals will cover the range of distances that give LOD scores within 1.0 of the LOD score of the most likely distance. The default is 0.6 LOD units.

epsilon

Specifies the convergence criterion for the iterative distance refinement routines. The default is 0.00001 Morgans.

do_shuffle

A boolean value: specifies if sib_map should perform the map shuffling function, to verify map order and/or place new markers on an existing map. The default is false.

sex_split

A boolean value: if set, then separate paternal and maternal recombination maps will be calculated. The default is false.

show_freqs

A boolean value: if set, then allele frequencies are reported.

Here is a sample parameter file for sib_map:

set nloc 4
set loc { "M1" "M2" "M3" "M4" }
set blank "0"
set mapping Kosambi

8.3 Output

The normal output of sib_map consists of the two-point and multipoint distances between each pair of adjacent markers, and the corresponding support intervals. If two markers are determined to be unlinked, their distance will be reported as ``[inf]'' (for ``infinity''). In this case, the support interval will give a lower bound on the most likely distance between the pair. A LOD score is reported for each interval, giving the likelihood for the most likely distance, compared to the likelihood of the two markers being unlinked.

From simulations, we estimate that for support levels of 0.2, 0.6, and 0.8 LOD units, the true distance should be within the support interval about 70%, 90%, and 95% of the time, respectively. These are not strict confidence intervals, however, so these probabilities should be used only as rough guidelines.

If verbose (-v) output is selected, then tables of LOD scores versus distance for each marker pair will also be generated.

If do_shuffle is true, then the output is, for each marker, a table giving three-point distance estimates for that marker with every other pair of adjacent markers along the map. The total distance spanning the three markers, and the corresponding LOD score, is generated for all possible orders of markers (XAB, AXB, ABX). Thus, a comparison of the LOD scores indicates where the test marker is likely to be in relation to the pair.

Following the distances and LOD scores, sib_map will print one of several symbols based on a comparison of the LOD scores. If the test marker appears to be to the left of the specified pair, then ``<'' or ``<<'' will be printed: the number of arrows indicates that the LOD score difference exceeds that number times the value of support. Similarly, ``>'' or ``>>'' will be printed if the marker is to the right of the pair. ``+'' or ``++'' will be printed if the marker is most likely to be between the specified pair. If the map order is correct and well supported by the data, the symbols for each marker should show a pattern of ``>'' rows, then one ``+'' row, then ``<'' rows, as the marker is shuffled through its true position. To use the output to position new loci on a map, in the parameter file for sib_map, list the new loci first in the map, followed by all the already-mapped loci in their proper order.


Next Previous Contents