Next Previous Contents

10. The sib_clean program

The sib_clean program is a stripped down version of sib_phase, which simply checks its input data for Mendelian inconsistencies and/or likely typing errors. Its output is a new linkage file that is error-free. In each case where an error is found, the genotype data for all members of the affected family at that marker will be deleted.

10.1 Command syntax

sib_clean [-v] [-c cmd] [-f file] [marker_data ...] [> clean_data]

One or more marker data files can be listed on the command line. If no files are specified, marker data will be read from standard input. The updated, clean linkage data is written to standard output.

10.2 TCL parameters

The following parameters should be specified using TCL commands via either the -c or -f mechanisms:

fixed_freq

A boolean value: if false, indicates that allele frequencies should be estimated from the given marker data. If true, then frequencies should be specified in the parameter file. The default is false.

show_freqs

A boolean value: if set, then allele frequencies are reported.

There is a TCL command, freq, for specifying fixed allele frequencies in the parameter file, when fixed_freq is enabled. The syntax is:

freq [loc] { [a1] [f1] [a2] [f2] ... }

where [loc] is the marker name, [a1] and [f1] are an allele name and its frequency, and so on.

Here is a sample parameter file for sib_clean:

set loc { "l1" "l2" "l3" "l4" }
set blank "0"
set fid_width 3
set pid_width 3
set allele_width 3
set error_freq 0.01
set dist { 0.1 0.1 0.1 0.1 0.1 }
set fixed_freq true
freq "l1" { "1" 0.3 "2" 0.7 }
freq "l2" { "1" 0.5 "2" 0.5 }
freq "l3" { "1" 0.9 "2" 0.1 }
freq "l4" { "1" 0.5 "2" 0.5 }

A distance map only needs to be specified if error_freq is non-zero.

10.3 Output

The output is the input data, in LINKAGE format, with inconsistent data and likely typing errors replaced by blank alleles. As with the other ASPEX programs, all the inconsistencies and likely typing errors are also reported.

Reanalysis of a dataset after cleaning with sib_clean may yield different results in some cases. When estimating allele frequencies from the input data, the ASPEX programs use the raw input data without removing inconsistencies or likely errors, so the cleaned data will yield slightly different allele frequency estimates. The differences in analysis results will generally be very small.


Next Previous Contents