Improvements to FASTA
-
The distribution of similarity scores with ln n are used to
scale similarity scores and derive accurate expectation
values. Expectation values for the highest scoring unrelated sequences
typically range from 0.5 - 2.0.
-
Optimized scores are calculated for about half of the library
sequences. Searches with ktup=1 use a band-width of 32;
ktup=2 uses a width of 16. DNA sequences use a width of 16.
This calculation improves the performance of FASTA significantly but
slows the program about 2-fold.
-
The BLOSUM50 matrix is used for protein sequence comparison by
default. Gap penalties are -12, -2 for FASTA and SSEARCH, -14, -4 for
LALIGN, -16, -4 for TFASTA. The DNA scoring matrix is the same as
that used by BLASTN (+5, -4); -16, -4 is the default gap
penalty. (Gap penalties can be changed on the command line.)
-
Final alignments are produced with the Smith-Waterman algorithm with
no limits on gaps.
wrp@virginia.edu