next up previous contents
Next: The LogDelLocalRefine Function Up: Matching Routines Previous: The GlobalAlign Function

The LocalAlign Function

When two sequences align well except at the ends, it is sometimes desirable to ignore these tails and align the two subsequences (subsets of the original sequences) which align best. The alignment from the previous subsection between the tyrosine protein kinases ABL1_CAEEL (C. Elegans) and ABL2_HUMAN (Humans) is extremely poor at the beginning. Only three bases align before an extremely long gap which is interrupted by a five base alignment before yet another long gap.

The LocalAlign function will perform such a subsequence alignment or local alignment. It is an implementation of the classic Smith-Waterman algorithm [25], a straightforward variant of dynamic programming with some nice properties making it extremely fast.

> DB:=ReadDb('Sample/SH2');
> CreateDayMatrices();                  # calculate matrix DM
> m1 := Match(op(Sequence(Entry(1))), op(Sequence(Entry(2)))):
> Glob_m1 := GlobalAlign(m1, DM);
Glob_m1 := Match(1127.6,367,1338,492,618,250)
> Loc_m1 := LocalAlign(m1,DM);
Loc_m1 := Match(1330.5,378,1477,481,479,250)
> print(");

lengths=481,479 simil=1330.5, PAM_dist=250, offsets=378,1477,
  identity=52.8%, similarity=29.3%
ID=ABL1_CAEEL   AC=P03949;   DE=TYROSINE-PROTEIN KINASE ABL-1 (EC 2.7.1.112)
(FRAGMENT).   OS=CAENORHABDITIS ELEGANS.   
ID=ABL2_HUMAN   AC=P42684;   DE=TYROSINE-PROTEIN KINASE ABL2 (EC 2.7.1.112)
(TYROSINE KINASE ARG).   OS=HOMO SAPIENS (HUMAN).   
TRKNDASNQRRLGEIGWVPSNFIAPYNSLDKYTWYHGKISRSDSEAILGSGITGSFLVRESETSIGQYTISVRHDGRVFH
::::! |:.|..:..||||||!|:|.|||!|::||||.!|||.:|.!|:| |:|||||||||:| ||.:||:|:!|||!|
NQNGEWSEVRSKNGQGWVPSNYITPVNSLEKHSWYHGPVSRSAAEYLLSSLINGSFLVRESESSPGQLSISLRYEGRVYH

YRINVDNTEKMFITQEVKFRTLGELVHHHSVHADGLICLLMYPASKKDKGRGLFSLSPNAPDEWELDRSEIIMHNKLGGG
||||:....|:!!|.|.!|.||:|||||||:.||||!..|.|||:| :|.. :!::|| ..|:||!!|:!|.|::|||||
YRINTTADGKVYVTAESRFSTLAELVHHHSTVADGLVTTLHYPAPKCNKPT_VYGVSP_IHDKWEMERTDITMKHKLGGG

QYGDVYEGYWKRHDCTIAVKALKEDAMPLHEFLAEAAIMKDLHHKNLVRLLGVCTHEAPFYIITEFMCNGNLLEYLRRTD
|||!||.|.||!::.|!|||:||||:|.::|||.|||!||!!:|.|||:||||||.|:||||!||!| .||||!|||:.:
QYGEVYVGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTLEPPFYIVTEYMPYGNLLDYLRECN

KSLLPPIILVQMASQIASGMSYLEARHFIHRDLAARNCLVSEHNIVKIADFGLARFMKEDTYTAHAGAKFPIKWTAPEGL
!: :::!!|:.||:||:|:|:|||.!:|||||||||||||:|::!||!|||||:|:|:.|||||||||||||||||||:|
REEVTAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHVVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESL

AFNTFSSKSDVWAFGVLLWEIATYGMAPYPGVELSNVYGLLENGFRMDGPQGCPPSVYRLMLQCWNWSPSDRPRFRDIHF
|!||||.|||||||||||||||||||:||||!!||:||:|||:|!||!.|:||||:||:||..||:|||:|||.|.!.| 
AYNTFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYDLLEKGYRMEQPEGCPPKVYELMRACWKWSPADRPSFAETHQ

NLENLISSNSLNDEVQKQLKKNNDKKLESDKRRSNVRERSDSKSRHSSHHDRDRDRESLHSRNSNPEIPNRSFIRTDDSV
.:|:!:.::|!:!||.::|.!..:::........ .. .|::!:.:::.::!!: :.:.::.:::::....:|||..::.
AFETMFHDSSISEEVAEELGRAASSSSVVPYLPRLPILPSKTRTLKKQVENKENIEGAQDATENSASSLAPGFIRGAQAS

S
|
S

The similarity score has climbed by more than 200 points (it is now 10133.05 more likely these sequences share a common ancestor than being simply a random alignment). Comparing the two alignments, one can see that the first two gaps (plus three extra bases of low quality alignment) have been removed. The two gaps in the orginal alignment created with GlobalAlign where of lengths 91 and 37respectively. These contributed

-19.814-1.396*(91-1) + -19.814-1.396*(37-1) = -215.524

to the overall score.


next up previous contents
Next: The LogDelLocalRefine Function Up: Matching Routines Previous: The GlobalAlign Function
Gaston Gonnet
1998-09-15