next up previous
Next: Multiple sequence alignment. Up: Definitions Previous: Dayhoff matrices.

Sequence alignment.

Aligning sequences is the process of associating some positions of each sequence with a position of the other sequence. This association preserves the order of the sequences. E.g.
 VNRLQQNIVSL____________EVDHKVANYKPQVEPFGHGPIFMATALVPGLYLGVPWF
 VNRLQQSIVSLRDAFNDGTKLLEELDHRVLNYKPQANPFGNGPIFMVTAIVPGLHLGAPWF
Unassociated positions are called insertions (or their counterpart deletions). Aligning protein sequences by dynamic programming (DP) using Dayhoff matrices is equivalent to finding the alignment which maximizes the probability that the two sequences evolved from an ancestral sequence as opposed to being random sequences. More precisely, we are comparing two events

a)
that the two sequences are independent of each other, and hence an arbitrary position with amino acid i aligned to another arbitrary position with amino acid j has the probability equal to the product of the individual frequencies

\begin{displaymath}Pr \{ \mbox{$i$\space and $j$\space are independent} \} = f_i f_j \end{displaymath}

b)
that the two sequences have evolved from some common ancestral sequence after t units of evolution.

\begin{eqnarray*}Pr \{ \mbox{$i$\space and $j$\space descended from $x$ } \}
& =...
..._{ix} (M^t)_{xj} \\
& = & f_j (M^{2t})_{ij} = f_i (M^{2t})_{ji}
\end{eqnarray*}


We use $\sum_i$ as a shorthand for $\sum_{i \in \Sigma}$, that is a sum over all symbols of the alphabet. The entries of the Dayhoff matrix are the logarithm of the quotient of these two probabilities.

\begin{displaymath}D_{ij} = 10 \log_{10} \left ( \frac
{ Pr \{ \mbox{$i$\space a...
...{ \mbox{$i$\space and $j$\space are independent} \} } \right )
\end{displaymath}

Since DP maximizes the sum of the similarity measure, DP maximizes the sum of the logarithms or maximizes the product of these quotients of probabilities. As a conclusion, DP finds the alignment which maximizes the probability of having evolved from a common ancestor (a maximum likelihood alignment) against the null hypothesis of being independent. This makes aligning sequences using Dayhoff matrices a soundly based algorithm.


next up previous
Next: Multiple sequence alignment. Up: Definitions Previous: Dayhoff matrices.
Gaston Gonnet
1998-07-14