next up previous contents
Next: Gap Probabilities and Penalties Up: Darwin and Problems from Biochemistry Previous: Other Similarity Matrices

   
Insertions and Deletions

Mutations occur at the DNA level and through the mechanism of gene expression these changes may manifest themselves in the protein encoded by the gene. The commonly used model for mutations at the DNA level partitions events into three categories:
1.
Point accepted mutations. Base i mutates to base j.
2.
Insertions. A subsequence of DNA is inserted into a sequence.
3.
Deletions. A subsequence of DNA is deleted from a sequence.

In the simplest theoretical model, all sequence mutations are explained as deletions and insertions of genetic material. A point accepted mutation corresponds to a deletion of a base followed by the insertion of a base at the same position in the sequence. Gene shuffling can be viewed as a series of deletion and insertion sequences. However, such a simplistic model does not take into account the relative probabilities of such events occuring. Intuitively, the probability of a single point mutation seems as though it would be much higher than the probability of an deletion event followed by an insertion event at the same position in the sequence. For this reason, we distinguish between point mutations and insertion/deletion events (indels).

As is the case with our model for point mutations, our model for deletions and insertions is entirely symmetric. Given an alignment of two sequences A, Bcontaining at least one insertion or deletion event such as the following:

    Sequence A:  ALAEGLGVIACIGEKLDEREAGITEKVVFEQTKVIADNVKDW
    Sequence B:  CKNLGLETIVCTNN______________INTSKAVAALSPDY
we can not determine whether sequence A has undergone an insertion or whether sequence B has undergone a deletion unless we know the ancestor of A and B. Because we can not resolve this issue without an assumed origin, we treat both events in the same fashion.



 
next up previous contents
Next: Gap Probabilities and Penalties Up: Darwin and Problems from Biochemistry Previous: Other Similarity Matrices
Gaston Gonnet
1998-09-15