next up previous
Next: Bibliography Up: Tree construction using gaps Previous: Tree construction using gaps

Resolving conflicts

Usuall we do not have any guarantee that our MSA is correct. Hence we don't know if all the gaps appear at the right places. This leads to conflicts: assume that the gap in sequence B of event 4 in Figure [*] is shifted to the left by e.g. 3 positions. Let us consider events 1 and 4 only. Because of event 1 we conclude that sequences $\{A, B\}$ and $\{C, D, E\}$ are grouped together. But at the same time sequences $\{A, C, D\}$ and $\{B, E\}$ should also be grouped together, due to event 4, which leads to a contradiction. Generally we define a conflict as follows:

Definition 4.1 (Conflict)   Given an MSA $\mbox{\rm A}= \langle a_1, .., a_n\rangle$ and two indel events (gap blocks) that split the sequences into the sets A1, B1 (event 1) and A2, B2 (event 2). Then event 1 and event 2 have a conflict, if $A_1 \not \subseteq
A_2$, $B_1 \not \subseteq A_2$, $A_1 \not \subseteq B_2$ and $B_1
\not \subseteq B_2$ (see Figure [*]).

Finding the minimum number of events to ignore such that we have no conflicts is called the "vertex cover problem" in graph theory.

Problem 4.1 (Vertex Cover)   Given is a graph G = (V, E) with vertices V and edges E. The problem is to find the smallest subset $V' \in V$ such that for each edge $(a, b) \in G: a \in V'$ or $b \in V'$.

In our case each indel event (or gap block) is a vertex, and whenever there is a conflict between two events an edge is drawn that connects the two vertices. A minimum vertex cover corresponds to a minimum number of events such that the MSA ignoring those events is conflict free. Once all conflicts are removed, we can build an evolutionary tree from the remaining events (if we have enough events).
  
Figure: Gap conflict graph: each vertex corresponds to an indel event, each edge means a conflict between two events. Here an optimal solution is to remove vertices 5, 6 and 7.
\begin{figure}
\vspace{-0.5cm}
\begin{center}
\mbox{\psfig{file=gaps/graph1.ps,height=0.18\textheight,angle=-90} }
\end{center}
\end{figure}

The problem of finding the minimum vertex cover is NP-complete [25]. But usually the number of indel events is small (smaller than 50), when the size of the MSA is restricted to the size of a domain (which is usually at most 70 amino acids long). Vertex cover problems of this magnitude can be readily solved, there is a fixed parameter tractable algorithm [27] to solve the problem deterministically, when parameterized by the size of the vertex cover [26,29]. With this algorithm, conflict graphs with minimum vertex covers of size larger than 150 can be resolved in reasonable time. A more detailed version of the paper can be obtained at
http://www.inf.ethz.ch/personal/korosten/papers/gaps.ps. All algorithms described here have been implemented into the Darwin system [28] and are available upon request. The programs can also be used via our computational biochemistry server: http://cbrg.ethz.ch.
next up previous
Next: Bibliography Up: Tree construction using gaps Previous: Tree construction using gaps
Chantal Korostensky
1999-07-14