At the top of our tower lies the secondary structure prediction. This, in some sense, is the ultimate goal for our system and the end of the bioinformatist's job. Our secondary structure predictions are based on multiple sequence alignments. These alignments indicate conserved/non-conserved areas of a protein and highlight the different structural units such as alpha helices and beta sheets. Of course, this implies that the accuracy of our structure predictions are dependent on the accuracy of our multiple sequence alignments. In turn, the accuracy of our alignments is dependent on how accurate our phylogenetic trees represent the true ancestral relationships between the species from which the sequences are taken. Our phylogenetic trees are constructed from the pairwise distances and variances deriven from the pairwise comparison of protein sequences. And, at the bottom of our tower, the protein sequences are extracted from the raw DNA or RNA supplied to us by the biochemist.
Of course, any mistake at any level of this tower percolates upwards. But, conversely, any improvement to an algorithm does too.
We do not claim that the solutions present herein are the only or the best way to go about solving any particular bioinformatics problem. The algorithms we have choosen to include in the Darwin libraries have strong arguments, both mathematical and biological, suggesting they will perform well in practice. However, there are other methods (requiring possibly unrealistic resource demands) that may be more pertinent to your particular situation and data. The strength of Darwin lies in the fact that any method (assuming it is algorithmic) can be programmed in the language.
Each of the following chapters contains:
Beyond the understanding of the Darwin libraries, we hope such a presentation gives users