37-524 Computational Biology 2000-01

Discussion: IFW E42 Thursday 14:00-15:00
Lecture: IFW E42 Thursday 15:00-17:00

Professor Gaston Gonnet
Dr. Gina Cannarozzi
Computational Biochemistry Research Group, E.T.H. Zürich
Ch-8092 Zürich, Switzerland
gonnet,cannaroz@inf.ethz.ch

Email Gina cannaroz@inf.ethz.ch if you have questions or want to be added to the mailing list for the class.

Books No books will be required but much of the material will come from CBRG papers and the Darwin Manual.
We will also follow closely some chapters from the book by Dan Gusfield entitled Algorithms on Strings, Trees, and Sequences . An introduction to biology can be found in the Cartoon Guide to Genetics by Larry Gonick.

Core computer science/mathematics topics covered in the course:

1.
Basic probability and statistics theory, Markov processes, jackknifing, bootstrapping (models of evolution),
2.
Dynamic programming (string alignment),
3.
Local and global search techniques for problems modelled discretely (phylogenetic tree construction),
4.
Exact algorithms for discrete problems (parsimony, Bunneman trees),
5.
Parameterized algorithms for discrete problems (character compatability),
6.
Basic approximation techniques and heuristic design (tree construction, genome level events),
7.
Fundamental/advanced data structures (patricia and suffix trees),
8.
Numerical techniques (Brent's algorithm, least squares fit, ultrametrics),
9.
Programming with the specialized interpreted language Darwin,
10.
Molecular computing (time permitted).
Biological concepts covered in the course:
1.
Models of evolution,
2.
Aligning an amino acid sequence against amino acid sequences,
3.
Aligning an amino acid sequence against a nucleotide sequence,
4.
Phylogenetic (evolutionary) tree construction,
5.
the use of multiple sequences alignments to predict a proteins secondary structure,
6.
protein domains and protein domain families,
7.
evolutionary events at the genomic level (inversions, duplications, lateral transfers).
8.
Protein identification through mass spectrometry

Week 1 (Oct. 26, 2000) - Basic Biology

Introduction to Bioinformatics. DNA, RNA, Amino acid sequences, genes, biochemical pathways, genomes. Mutation. Insertion. Deletion. Duplication. Reversion. Speciation. Phylogenetic (evolutionary trees).

Uebung: Web Resources


Week 2 (Nov. 2, 2000) - Approximate String Alignment I

Longest Common Subsequence. Dynamic programming: (1) Recurrence Relations, (2) Tabular Computation (3) Traceback. Alignment. Similarity Scores. Constant Gap Penalty. Cost-free End spaces.

Week 2 notes


Week 3 (Nov. 9, 2000) - Approximate String Alignment continued

Local Alignment. Affine Gap Penalties. Arbitrary Gap Penalties. Linear Space. FASTA/BLAST. Note on 4 Russians Speedup.

Week 3 notes


Week 4 (Nov. 16, 2000) - Approximate String Alignment continued

Construction of Mutation Matrices. Empirical Evidence Supporting Gap Penalties. Maximum Likelihood. Darwin Techniques for Aligning.
Week 4 notes


Week 5 (Nov. 23, 2000) - Measuring Genetic Change - Suffix Trees

Jukes-Cantor, Kimura's 2 Parameter Model, Felsenstein 81, REV model, Gamma Parameters. Discussion of genetic databases - rates of growth, nature of the data. P against All. Threshold All against All. Introduction to Suffix Trees. Linear time construction of Suffix trees.
Week 5 notes


Week 6 (Nov. 30, 2000) - Phylogenetic Trees I: Introduction

Basic Introduction to Phylogenetic Trees. History. Tree Definitions. Optimality Criteria. Fitch-Wagner Parsimony. Dollo Parsimony (Character Compatibility issues). Generalized Parsimony (Sankoff's Algorithm). Felsenstein's Proof of Parsimony's Statistical Inconsistency.
Week 6 notes


Week 7 (Dec. 7, 2000) - Phylogenetic Trees II: Parsimony and Introduction to Maximum Likelihood

Parsimony with DNA sequences. Fitch's Algorithm. Dayhoff and Eck's Algorithm. Assumptions of Parsimony. Inconsistency Revisited. Introduction to Maximum Likelihood. Inverting Markov Processes. Consistency. Different Likelihood Models. Calculating the Likelihood of a Tree. Negative Misleadingness.

Week 7 notes


Week 8 (Dec. 14, 2000)- Phylogenetic Trees III: Maximum Likelihood and Distance Based Methods

Review of Maximum Likelihood. Example of Scoring a Tree. Distance Based Methods. L-norms. Example Using Pairwise Distance Measures. The Gap Problem (Introduction to Multiple Sequence Alignment). Calculation of the Size of the Space.

NOTE: This week will have 3 hours of lecture and no uebung because there will be no class held on Dec. 21.

Week 8 notes


Week 9 (Dec. 21, 2000) - Phylogenetic Trees IV: Initial Topologies, Fitting Distances to Trees

Note on Hardness of All Three Techniques. Ultrametrics. Additivity. Initial Topologies. Cluster Methods. Neighbour Joining. Note of the Factor 3 Approximation. Fitting Distances to Trees.

NOTE: There will be no class or uebung this week. This class is rescheduled to the uebung hour of Dec. 14.


Week 10 (Jan 11, 2001) - Multiple Sequence Alignment I

Basic Introduction. Notion of Scoring an alignment: Sum of Squares. k- dimensional dynamic Programming. Criticism of Sum of Squares.

Week 10 notes


Week 11 (Jan 18, 2001) - Multiple Sequence Alignment II

Circular Order Scoring of Multiple Sequence Alignments. Two Reasons Why a Tree is Important for Building a MSA. Building a MSA given a tree. A Note of Practical Algorithms for this Problem. Definition of Protein Families. Characterizing Protein Families. Sequence vs. Family Comparison.

Week 11 notes


Week 12 (Jan 25, 2000) Identifying Proteins from Molecular Weights with Mass Spectrometers.

2D gels. Mass Spectrometers. Applications. Aligning Weights against a Sequence Database. Problems: Error, Too Many Matches, No Order Information, AA Weights Not Unique. Maximum Likelihood Formulation. Generating Functions. Experimental results.

Week 12 notes


Week 13 (Feb 1, 2000) Secondary Structure and Tertiary Prediction, Long Distance Homologies.

Overview of Common Experimental Approaches- NMR, X-ray Crystallography vs "in silico predictions". Secondary structure- Alpha Helices, Beta Strands, Sheets, Turns. Super Secondary Structure. Enzymes. SAINT: Predicting Parses, Active Sites, Surface/Interior Residues, Secondary Structure. A note on PHD (neural networks), nnpredict, SOPMA. Tertiary Structure, Swiss Model, Topit (long Distance Homologs) Dali.

Week 13 notes


Week 14 (Feb 8, 2000) Secondary Structure and Tertiary Prediction, Long Distance Homologies. II

Gina Cannarozzi

2000-10-25