Professor Gaston Gonnet
Dr. Gina Cannarozzi
Computational Biochemistry Research Group, E.T.H. Zürich
Ch-8092 Zürich, Switzerland
gonnet,cannaroz@inf.ethz.ch
Email Gina cannaroz@inf.ethz.ch if you have questions or want to be added to the mailing list for the class.
Books No books will be required but much of the material will
come from CBRG papers and the Darwin
Manual.
We will also follow closely some chapters from the book by Dan Gusfield
entitled Algorithms on Strings, Trees, and Sequences . An introduction
to biology can be found in the Cartoon Guide to Genetics by Larry
Gonick.
Core computer science/mathematics topics covered in the course:
Week 1 (Oct. 26, 2000) - Basic Biology
Introduction to Bioinformatics. DNA, RNA, Amino acid sequences, genes, biochemical pathways, genomes. Mutation. Insertion. Deletion. Duplication. Reversion. Speciation. Phylogenetic (evolutionary trees).
Uebung: Web Resources
Week 2 (Nov. 2, 2000) - Approximate String Alignment I
Longest Common Subsequence. Dynamic programming: (1) Recurrence Relations, (2) Tabular Computation (3) Traceback. Alignment. Similarity Scores. Constant Gap Penalty. Cost-free End spaces.
Week 3 (Nov. 9, 2000) - Approximate String Alignment continued
Local Alignment. Affine Gap Penalties. Arbitrary Gap Penalties. Linear Space. FASTA/BLAST. Note on 4 Russians Speedup.
Week 4 (Nov. 16, 2000) - Approximate String Alignment continued
Construction of Mutation Matrices. Empirical Evidence Supporting Gap
Penalties. Maximum Likelihood. Darwin Techniques for Aligning.
Week
4 notes
Week 5 (Nov. 23, 2000) - Measuring Genetic Change - Suffix Trees
Jukes-Cantor, Kimura's 2 Parameter Model, Felsenstein 81, REV model,
Gamma Parameters. Discussion of genetic databases - rates of growth, nature
of the data. P against All. Threshold All against All. Introduction to
Suffix Trees. Linear time construction of Suffix trees.
Week
5 notes
Week 6 (Nov. 30, 2000) - Phylogenetic Trees I: Introduction
Basic Introduction to Phylogenetic Trees. History. Tree Definitions.
Optimality Criteria. Fitch-Wagner Parsimony. Dollo Parsimony (Character
Compatibility issues). Generalized Parsimony (Sankoff's Algorithm). Felsenstein's
Proof of Parsimony's Statistical Inconsistency.
Week
6 notes
Week 7 (Dec. 7, 2000) - Phylogenetic Trees II: Parsimony and Introduction to Maximum Likelihood
Parsimony with DNA sequences. Fitch's Algorithm. Dayhoff and Eck's Algorithm. Assumptions of Parsimony. Inconsistency Revisited. Introduction to Maximum Likelihood. Inverting Markov Processes. Consistency. Different Likelihood Models. Calculating the Likelihood of a Tree. Negative Misleadingness.
Week 8 (Dec. 14, 2000)- Phylogenetic Trees III: Maximum Likelihood and Distance Based Methods
Review of Maximum Likelihood. Example of Scoring a Tree. Distance Based Methods. L-norms. Example Using Pairwise Distance Measures. The Gap Problem (Introduction to Multiple Sequence Alignment). Calculation of the Size of the Space.
NOTE: This week will have 3 hours of lecture and no uebung because there will be no class held on Dec. 21.
Week 9 (Dec. 21, 2000) - Phylogenetic Trees IV: Initial Topologies, Fitting Distances to Trees
Note on Hardness of All Three Techniques. Ultrametrics. Additivity. Initial Topologies. Cluster Methods. Neighbour Joining. Note of the Factor 3 Approximation. Fitting Distances to Trees.
NOTE: There will be no class or uebung this week. This class is rescheduled to the uebung hour of Dec. 14.
Week 10 (Jan 11, 2001) - Multiple Sequence Alignment I
Basic Introduction. Notion of Scoring an alignment: Sum of Squares. k- dimensional dynamic Programming. Criticism of Sum of Squares.
Week 11 (Jan 18, 2001) - Multiple Sequence Alignment II
Circular Order Scoring of Multiple Sequence Alignments. Two Reasons Why a Tree is Important for Building a MSA. Building a MSA given a tree. A Note of Practical Algorithms for this Problem. Definition of Protein Families. Characterizing Protein Families. Sequence vs. Family Comparison.
Week 12 (Jan 25, 2000) Identifying Proteins from Molecular Weights with Mass Spectrometers.
2D gels. Mass Spectrometers. Applications. Aligning Weights against a Sequence Database. Problems: Error, Too Many Matches, No Order Information, AA Weights Not Unique. Maximum Likelihood Formulation. Generating Functions. Experimental results.
Week 13 (Feb 1, 2000) Secondary Structure and Tertiary Prediction, Long Distance Homologies.
Overview of Common Experimental Approaches- NMR, X-ray Crystallography vs "in silico predictions". Secondary structure- Alpha Helices, Beta Strands, Sheets, Turns. Super Secondary Structure. Enzymes. SAINT: Predicting Parses, Active Sites, Surface/Interior Residues, Secondary Structure. A note on PHD (neural networks), nnpredict, SOPMA. Tertiary Structure, Swiss Model, Topit (long Distance Homologs) Dali.
Week 14 (Feb 8, 2000) Secondary Structure and Tertiary Prediction, Long Distance Homologies. II
Gina Cannarozzi