RESOURCES
Supplementary Readings
Chapter 1: BLAST and Karlin-Altschul Statistics
- A Model of Evolutionary Change in Proteins
- Identification of Common Molecular Subsequences
- Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase
- Basic Local Alignment Search Tool
- Amino Acid Substitution Matrices from an Information Theoretic Perspective
- Applications and statistics for multiple high-scoring segments in molecular sequences
- On-Line Construction of Suffix Trees
- BLAT—The BLAST-Like Alignment Tool
- BLAST Program Selection Guide
Chapter 2: Genome Assembly and Lander-Waterman Statistics
- Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis
- A New Algorithm for DNA Sequence Assembly
- The Sequence of the Human Genome
- An Eulerian path approach to DNA fragment assembly
- Whole-genome shotgun assembly and comparison of human genome assemblies
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
- How to apply de Bruijn graphs to genome assembly
- Why are de Bruijn graphs useful for genome assembly?
- Evaluation of the impact of Illumina error correction tools on de novo genome assembly
- Human contamination in bacterial genomes has created thousands of spurious proteins
- Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
Chapter 3: Coalescent Theory and Ancestral Recombination Graphs
- Geneological Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms
- Comparative immunopeptidomics of humans and their pathogens
- Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci
- Mapping Trait Loci by Use of Inferred Ancestral Recombination Graphs
- Approximating the coalescent with recombination
- Genome-Wide Inference of Ancestral Recombination Graphs
- Developments in coalescent theory from single loci to chromosomes
Chapter 4: Hidden Markov Models - The Learning Problem
- Maximum Likelihood from Incomplete Data via the EM Algorithm
- CpG Islands in Vertebrate Genomes
- A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population
- A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
- The EM Algorithm and the Rise of Computational Biology
- What is a hidden Markov model?
- Identification of CpG islands in DNA sequences using statistically optimal null filters
Chapter 5: Clustering Theory and Spectral Clustering
- Gene Expression Clustering with Functional Mixture Models
- Incremental genetic K-means algorithm and its application in gene expression data analysis
- A Tutorial on Spectral Clustering
- Spectral Clustering Based Classification Algorithm for Text Classification
- Spectral clustering via half-quadratic optimization
Chapter 6 Readings: Protein Folding Algorithms
- The Protein Folding Problem
- Introduction to Protein Structure
- Protein Folding Quotes
- Folding on 2D Lattices and the Hart-Istrail Algorithm
- Contact Maps and Skolnick Clustering
- Protein Structure Alignment