Schedule 2014


date topic notes homework recommended readings
Major Unit. Genomics Foundations: Modeling and Measuring Evolution: Linkage Disequilibrium (LD), Urn Models
Sept. 4 The Human Genome and Variation CSCI 2951-N Overview
SNPs and the Human Genome
Haplotype Blocks and Tagging SNPs
From Human Genome to Clinical Theraphy
Sept. 9 Introduction to Linkage Disequilibrium
R. Lewontin's seminal paper introducing D
M. Slatkin "Linkage Disequilibrium - understanding evolutionary past .."
Sept. 11 Urn Models and the Wright-Fisher Model
The Hardy-Weinberg Model
A comparison paper of most used LD measures
A rigorous review of Statistical Hypothesis Testing
Sir Ronald Fisher's Puzzle "The Lady Tasting Tea"
Major Unit. Algorithms: Graph-Theory Algorithms: Haplotype Phasing I. Clark Methods and the EM Algorithm
Sept. 16 Haplotype Phasing Algorithms: The Clark Method
Andy Clark's seminal 1990 paper introducing the Clark Method
Homework 1
Dan Gusfield's paper proving the NP-completeness of the problem of minimizing the number of unresolved (un-phased) genotypes
Sept. 18 Urn Models, the Infinite Alleles Models, and the Ewens Sampling Lemma
An article presenting Ewens Sampling Lemma's proof and generalizations
Ewens Sanmpling Lemma - several other applications
Sept. 30 The Expectation-Maximization (EM) Algorithm for Haplotype Phasing
The Excoffier-Slatkin Expectation-Maximization (EM) Algorithm paper for Computing Haplotype Frequencies and Haplotype Phasing

Oct. 2 Theorem (E. Hubbell, 2000) Haplotype Parsimony Phasing is NP-complete, and Haplotype Maximum Likelihood Phasing is NP-complete
Major Unit. Algorithms: Graph-Theory Algorithms: Set Cover and Dominating Set: Haplotype Tagging SNPs Algorithms
Oct. 7 Tagging SNPs - The LD-Select Algorithm
Oct. 9 Tagging SNPs - The Informativeness Algorithm
Tagging SNPs and Haplotype Blocks

Homework 2
Oct. 14 Informativeness Algorithm and Haplotype Blocks
Major Unit. Genome-Wide Association Studies: Computational Workflows
Oct. 16 GWAS Issues and Caveats
GWAS: Computational Workflows

GWAS Tutorial on Statistical Methods for Population Genomics Studies

Oct. 23 Computational Workflows for GWAS - Rare variants
Genetic Mapping in Human Disease and GWAS successes
List of GWASs and diseases - to pick from for presentation

Guilt by association
GWAS and assessment of disease risk - an overview of GWAS methodology
GWAS and human disease - a review

Major Unit. Algorithms: Graph-Theory Algorithms: Haplotype Phasing II. Long-Range Phasing and Identical-by-Descent Inference
Oct. 28 deCode Long-Range Haplotype Phasing Algorithm

Oct. 30 The Transmission Disequilibrium Test

Homework 3
Major Unit. Algorithms: Markov Chain Monte-Carlo and the Metropolis Algorithm; application: the STRUCTURE Algorithm for Inference of Population Substructure in a sample of genotyped individuals
Nov. 4 The seminal paper by Jonathan Pritchard, Matthew Stephens, and Peter Donnelly "Inference of Population Structure Using Multilocos Genotype Data" Genetics, (2000). It presents the algorithm STRUCTRE based on the Markov Chain Monte Carlo method for infering population substructure in a sample of genotyped individuals such as a GWAS sample.

The seminal paper by Persi Diaconis "The Markov Chain Monte Carlo Revoultion"; it contains the "Criminal/prison code"and its breaking via an MCMC algorithm as we discussed in class.

"Many basic scientific problems are now routinely solved by simulation: a fancy random walk is performed on the system of interest. Averages computed from the walk give useful answers to formerly intractable problems."
Nov. 6 A tutorial on Markov Chain Monte Carlo Method. In particular it has theory on how long the chains should be run to reach the stationary distribution, the so called "Mixing-Time" theory, which we did not cover in class.

Nov. 11 The original Metropolis et al paper, i.e., Nicholas Metropolis, Ariana Rosenbluth, Marshall Rosenbluth, Augusta Teller, and Edward Teller "Equation of State Calculations for Fast Computing Machines" The Journal of Chemical Physics, 21, 1087 (1953)

Understanding the Metropolis-Hastings Algorithm

Nov. 13 A paper by Nick Metropolis, who coined the name "Monte Carlo" for the method: N. Metropolis "The Beginning of the Monte Carlo Method" Los Alamos Science, Special Isssue 1987, pp. 125-130; he gives a historic introduction of the Monte Carlo Method introduced by two famous mathematicians working at Los Alamos on the Manhattan Project: Stanislaw Ulam and John von Neumann; the method was "reinvented" by Enrico Fermi 15 years before but just as a numerical simulation technique; the mathematical development and foundations are due to Ulam and von Neumann.

Metropilis and MANIAC (one of the first electronic computers)

A perspective on the Metropolis method
Major Unit. The Coalescent: Theory and Applications; The Miniciello-Durbin Ancestral Recombination Graph Inference Algorithm; Hudson Simulator
Nov. 18 The seminal Miniciello-Durbin Ancestral Recombination Graph Algorithm; reference: Mark Miniciello and Richard Durbin, " Mapping Trait Loci by use of Ancestral Recombination Graphs", American Journal of Human Genetics, vol 79, pp. 910-922 (2006)

Nov. 20 A survey by Noah Rosenberg and Magnus Mordborg, "Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphism" Nature Reviews Genetics, 380, vol 3 (2002)

A Tutorial on the Coalescent Theory, Magnus Norburg, "Coalescent Theory" 37 pp (2000)

Homework 4
Stephens et al the PHASE Algorithm paper
The Haplotype Phasing Competition paper
Major Unit. Genome-Wide Association Studies (GWAS): Statistical associations, the missing heritabililty problem, genetic heterogeneity, genomic privacy.
Major Unit. The Social Network of Protein Folds: Individual Preferences of Amino Acids and the Thermodynamic "Social Choice" Hypothesis
Major Unit. Algorithms: Statistical Hypothesis Testing and Knapsack Algorithms: The Neyman-Pearson Lemma, Multiple Testing
Major Unit. Algorithms: Voting Theory Algorithms and von Neumann-Morgenstern Utility Theory: Protein Folding Energy Function Inference