Major Unit. Genomics Foundations: Modeling and Measuring Evolution: Linkage Disequilibrium (LD), Urn Models 
Sept. 4 
The Human Genome and Variation 
CSCI 2951N Overview
SNPs and the Human Genome
Haplotype Blocks and Tagging SNPs
From Human Genome to Clinical Theraphy 


Sept. 9 

Introduction to Linkage Disequilibrium
 
R. Lewontin's seminal paper introducing D
M. Slatkin "Linkage Disequilibrium  understanding evolutionary past .." 
Sept. 11 

Urn Models and the WrightFisher Model
The HardyWeinberg Model
 
A comparison paper of most used LD measures
A rigorous review of Statistical Hypothesis Testing
Sir Ronald Fisher's Puzzle "The Lady Tasting Tea"

Major Unit. Algorithms: GraphTheory Algorithms: Haplotype Phasing I. Clark Methods and the EM Algorithm

Sept. 16 

Haplotype Phasing Algorithms: The Clark Method
Andy Clark's seminal 1990 paper introducing the Clark Method

Homework 1
dataset1genotypes

Dan Gusfield's paper proving the NPcompleteness of the problem of minimizing the number of unresolved (unphased) genotypes

Sept. 18 

Urn Models, the Infinite Alleles Models, and the Ewens Sampling Lemma
 
An article presenting Ewens Sampling Lemma's proof and generalizations
Ewens Sanmpling Lemma  several other applications

Sept. 30 

The ExpectationMaximization (EM) Algorithm for Haplotype Phasing
 
The ExcoffierSlatkin ExpectationMaximization (EM) Algorithm paper for Computing Haplotype Frequencies and Haplotype Phasing

Oct. 2 

Theorem (E. Hubbell, 2000) Haplotype Parsimony Phasing is NPcomplete, and Haplotype Maximum Likelihood Phasing is NPcomplete
 

Major Unit. Algorithms: GraphTheory Algorithms: Set Cover and Dominating Set: Haplotype Tagging SNPs Algorithms

Oct. 7 

Tagging SNPs  The LDSelect Algorithm
 

Oct. 9 

Tagging SNPs  The Informativeness Algorithm
 
Tagging SNPs and Haplotype Blocks




Homework 2 

Oct. 14 

Informativeness Algorithm and Haplotype Blocks
 

Major Unit. GenomeWide Association Studies: Computational Workflows

Oct. 16 

GWAS Issues and Caveats
GWAS: Computational Workflows

 GWAS Tutorial on Statistical Methods for Population Genomics Studies

Oct. 23 

Computational Workflows for GWAS  Rare variants
Genetic Mapping in Human Disease and GWAS successes
List of GWASs and diseases  to pick from for presentation


Guilt by association
GWAS and assessment of disease risk  an overview of GWAS methodology
GWAS and human disease  a review

Major Unit. Algorithms: GraphTheory Algorithms: Haplotype Phasing II. LongRange Phasing and IdenticalbyDescent Inference

Oct. 28 

deCode LongRange Haplotype Phasing Algorithm



Oct. 30 

The Transmission Disequilibrium Test






Homework 3 

Major Unit. Algorithms: Markov Chain MonteCarlo and the Metropolis Algorithm; application: the STRUCTURE Algorithm for Inference of Population Substructure in a sample of genotyped individuals

Nov. 4 

The seminal paper by Jonathan Pritchard, Matthew Stephens, and Peter Donnelly "Inference of Population Structure Using Multilocos Genotype Data" Genetics, (2000). It presents the algorithm STRUCTRE based on the Markov Chain Monte Carlo method for infering population substructure in a sample of genotyped individuals such as a GWAS sample.


The seminal paper by Persi Diaconis "The Markov Chain Monte Carlo Revoultion"; it contains the "Criminal/prison code"and its breaking via an MCMC algorithm as we discussed in class.
"Many basic scientific problems are now routinely solved by simulation: a fancy random walk is performed on the system of interest. Averages computed from the walk give useful answers to formerly intractable problems."

Nov. 6 

A tutorial on Markov Chain Monte Carlo Method. In particular it has theory on how long the chains should be run to reach the stationary distribution, the so called "MixingTime" theory, which we did not cover in class.



Nov. 11 

The original Metropolis et al paper, i.e., Nicholas Metropolis, Ariana Rosenbluth, Marshall Rosenbluth, Augusta Teller, and Edward Teller "Equation of State Calculations for Fast Computing Machines" The Journal of Chemical Physics, 21, 1087 (1953)


Understanding the MetropolisHastings Algorithm

Nov. 13 

A paper by Nick Metropolis, who coined the name "Monte Carlo" for the method: N. Metropolis "The Beginning of the Monte Carlo Method" Los Alamos Science, Special Isssue 1987, pp. 125130; he gives a historic introduction of the Monte Carlo Method introduced by two famous mathematicians working at Los Alamos on the Manhattan Project: Stanislaw Ulam and John von Neumann; the method was "reinvented" by Enrico Fermi 15 years before but just as a numerical simulation technique; the mathematical development and foundations are due to Ulam and von Neumann.


Metropilis and MANIAC (one of the first electronic computers)
A perspective on the Metropolis method

Major Unit. The Coalescent: Theory and Applications; The MinicielloDurbin Ancestral Recombination Graph Inference Algorithm; Hudson Simulator

Nov. 18 

The seminal MinicielloDurbin Ancestral Recombination Graph Algorithm; reference: Mark Miniciello and Richard Durbin, " Mapping Trait Loci by use of Ancestral Recombination Graphs", American Journal of Human Genetics, vol 79, pp. 910922 (2006)



Nov. 20 

A survey by Noah Rosenberg and Magnus Mordborg, "Genealogical Trees, Coalescent Theory and the Analysis of Genetic Polymorphism" Nature Reviews Genetics, 380, vol 3 (2002)


A Tutorial on the Coalescent Theory, Magnus Norburg, "Coalescent Theory" 37 pp (2000)




Homework 4
Stephens et al the PHASE Algorithm paper
The Haplotype Phasing Competition paper


Major Unit. GenomeWide Association Studies (GWAS): Statistical associations, the missing heritabililty problem, genetic heterogeneity, genomic privacy. 
Major Unit. The Social Network of Protein Folds: Individual Preferences of Amino Acids and the Thermodynamic "Social Choice" Hypothesis 
Major Unit. Algorithms: Statistical Hypothesis Testing and Knapsack Algorithms: The NeymanPearson Lemma, Multiple Testing 
Major Unit. Algorithms: Voting Theory Algorithms and von NeumannMorgenstern Utility Theory: Protein Folding Energy Function Inference 