The past decade has witnessed tremendous technological improvements in
measuring the fundamental molecules of life: DNA, RNA, and
protein. In particular, next-generation DNA sequencing
technologies are producing exponentially increasing volumes of DNA
sequence data. Interpreting this data presents numerous
computational challenges requiring algorithmic, statistical, and
machine learning techniques. This course will examine several
problems arising in the sequencing of human and cancer genomes
including: genome assembly and variant detection; identification
of functional mutations/variants from multiple genome sequences;
discovery of combinations of mutations/variants that influence
an observed trait/phenotype.
News
Oct. 9, 2011: Posted Lecture 4 slides. Updated reading for Lectures 3 and 4.
Sept. 26, 2011: Updated schedule. Computer assignment and project proposal posted. See below and schedule.
Sept. 10, 2011: Course website undergoing revisions. Please check
back on Sept. 12 for updated information.
Proposal:Due Nov. 9 (Specific Aims and Significance) and Dec. 16 (All).
Course Organization
The course will be organized in seminar style where students will read
and present recent papers on the topics listed below. Each topic
will be introduced with background lectures. Students will
undertake a project to further study one of the topics. To the
extent possible, projects will be adjusted to the
background/interest of the student and could range from theoretical
(e.g. designing a new algorithm and proving its correctness), to
the practical (a software implementation).
Prerequisites
Undergraduate-level knowledge of probability: random variables, distributions,
etc.
Undergraduate-level knowledge of algorithms and/or statistics
No biology background is assumed. Necessary background will be
introduced in lectures and reading.