CSCI1820

Algorithmic Foundations of Computational Biology

Not offered this year
Offered most years, last taught:

Spring 2024

This course is devoted to computational and statistical methods as well as software tools for DNA, RNA, and protein sequence analysis. The focus is on understanding the algorithmic and mathematical foundations of the methods, the design of the associated genomics tools, as well as on their applications. A comprehensive set of programming assignments provides a hands-on journey for the student into the complexities of real genomic data. These include: basic components of a genome assembler, mapping sets of sequences to the genome, as those generated by high-throughput sequencing like Illumina/Solexa and 454, a BLAST-like search tool, HMM algorithms for gene prediction, suffix trees, motif prediction for transcription factors promoters, and genome mapping of genetic variation of SNPs, haplotypes, and copy number. The course has several unifying themes such as alignment, comparative genomics, protein structure, the newly unveiled role of RNA in the regulatory genome, and the intertwining of statistics and algorithmics in the design of powerful genomic tools.

The course is open to computer and mathematical sciences students as well as biological and medical students.

Both advanced undergraduates and graduate students are welcome. Biomed students compensate for programming assignments with comparable work for a final project. Graduate credit is obtained by a final project devoted to a research problem. Two grassroots projects are being built gradually by final projects of students in this class. Genomathica is a library of biologist-friendly-code-tinkering genomic tools written in Mathematica, and Cellarium is a programming language framework for bioinformatics workflows. The instructor taught evolutionary versions of this course in Departments of Biology, Computer Science, and Biochemistry and Cell Biology (in the Medical School).

Pre-requisite: CSCI 1810

Instructor(s):
CRN:26692