Instructor: Thomas Hofmann

The course will provide a systematic introduction to pattern recognition & machine learning and covers introductory as well as intermediate level material. It is mainly geared towards graduate students, but may also be suitable for advanced undergraduate students with a solid mathematical background. There are no strictly enforced prerequisites, but familiarity with probability theory (for example CS 155), calculus, and linear algebra is a plus.

Covered topics include: decision theory, maximum likelihood estimation, Bayesian statistics, linear classifiers, support vector machines, nearest neighbor classification, Parzen windows, linear regression, regularization theory, neural networks, boosting, model selection, statistical learning theory, feature selection, graphical models, and various techniques for unsupervised learning.

Students from other departments are welcome.

The course will use the following textbook:

Pattern Classification

by Richard O. Duda, Peter E. Hart, and David G. Stork

John Wiley & Sons, Inc., New York,Second Edition, 2001.In addition, chapters and excerpts from the following books will be used in class:

Organization

- Standard lecture format
- Additional discussion sessions
- Problem assignments (total of 8)
- Programming exercises (total of 4)
- Final student presentations on applications (10-15 minutes).
Grading

Problem Assignments 1-8 40% (each 5%) Programming Exercises A-D 40% (each 10%) Presentation 20% Participation mandatory Final Projects voluntarily as a preperation for research projects Continuations

- Reading & Research classes
- Additional lectures and papers in reading group meetings.

Sep 4,6 No class - European Conference on Machine Learning Sep 11 Introduction

Bayesian Decision Theory (part I)-

[DHS 2.1-2.2,A.4]slides Sep 13 Normal Distribution

Bayesian Decision Theory (part II)[DHS 2.5, A.4]

[DHS 2.6, DHS 2.9]slides Assignment 1: Decision Theory Sep 18 Maximum Likelihood Parameter Estimation

Bayesian Parameter Estimation[DHS 3.1-3.3]

[DHS 3.3-3.5]slides Sep 20 Theory of point estimators, Sufficient statistics

Cramer-Rao Theorem, Rao-Blackwell Theorem[DHS 3.6; Lindsey 7.4 (copy at CIT 505)] slides Assignment 2: Parameter Estimation Programming Exercises A: Naive Bayes Classifier for Text Categorization Sep 25 Discussion & Interaction

Exponential Family

[DHS 3.6]

slidesSep 27 Fisher's Linear Discriminant

Perceptron Algorithm[DHS 5.1,5.2, 3.8.2]

[CS 2.1.1 (copy at CIT 505)]slides Oct 2 Support Vector Machines for Classification [CS 6.1]

slides Oct 4 Soft-Margin Classifiers

Quadratic Programming[CS 5.1-5.3, DHS A.3] slides Oct 9 Non-linear Discriminant Functions via Kernels

[CS 3.1-3.3] slides Assignment 3: Linear Discriminant Functions & SVMs Oct 11 Discussion & Interaction

Nearest Neighbor Classifier[DHS 4.5-4.6] slides Oct 16 Linear Regression, Ridge Regression, Regularization [CS 2.2, HTF 3] slides Oct 18 Radial Basis Function Networks, K-means, Interploation [Bishop 5] slides Oct 23 Regularization Networks, SVM regression [Bishop 5, CS 6.2] slides Programming Exercises B: Classification & Regression Oct 25 Neural Networks, Multilayer Perceptrons, Backpropagation [Bishop 4, DHS 6.3,6.6] slides Oct 30 Discussion & Interaction - Nov 1 Resampling for Model Evaluation & Ensemble Methods:

Bagging, Boosting, Jackknife, Bootstrap[DHS 9.4-9.5] slides Nov 6 Ensemble Methods: AdaBoost, Hierarchical Mixtures of Experts [HTF 9.5, 10] slides Nov 8 Statistical Learning Theory [CS 4.1-4.2] slides Nov 13 Graphical Models: Markov Networks, Bayesian Networks [Jordan, Bishop: Chapter 2]

(draft version, not for circulation)Nov 15 Inference in Graphical Models

The Junction Tree Algorithm, Part I[Jordan, Bishop: Chapter 14]

(draft version, not for circulation)Assignment 4: Boosting, Backpropagation & Learning Theory Nov 20 The Junction Tree Algorithm, Part II

Hidden Markov Models[Jordan, Bishop: Chapter 15]

(draft version, not for circulation)Assignment 6: Graphical Models Nov 22 No class - Thanksgiving Recess Nov 27 Hidden Markov Models (continued)

Mixture Models[DHS 10.1-10.4] Nov 29 Learning in Graphical Models

Expectation Maximization AlgorithmTutorial by David Heckerman html Programming Exercises C: Unsupervised Learning Dec 3 Discussion & Interaction (self-organized) - Dec 6 No class - Neural Information Processing Systems

Reading & preparation of student presentations- Dec 11 Student Presentations

Applications of Machine Learning Methods[*] Dec 13 Student Presentations

Applications of Machine Learning Methods[*]