CSCI 2952-M: The Works that Made and Changed Machine Learning

CSCI 2952-M
The Works that Made and Changed Machine Learning
(Fall 2021)

Course Description

This seminar is aimed at current and potential future graduate students who want to gain technical depth and perspective on the field of statistical machine learning. Students will read, present, and discuss some of the original papers that had transformative impact on the development of machine learning. Topics will range from mathematical foundations, to major algorithmic, and breakthrough works on deep learning and its applications in vision and NLP. Ideal students will have a mix of: 1) motivation to learn how to read, present and evaluate technical papers, 2) mathematical maturity and basic ML background, 3) willingness to participate and contribute to discussions. Enrollment will be limited, and will be finalized after the first class. No formal prerequisite.

Essential Info

Instructor: Eli Upfal

Class Meetings: Mondays, 3:00-5:00pm, CIT 316.

Prerequisites: No formal prerequisite. Previous experience in statistical machine learning is recommended through CSCI 1550 or equivalent research experience.

IMPORTANT: Check Canvas for additional information on the first lecture (including Zoom link).

Important Links

Syllabus

Canvas for discussions, and additional class resources.

Contact

For questions, discussion, and other course-related posts, use Canvas.

If you have an atypical question that you are certain does not belong on Canvas, email the instructor.

Background Material

A.M. Turing. Computing machinery and intelligence. Original [PDF], easier to read [PDF].

P. W. L. Wong. How to Read a CS Research Paper?

S. Keshav. How to Read a Paper.

Format

1. Paper(s) assigned for each meeting will be posted on the course website a week before the meeting.

2. Students will read the assigned paper(s) and submit a 1-2 page writeup before the meeting, answering the following questions:

a. What is the main contribution of the paper?

b. Why is the contribution important (or note)?

c. What is not solved in the paper? What research would you suggest following this paper?

3. Each student will give at least one class presentation.

4. Presentations will be followed by a class discussion.

Course Papers

The seminar will cover a subset of the following papers, and possibly other papers suggested by the participants.

Perceptron (Before Deep Learning...)

F. Rosenblatt. The Perceptron — a perceiving and recognizing automaton. Report 85-460-1. Cornell Aeronautical Laboratory (1957).
A. B. J. Novikoff. On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, Vol. XII (1962).
S. Shalev-Shwartz, Y. Singer. A new perspective on an old perceptron algorithm. COLT (2005).

SVM - Support Vector Machine (The First Modern ML Algorithm?)

B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifiers. COLT (1992).
H. Drucker, C. C. Burges, L. Kaufman, A. J. Smola, V. N. Vapnik. Support Vector Regression Machines. NeurIPS (1996).
C. Cortes, V. N. Vapnik. Support-vector networks. Machine Learning 20 (3) (1995).

VC – PAC Learning (The theory)

V. N. Vapnik, A. Chervonenkis. On the uniform convergence of the relative frequencies of events to their probabilities. Theory of Probability and Its Applications, Volume XVI, Number 2 (1971).
L. Valiant. A theory of the learnable. Communications of the ACM, 27 (1984).
A. Blumer, A. Ehrenfeucht, D. Haussler, M. K. Warmuth. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36 (4) (1989).

Boosting - ADABoost

Y. Freund, R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1) (1997).
Y. Freund, R. E. Schapire. A brief introduction to boosting. IJCAI (1999).

Rademacher Complexity (The “modern” theory)

P. L. Bartlett, S. Mendelson. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. JMLR 3 (2002).
S. Mendelson. Rademacher averages and phase transitions in Glivenko-Cantelli classes. IEEE Transactions on Information Theory, 48(1) (2002).
V. Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5) (2001).
V. I. Koltchinskii, D. Panchenko. Rademacher processes and bounding the risk of function learning. In E. Gine, D. Mason, and J. Wellner, editors, High Dimensional Probability II, pages 443–459 (2000).

Transfer Learning

S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, J. W. Vaughan. A theory of learning from different domains. Machine Learning 79(1-2) (2010).
S. Ben-David, J. Blitzer, K. Crammer, F. Pereira. Analysis of Representations for Domain Adaptation. NeurIPS (2006).
M. Ackerman, S. Ben-David, D. Loker. Impossibility Theorems for Domain Adaptation. COLT (2010).
S. Ben-David, R. Urner. Domain adaptation—can quantity compensate for quality? Annals of Mathematics and Artificial Intelligence, vol. 70 (2014).

Deep Learning

D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning internal representations by error propagation. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1 (1986).
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE (1998).
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 60(6) (2017).
G. Cybenko. Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems 2 (1989)
Goodfellow et al. Generative Adversarial Nets. NeurIPS (2015).

NLP – Natural Language Processing

Vaswani et al. Attention Is All You Need. NeurIPS (2017).
Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003).

Reinforcement Learning

R. S. Sutton. Learning to Predict by the Methods of Temporal Differences. Machine Learning 3(9) (1988).
C. Watkins, P. Dayan. Q-learning. Machine Learning 8 (1992).

MCMC

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller. Equation of State Calculations by Fast Computing Machines. Journal of Chemical Physics 21(6) (1953).
W. K. Hastings. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57(1) (1970).