CSCI 1952Q: Algorithmic Aspects of Machine Learning (Spring 2023)

Basic Information

Lectures: Tue/Thu 2:30-3:50 pm in CIT 368.
Instructor: Yu Cheng.
- Email: yu_cheng@brown.edu
- Office Hours: Tuesday 4-5 pm in CIT 413, or by appointment.
TAs:
- Mattie Ji (Head TA). Office Hours: Friday 4-6 pm.
- Linghai Liu. Office Hours: Wednesday 7-9 pm.
- Rohit Mohanty. Office Hours: Tuesday 7-9 pm.
- Colin Savage. Office Hours: Thursday 8-10 pm.
- Nora Tang. Office Hours: Monday 8-10 pm.
Textbook:
- [LRU] Mining of Massive Datasets (Third Edition). Jure Leskovec, Anand Rajaraman, Jeff Ullman.
- [M] Algorithmic Aspects of Machine Learning. Ankur Moitra.
- [A] Theory of Deep Learning (book draft). Sanjeev Arora et al.

Course Description

In this course, we will explore the theoretical foundations of machine learning and deep learning. We will focus on designing and analyzing machine learning algorithms with provable guarantees. More specifically, in this course we will (1) introduce basic tools in linear algebra and optimization, including the power method, singular value decomposition, matrix calculus, (matrix) concentration inequalities, and (stochastic) gradient descent, (2) cover many examples where one can design algorithms with provably guarantees for fundamental problems in machine learning (under certain assumptions), including topic modeling, tensor decomposition, sparse coding, and matrix completion, and (3) discuss the emerging theory of deep learning, including landscape analysis, generalization and over-parameterization, neural tangent kernels, generalization bounds, and implicit regularization.

Top Projects and Reviewers

The course uses a voluntary peer-review process to evaluate all final project submissions. Based on this peer-review process, the following projects/students are chosen as top projects/reviewers:

Top Projects:
- Nick Huang, Jayden Yi. Convergence of Relativistic GANs With Zero-Centered Gradient Penalties.
- Suraj Anand, Neil Xu. Disentangling Casual Mechanisms by Obstructing Classifiers.
- Christopher Hong. Low Latency Streaming Speech Selection.
- Mason Lee, Siddharth Diwan. Improving Probabilistic Model Based Reinforcement Learning Control in Chaotic Environments with Conditional Normalizing Flows.
- Yixiang Sun, Zhengyu Zou. Implementing and Improving Gamma-Net for Data Efficient Image Contour Detection.
Top Reviewers:
- Michael Freeman.
- Daniel Li.
- Jay Sarva.
- Yiyang Nan.
List of all reviewers. The names of the Teaching Assistants (TAs) have been omitted from the list of (top) reviewers. The following students participated in the peer-review process: Jason Eveleth, Michael Freeman, Kalen Frieberg, Brayden Goldstein-Gelb, Klimentina Krstevska, Tanadol Tiger Lamlertprasertkul, Mason Lee, Daniel Li, Yiyang Nan, Mikhail Okunev, Nishka Pant, Jay Sarva, Benjamin Shih, Yixiang Sun, Serdar Sungun, Panos Syrgkanis, Michael Youssef, Jiaqi Zhang, Tony Zhu. Thank you for your time and effort in reviewing the final projects of CSCI 1952Q!

Homeworks

Warning: the following assignments and due dates are from the past. If you are taking CSCI 1952Q this semester, please refer to the current course webpage.

Homework 1, due at 2:30pm on Feb 23 (Thu).
Coding Assignment 1, due at 2:30pm on Mar 7 (Tue).
Homework 2, due at 2:30pm on Mar 22 (Wed).
Homework 3, due at 2:30pm on Apr 17 (Mon).
Coding Assignment 2, due at 2:30pm on Apr 24 (Mon).
Coding Assignment 3, due at 2:30pm on May 8 (Mon).
Homework 4, due at 2:30pm on May 15 (Mon).

Course Project

Project guidelines.
Important dates:
- Mar 9: Decide on teams and topics.
- Apr 13 - Apr 27: Presentations.
- May 15: Project report due.

Schedule

Week 1 (Jan 23): Introduction.
- Introduction to the Course (Lecture 1).
Week 2 (Jan 30): Non-Convex Optimization I (Chapter 7 of [A], Chapter 9 of [LRU], Chapter 8 of [M]).
- Non-Convex Matrix Completion (Lecture 2).
- Tractable Non-Convex Problems (Lecture 3).
- [1] R. Ge, C. Jin, Y. Zheng. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis. ICML 2017.
Week 3 (Feb 6): Non-Convex Optimization II (Chapter 7 of [A], Chapter 9 of [LRU], Chapter 8 of [M]).
- Tensor Decomposition (Lecture 4).
- Singular Value Decomposition (Lecture 5).
- [2] Y. Zhang, Q. Qu, J. Wright. From Symmetry to Geometry: Tractable Nonconvex Problems. 2020.
Week 4 (Feb 13): Nonnegative Matrix Factorization I (Chapter 1 of [M], Chapter 11 of [LRU]).
- Nonnegative Matrix Factorization (Lecture 6).
- Algorithms for Separable NMF (Lecture 7).
- [3] S. Arora, R. Ge, R. Kannan, A. Moitra. Computing a Nonnegative Matrix Factorization -- Provably. STOC 2012.
Week 5 (Feb 20): Nonnegative Matrix Factorization II (Chapter 1 of [M], Chapter 11 of [LRU]).
- Topic Modeling (Lecture 8).
- [4] S. Arora, R. Ge, A. Moitra. Learning Topic Models -- Going Beyond SVD. FOCS 2012.
Week 6 (Feb 27): Finding Similar Items I (Chapter 3 of [LRU]).
- NMF and Topic Modeling (Lecture 9).
- Set Similarity and MinHash (Lecture 10).
- [5] A. Z. Broder. On the Resemblance and Containment of Documents. SEQUENCES 1997.
Week 7 (Mar 6): Finding Similar Items II (Chapter 3 of [LRU]).
- Locality Sensitive Hashing (Lecture 11).
- Locality-Sensitive Functions and Amplification (Lecture 12).
- [6] P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998.
Week 8 (Mar 13): Algorithms for Data Streams (Chapter 4 of [LRU]).
- Sampling Data in a Stream (Lecture 13).
- Bloom Filter, Counting Distinct Elements in a Stream (Lecture 14).
- [7] P. Flajolet, G. N. Martin. Probabilistic Counting Algorithms for Data Base Applications. JCSS 1985.
Week 9 (Mar 20): Network Analysis (Chapters 5 and 10 of [LRU]).
- PageRank and Computation of PageRank (Lecture 15).
- Personalized PageRank, Graph Partitioning (Lecture 16).
- [8] L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1999.
Week 10 (Mar 27): No classes. Spring Recess.
Week 11 (Apr 3): Theory of Deep Learning (Chapters 9 and 10 of [A]).
- Spectral Clustering, Local Graph Algorithms (Lecture 17).
- Global Convergence of Gradient Descent (Lecture 18)
- [9] S. Arora, S. S. Du, W. Hu, Z. Li, R. Wang. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. ICML 2019.
Week 12-14 (Apr 10):
- Implicit Bias of Gradient Descent (Lecture 19).
- Project Presentations.
- [10] D. Soudry, E. Hoffer, M. S. Nacson, S. Gunasekar, N. Srebro. The Implicit Bias of Gradient Descent on Separable Data. ICLR 2018.
Additional Readings (optional):
- Non-Convex Optimization:
  - R. Ge, J. D. Lee, T. Ma. Matrix Completion has No Spurious Local Minimum. NIPS 2016.
  - J. Sun, Q. Qu, J. Wright. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture. IEEE Trans. Inf. Theory. 2017.
  - R. Ge, T. Ma. On the Optimization Landscape of Tensor Decompositions. NIPS 2017.
  - C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, M. I. Jordan. How to Escape Saddle Points Efficiently. ICML 2017.
  - S. Bubeck. Convex Optimization: Algorithms and Complexity (monograph). Found. Trends Mach. Learn. 2015.
- Nonnegative Matrix Factorization:
  - S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, M. Zhu. A Practical Algorithm for Topic Modeling with Provable Guarantees. ICML 2013.
  - S. A. Vavasis. On the Complexity of Nonnegative Matrix Factorization. SIOPT 2009.
  - D. D. Lee, H. S. Seung. Learning the Parts of Objects by Nonnegative Matrix Factorization. Nature 1999.
  - D. D. Lee, H. S. Seung. Algorithms for Non-negative Matrix Factorization. NIPS 2000.
  - N. Gillis. Nonnegative Matrix Factorization (book). 2020.
- Locality-Sensitive Hashing:
  - A. Gionis, P. Indyk, R. Motwani. Similarity Search in High Dimensions via Hashing. VLDB 1999.
  - A. Z. Broder, M. Charikar, A. M. Frieze, M. Mitzenmacher. Min-Wise Independent Permutations. STOC 1998.
  - M. Datar, N. Immorlica, P. Indyk, V. S. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. SCG 2004.
  - R. O'Donnell, Y. Wu, Y. Zhou. Optimal Lower Bounds for Locality Sensitive Hashing (Except When q Is Tiny). ICS 2011.
  - A. Andoni, I. Razenshteyn. Optimal Data-Dependent Hashing for Approximate Near Neighbors. STOC 2015.
- Streaming Algorithms:
  - B. H. Bloom. Space/Time Trade-Offs in Hash Coding With Allowable Errors. Commun. ACM. 1970.
  - J. S. Vitter. Random Sampling With a Reservoir. TOMS 1985.
  - N. Alon, Y. Matias, M. Szegedy. The Space Complexity of Approximating the Frequency Moments. STOC 1996.
  - P. Gopalan, T. S. Jayram, R. Krauthgamer, R. Kumar. Estimating the Sortedness of a Data Stream. SODA 2007.
  - E. Kushilevitz, N. Nisan. Communication Complexity (book). 1996.
- Network Analysis:
  - S. Brin, L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Comput. Networks. 1998.
  - R. Andersen, F. Chung, K. Lang. Local Graph Partitioning using PageRank Vectors. FOCS 2006.
  - J. Shi and J. Malik. Normalized Cuts and Image Segmentation. CVPR 1997.
  - D. A. Spielman. Spectral and Algebraic Graph Theory (book draft). 2019.
  - S.-H. Teng. Scalable Algorithms for Data and Network Analysis (book). 2016.
- Theoretical Deep Learning:
  - S. S. Du, X. Zhai, B. Póczos, A. Singh. Gradient Descent Provably Optimizes Over-parameterized Neural Networks. ICLR 2019.
  - Z. Allen-Zhu, Y. Li, Z. Song. A Convergence Theory for Deep Learning via Over-Parameterization. ICML 2019.
  - J. D. Lee, Q. Lei, N. Saunshi, J. Zhuo. Predicting What You Already Know Helps: Provable Self-Supervised Learning. NeurIPS 2021.
  - C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals. Understanding Deep Learning (Still) Requires Rethinking Generalization. Commun. ACM. 2021.
  - M. Belkin, D. Hsu, S. Ma, S. Mandal. Reconciling Modern Machine Learning and the Bias-Variance Trade-off. PNAS 2019.

Grading

Homework assignments (50%): We will have 4 homework assignments and 3 coding assignments. You are encouraged to discuss with other students, but you must acknowledge who you worked with and write up your solutions independently. Late submissions are not accepted (exceptions with good reasons may be requested in advance).
Course project (50%): Students will form teams to work on research projects or explore in depth a topic that we did not cover in class. Each group will do a project presentation and write a project report.
Bonus points (20%): There are bonus points for extra-credit homework questions and for top final projects and reviewers.

Academic Integrity

Academic achievement is evaluated on the basis of work that a student produces independently. A student who obtains credit for work, words, or ideas which are not the products of his or her own effort is dishonest. Such dishonesty undermines the integrity of academic standards of the University. Infringement of the Academic Code entails penalties ranging from reprimand to suspension, dismissal or expulsion from the University. Students who have questions on any aspect of the Academic Code should consult the instructor or one of the deans of the Graduate School to avoid the serious charge of academic dishonesty.

Disability Policies

Brown University is committed to full inclusion of all students. Any student with a documented disability is welcome to contact the instructor as early in the semester as possible so that reasonable accommodations can be arranged. If you need accommodations around online learning or in-classroom accommodations, please be sure to reach out to Student Accessibility Services (SAS) for their assistance (sas@brown.edu, 401-863-9588, Brown SAS Website). Students may also speak with Student Accessibility Services at 401-863-9588 to discuss the process for requesting accommodations.

Religious Holidays

Students who wish to observe their religious holidays shall inform the instructor within the first four weeks of the semester, unless the religious holiday is observed in the first four weeks. In such cases, the students shall notify the faculty member at least five days in advance of the date when he/she will be absent. The instructor shall make every reasonable effort to honor the request.