Advanced Topics in Machine Learning: Modeling and Clustering High-Dimensional Data
Course Description
In the era of data deluge, the development of methods for discovering structure in high-dimensional data is becoming increasingly important. This course will cover state-of-the-art methods from algebraic geometry, sparse and low-rank representations, and statistical learning for modeling and clustering high-dimensional data. The first part of the course will cover methods for modeling data with a single low-dimensional subspace, such as PCA, Robust PCA, Kernel PCA, and manifold learning techniques. The second part of the course will cover methods for modeling data with multiple subspaces, such as algebraic, statistical, sparse and low-rank subspace clustering techniques. The third part of the course will cover applications of these methods in image processing, computer vision, and biomedical imaging.
Syllabus
- Background
- Introduction (Chapter 1)
- Basic Facts from Linear Algebra
- Basic Facts from Optimization (Appendix A)
- Basic Facts from Mathematical Statistics (Appendix B)
- Modeling Data with a Single Subspace (Part I)
- Principal Component Analysis (Chapter 2)
- Statistical View
- Geometric View
- Model Selection
- Applications in Face Recognition
- Robust Principal Component Analysis (Chapter 3)
- PCA with Missing Entries
- PCA with Corrupted Entries
- PCA with Outliers
- Applications in Face Recognition
- Nonlinear and Nonparametric Extensions (Chapter 4)
- Nonlinear and Kernel PCA
- Nonparametric Manifold learning
- K-means and Spectral Clustering
- Applications in Face Recognition
- Modeling Data with Multiple Subspaces (Part II)
- Algebraic-Geometric Methods (Chapter 5)
- Line, Plane, and Hyperplane Clustering
- Algebraic Subspace Clusering
- Model Selection for Multiple Subspaces
- Statistical Methods (Chapter 6)
- K-Subspaces
- MPPCA
- Applications in Face Clustering
- Spectral Methods (Chapter 7)
- Local Methods: LSA, SLBF, LLMC
- Global Methods: SCC, SASC
- Applications in Face Clustering
- Sparse and Low-Rank Methods (Chapter 8)
- Low-Rank Subspace Clustering
- Sparse Subspace Clustering
- Applications in Face Clustering
- Applications (Part III)
- Image Representation (Chapter 9)
- Image Segmentation (Chapter 10)
- Motion Segmentation (Chapter 11)
Textbook
Course Schedule
01/27/16:
Introduction + Basics of Optimization
02/01/16: Basics of Optimization
02/03/16: Basics of Linear Algebra
02/08/16: Statistical View of PCA
02/10/16: Geometric View of PCA
02/17/16: Rank Minimization View of PCA
02/22/16: Probabilistic PCA
02/24/16: Probabilistic PCA
02/29/16: Model Selection for PCA
03/02/16: PCA with Missing Entries
03/07/16: PCA with Missing Entries via Alternating Minimization
03/07/16: PCA with Missing Entries via Convex Optimization
03/11/16: PCA with Corrupted Entries via Alternating Minimization
03/21/16: PCA with Corrupted Entries via Convex Optimization
03/23/16: PCA with Outliers via Convex Optimization (L21)
03/28/16: PCA with Outliers via Convex Optimization (L21 and L1)
03/30/16: Nonparametric Manifold Learning (LLE and LE)
04/04/16: K-means and Spectral Clustering
04/06/16: K-subspaces
Grading
- Homeworks (40%): There will two homeworks. Homework problems will include both
analytical exercises as well as programming assignments in MATLAB.
- Exam (30%): There will be one exam on Monday April 25th, 12:00PM-1:15PM.
- Project (30%): There will be a final project to be done either individually or in teams of two students. Each team will write a 6-page report and give a 10 minute presentation (including 3 minutes for questions) on the scheduled
exam day on Saturday May 7th, 2-5PM.
- Sample theoretical project
- Do theoretical analysis of alternating minimization for matrix completion (e.g., prove convergence, prove correctness of recovered matrix)
- Do theoretical analysis of convex optimization method for matrix completion with noise (e.g., prove correctness of recovered matrix)
- Do theoretical analysis of sparse subspace clustering with noise (e.g., prove correctness of recovered matrix)
- Sample evaluation project
- Evaluate and compare all methods for matrix completion, PCA with corrupted entries, PCA with outliers, on synthetic data
- Evaluate and compare all methods for subspace clusteriung on synthetic data
- Sample application project
- Develop and evaluate face recognition algorithm with variations not only in illumination, but also in pose
- Evaluate and compare all methods for subspace clusteriung on synthetic data
Administrative
- Late policy:
- Homeworks and projects are due on the specified dates.
- No late homeworks or projects will be accepted.
- Honor policy:
The strength of the university depends on academic and personal
integrity. In this course, you must be honest and truthful. Ethical
violations include cheating on exams, plagiarism, reuse of
assignments, improper use of the Internet and electronic devices,
unauthorized collaboration, alteration of graded assignments, forgery
and falsification, lying, facilitating academic dishonesty, and unfair
competition.
- Homeworks and exams are strictly individual
- Projects can be done in teams of two students