Advanced Topics in Machine Learning: Modeling and Clustering High-Dimensional Data
Time: MW 12:00-1:15 p.m.

Place: Hackerman 320

Instructor: Rene Vidal
Course Description
In the era of data deluge, the development of methods for discovering structure in high-dimensional data is becoming increasingly important. This course will cover state-of-the-art methods from algebraic geometry, sparse and low-rank representations, and statistical learning for modeling and clustering high-dimensional data. The first part of the course will cover methods for modeling data with a single low-dimensional subspace, such as PCA, Robust PCA, Kernel PCA, and manifold learning techniques. The second part of the course will cover methods for modeling data with multiple subspaces, such as algebraic, statistical, sparse and low-rank subspace clustering techniques. The third part of the course will cover applications of these methods in image processing, computer vision, and biomedical imaging.
Syllabus
  1. Background
    • Introduction (Chapter 1)
    • Basic Facts from Linear Algebra
    • Basic Facts from Optimization (Appendix A)
    • Basic Facts from Mathematical Statistics (Appendix B)
  2. Modeling Data with a Single Subspace (Part I)
    • Principal Component Analysis (Chapter 2)
      • Statistical View
      • Geometric View
      • Model Selection
      • Applications in Face Recognition
    • Robust Principal Component Analysis (Chapter 3)
      • PCA with Missing Entries
      • PCA with Corrupted Entries
      • PCA with Outliers
      • Applications in Face Recognition
    • Nonlinear and Nonparametric Extensions (Chapter 4)
      • Nonlinear and Kernel PCA
      • Nonparametric Manifold learning
      • K-means and Spectral Clustering
      • Applications in Face Recognition
  3. Modeling Data with Multiple Subspaces (Part II)
    • Algebraic-Geometric Methods (Chapter 5)
      • Line, Plane, and Hyperplane Clustering
      • Algebraic Subspace Clusering
      • Model Selection for Multiple Subspaces
    • Statistical Methods (Chapter 6)
      • K-Subspaces
      • MPPCA
      • Applications in Face Clustering
    • Spectral Methods (Chapter 7)
      • Local Methods: LSA, SLBF, LLMC
      • Global Methods: SCC, SASC
      • Applications in Face Clustering
    • Sparse and Low-Rank Methods (Chapter 8)
      • Low-Rank Subspace Clustering
      • Sparse Subspace Clustering
      • Applications in Face Clustering
  4. Applications (Part III)
    • Image Representation (Chapter 9)
    • Image Segmentation (Chapter 10)
    • Motion Segmentation (Chapter 11)
Textbook
Course Schedule
01/27/16: Introduction + Basics of Optimization

02/01/16: Basics of Optimization

02/03/16: Basics of Linear Algebra

02/08/16: Statistical View of PCA

02/10/16: Geometric View of PCA

02/17/16: Rank Minimization View of PCA

02/22/16: Probabilistic PCA

02/24/16: Probabilistic PCA

02/29/16: Model Selection for PCA

03/02/16: PCA with Missing Entries

03/07/16: PCA with Missing Entries via Alternating Minimization

03/07/16: PCA with Missing Entries via Convex Optimization

03/11/16: PCA with Corrupted Entries via Alternating Minimization

03/21/16: PCA with Corrupted Entries via Convex Optimization

03/23/16: PCA with Outliers via Convex Optimization (L21)

03/28/16: PCA with Outliers via Convex Optimization (L21 and L1)

03/30/16: Nonparametric Manifold Learning (LLE and LE)

04/04/16: K-means and Spectral Clustering

04/06/16: K-subspaces

Grading
  1. Homeworks (40%): There will two homeworks. Homework problems will include both
  2. analytical exercises as well as programming assignments in MATLAB.
  3. Exam (30%): There will be one exam on Monday April 25th, 12:00PM-1:15PM.
  4. Project (30%): There will be a final project to be done either individually or in teams of two students. Each team will write a 6-page report and give a 10 minute presentation (including 3 minutes for questions) on the scheduled exam day on Saturday May 7th, 2-5PM.
    • Sample theoretical project
      • Do theoretical analysis of alternating minimization for matrix completion (e.g., prove convergence, prove correctness of recovered matrix)
      • Do theoretical analysis of convex optimization method for matrix completion with noise (e.g., prove correctness of recovered matrix)
      • Do theoretical analysis of sparse subspace clustering with noise (e.g., prove correctness of recovered matrix)
    • Sample evaluation project
      • Evaluate and compare all methods for matrix completion, PCA with corrupted entries, PCA with outliers, on synthetic data
      • Evaluate and compare all methods for subspace clusteriung on synthetic data
    • Sample application project
      • Develop and evaluate face recognition algorithm with variations not only in illumination, but also in pose
      • Evaluate and compare all methods for subspace clusteriung on synthetic data
Administrative
  • Late policy:
    • Homeworks and projects are due on the specified dates.
    • No late homeworks or projects will be accepted.
  • Honor policy:

    The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful. Ethical violations include cheating on exams, plagiarism, reuse of assignments, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition.

  • Homeworks and exams are strictly individual
  • Projects can be done in teams of two students