Unsupervised Learning: From Big Data to Low-Dimensional Representations
Time: MF 12:00-1:15 p.m.

Place: Hodson 313

Instructor: Rene Vidal
Course Description
In the era of data deluge, the development of methods for discovering structure in high-dimensional data is becoming increasingly important. This course will cover state-of-the-art methods from algebraic geometry, sparse and low-rank representations, and statistical learning for modeling and clustering high-dimensional data. The first part of the course will cover methods for modeling data with a single low-dimensional subspace, such as PCA, Robust PCA, Kernel PCA, and manifold learning techniques. The second part of the course will cover methods for modeling data with multiple subspaces, such as algebraic, statistical, sparse and low-rank subspace clustering techniques. The third part of the course will cover applications of these methods in image processing, computer vision, and biomedical imaging.
  1. Background
    • Introduction (Chapter 1)
    • Basic Facts from Linear Algebra
    • Basic Facts from Optimization (Appendix A)
    • Basic Facts from Mathematical Statistics (Appendix B)
  2. Modeling Data with a Single Subspace (Part I)
    • Principal Component Analysis (Chapter 2)
      • Statistical View
      • Geometric View
      • Model Selection
      • Applications in Face Recognition
    • Robust Principal Component Analysis (Chapter 3)
      • PCA with Missing Entries
      • PCA with Corrupted Entries
      • PCA with Outliers
      • Applications in Face Recognition
    • Nonlinear and Nonparametric Extensions (Chapter 4)
      • Nonlinear and Kernel PCA
      • Nonparametric Manifold learning
      • K-means and Spectral Clustering
      • Applications in Face Recognition
  3. Modeling Data with Multiple Subspaces (Part II)
    • Algebraic-Geometric Methods (Chapter 5)
      • Line, Plane, and Hyperplane Clustering
      • Algebraic Subspace Clusering
      • Model Selection for Multiple Subspaces
    • Statistical Methods (Chapter 6)
      • K-Subspaces
      • MPPCA
      • Applications in Face Clustering
    • Spectral Methods (Chapter 7)
      • Local Methods: LSA, SLBF, LLMC
      • Global Methods: SCC, SASC
      • Applications in Face Clustering
    • Sparse and Low-Rank Methods (Chapter 8)
      • Low-Rank Subspace Clustering
      • Sparse Subspace Clustering
      • Applications in Face Clustering
  4. Applications (Part III)
    • Image Representation (Chapter 9)
    • Image Segmentation (Chapter 10)
    • Motion Segmentation (Chapter 11)
Video Lectures

Slides on Subspace Clustering

01/30/17: Syllabus + Introduction + Basics of Linear Algebra

02/03/17: Basics of Linear Algebra

02/06/17: Statistical View of PCA

02/10/17: Geometric View of PCA

02/20/17: Rank Minimization View of PCA

02/22/17: Model Selection for PCA

02/24/17: PCA with Missing Entries via Convex Optimization

02/27/17: PCA with Corrupted Entries via Convex Optimization

03/01/17: PCA with Outliers via Convex Optimization (L21)

03/03/17: Extensions + PCA with Outliers via Convex Optimization (L1)

03/06/17: Robust PCA via Alternating Minimization

03/10/17: Robust PCA via Alternating Minimization

03/13/17: Nonlinear PCA

03/17/17: Kernel PCA

03/27/17: Locally Linear Embedding (LLE)

03/31/17: Laplacian Eigenmaps (LE)

04/03/17: Exam 1

04/07/17: Spectral Clustering

04/10/17: Spectral Clustering

04/14/17: K-means

04/17/17: K-subspaces

04/21/17: Spectral Subspace Clustering: Local Methods

04/24/17: Spectral Subspace Clustering: Global Methods

04/28/17: Sparse Subspace Clustering

05/01/17: Sparse Subspace Clustering

05/05/17: Low-Rank Subspace Clustering

  1. Homeworks (30%): There will homeworks approximately every other week, which will include both analytical exercises as well as programming assignments in MATLAB.
    • Homework 1: Due 02/20/2017, 23:59
    • Exercise 2.1
      Exercise 2.4
      Exercise 2.17 (PCA part only, do not do PPCA or model selection)
      Submission instructions. Please send an email to the TA with subject 600.692:HW1 and attachment firstname-lastname-hw1-learning16.zip or firstname-lastname-hw1-learning16.tar.gz. The attachment should have the following content:
      1. For analytical questions, please submit a file called hw1.pdf containing your answers to each one of the analytical questions. If at all possible, you should generate this file using the latex template hw1-learning14.tex. If not possible, you may use another editor, or scan your handwritten solutions. But note that you must submit a single PDF file with all your answers.
      2. For coding questions, please submit a file called README, which contains instructions on how to run your code. Please use separate directories for each coding problem, each one containing all the functions and scripts you are asked to write in separate files. For example, for HW1 the structure of your submission could look like (a) README (b) hw1.pdf (c) hw1q3: hw1q3c.m, hw1q3e.m The TA will run your scripts to generate the results. Thus, your script should include all needed plotting com- mands so that figures pop up automatically. Please make sure that the figure numbers match those you describe in hw1.pdf. You do not need to submit input or output images. The output images should be automatically generated by your scripts so that the TA can see the results by just running the scripts. In writing your code, you should assume that the TA will place the input images in the directory that is relevant to the question solved by your script. Also, make sure to comment your code properly.
    • Homework 2: Due 03/06/2017, 11:59AM
    • Exercise 3.5
      Exercise 2.16
      Exercise 3.1
    • Homework 3: Due 03/17/2017, 11:59PM
    • Exercise 3.7 (40 points)
      Exercise 3.8 (20 points): Solve part 1 only.
      Exercise 3.9 (20 points): Implement and evaluate LRMC only.
      Exercise 3.10 (20 points): Implement and evaluate rpca_admm only.
    • Homework 4: Due 03/31/2017, 11:59PM
    • Exercise 3.11
    • Homework 5: Due 05/05/2017, 11:59AM
    • Exercise 4.2 (10 points)
      Exercise 4.13 (30 points)
      Exercise 6.1 (20 points)
      Exercise 7.2 (10 points)
      Exercise 7.3 (10 points)
      Exercise 8.1 (20 points)
  2. Exams (40%):
    • Exam 1: 04/03/2017, 12:00PM-01:15PM, Hodson 313.
    • Exam 2: 05/12/2017, 09:00AM-10:15AM, Hodson 313.
  3. Project (30%): There will be a final project to be done either individually or in teams of up to three students. Please see detailed instructions here.
    • Project proposal: Due 04/26/2017, 11:59PM. Please submit a one page proposal describing (1) the problem you intend to solve, (2) the algorithms you intend to use, and (3) the datasets and metrics by which you plan to evaluate the algorithms. Please see detailed instructions here.
    • Project presentation and report: Due 05/12/2017, 10:15AM-12:00PM, Hodson 313. Please submit a 6-page report (e.g., CVPR 2017 double column format) containing title, abstract, introduction, problem description, proposed solution, experiments, conclusions, and references. Please give a 10 minute presentation (including 3 minutes for questions)
  • Late policy:
    • Homeworks and projects are due on the specified dates.
    • No late homeworks or projects will be accepted.
  • Honor policy:

    The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful. Ethical violations include cheating on exams, plagiarism, reuse of assignments, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition.

  • Homeworks and exams are strictly individual
  • Projects can be done in teams of two students