Unsupervised Learning: From Big Data to Low-Dimensional Representations

Time: MF 12:00-1:15 p.m.
Place: Hodson 313
Instructor: Rene Vidal

Course Description

In the era of data deluge, the development of methods for discovering structure in high-dimensional data is becoming increasingly important. This course will cover state-of-the-art methods from algebraic geometry, sparse and low-rank representations, and statistical learning for modeling and clustering high-dimensional data. The first part of the course will cover methods for modeling data with a single low-dimensional subspace, such as PCA, Robust PCA, Kernel PCA, and manifold learning techniques. The second part of the course will cover methods for modeling data with multiple subspaces, such as algebraic, statistical, sparse and low-rank subspace clustering techniques. The third part of the course will cover applications of these methods in image processing, computer vision, and biomedical imaging.

Syllabus

**Background**- Introduction (Chapter 1)
- Basic Facts from Linear Algebra
- Basic Facts from Optimization (Appendix A)
- Basic Facts from Mathematical Statistics (Appendix B)
**Modeling Data with a Single Subspace (Part I)**- Principal Component Analysis (Chapter 2)
- Statistical View
- Geometric View
- Model Selection
- Applications in Face Recognition
- Robust Principal Component Analysis (Chapter 3)
- PCA with Missing Entries
- PCA with Corrupted Entries
- PCA with Outliers
- Applications in Face Recognition
- Nonlinear and Nonparametric Extensions (Chapter 4)
- Nonlinear and Kernel PCA
- Nonparametric Manifold learning
- K-means and Spectral Clustering
- Applications in Face Recognition
**Modeling Data with Multiple Subspaces (Part II)**- Algebraic-Geometric Methods (Chapter 5)
- Line, Plane, and Hyperplane Clustering
- Algebraic Subspace Clusering
- Model Selection for Multiple Subspaces
- Statistical Methods (Chapter 6)
- K-Subspaces
- MPPCA
- Applications in Face Clustering
- Spectral Methods (Chapter 7)
- Local Methods: LSA, SLBF, LLMC
- Global Methods: SCC, SASC
- Applications in Face Clustering
- Sparse and Low-Rank Methods (Chapter 8)
- Low-Rank Subspace Clustering
- Sparse Subspace Clustering
- Applications in Face Clustering
**Applications (Part III)**- Image Representation (Chapter 9)
- Image Segmentation (Chapter 10)
- Motion Segmentation (Chapter 11)

Textbook

Lectures

Video Lectures
Slides on Subspace Clustering
01/30/17:
Syllabus +
Introduction
+ Basics of Linear Algebra
02/03/17: Basics of Linear Algebra
02/06/17: Statistical View of PCA
02/10/17: Geometric View of PCA
02/20/17: Rank Minimization View of PCA
02/22/17: Model Selection for PCA
02/24/17: PCA with Missing Entries via Convex Optimization
02/27/17: PCA with Corrupted Entries via Convex Optimization
03/01/17: PCA with Outliers via Convex Optimization (L21)
03/03/17: Extensions + PCA with Outliers via Convex Optimization (L1)
03/06/17: Robust PCA via Alternating Minimization
03/10/17: Robust PCA via Alternating Minimization
03/13/17: Nonlinear PCA
03/17/17: Kernel PCA
03/27/17: Locally Linear Embedding (LLE)
03/31/17: Laplacian Eigenmaps (LE)
04/03/17: Exam 1
04/07/17: Spectral Clustering
04/10/17: Spectral Clustering
04/14/17: K-means
04/17/17: K-subspaces
04/21/17: Spectral Subspace Clustering: Local Methods
04/24/17: Spectral Subspace Clustering: Global Methods
04/28/17: Sparse Subspace Clustering
05/01/17: Sparse Subspace Clustering
05/05/17: Low-Rank Subspace Clustering

Grading

**Homeworks (30%)**: There will homeworks approximately every other week, which will include both analytical exercises as well as programming assignments in MATLAB.-
**Homework 1: Due 02/20/2017, 23:59**
Exercise 2.1
**Homework 2: Due 03/06/2017, 11:59AM**
Exercise 3.5 **Homework 3: Due 03/17/2017, 11:59PM**
Exercise 3.7 (40 points) **Homework 4: Due 03/31/2017, 11:59PM**
Exercise 3.11
**Homework 5: Due 05/05/2017, 11:59AM**
Exercise 4.2 (10 points) **Exams (40%):****Exam 1:**04/03/2017, 12:00PM-01:15PM, Hodson 313.**Exam 2:**05/12/2017, 09:00AM-10:15AM, Hodson 313.**Project (30%):**There will be a final project to be done either individually or in teams of up to three students. Please see detailed instructions here.**Project proposal: Due 04/26/2017, 11:59PM.**Please submit a one page proposal describing (1) the problem you intend to solve, (2) the algorithms you intend to use, and (3) the datasets and metrics by which you plan to evaluate the algorithms. Please see detailed instructions here.**Project presentation and report: Due 05/12/2017, 10:15AM-12:00PM, Hodson 313.**Please submit a 6-page report (e.g., CVPR 2017 double column format) containing title, abstract, introduction, problem description, proposed solution, experiments, conclusions, and references. Please give a 10 minute presentation (including 3 minutes for questions)

Exercise 2.4

Exercise 2.17 (PCA part only, do not do PPCA or model selection)

1. For analytical questions, please submit a file called hw1.pdf containing your answers to each one of the analytical questions. If at all possible, you should generate this file using the latex template hw1-learning14.tex. If not possible, you may use another editor, or scan your handwritten solutions. But note that you must submit a single PDF file with all your answers.

2. For coding questions, please submit a file called README, which contains instructions on how to run your code. Please use separate directories for each coding problem, each one containing all the functions and scripts you are asked to write in separate files. For example, for HW1 the structure of your submission could look like (a) README (b) hw1.pdf (c) hw1q3: hw1q3c.m, hw1q3e.m The TA will run your scripts to generate the results. Thus, your script should include all needed plotting com- mands so that figures pop up automatically. Please make sure that the figure numbers match those you describe in hw1.pdf. You do not need to submit input or output images. The output images should be automatically generated by your scripts so that the TA can see the results by just running the scripts. In writing your code, you should assume that the TA will place the input images in the directory that is relevant to the question solved by your script. Also, make sure to comment your code properly.

Exercise 2.16

Exercise 3.1

Exercise 3.8 (20 points): Solve part 1 only.

Exercise 3.9 (20 points): Implement and evaluate LRMC only.

Exercise 3.10 (20 points): Implement and evaluate rpca_admm only.

Exercise 4.13 (30 points)

Exercise 6.1 (20 points)

Exercise 7.2 (10 points)

Exercise 7.3 (10 points)

Exercise 8.1 (20 points)

Administrative

- Late policy:
- Homeworks and projects are due on the specified dates.
- No late homeworks or projects will be accepted.
- Honor policy:
The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful. Ethical violations include cheating on exams, plagiarism, reuse of assignments, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition.

- Homeworks and exams are strictly individual
- Projects can be done in teams of two students