Sparse and Low Rank Methods for Imbalanced and Heterogeneous Data

Project Summary

In recent years, sparse and low-rank modeling techniques have emerged as powerful tools for efficiently processing visual data in non-traditional ways. However, existing methods generally succeed only in the presence of sufficient amounts of homogeneous and balanced training data that are well matched to the actual test conditions. In practice, when the data are heterogeneous and imbalanced, the performance of existing methods can be much worse than expected. This project will develop a new recognition framework based on novel sparse and low-rank modeling techniques, which will be able to deal with imbalanced, heterogeneous and multi-modal data. Imbalanced data will be handled using convex optimization techniques that automatically divide a dataset into common and rare patterns, and select a small set of representatives for the common patterns that are then combined with the rare patterns to form a balanced dataset. Heterogeneous and multi-modal data will be handled using non-convex optimization techniques that learn a latent representation from multiple domains or modalities. Classification and clustering algorithms can be applied to the latent representation. Applications of these methods include image and video-based object recognition, activity recognition, video summarization, and surveillance.

Clustering Imbalanced Data in a Union of Hyperplanes

In [1,2] we developed a clustering algorithm that is able to provably cluster imbalanced data drawn from a union of hyperplanes. By imbalanced data in a union of hyperplanes we mean that the number of points in hyperplane H1 is much larger than the number of points in hyperplane H2 and so forth. The algorithm is sequential in the sense that it first computes the normal vector to the dominant hyperplane H1, then the normal vector to the second dominant hyperplane H2 and so on. The normal vector to H1 is computed by solving an L1 non-convex minimization problem on the sphere. The extensive theoretical analysis of our ICML 2017 paper reveals that as long as there are enough points in the dataset from the dominant hyperplane H1 (as compared to points from other hyperplanes), then the normal vector to H1 is the unique up to sign solution to the L1 non-convex problem. Moreover, this solution can be provably computed by iterating a finite number of times a recursion of linear programs. Alternatively, an Iteratively-Reweighted-Least-Squares (IRLS) approach can be employed to furnish such a normal vector in an efficient fashion. To compute the normal vector to H2, instead of removing points that are close to H1, a procedure that is known to be sensitive to the choice of the thresholding parameter, we weight all points according to their distance to the estimated normal to H1, and then solve the non-convex problem on the weighted data. Experiments on synthetic data show that our algorithm dramatically improves the performance of similar sequential algorithms based on traditional approaches such as RANSAC or modern methods such as REAPER.

A Scalable Exemplar-based Subspace Clustering Algorithm for Class-Imbalanced Data

In [3] we developed an exemplar-based subspace clustering method that is able to efficiently clustering
imbalanced data in a union of subspaces. Our method searches for a subset of the data that best
represents all data points as measured by the l1 norm of the representation coefficients.
To solve our model efficiently, we introduce a farthest first search algorithm, which
iteratively selects the least well-represented point as an exemplar. When data come from
a union of subspaces, we proved that the computed subset contains enough exemplars
from each subspace for expressing all data points even if the data are imbalanced. We
evaluated the above method on two imbalanced image datasets: the EMNIST handwritten
letter dataset and the GTSRB street sign dataset. Experimental results showed that our
method outperforms the state-of-the-art in terms of clustering performance and running
time. We also demonstrated through experiments on the Extended Yale B face database
that the exemplars selected by our model could be used for unsupervised subset selection
tasks, where the goal is to select a subset from a big data set that may be used to train a
classifier that incurs minimum performance loss.

A Nonconvex Formulation for Clustering Data from Multiple Domains

We addressed the problem of learning domain adaptive sparse representations for clustering data from multiple domains. In the proposed
approach, we search for a projection from each domain to a shared latent space where we
perform subspace clustering by enforcing the self-expressiveness property of data in a
union of subspaces. From an optimization perspective, this formulation requires solving for
both the projections from each domain as well as the sparse representation in the shared
latent space. While one can solve for both variables by using alternating minimization
scheme, the challenge is that the optimization problem is non-convex. Recent work on nonconvex
analysis has shown that when the objective function satisfies the so-called strict
saddle property, simple methods such as stochastic gradient descent can find the global
optimum with overwhelming probability. However, the strict saddle property is hard to
verify in general. In our work, we exploit the fact that our optimization problem has a
bilinear structure to reduce the strict saddle property in two variables (the projections and
the sparse representation) to a strict saddle property in one of the variables. Specifically,
we show that if the objective function is strongly convex with respect to one of the
variables and satisfied the strict saddle property with respect to the second variable, then
it satisfies the strict saddle property with respect to both variables.

A Nonconvex Formulation for Low Rank Subspace Clustering

In [4] we studied the problem of subspace clustering with data that is
potentially corrupted by both dense noise and sparse gross errors. In particular, we
studied a recently proposed low-rank subspace clustering approach based on a
nonconvex modeling formulation. This formulation includes a nonconvex spectral function
in the objective function that makes the optimization task challenging, e.g., it is unknown
whether the alternating direction method of multipliers (ADMM) framework proposed to
solve the nonconvex model formulation is provably convergent. We established that the
spectral function is differentiable and gave a formula for computing its derivative.
Moreover, we showed that the derivative of the spectral function is Lipschitz continuous
and provided an explicit value for the Lipschitz constant. These facts were then used to
provide a lower bound for how the penalty parameter in the ADMM method should be
chosen. As long as the penalty parameter is chosen according to this bound, we showed
that the ADMM algorithm computes iterates that have a limit point satisfying first-order
optimality conditions. We also presented a second strategy for solving the nonconvex
problem that is based on proximal gradient calculations. The convergence and
performance of the algorithms was verified through experiments on real data from face
and digit clustering and motion segmentation.

Theoretical Analysis for Affine Subspace Clustering

We studied the problem of subspace clustering when the underlying subspaces are
affine. A classical approach for dealing with the affine subspaces in self-representation based subspace
clustering methods has been to add an affine constraint to the self-representations. However, since affine
subspaces can always be embedded into linear subspaces of one extra dimension, it is unclear if the affine
constraint is necessary. Our work provided both theoretical and empirical evidences that when the
dimension of the ambient space is high relative to the sum of the dimensions of the affine subspaces, the
affine constraint has a negligible effect on the clustering performance. Specifically, our theoretical analysis
provided conditions that guarantee the correctness of the methods for the affine subspace clustering
problem both with and without the affine constraint, and shows that both conditions are satisfied for
high-dimensional data under the assumption that the affine subspaces are drawn from a random model.
Our discovery provided important guidance for practitioners when picking the best model for their
specific subspace clustering tasks. This work was accepted for publication at the International
Conference on Computer Vision (ICCV).

Acknowledgement

Work supported by NSF grant 1618637.

Publications

[1]

Hyperplane Clustering via Dual Principal Component Pursuit

arXiv:1706.01604v2[cs.CV].

[2]

Hyperplane Clustering via Dual Principal Component Pursuit

International Conference on Machine Learning (ICML) 2017.

[3]

A Scalable Exemplar-based Subspace Clustering Algorithm for Class-Imbalanced Data

European Conference on Computer Vision (ECCV) 2018.

[4]

A Nonconvex Formulation for Low Rank Subspace Clustering: Algorithms and Convergence Analysis

Computational Optimization and Applications 2018.

[5]

Is an Affine Constraint Needed for Affine Subspace Clustering?

International Conference on Computer Vision (ICCV) 2019.

[6]

Classifying and Comparing Approaches to Subspace Clustering with Missing Data

International Workshop on Robust Subspace Learning and Applications in Computer Vision 2019.

[7]

Global Optimality in Neural Network Training

IEEE Conference on Computer Vision and Pattern Recognition 2017.