Sparse and Low Rank Methods for Imbalanced and Heterogeneous Data
Project Summary
In recent years, sparse and low-rank modeling techniques have emerged as powerful tools for efficiently processing visual data in non-traditional ways. However, existing methods generally succeed only in the presence of sufficient amounts of homogeneous and balanced training data that are well matched to the actual test conditions. In practice, when the data are heterogeneous and imbalanced, the performance of existing methods can be much worse than expected. This project will develop a new recognition framework based on novel sparse and low-rank modeling techniques, which will be able to deal with imbalanced, heterogeneous and multi-modal data. Imbalanced data will be handled using convex optimization techniques that automatically divide a dataset into common and rare patterns, and select a small set of representatives for the common patterns that are then combined with the rare patterns to form a balanced dataset. Heterogeneous and multi-modal data will be handled using non-convex optimization techniques that learn a latent representation from multiple domains or modalities. Classification and clustering algorithms can be applied to the latent representation. Applications of these methods include image and video-based object recognition, activity recognition, video summarization, and surveillance.
Clustering Imbalanced Data in a Union of Hyperplanes
In [1,2] we developed a clustering algorithm that is able to provably cluster imbalanced data drawn from a union of hyperplanes. By imbalanced data in a union of hyperplanes we mean that the number of points in hyperplane H1 is much larger than the number of points in hyperplane H2 and so forth. The algorithm is sequential in the sense that it first computes the normal vector to the dominant hyperplane H1, then the normal vector to the second dominant hyperplane H2 and so on. The normal vector to H1 is computed by solving an L1 non-convex minimization problem on the sphere. The extensive theoretical analysis of our ICML 2017 paper reveals that as long as there are enough points in the dataset from the dominant hyperplane H1 (as compared to points from other hyperplanes), then the normal vector to H1 is the unique up to sign solution to the L1 non-convex problem. Moreover, this solution can be provably computed by iterating a finite number of times a recursion of linear programs. Alternatively, an Iteratively-Reweighted-Least-Squares (IRLS) approach can be employed to furnish such a normal vector in an efficient fashion. To compute the normal vector to H2, instead of removing points that are close to H1, a procedure that is known to be sensitive to the choice of the thresholding parameter, we weight all points according to their distance to the estimated normal to H1, and then solve the non-convex problem on the weighted data. Experiments on synthetic data show that our algorithm dramatically improves the performance of similar sequential algorithms based on traditional approaches such as RANSAC or modern methods such as REAPER.
A Scalable Exemplar-based Subspace Clustering Algorithm for Class-Imbalanced Data
In [3] we developed an exemplar-based subspace clustering method that is able to efficiently clustering imbalanced data in a union of subspaces. Our method searches for a subset of the data that best represents all data points as measured by the l1 norm of the representation coefficients. To solve our model efficiently, we introduce a farthest first search algorithm, which iteratively selects the least well-represented point as an exemplar. When data come from a union of subspaces, we proved that the computed subset contains enough exemplars from each subspace for expressing all data points even if the data are imbalanced. We evaluated the above method on two imbalanced image datasets: the EMNIST handwritten letter dataset and the GTSRB street sign dataset. Experimental results showed that our method outperforms the state-of-the-art in terms of clustering performance and running time. We also demonstrated through experiments on the Extended Yale B face database that the exemplars selected by our model could be used for unsupervised subset selection tasks, where the goal is to select a subset from a big data set that may be used to train a classifier that incurs minimum performance loss.
A Nonconvex Formulation for Clustering Data from Multiple Domains
We addressed the problem of learning domain adaptive sparse representations for clustering data from multiple domains. In the proposed approach, we search for a projection from each domain to a shared latent space where we perform subspace clustering by enforcing the self-expressiveness property of data in a union of subspaces. From an optimization perspective, this formulation requires solving for both the projections from each domain as well as the sparse representation in the shared latent space. While one can solve for both variables by using alternating minimization scheme, the challenge is that the optimization problem is non-convex. Recent work on nonconvex analysis has shown that when the objective function satisfies the so-called strict saddle property, simple methods such as stochastic gradient descent can find the global optimum with overwhelming probability. However, the strict saddle property is hard to verify in general. In our work, we exploit the fact that our optimization problem has a bilinear structure to reduce the strict saddle property in two variables (the projections and the sparse representation) to a strict saddle property in one of the variables. Specifically, we show that if the objective function is strongly convex with respect to one of the variables and satisfied the strict saddle property with respect to the second variable, then it satisfies the strict saddle property with respect to both variables.
A Nonconvex Formulation for Low Rank Subspace Clustering
In [4] we studied the problem of subspace clustering with data that is potentially corrupted by both dense noise and sparse gross errors. In particular, we studied a recently proposed low-rank subspace clustering approach based on a nonconvex modeling formulation. This formulation includes a nonconvex spectral function in the objective function that makes the optimization task challenging, e.g., it is unknown whether the alternating direction method of multipliers (ADMM) framework proposed to solve the nonconvex model formulation is provably convergent. We established that the spectral function is differentiable and gave a formula for computing its derivative. Moreover, we showed that the derivative of the spectral function is Lipschitz continuous and provided an explicit value for the Lipschitz constant. These facts were then used to provide a lower bound for how the penalty parameter in the ADMM method should be chosen. As long as the penalty parameter is chosen according to this bound, we showed that the ADMM algorithm computes iterates that have a limit point satisfying first-order optimality conditions. We also presented a second strategy for solving the nonconvex problem that is based on proximal gradient calculations. The convergence and performance of the algorithms was verified through experiments on real data from face and digit clustering and motion segmentation.
Theoretical Analysis for Affine Subspace Clustering
We studied the problem of subspace clustering when the underlying subspaces are affine. A classical approach for dealing with the affine subspaces in self-representation based subspace clustering methods has been to add an affine constraint to the self-representations. However, since affine subspaces can always be embedded into linear subspaces of one extra dimension, it is unclear if the affine constraint is necessary. Our work provided both theoretical and empirical evidences that when the dimension of the ambient space is high relative to the sum of the dimensions of the affine subspaces, the affine constraint has a negligible effect on the clustering performance. Specifically, our theoretical analysis provided conditions that guarantee the correctness of the methods for the affine subspace clustering problem both with and without the affine constraint, and shows that both conditions are satisfied for high-dimensional data under the assumption that the affine subspaces are drawn from a random model. Our discovery provided important guidance for practitioners when picking the best model for their specific subspace clustering tasks. This work was accepted for publication at the International Conference on Computer Vision (ICCV).
Acknowledgement
Work supported by NSF grant 1618637.
Publications
[1]
M. C. Tsakiris and R. Vidal.
Hyperplane Clustering via Dual Principal Component Pursuit
arXiv:1706.01604v2[cs.CV].
[2]
M. C. Tsakiris and R. Vidal.
Hyperplane Clustering via Dual Principal Component Pursuit
International Conference on Machine Learning (ICML) 2017.
[3]
C. You, C. Li, D. Robinson, R. Vidal.
A Scalable Exemplar-based Subspace Clustering Algorithm for Class-Imbalanced Data
European Conference on Computer Vision (ECCV) 2018.
[4]
H. Jiang, D. Robinson, R. Vidal, C. You.
A Nonconvex Formulation for Low Rank Subspace Clustering: Algorithms and Convergence Analysis
Computational Optimization and Applications 2018.
[5]
C. You, C.-G. Li, D. Robinson, R. Vidal.
Is an Affine Constraint Needed for Affine Subspace Clustering?
International Conference on Computer Vision (ICCV) 2019.
[6]
C. Lane, R. Boger, C. You, M. Tsakiris, B. Haeffele, R. Vidal.
Classifying and Comparing Approaches to Subspace Clustering with Missing Data
International Workshop on Robust Subspace Learning and Applications in Computer Vision 2019.
[7]
B. Haeffele, R. Vidal.
Global Optimality in Neural Network Training
IEEE Conference on Computer Vision and Pattern Recognition 2017.