An Optimization Framework for Understanding Deep Networks
Project Summary
This project seeks to develop a mathematical framework for the analysis of a broad class of non-convex optimization problems, including matrix factorization, tensor factorization, and deep learning. In particular, this project will study the problem of minimizing the sum of a loss function and a regularization function, both of which can be non-convex. Specific goals include characterizing conditions on the loss function, the product map, the network architecture and the regularization strategy that give guarantees of global optimality, provide theoretical insights for the success of existing optimization algorithms and architectures, and offer guidance in the design of novel network architectures.
Global Optimality in Structured Low-rank Matrix Factorization Problems
This work studies various classes of matrix factorization problems. In particular, we study sufficient conditions under which all local minima are guaranteed to be globally optimal as well as conditions under which one can find a global minimum from any initialization using a local descent strategy. Our conditions require the regularization function to control the size of the factorization via a positively homogeneous functions of degree two. These results were submitted for publication to the IEEE Transactions for Pattern Analysis and Machine Intelligence [1].
Global Optimality in Neural Network Training
This work studies the training problem for certain classes of neural networks. In particular, we study sufficient conditions under which all local minima are guaranteed to be globally optimal as well as conditions under which one can find a global minimum from any initialization using a local descent strategy. Our conditions require both the network output and the regularization to be positively homogeneous functions of the network parameters, with the regularization being designed to control the network size. Our results apply to networks with one hidden layer, where size is measured by the number of neurons in the hidden layer, and multiple deep subnetworks connected in parallel, where size is measured by the number of subnetworks. These results were submitted for publication at CVPR 2017 [2], and our paper was accepted for oral presentation (top 2.65% of submissions).
Acknowledgement
Work supported by NSF grant 1618485 (link).
Publications
[1]
B. Haeffele and R. Vidal.
Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications
submitted for publication to IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2]
B. Haeffele and R. Vidal.
Global Optimality in Neural Network Training
accepted for publication and oral presentation (top 2.65% of submissions) at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017.