Scene Understanding using 3D Wireframe Models
Project Summary
Object detection, pose estimation and semantic segmentation are core research problems in the area of computer vision. Even though these problems are solved almost trivially by humans, they have been surprisingly resistant to decades of research. We hope to tackle these fundamental problems using a new class of 3D object models called 3D wireframe models.

A wireframe model is a sparse collection of 3D points, edges and surface normals defined only at a few points on the boundaries of the 3D object. The model is designed such that, when projected onto the image, it resembles a 2D HOG template for the object, hence it can be easily matched to the image by performing fine-grained 3D pose estimation, which gives a 2D detection as a byproduct. In this project, we aim to design algorithms to (1) learn deformable wireframe models from 2D images and (2) use these models for holistic scene understanding (semantic segmentation with 3D pose and layout estimation). The proposed learning algorithm replaces the 3D reconstruction error with pose estimation score to create a new correspondence free non-rigid structure from motion algorithm. The project aims to design new top-down energy terms based on 3D wireframe models that combine semantic segmentation, 3D pose and layout in a CRF-based energy to solve these problems together in a principled manner. It also aims to derive optimization strategies to efficiently solve these problem formulations.

Applications of this research include autonomous navigation (detection and localization in 3D of vehicles and pedestrians for cars) and robotics (identifying and interacting with objects, locating obstacles and determining room layout for navigation).
Object localization and Pose Estimation using 3D Wireframe models
This work [1] introduces a new class of 3D object models called 3D Wireframe models which allow for efficient 3D object localization and fine-grained 3D pose estimation from a single 2D image. The approach follows the classical paradigm of matching a 3D model to the 2D observations. The 3D object model is composed of a set of 3D edge primitives learned from 2D object blueprints, which can be viewed as a 3D generalization of HOG features. This model is used to define a matching cost obtained by applying a rigid-body transformation to the 3D object model, projecting it onto the image plane, and matching the projected model to HOG features extracted from the input image. We also introduce a very efficient branch-andbound algorithm for finding the 3D pose that maximizes the matching score. For this, 3D integral images of quantized HOGs are employed to evaluate in constant time the maximum attainable matching scores of individual model primitives. Experimental evaluation is performed on three different datasets of cars and demonstrated promising results with testing times as low as less than half a second.
Work supported by NSF grant 1527340 (link).
E. Yoruk and R. Vidal.
A 3D wireframe model for efficient object localization and pose estimation
In Workshop on 3D Representation and Recognition at IEEE International Conference on Computer Vision, 2013.
S. Mahendran and R. Vidal.
arXiv 2016
S. Mahendran, H. Ali and R. Vidal.
In Workshop on Deep Learning for Robotic Vision at Conference on Computer Vision and Pattern Recognition, 2017