Camera Sensor Networks
Project Summary
Recent hardware innovations have produced low-powered, embedded devices (also known as motes) which can be equipped with small cameras and that can communicate with neighbouring units using wireless interfaces. These motes can organize themselves in a "Smart Camera Network" and represent an attractive platform for applications at the intersection of sensor networks and computer vision. In particular, flexibility is a key aspect of this technology: networks of motes are easily deployable and can grow to hundreds of cameras.

Figure: the CITRIC mote

The main challenge in this setting is that each node alone has limited power and computational capabilities. Standard computer vision algorithms (e.g., Structure from Motion, target tracking, object recognition, etc.) are centralized and are not suitable for direct deployement, because they would quickly exhaust the resources of a single node. In order to perform more complex and demanding tasks, the nodes therefore need to collaborate by means of distributed algorithms. Again, standard algorithms from the Sensor Network community (e.g. averaging consensus) cannot be directly employed, due to the special structures arising in computer vision applications.
In our research we aim to extend the algorithms from the Sensor Network and Control community to find distributed solutions for Computer Vision applications. We have focused on two particular issues:
  1. Distributed pose averaging: we have developed algorithms that, given individual estimates of the pose of an object at each node, consistently compute their average in a distributed fashion. Our solution differs from previous work by taking into account the non-Euclidean structure of the space of poses. We have also applied our algorithms to face recognition applications.
  2. Camera network localization and calibration: when the field-of-view's of two cameras intersect, they can estimate their relative pose by using two-views epipolar geometry. We propose algorithms that, given the noisy pose estimates between pairs of cameras, are able to consistently localize each camera in the network. Moreover, we propose algorithms that can also calibrate the internal parameters of the cameras by exploting the presence of a small number of already calibrated cameras in the network.
Consensus on manifolds
In this line of research our goal is to develop algorithms to distributely average quantities that lie on non-Euclidean manifolds. Of our particular interest is the case of pose estimation. We assume that the nodes are localized (meaning that each mote knows its position relative to its neighbours). We assume also that each node can estimate the pose of a common object (e.g. a face).

Figure: A network of cameras looking at the same object

Our goal is then to aggregate the estimates through a distributed averaging. In Sensor Networks, a popular class of algorithms for computing averages are consensus algorithms. These techniques are attractive because require communications only between neighbours and converge under mild network connectivity requirements. Unfortunately, these algorithms are designed for scalar, Euclidean quantities. On the other hand, poses are represented with a pair (R,T), where R is a rotation and T is a translation. Rotations, in particular, live on the manifold of the special orthogonal group SO(3). This means that the appropriate metric (dm in the Figure) has to be used, instead of the Euclidean one (de). This induces also a new concept of average, called Karcher mean. We have developed algorithms that:

Figure: Difference between the appropriate distance in the manifold and the Euclidean distance.

  • Take into account the appropriate measure of distance and the corresponding concept of Karcher mean in SO(3).
  • Extend consensus algorithms to average quantities in SO(3).
  • Show convergence to the correct average.
We have also applied our algorithm to a face recognition application, where each node estimates the pose of the face using Eigenfaces/Tensorfaces and then the various estimates are aggregated with our consensus algorithm on SO(3).

Figure: Our algorithm applied to face recognition. Each camera estimate the pose of the face and communicates with the neighbors to aggregate the estimates.

Distributed localization and calibration of camera networks
Localization problem in a camera sensor network refers to uniquely determining the pose of every camera with respect to a fixed reference frame. More precisely, assume we have a network of n cameras deployed in 3-D. The relative pose of each camera with respect to a reference frame is defined by its rotation R and its translation T with respect to that frame. We say the network is localized if there is a set of relative transformations (Rij,Tij) between camera i and j such that when the reference frame for node 1 is fixed at (R1,T1), the other poses (Ri,Ti) are uniquely determined. When the field-of-view's of two cameras intersect, it is possible to estimate their relative transformation(Rij,Tij) using two-view epipolar geometry. However, there are a couple of problems in directly using these pairwise estimates.
  1. The estimates may not be consistent if the whole network is taken into consideration. For instance, if we go trough a loop in the network, the composition of all relative transformations should give the identity.
  2. The translations are obtained only to a scale. It is necessary to exploit the constraints from the network topology in order to find these additional scales.
Previous work either is not applicable in this setting (typically, only the case of range measurements has been considered) or does not correctly exploit the manifold structure of SE(3). In our work we developed distributed algorithm that, given the pairwise measurement, find a complete, consistent localization of the network. Moreover, our algorithm is optimal with respect to the use of the appropriate metric in SE(3).

Figure: A pictorial representation of a camera setup.

Distributed Calibration of Camera Sensor Networks
The problem of camera calibration for a single camera has been widely studied in the computer vision literature. Such methods typically require the user to show a calibration rig to the camera. Obviously, such methods do not scale well for large camera sensor networks, because they would require manual calibration of each camera. On the other hand, self-calibration methods automatically calibrate the cameras by solving nonlinear equations such as Kruppa's equations. While these methods are very elegant, they suffer from the fact that the problem of solving Kruppa's equations is numerically ill-conditioned. Moreover, these methods assume that all the cameras have the same intrinsic parameters so they are not readily applicable to camera sensor networks, where each camera can have different intrinsic parameters. We show that the problem of automatically calibrating a large number of cameras in a sensor network can be solved in a distributed way by solving a set of linear equations. We show that this is possible under the mild assumption that only one of the cameras needs to be calibrated.
Once the cameras' intrinsic and extrinsic parameters are known, the next problem is to recover the structure of the 3D scene (triangulation problem). However, solving the multiple view triangulation problem becomes difficult in a sensor network setup. This is because, while each pair of motes could easily compute the 3D structure of the scene via linear triangulation, the estimates from different pairs of motes may not be the same, especially in real situations where image data are noisy. We propose a method based on distributed consensus algorithms for estimating the 3D structure of a scene in a distributed way. We show that all the motes compute the same 3D structure, even though they communicate with only a few neighbors in the network. This method requires that all the cameras observe the same scene and that the network graph over which the nodes communicate be connected.
Publications
[1]
R. Tron, R. Vidal and A. Terzis.
International Conference on Distributed Smart Cameras, 2008.
[2]
R. Tron and R. Vidal.
Distributed Face Recognition via Consensus on SE(3).
Workshop on Omnidirectional Vision, 2008.
[3]
E. Elhamifar, R. Vidal
Distributed Calibration of Camera Sensor Networks.
International Conference on Distributed Smart Cameras, 2009.
[4]
R. Tron and R. Vidal.
Distributed 3-D localization in camera networks.
Conference on Decision and Control, 2009.
Work supported by the WSE/APL Contract: Information Fusion and Localization in Distributed Sensor Systems, and the grant NSF CSR-0834470.