Computer Vision for Rehabilitation Therapy
Project Summary
GEAR (Grounded Early Adaptive Rehabilitation) is a collaborative research effort between the University of Delaware and Johns Hopkins University that brings together robotics engineers, cognitive scientists, and physical therapists, for the purpose of designing new rehabilitation environments and methods for young children with mobility disorders. The envisioned pediatric rehabilitation environment consists of a portable harness system intended to partially compensate for body weight and facilitate the children’s mobility within a 10 x 10 feet area, a small humanoid robot that socially interacts with subjects, trying to engage with them in games designed to make them maintain particular levels of physical activity, and a network of cameras capturing and identifying the motion in the environment and informing the robot so that the latter adjusts its behavior depending on that of the child.

The realization of this system presents unique new research challenges in the fields of pediatric rehabilitation, robot control, machine vision, and computational learning. One of them is to develop activity recognition methods, which are essential for facilitating children-robot interaction.Our team aims to develop highly interpretable, structured representations and models of children movements that capture spatial and temporal relationships among moving body parts, actions and activities, and can be automatically learned from multimodal time-series data.

Activity Models
We have been working on the development of a library of activity models that are specifically designed for children (Aim 2A). However, we took strides along Aim 2A by using datasets such as MSR Action 3D, MSR DailyActivity3D, Berkeley MHAD, which were collected from adults performing various activities, e.g., hand waving, clapping, jumping, drinking. In particular, we have been working on the development of so-called "moving poselets" [1], which are a library of movements associated with a specific body part configuration (e.g., hand moving forward). We used motion capture data from body parts to learn a library of moving poselets as well as activity classifiers based on moving poselets. This work was published in the Chalearn Looking at People Workshop at the International Conference on Computer Vision (2015) [1]. More recently, we have been working on extending this work to video data. In particular, we have been developing a spatiotemporal convolutional neural network model for predicting fine-grained activities that can be decomposed as a sequence of actions. This work will be presented at the European Conference in Computer Vision (2016) [2].
R. Vidal, E. Mavroudi, L. Tao.
Work supported by NIH grant R01HD87133-01.
L. Tao and R. Vidal.
Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition
In Chalearn Looking at People Workshop, International Conference on Computer Vision, December 2015.
C. Lea, A. Reiter, R. Vidal.
Segmental Spatio-Temporal CNNs for Fine-grained Action Segmentation and Classification
In European Conference on Computer Vision, October 2016.