Computer Vision for Rehabilitation Therapy
GEAR (Grounded Early Adaptive Rehabilitation) is a collaborative research effort between the University
of Delaware, University of California Riverside and Johns Hopkins University that brings together robotics
engineers, cognitive scientists, and physical therapists, for the purpose of designing new rehabilitation
environments and methods for young children with mobility disorders. The envisioned pediatric
rehabilitation environment consists of a portable harness system intended to partially compensate for
body weight and facilitate the children’s mobility within a 10 x 10 feet area, a small humanoid robot that
socially interacts with subjects, trying to engage with them in games designed to make them maintain
particular levels of physical activity, and a network of cameras capturing and identifying the motion in
the environment and informing the robot so that the latter adjusts its behavior depending on that of the
The realization of this system presents unique new research challenges in the fields of pediatric
rehabilitation, robot control, machine vision, and computational learning. One of them is to develop
activity recognition methods, which are essential for facilitating children-robot interaction. Our team
aims to develop highly interpretable, structured representations and models of children movements that
capture spatial and temporal relationships among moving body parts, actions and activities, and can be
automatically learned from multimodal time-series data.
We have been working on the development of a library of activity models that are specifically designed
for children. However, we took strides along this goal by using datasets such as MSR Action 3D, MSR
DailyActivity3D, Berkeley MHAD, which were collected from adults performing various activities, e.g.,
hand waving, clapping, jumping, drinking. In particular, we have worked on the development of so-called
"moving poselets" , which are a library of movements associated with a specific body part configuration
(e.g., hand moving forward). We used motion capture data from body parts to learn a library of moving
poselets as well as activity classifiers based on moving poselets. This work was published in the Chalearn
Looking at People Workshop at the International Conference on Computer Vision (2015) . We then
extended this work to video data by developing a spatiotemporal convolutional neural network model for
predicting fine-grained activities that can be decomposed as a sequence of actions. This work was
presented at the European Conference in Computer Vision (2016) .
Multiview Action Classification
Our team from the University of Delaware designed the envisioned pediatric rehabilitation environment
and acquired data of infants (7 to 24 months old subjects) performing actions in it from multiple cameras.
In these scenes not only infants but also robots and adults are present, with the infant being one
of the smallest actors in the scene. Moreover, the set-up is challenging because the infants are often
occluded by other actors or elements in the scene, and thus the information from a given camera is not
always useful for action classification purposes. From this multiview data we aim to classify the main motor
actions seen in infant development (crawling, sitting, standing and walking). We first proposed to address
this problem by a multiple instance learning SVM scheme (MI-SVM), which considers views as instances of the
same sample and takes into account that the action might not be observed in all of them. This work was published
in the Journal of Neuroengineering and Rehabilitation . More recently, we have been working on addressing
the challenges imposed by the complexity of the scene by using local features from spatial regions of
interest in a detection-based multiview action classification scheme. We propose to leverage deep
networks for feature extraction and classification, while introducing learnable fusion coefficients to weigh
the importance of each view in the final prediction. This work has currently been submitted to an
international refereed conference .
R. Vidal, E. Mavroudi, C. Pacheco, L. Tao.
Work supported by NIH grant R01HD87133-01.
Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition
In Chalearn Looking at People Workshop, International Conference on Computer Vision, December 2015.
Segmental Spatio-Temporal CNNs for Fine-grained Action Segmentation and Classification
In European Conference on Computer Vision, October 2016.
Gearing Smart Environments for Pediatric Motor Rehabilitation
Journal of Neuroengineering and Rehabilitation, vol. 17, no.1, 2020.
A Detection-based Approach to Multiview Action Classification in Infants