Jeffrey E. Boyd | Main / Shape of Motion

Main / Shape of Motion

The Shape of Motion

We have developed a novel vision system that can recognize people by the way they walk. The system computes optical flow for an image sequence of a person walking, and then characterizes the shape of the motion with a set of sinusoidally-varying scalars. Feature vectors composed of the phases of the sinusoids are able to discriminate among people.

Overview

Our goal is to describe the motion of a moving human figure in order to recognize individuals by variation in the characteristics of the motion description. We begin with a short sequence of images of a moving figure, taken by a static camera, and derive dense optical flow data, (u(x,y),v(x,y)), for the sequence. We determine a range of scale-independent scalar features of each flow image that characterize the spatial distribution of the flow, i.e., the shape of the motion. The scalars are based on various moments of the set of moving points. To characterize the shape of the motion, not the shape of moving points, the points are weighted by |u|, |v|, or |(u,v)|.

We then analyze the periodic structure of these sequences of scalars. All of these sequences share the same fundamental period of the gait, but they differ in phase. Although there are several regularities in the relative phases of the signals, some of the phases show significant statistical variation. Therefore, we are able to use vectors of phase measurements derived for each image sequence to recognize individuals by the shape of their motion.

The representation is model-free, and therefore could be used to characterize the motion of other non-rigid bodies.

System Description

Data-Flow Diagram

As shown in the data-flow diagram (above), the steps in the system are, from top to bottom:

The system begins with a motion sequence of n+1 frames of a person walking.
We compute the optical flow (Bulthoff, Little, and Poggio - source code) of the motion sequence to get n frames of (u,v) data, where u is the x-direction flow and v is the y-direction flow.

u

v

magnitude
In the u and v images, red indicates negative flow while green indicates positive. To save computation time, we manually tracked each subject and computed flow only in a box surrounding the person. You can see artifacts from this box in the MPEG images of the flow sequences.
Spurious motion in the image background can confound recognition. The aspect ratio signals are particularly sensitive to outliers. To eliminate as many outliers as possible, we compute connected components of regions that are moving and discard the small components. The resulting moving "blobs" have less spurious motion.

blobs for sample sequence.
For each frame of the flow, we compute a set of scalars (see source code) that characterize the shape of the flow in that frame. Examples include x and y coordinates of the moving region (centx and centy), and the aspect ratio of the moving region (aspct).
View the cropped motion sequence with the the position of (centx,centy) shown as a "+", the |(u,v)|-weighted centroid (wcentx,wcenty) shown as a box, a solid ellipse corresponding to aspct, and a dashed ellipse corresponding to the aspect ratio of the |(u,v)|-weighted moving region, waspct.

'scalar sequence
Note how the weighted ellipse (particulary senstive to fast-moving parts of the body) follows the motion of the feet as they accelerate in the gait.
We rearrange the scalars to form one time series for each scalar.

Raw Time Series for centx, centy, aspct centx, centy, aspct with Linear Background Removed
The time series are sinusoidal and have a common frequency. We use least-squares linear prediction spectrum analysis (Barrodale and Erikson's method - source code) to find the fundamental frequency and phase for each series. A Matlab example that demonstrates the advantages of this method is available.

Frequency Spectrum of aspct (20 coefficients) aspct with Sinusoid of Fitted Frequency and Phase
Phase values are arbitrary and depend on the point in the gait at which the image sequence begins. We select a single signal to be a reference for the others. Phases, with the reference phase subtracted, are features.
Selected features combine to form a feature vector for recognition. The plot shows the average feature values and the maximum and minimum values for each subject in our experiment.

Feature Vector Plot

The entire process is controlled from a shell script that executes programs to compute the optical flow, calculate the scalar shape descriptions, and analyze the signals for frequency and phase.

Experiment

To verify that the system is capable of recognition, we sampled the gaits of six people using the apparatus depicted below.

Experimental Apparatus

A camera fixed on a tripod points towards a fixed non-reflecting static background. Subjects walk in a circular path such that on one side of the path they pass through the field of view of the camera and pass behind the camera on the other side. Only one subject is in the field of view at any one time. The subjects walk this path for about 15 minutes while the images are recorded on video tape.

Later, we digitize sequences for the six subjects. We discard the first two or three passes for each person and digitize seven sequences for each subject (42 sequences total).

Images from the tape are digitized in 24-bit color at a resolution of 640 by 480 pixels. We resample and crop the images to get black and white images with 320 by 160 pixels.

Click to view an example sequence for each of the six subjects:

Person #1
Person #2
Person #3
Person #4
Person #5
Person #6

The complete data set.

Results

We analyzed the following scalars and their phase features:

centx: x coordinate of centroid of moving region
wcentx: x coordinate of centroid of moving region weighted by |(u,v)|
wcenty: y coordinate of centroid of moving region weighted by |(u,v)|
dcentx: wcentx - centx
dcenty: wcenty - centy
aspct: aspect ratio of moving region
waspct: aspect of ratio of moving region weighted by |(u,v)|
daspct: aspct - waspct
uwcentx: x coordinate of centroid of moving region weighted by |u|
uwcenty: y coordinate of centroid of moving region weighted by |u|
vwcentx: x coordinate of centroid of moving region weighted by |v|
vwcenty: y coordinate of centroid of moving region weighted by |v|

centy (y coordinate of the centroid of moving region) is used only as the phase reference signal.

Analysis of variance, including post hoc testing, indicated the following:

All features showed significant variations between people.
The features showing the greatest variation were aspct, dcenty, wcenty, and vwcenty.

The following scatter plots show the features with the greatest variation, centx, wcentx, and aspct:

Scatter Plot of aspct versus centx


Right	Left

Stereo 3-D Scatter Plot of aspct versus centx and wcentx

We tested recognition using a variety of algorithms. The best results were obtained by finding the nearest neighbor from a set of exemplars, vectors of mean feature values for each subject. To get an unbiased estimate of the recognition rate we used a leave-one-out procedure. Using the full feature vector gives a recognition rate of about 90%. If we use only the four best features, daspct, dcenty, vwcenty, and waspct, then the recognition rate went as high as 95%. In comparison, recognition by chance would yield a rate of about 17%.

Recognition was possible for a variety of flow sources varying in spatial resolution. We observed that while the exemplars changed and the features with the greatest variation changed, as long as the parameters for flow computation were kept constant, recognition is successful.

Analysis of variance and our recognition test show that the features have the following approximate significance for recognition: daspct > dcenty > wcenty > vwcenty > aspct > vwcentx > waspct > centx > uwcentx > wcentx > uwcenty > dcentx