Tracking Humans in Videos

Faculty: A. Goshtasby

Student: Y. Li

Sponsor: Information Technology Research Institute

Abstract: This is a video understanding project to index and retrieve videos based on contents. In this project, first, human faces and other exposed body parts are detected and tracked. Then other moving image regions are detected and tracked. Finally, it is determined which moving region belong to which individuals in a video, and whether a moving region corresponds to the face, hand, leg, etc. By understanding the motion of different body parts, the objective is to characterize different human activities in a video. The following demonstrates progress made so far in this project.

Example: Suppose a video clip is given. We first detect faces in the first frame and then track the faces in subsequent frames using color and shape information of the detected faces. The program to detect faces in the first image frame has been described elsewhere.

A video clip.


To characterize the motion of a region, the coordinates of the center of the region are drawn as a function of time (frames). In the above example, the center of the region corresponds to the tip of the nose of the detected face. Tracking the tip of the nose of the person in the video clip containing 30 frames is shown below. This curve is descriptive of the kind of motion exhibited by the head of the individual. By training our system to recognize motions based on curves like this, the objective is to recognize different motions for different parts of the body, and by combining different motions, infer different human activities in a video.

Motion of the tip of the nose of the person in the video clip.

Left: The vertical motion. Right: The horizontal motion.

Another example:


Yet another example:


Details of our method: To detect faces in the first frame, a chroma chart representative of skin colors of different races is used. Once faces in the first frame are detected, a chroma chart is computed for each detected individual and that chroma chart is used to detect and track faces in subsequent frames.

Left: Chroma chart representing skin colors of persons of different races. Right: Chroma chart representing the skin color of one individual. Brightness of points show the likelihoods of different chromas belonging to the skin.

(a) First frame in video clip seq31c1.mpg. (b) 10th frame. (c) 30th frame.

Skin-likelihood image: Intensities of pixels represent likelihoods of the pixels representing the skin.

Segmented images to extracted skin regions.

The templates used to extract the exact color templates in the video that are used to detect corresponding faces in the subsequent frame in the video.

Patterns from the first fame, 10th frame, 30th frame in the video that are used to detect and verify the faces in the subsequent frame in the video.

This process is repeated, by using information from frame i to process frame i+1 until all video frames are processed.

Tracking Multiple Regions

The following video clip shows the tracking result of regions from two individuals. Once the skin regions (faces, hands) are obtained, faces and hands are tracked.


Motion signatures for the faces and the hands look like the following:

[WSU Home Page] [CSE Department Home Page] [Intelligent Systems Lab]

For more information contact A. Goshtasby (

Last modified:7/23/99.