We felt that the advantages of the active contour models would allow for detection of regions that would traditionally be difficult for traditional region detectors. In particular, if our intention is to track an object that is fading to the background or disappearing slightly from view due to rotations, then we felt that active contour models should afford for better region detection and motion tracking than traditional region detectors.
Experiment Setup
As our experiment attempts to compare the relative efficiency of active contour models with that of traditional region detectors in motion tracking, it was important to settle on a standardized way of processing the images as well as a way of evaluating the results. This is especially important given that the two algorithms take in different inputs and have several variables to be tweaked.
Snake Algorithm
Our program takes in a blurred edge-enhanced image as its input, with an ASCII text file containing coordinates for the control points and then iteratively works through the algorithm.
For edge enhancement, we used a sobel edge detector. We convolved the resulting image with a gaussian filter to obtain the input for the program. The program accepts the edge image and the ASCII control point file as its inputs, and needs user to define the α, β and γ parameters. Prior literature review suggested values of 1.0, 1.0 and 1.5 respectively, which we used for our testing. The program then prints out the final ‘locked-on’ control points determined by the snake.
>> vsobel image.correct | vconv k=gaussian.vx of=imageedge.correct
>> ./vsnake if=imageedge.correct ig=coordfile a=1.0 b=1.0 g=1.5
We modified our code slightly to allow efficient processing for image sequences. For image sequences, our input is the blurred edge-enhanced image sequence and the coordinate file containing the coordinates of the control points for the first image, as determined by a human observer in vdview. This is obtained from the bb file via the command:
>> vxftoz image.correct.1.bb image.correct |vpr
The code is modified such that it prints the coordinates of the ‘locked-on’ control points for each frame, but also uses these ‘locked-on’ control points as input for the next frame in the image sequence. The points obtained for each frame are considered to be the boundary of the segmentation, and we used vbndpix to convert the boundary into a binary image showing the segmentation
Traditional Methods
For traditional region detectors, we used a combination of region growing and thresholding. We first did region growing on the image, then thresholded it to isolate the desired region. To determine the parameters for a fair comparison, we decided to find the lower and upper bound of parameters (such as the range for region growing) that produce an acceptable result. We then calculate the result for several values along that range and use the mean as the result for that algorithm. Our intuition is that an individual using such traditional algorithms would similarly try various parameters values until he obtained an acceptable image, so our choice of parameter values should similarly represent the average image he would obtain. We found that the best results were obtained when we set the range to large (so we the regions were large and there were less than 255 regions in the image) and then picked out the region using thresholding. Each image in the sequence was processed separately and the results concatenated to form the final result.
Ground truth
We decided that our ground truth should be determined based on the human visual system’s deliberation of where a particular image region is. The human visual system is dynamic, incorporates high level information with low level sensory information and ultimately is the sole judge on how accurate an image boundary is. We had three different individuals (judges) do separate boundary markings on our test images and then determined our ground truth based on the regions that were marked by two or more of the judges.
Comparison Metric
With such a ground truth, our criterion for judging was as follows:
____________________________________________________________________________
total number of pixels in region
If the region given by the algorithm matches our ground truth exactly, we get a score of 1.00. If there are regions that are incorrectly included, or correct parts that were excluded, the score would be less than 1.00. This metric is commonly known as Fraction Correct Pixel