The SVN address for the paper is:
svn://cessna.cs.umass.edu/Hough Object Tracking Paper
Abstract: This paper presents a method for extracting distinctive invariant features from
images that can be used to perform reliable matching between different views of
an object or scene. The features are invariant to image scale and rotation, and
are shown to provide robust matching across a a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination.
The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from
many images. This paper also describes an approach to using these features
for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor
algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for
consistent pose parameters. This approach to recognition can robustly identify
objects among clutter and occlusion while achieving near real-time performance.
Download
Abstract: An object recognition system has been developed that uses a
new class of local image features. The features are invariant
to image scaling, translation, and rotation, and partially invariant
to illumination changes and affine or 3D projection.
These features share similar properties with neurons in inferior
temporal cortex that are used for object recognition
in primate vision. Features are efficiently detected through
a staged filtering approach that identifies stable points in
scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales.
The keys are used as input to a nearest-neighbor indexing
method that identifiescandidate object matches. Final verification of each match is achieved by finding a low-residual
least-squares solution for the unknown model parameters.
Experimental results show that robust object recognition
can be achieved in cluttered partially-occluded images with
a computation time of under 2 seconds.
Download
Abstract: This paper approaches the problem of finding correspondences between
images in which there are large changes in viewpoint, scale and illumination. Recent work has shown that scale-space ‘interest points’ may
be found with good repeatability in spite of such changes. Further
more, the high entropy of the surrounding image regions means that
local descriptors are highly discriminative for matching. For descriptors at interest points to be robustly matched between images, they
must be as far as possible invariant to the imaging process.
In this work we introduce a family of features which use groups
of interest points to form geometrically invariant descriptors of image
regions. Feature descriptors are formed by resampling the image rel
ative to canonical frames de?ned by the points. In addition to robust
matching, a key advantage of this approach is that each match implies
ahypothesis of the local 2D (projective) transformation. This allows
us to immediately reject most of the false matches using a Hough trans
form. We reject remaining outliers using RANSAC and the epipolar
constraint. Results show that dense feature matching can be achieved
in a few seconds of computation on 1GHz Pentium III machines.
Download
Abstract: The goal of this article is to review the state-of-the-art tracking methods, classify them into different cate-
gories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking
objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene,
nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is
usually performed in the context of higher-level applications that require the location and/or shape of the
object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of
a particular application. In this survey, we categorize the tracking methods on the basis of the object and
motion representations used, provide detailed descriptions of representative methods in each category, and
examine their pros and cons.Moreover, we discuss the important issues related to tracking including the use
of appropriate image features, selection of motion models, and detection of objects.
Download
Abstract: No feature-based vision system can work unless good
features can be identified and tracked from frame to
frame. Although tracking itself is by and large a solved
problem, selecting features that can be tracked well and
correspond to physical points in the world is still hard.
We propose a feature selection criterion that is optimal
by construction because it is based on how the tracker
works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not
correspond to points in the world. These methods are
based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work
under affine image transformations. We test performance with several simulations and experiments.
Download
Abstract: This paper describes a novel multi-view matching frame-
work based on a new type of invariant feature. Our feaures are located at Harris corners in discrete scale-space
nd oriented using a blurred local gradient. This defines a
otationally invariant frame in which we sample a feature
escriptor, which consists of an 8×8 patch of bias/gain
normalised intensity values. The density of features in the
mage is controlled using a novel adaptive non-maximal
uppression algorithm, which gives a better spatial distriution of features than previous approaches. Matching is
chieved using a fast nearest neighbour algorithm that in-
exes features based on their low frequency Haar wavelet
oefficients. We also introduce a novel outlier rejection pro-
edure that verifies a pairwise feature match based on a
ackground distribution of incorrect feature matches. Feaure matches are refined using RANSAC and used in an
utomatic 2D panorama stitcher that has been extensively
ested on hundreds of sample inputs.
Download
Abstract: In this paper, we describe the application of the novel SURF
(Speeded Up Robust Features) algorithm [1] for the recognition of objects
of art. For this purpose, we developed a prototype of a mobile interactive
museum guide consisting of a tablet PC that features a touchscreen and
a webcam. This guide recognises objects in museums based on images
taken by the visitor. Using different image sets of real museum objects,
we demonstrate that both the object recognition performance as well as
the speed of the SURF algorithm surpasses the results obtained with
SIFT, its main contender.
Download
Abstract: This paper addresses the problem of automatic
construction of a hierarchical map from images. Our approach
departs from a large collection of omnidirectional images taken
at many locations in a building. First a low-level map is built
that consists of a graph in which relations between images are
represented. For this we use a metric based on visual landmarks
(SIFT features) and geometrical constraints. Then we use a
graph partitioning method to cluster nodes and in this way
construct the high-level map. Experiments on real data show
that meaningful higher and lower level maps are obtained,
which can be used for accurate localization and planning.
Download
Abstract: Face identification systems relying on local descriptors are
increasingly used because of their perceived robustness with
respect to occlusions and to global geometrical deformations.
Descriptors of this type – based on a set of oriented Gaussian
derivative filters – are used in our identification system. In this
paper, we explore a pose-invariant multiview face identification
system that does not use explicit geometrical information. The
basic idea of the approach is to find discriminant features to
describe a face across different views. A boosting procedure is
used to select features out of a large feature pool of local features
collected from the positive training examples. We describe
experiments on well-known, though small, face databases with
excellent recognition rate.
Download
Abstract: This paper proposes a novel method for estimating the
geospatial trajectory of a moving camera. The proposed
method uses a set of reference images with known GPS
(global positioning system) locations to recover the trajectory of a moving camera using geometric constraints. The
proposed method has three main steps. First, scale invariant features transform (SIFT) are detected and matched between the reference images and the video frames to calculate a weighted adjacency matrix (WAM) based on the number of SIFT matches. Second, using the estimated WAM, the
maximum matching reference image is selected for the current video frame, which is then used to estimate the relative
position (rotation and translation) of the video frame using
the fundamental matrix constraint. The relative position is
recovered upto a scale factor and a triangulation among
the video frame and two reference images is performed to
resolve the scale ambiguity. Third, an outlier rejection and
trajectory smoothing (using b-spline) post processing step
is employed. This is because the estimated camera locations may be noisy due to bad point correspondence or degenerate estimates of fundamental matrices. Results of recovering camera trajectory are reported for real sequences
Download
Abstract: The SIFT operator developed by David Lowe (Lowe 1999, Lowe 2004) is an algorithm
for object recognition in images.
This dissertation is an exploration of the SIFT operator, with the goal of identifying and
exploring areas of possible improvement. These might be either in performance characteristics of the implementation of the algorithm or general improvements to the stability or
robustness of the algorithm in analysing different images and detecting objects.
First the algorithm will be implemented in C++ on a Windows operating system, then
once it has been successfully implemented, avenues of improvement will be identified and
explored. The areas will be indentified by experimentation and further research during the
course of the project.
Download
Abstract: This paper introduces a new operator to characterize a point in an
image in a distinctive and invariant way. The robust recognition of points is a
key technique in computer vision: algorithms for stereo correspondence, motion
tracking and object recognition rely heavily on this type of operator. The goal in
this paper is to describe the salient point to be characterized by a constellation
of surrounding anchor points. Salient points are the most reliably localized
points extracted by an interest point operator. The anchor points are multiple in
terest points in a visually homogenous segment surrounding the salient point.
Because of its appearance, this constellation is called a spider. With a prototype
of the spider operator, results in this paper demonstrate how a point can be recognized in spite of significant image noise, inhomogeneous change in illumination and altered perspective. For an example that requires a high performance
close to object / background boundaries, the prototype yields better results than
David Lowe’s SIFT operator
Download
Abstract: Photo-realistic 3D modeling is a challenging problem and
has been a research topic for many years. Quick generation of photo-realistic three-dimensional calibrated models
using a hand-held device is highly desirable for applications ranging from forensic investigation, mining, to mobile
robotics. In this paper, we present the instant Scene Modeler (iSM), a 3D imaging system that automatically creates
3D models using an off-the-shelf hand-held stereo camera.
The user points the camera at a scene of interest and the
system will create a photo-realistic 3D calibrated model automatically within minutes. Field tests in various environments have been carried out with promising results
Download
Abstract: This paper presents a new approach to full automatic relative orientation of several digital images taken with a calibrated camera.
This approach uses new algorithms for feature extraction and relative orientation developed in the last few years. There is no need
for special markers in the scene nor for approximate values for the parameters of the exterior orientation. We use the point operator
developed by D. G. Lowe (Lowe, 2004), which extracts points with scale- and rotation-invariant descriptors (SIFT-features). These
descriptors allow a successful matching of image points even in situations with highly convergent images. The approach consists of
the following steps: After extracting image points on all images each image pair is matched using the SIFT parameters only. No
prior information about the pose of the images or the overlapping parts of the images is needed. For every image pair a relative
orientation is computed using a RANSAC procedure. Here we use the new 5-point algorithm developed by D. Nister (Nister, 2004).
Based on these orientations approximate values for the orientation parameters and the object coordinates are calculated. This is
achieved by computing the relative scale and transforming into a common coordinate system. Several tests are carried out to ensure
reliable inputs for the currently final step: a bundle block adjustment. The paper discusses the practical impacts of the algorithms
involved. Examples of different indoor- and outdoor-scenes including a dataset of tilted aerial images are presented and the results of
the approach are evaluated. These results show that the approach can be used for a wide range of scenes with different types of the
image geometry and taken with different types of cameras including inexpensive consumer cameras. In particular we investigate in
the robustness of the algorithms, e.g. in geometric tests on image triplets. In the outlook further developments like the use of image
pyramids with a modified matching are discussed.
Download
Abstract: A new method for motion segmentation is presented for clustering features that belong to independently moving objects. It is based on the geometric constraints imposed on the image positions of points and lines arising from rigidly moving objects in the world. The motion of points and lines in the image over three views are linked by the trilinear (or trifocal) constraint, which plays a similar role in three views to that played by the fundamental matrix in two. The fundamental matrix only imposes a one dimensional constraint on the location of features in the second image given its location in the first, whereas the trilinear constraint gives the exact location of a feature in a third image given its location in the other two. The trilinear constraint discriminates a wider range of motion than the epipolar geometry. Furthermore the trilinear constraint has the advantage that it constraints the location of lines as well as points. The segmentation problem is transformed into that of grouping the features in the image consistent with different trilinear constraints. Feasible clusters are generated using robust techniques. It is essential that the method be robust due to the prevalence of mismatches generated by state of the art feature matchers. Degenerate cases are explored, with specific emphasis on the three view constraint on points and lines imposed by the affine camera.
Download