PE&RS December 2017 Public

4FP-Structure: A Robust Local

Region Feature Descriptor

Jiayuan Li, Qingwu Hu, and Mingyao Ai

Abstract

Establishing reliable correspondence for images of the same

scene is still challenging work due to repetitive texture and

unknown distortion. In this paper, we propose a region-

matching method to simultaneously filter false matches

and maximize good correspondence between images, even

those with irregular distortion. First, a novel region descrip-

tor, represented by a structure formed by four feature points

(

4FP

-Structure), is presented to simplify matching with severe

deformation. Furthermore, an expansion stage based on the

special

4FP

-Structure is adapted to detect and select as many

high location accuracy correspondences as possible under a

local affine-transformation constraint. Extensive experiments

on both rigid and non-rigid image datasets demonstrate that

the proposed algorithm has a very high degree of correctness

and significantly outperforms other state-of-the-art methods.

Introduction

As a basic step for many remote sensing and computer vision

applications, such as image registration (Brown and Lowe,

2003), structure from motion (Snavely

et al

., 2006), and

simultaneous localization and mapping (

SLAM

) (Montemerlo

et al

., 2002), automatic image matching has been well studied

in recent years. Current feature matching algorithms (Bay

et

al

., 2008; Ke and Sukthankar, 2004; Lourenço

et al

., 2012;

Lowe, 2004; Rublee

et al

., 2011; Tola

et al

., 2010) typically

consist of three major stages: keypoint detection, keypoint

description and keypoint matching. In the first stage, salient

and stable interest points are extracted. These keypoints are

then described based on their photometric neighborhoods

using properties such as local gradients. In the third step, the

distances between the descriptor vectors are calculated to

recognize reliable correspondences. Among these methods,

the most famous is the scale-invariant feature transform (

SIFT

)

(Lowe, 2004) due to its robustness to image scale, rotation,

illumination and viewpoint change.

For rigid scenes, such a framework can achieve remarkable

results. Point correspondences can be produced with high

correctness rate. Although there are some false matches be-

cause of ambiguities that arise from poor or repetitive texture,

a postprocessing step such as

RANSAC

(Fischler and Bolles,

1981) or graph matching (Conte

et al

., 2004) can be adopted.

The

RANSAC

algorithm is a robust technique for model

fitting with noise and outliers, which has been widely used

in computer vision and machine learning. The basic idea of

RANSAC

is simple but effective: first, randomly select a subset

of correspondences to compute the candidate fundamental

or homography matrix because perspective images satisfy the

epipolar or homography constraint. Then, count the number

of correspondences that support this transformation model.

If the number is sufficiently large, the transformation matrix

can be considered a good solution. The matches that sup-

port it will be accepted as inliers; in contrast, others will be

discarded as outliers.

RANSAC

, however, works well only if

two prerequisites are satisfied. The first is a sufficiently high

inlier rate. Literature (Liu and Marlet, 2012) reports that

RANSAC

-like (Chum and Matas, 2005b; Chum

et al

., 2003; Torr

and Zisserman, 2000) methods may fail and become very

time-consuming when the inlier rate is less than 50 percent.

If the inlier rate is very small, the number of required itera-

tions becomes huge. The other is the transformation model.

A putative model must be given in advance, and the inlier set

should satisfy this model well.

Graph matching (Cho and Lee, 2012; Conte

et al

., 2004;

Duchenne

et al

., 2011) is another powerful and general tool

for feature matching. It represents scene images as graphs

using feature points, and correct correspondences can be ex-

tracted by solving a global optimization function to minimize

the structural distortions between graphs (Cho and Lee, 2012).

Unlike the

RANSAC

algorithm which only uses rigid geometric

constraints, graph matching can also be applied to non-rigid

scenes. However, current methods still assume that the inlier

rate is relatively high. The large number of outliers aris-

ing from strong distortion may make them impractical. For

instance, Duchenne

et al

. (2011) show that if the outlier rate is

more than 70 percent, the performance of graph matching will

severely drop. Another problem of graph matching is that it is

NP

-hard, so the computational costs in time and memory limit

the permissible sizes of input graphs.

In this paper, we also focus on feature matching for non-

rigid scenes, e.g., fisheye images. A fisheye lens has a large

field of view (

FOV

), which is needed for many vision tasks

in photogrammetry and computer vision. For instance, five

fisheye images are sufficient for 360° panoramic stitching,

but nine perspective images are needed; self-driving vehicles

(Geiger

et al

., 2012) need a large

FOV

to accurately sense the

environment to plan their route. However, fisheye images

have an inherent drawback: distortion is severe. Because of

that,

SIFT

(Lowe, 2004) usually cannot work well, and the

outlier rate may be very high (higher than 50 percent). In

addition, a fisheye image no longer satisfies the homography

constraint and has its own epipolar geometry, which can be

applied only if the calibration information is provided. These

issues make feature matching challenging, as the prerequisites

of

RANSAC

and graph matching are not well satisfied.

To exactly distinguish inliers from outliers for both rigid

and non-rigid images, a region-matching method is proposed.

We first define a

4FP

-Structure, formed by four neighbor-

hood feature points, to represent the local region. Using

local regions instead of feature points for matching has two

advantages: (a) The

4FP

-Structure is a 4-node graph that has

the ability to resist the distortion in a small region, and it

contains four feature points that can restrain each other to

School of Remote Sensing and Information Engineering,

Wuhan University, Wuhan, China (

huqw@whu.edu.cn

).

Photogrammetric Engineering & Remote Sensing

Vol. 83, No. 12, December 2017, pp. 813–826.

0099-1112/17/813–826

and Remote Sensing

doi: 10.14358/PERS.83.12.813

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

November 2017

813

PE&RS December 2017 Public - page 813

Warning.