PE&RS July 2016 Public

The incorporation of elevation data in building detection

requires a co-registration between the

VHR

optical images and the

corresponding

DSM

s. However, direct integration of these two da-

tasets often introduces a misregistration problem (Van de Voorde

et al.

, 2007). For highly elevated buildings in off-nadir images,

the misregistration is a serious and challenging problem due to

the severe relief displacement of such buildings. This particular

problem is thoroughly discussed in Suliman and Zhang (2015).

To circumvent this problem, one may use disparity infor-

mation as the alternative stereo-based third dimension. Dis-

parity maps are constructed by executing an image matching

technique to measure the distance, in pixels, between each

pixel in one image and its conjugate in the other stereo mate

along the epipolar direction. The advantage of using disparity

maps is that they, by definition, have exactly the same refer-

ence frame as one of the stereo images. Replacing elevation

models with disparity maps avoids several computationally

expensive steps (e.g., aerial triangulation and accurate coreg-

istration) required for implementing building detection meth-

ods using stereo-based elevations. The following section is a

brief review of building detection methods based on disparity

information proposed in the existing literature.

Related Works and Challenges

Disparity-based building detection methods rarely appear

in research publications. Oriot (2003) proposed a technique

based on segmenting the disparity map into building and

background classes. Beumier (2008) introduced a simple and

fast technique to detect buildings in directly acquired epipo-

lar Ikonos stereo pairs. The technique uses the disparity of

building edges for the detection. Unlike these two techniques

that assume flat terrain, an approach proposed by Krauß

and Reinartz (2010) for urban object modeling was based

on fusing disparity maps with

VHR

optical stereo data. The

approach includes an appealing technique to extract terrain

disparity. More recently, Krauß

et al.

(2012) made use of the

classification power of eight-band

VHR

stereo images from the

WorldView-2 sensor to complete and enhance the generated

disparity map. Although these reviewed methods are promis-

ing, two major challenges are identified: occlusion effect and

terrain variation effect. These two difficulties become ex-

tremely challenging when dealing with off-nadir

VHR

images

captured over dense and non-flat urban areas.

Occlusion is the result of off-nadir acquisition angles which

create the leaning appearance of buildings in

VHR

images. It is

impossible to find point matches in these hidden areas and,

therefor, to measure the disparity. The consequence is dispar-

ity maps with many no-data regions that are normally filled

by interpolating the surrounding data. However, interpolation

in urban areas results in over smoothing building boundar-

ies and missing narrow roads. This misleading information

destroys the quality of the generated disparity map and affects

the subsequent processes. To minimize the occlusion effect,

the reviewed literature dictates that the convergence angle of

the stereo images should be small to guarantee high similarity

of the overlapped images. Another possible solution is to use

multiple stereo pairs to eliminate the occlusion effect. Unfor-

tunately, these two options are not always available.

The other challenge that faces disparity-based building

detection methods is the need to remove bare-earth effects.

Terrain elevations may cause buildings with the same aboveg-

round height to have different disparity values. Thus, a terrain

disparity map (

TDM

), representing the bare earth, needs to be

extracted and subtracted from its corresponding surface dispar-

ity map (

SDM

) that describes the visible surface. The result is a

normalized surface disparity map (

nSDM

) that represents only

the objects above the extracted

TDM

(i.e.,

nSDM

=

SDM

–

TDM

).

To identify bare earth, an algorithm that uses different

ranks of percentile filters to approximate the terrain variation

is introduced by Weidner and Förstner (1995). This empiri-

cal algorithm assumes that the ground-level areas dominate

the scene. Arefi

et al.

(2007) developed the geodesic algorithm

that iteratively executes a dilation process until a predefined

marker surface is met. However, this algorithm requires

repetitive computation and the results vary depending on

the selected surface. Krauß and Reinartz (2010) proposed a

steep-edge algorithm that uses the subtraction of two sizes of

median-filter results to detect the areas at the bottom of steep

walls. It assumes that the detected areas are on the ground

and not occluded. However, in dense urban areas, different

levels of building roofs may be adjacent to each other in a

way that satisfies the condition of the steep-edge algorithm,

thus produces false surface information.

Being based on specific assumptions, these algorithms

have limitations in extracting terrain from incomplete urban

disparity maps especially when the occlusion in off-nadir

images is serious. In contrast, the local-minima technique

developed by Zhang

et al.

(2004) is a more general tech-

nique without prior assumptions. It is based on interpolating

ground-level points which are detected by a moving window

of a constant size that looks for local minima values.

In summary, off-nadir

VHR

stereo images are burdened by

inherent occlusion at building edges when acquired over

dense urban areas. If the convergence angle of the stereo

images is not small enough, the implemented interpolation

technique to fill the gaps will result in misleading terrain dis-

parities over narrow streets and between adjacent buildings

that destroy the quality of the subsequent process.

Research Objective and Hypothesis

The ultimate aim for this research is detecting building roofs

in off-nadir

VHR

satellite images acquired over a dense and

reasonably non-flat urban area. The adopted approach is based

on using normalized disparity data derived from a stereo pair.

Thus, crucial to this study is developing a technique for gen-

erating normalized disparity maps. For that purpose, we pro-

pose that if the original stereo images are (a) rectified to elimi-

nate the y-direction disparity (y-parallax) of all corresponding

pixels (thereby creating epipolar images), and (b) coregistered

with the corresponding ground-level objects (e.g., roads) to

eliminate the x-disparity of the terrain, then the remaining

measurable x-disparity should represent only the off-terrain

objects (i.e.,

nSDM

). Consequently, both the interpolation step

(to fill data gaps caused by occlusion) and the terrain extrac-

tion process (to normalize the

SDM

) will be bypassed.

The novelty of this technique is in the concept of coregis-

tering terrain-level objects to directly measure aboveground

disparity information without the need for applying either

interpolation or data normalization. An earlier version of this

technique is presented in Suliman

et al.

(2016). The developed

technique is then incorporated in the actual task of stereo-

based building detection using disparity data. The details of

the proposed technique and its implementation are described

in the next Section of this paper.

Next, the validation procedure to evaluate the performance

of the proposed technique in building detection relative to

similar results based on published methods for epipolar recti-

fication, gap surface interpolation, and stereo data normaliza-

tion is described, followed by all of the experimental results

achieved in this study. Finally, the results are compared and

discussed, followed by the conclusions.

Methodology

The methodology for this research aims at tackling the two

tasks stated before: generating a normalized disparity map

and incorporating it in the detection of buildings. These two

tasks are followed by a validation procedure, as detailed in

536

July 2016

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

PE&RS July 2016 Public - page 536

Warning.