PE&RS July 2016 Public - page 536

The incorporation of elevation data in building detection
requires a co-registration between the
VHR
optical images and the
corresponding
DSM
s. However, direct integration of these two da-
tasets often introduces a misregistration problem (Van de Voorde
et al.
, 2007). For highly elevated buildings in off-nadir images,
the misregistration is a serious and challenging problem due to
the severe relief displacement of such buildings. This particular
problem is thoroughly discussed in Suliman and Zhang (2015).
To circumvent this problem, one may use disparity infor-
mation as the alternative stereo-based third dimension. Dis-
parity maps are constructed by executing an image matching
technique to measure the distance, in pixels, between each
pixel in one image and its conjugate in the other stereo mate
along the epipolar direction. The advantage of using disparity
maps is that they, by definition, have exactly the same refer-
ence frame as one of the stereo images. Replacing elevation
models with disparity maps avoids several computationally
expensive steps (e.g., aerial triangulation and accurate coreg-
istration) required for implementing building detection meth-
ods using stereo-based elevations. The following section is a
brief review of building detection methods based on disparity
information proposed in the existing literature.
Related Works and Challenges
Disparity-based building detection methods rarely appear
in research publications. Oriot (2003) proposed a technique
based on segmenting the disparity map into building and
background classes. Beumier (2008) introduced a simple and
fast technique to detect buildings in directly acquired epipo-
lar Ikonos stereo pairs. The technique uses the disparity of
building edges for the detection. Unlike these two techniques
that assume flat terrain, an approach proposed by Krauß
and Reinartz (2010) for urban object modeling was based
on fusing disparity maps with
VHR
optical stereo data. The
approach includes an appealing technique to extract terrain
disparity. More recently, Krauß
et al.
(2012) made use of the
classification power of eight-band
VHR
stereo images from the
WorldView-2 sensor to complete and enhance the generated
disparity map. Although these reviewed methods are promis-
ing, two major challenges are identified: occlusion effect and
terrain variation effect. These two difficulties become ex-
tremely challenging when dealing with off-nadir
VHR
images
captured over dense and non-flat urban areas.
Occlusion is the result of off-nadir acquisition angles which
create the leaning appearance of buildings in
VHR
images. It is
impossible to find point matches in these hidden areas and,
therefor, to measure the disparity. The consequence is dispar-
ity maps with many no-data regions that are normally filled
by interpolating the surrounding data. However, interpolation
in urban areas results in over smoothing building boundar-
ies and missing narrow roads. This misleading information
destroys the quality of the generated disparity map and affects
the subsequent processes. To minimize the occlusion effect,
the reviewed literature dictates that the convergence angle of
the stereo images should be small to guarantee high similarity
of the overlapped images. Another possible solution is to use
multiple stereo pairs to eliminate the occlusion effect. Unfor-
tunately, these two options are not always available.
The other challenge that faces disparity-based building
detection methods is the need to remove bare-earth effects.
Terrain elevations may cause buildings with the same aboveg-
round height to have different disparity values. Thus, a terrain
disparity map (
TDM
), representing the bare earth, needs to be
extracted and subtracted from its corresponding surface dispar-
ity map (
SDM
) that describes the visible surface. The result is a
normalized surface disparity map (
nSDM
) that represents only
the objects above the extracted
TDM
(i.e.,
nSDM
=
SDM
TDM
).
To identify bare earth, an algorithm that uses different
ranks of percentile filters to approximate the terrain variation
is introduced by Weidner and Förstner (1995). This empiri-
cal algorithm assumes that the ground-level areas dominate
the scene. Arefi
et al.
(2007) developed the geodesic algorithm
that iteratively executes a dilation process until a predefined
marker surface is met. However, this algorithm requires
repetitive computation and the results vary depending on
the selected surface. Krauß and Reinartz (2010) proposed a
steep-edge algorithm that uses the subtraction of two sizes of
median-filter results to detect the areas at the bottom of steep
walls. It assumes that the detected areas are on the ground
and not occluded. However, in dense urban areas, different
levels of building roofs may be adjacent to each other in a
way that satisfies the condition of the steep-edge algorithm,
thus produces false surface information.
Being based on specific assumptions, these algorithms
have limitations in extracting terrain from incomplete urban
disparity maps especially when the occlusion in off-nadir
images is serious. In contrast, the local-minima technique
developed by Zhang
et al.
(2004) is a more general tech-
nique without prior assumptions. It is based on interpolating
ground-level points which are detected by a moving window
of a constant size that looks for local minima values.
In summary, off-nadir
VHR
stereo images are burdened by
inherent occlusion at building edges when acquired over
dense urban areas. If the convergence angle of the stereo
images is not small enough, the implemented interpolation
technique to fill the gaps will result in misleading terrain dis-
parities over narrow streets and between adjacent buildings
that destroy the quality of the subsequent process.
Research Objective and Hypothesis
The ultimate aim for this research is detecting building roofs
in off-nadir
VHR
satellite images acquired over a dense and
reasonably non-flat urban area. The adopted approach is based
on using normalized disparity data derived from a stereo pair.
Thus, crucial to this study is developing a technique for gen-
erating normalized disparity maps. For that purpose, we pro-
pose that if the original stereo images are (a) rectified to elimi-
nate the y-direction disparity (y-parallax) of all corresponding
pixels (thereby creating epipolar images), and (b) coregistered
with the corresponding ground-level objects (e.g., roads) to
eliminate the x-disparity of the terrain, then the remaining
measurable x-disparity should represent only the off-terrain
objects (i.e.,
nSDM
). Consequently, both the interpolation step
(to fill data gaps caused by occlusion) and the terrain extrac-
tion process (to normalize the
SDM
) will be bypassed.
The novelty of this technique is in the concept of coregis-
tering terrain-level objects to directly measure aboveground
disparity information without the need for applying either
interpolation or data normalization. An earlier version of this
technique is presented in Suliman
et al.
(2016). The developed
technique is then incorporated in the actual task of stereo-
based building detection using disparity data. The details of
the proposed technique and its implementation are described
in the next Section of this paper.
Next, the validation procedure to evaluate the performance
of the proposed technique in building detection relative to
similar results based on published methods for epipolar recti-
fication, gap surface interpolation, and stereo data normaliza-
tion is described, followed by all of the experimental results
achieved in this study. Finally, the results are compared and
discussed, followed by the conclusions.
Methodology
The methodology for this research aims at tackling the two
tasks stated before: generating a normalized disparity map
and incorporating it in the detection of buildings. These two
tasks are followed by a validation procedure, as detailed in
536
July 2016
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
447...,526,527,528,529,530,531,532,533,534,535 537,538,539,540,541,542,543,544,545,546,...582
Powered by FlippingBook