PE&RS September 2014

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

September 2014

873

A Hierarchical Building Detection Method

for Very High Resolution Remotely Sensed

Images Combined with DSM Using Graph

Cut Optimization

Rongjun Qin and Wei Fang

Abstract

Detecting buildings in remotely sensed data plays an import-

ant role for urban analysis and geographical information

systems. This study proposes a hierarchical approach for

extracting buildings from very high resolution (9 cm

GSD

(Ground Sampling Distance)), multi-spectral aerial images

and matched

DSMs

(Digital Surface Models). There are three

steps in the proposed method: first, shadows are detected with

a morphological index, and corrected for

NDVI

(Normalized

Difference Vegetation Index) computation; second, the

NDVI

is incorporated using a top-hat reconstruction of the

DSM

to

obtain the initial building mask; finally, a graph cut optimi-

zation based on modified superpixel segmentation is carried

out to consolidate building segments with high probability

and thus eliminates segments that have low probability to be

buildings. Experiments were performed over the whole Vai-

hingen dataset, covering 3.4 km

2

with around 3000 buildings.

The proposed algorithm effectively extracted 94 percent of the

buildings with 87 percent correctness. This demonstrates that

the proposed method achieved satisfactory results over a large

dataset and has the potential for many practical applications.

Introduction

The identification and localization of buildings in an ur-

ban area is very important for planning, building analysis,

automatic 3

D

reconstruction of building models and change

detection (Qin and Gruen, 2014). The development of very

high resolution (

VHR

) remote sensing images (Qin

et al

., 2013)

creates a possible avenue to sense individual buildings in

an urban scenario, e.g., Ikonos with 1-meter resolution, or

Worldview with 0.5-meter resolution. Sensors with even

higher resolution are in the planning stages (e.g., Geoeye-2

and Worldview-3 with 0.3-meter resolution). However, this in-

creasing level of detail does not necessarily facilitate building

detection with an improved accuracy (Huang and Zhang,

2011). Indeed, more detailed image contents actually increase

spectral ambiguities in remotely sensed images, such as sym-

bol patterns on the road, and big vehicles. Therefore, research-

ers have devoted a lot of effort toward using multi-source

data and designing better detection strategies to increase the

building detection rate.

Multispectral images provide shadow information as

primitives for building locations. Furthermore, shadow

information are especially effective in single image based

methods (Huang and Zhang, 2012; Ok, 2013; Ok

et al

., 2013).

Meanwhile,

NDVI

data extracted from a multispectral image

can be used as vegetation indicators to eliminate trees. Vector

features such as parallel lines and corner junctions reveal

the characteristics of rectangular buildings, which have been

investigated and used to develop single-image based methods

for building detection (Lin and Nevatia, 1998; Sirmacek and

Unsalan, 2011; Sirmacek and Unsalan, 2010; Sirmaçek and

Unsalan, 2009).

Lidar (Light Detection and Ranging) point clouds provide

height information for a ground scene and are used for build-

ing detection. By subtracting the

DTM

(Digital Terrain Model)

from the

DSM

(Digital Surface Model), a nDSM (normalized

DSM

) can be computed to obtain off-terrain points for build-

ing detection (Weidner and Förstner, 1995). In addition, the

multi-return characteristics of lidar provide useful infor-

mation to eliminate the vegetation for point clouds based

methods (Ekhtari

et al

., 2008; Meng

et al

., 2009), to increase

the accuracy of building detection.

Both multispectral image and lidar point clouds have their

advantages and deficiencies. Complex algorithms based on a

single image usually have assumptions concerning building

distribution and sometimes are only able to detect certain

types of buildings. For example, methods based on feature

point extraction from a single image are only able to detect iso-

lated buildings with regular patterns, and methods relying on

parallel lines are not able to detect dome roofs. As compared

to multi-spectral images, lidar point clouds provide accurate

height information, but less accurate boundaries. There are

also null values for lidar point clouds due to occlusion and

specular reflection from water surfaces on the roofs. Therefore,

integration of both sources is a possible direction for improv-

ing building detection accuracy as well as robustness.

There has been a spate of integrated methods proposed in

the literature. Rottensteiner

et al

. (2007) and Rottensteiner

et

al

. (2005) proposed a supervised classification-based build

Rongjun Qin is with the Singapore ETH Center, Future Cities

Laboratory, ETH, Zurich. 1 CREATE Way, #06-01 CREATE

Tower, Singapore 138602 (

rqin@student.ethz.ch

).

Wei Fang is with the Singapore ETH Center, Future Cities

Laboratory, ETH, Zurich. 1 CREATE Way, #06-01 CREATE

Tower, Singapore 138602, and the State Key Laboratory of

Information Engineering in Surveying, Mapping and Remote

Sensing (LIESMARS), Wuhan University, China. #129, Luoyu

Road, Wuchang District, LIESMARS, Wuhan University, Wu-

han, P. R. China, 430079.

Photogrammetric Engineering & Remote Sensing

Vol. 80, No. 9, September 2014, pp. 873–883.

0099-1112/14/8009–873

and Remote Sensing

doi: 10.14358/PERS.80.9.873

PE&RS September 2014 - page 873

Warning.