04-20 April PE&RS Public

Building Extraction from High-Resolution

Remote Sensing Images Based on GrabCut

with Automatic Selection of Foreground and

Background Samples

Ka Zhang, Hui Chen, Wen Xiao, Yehua Sheng, Dong Su, and Pengbo Wang

Abstract

This article proposes a new building extraction method from

high-resolution remote sensing images, based on GrabCut,

which can automatically select foreground and background

samples under the constraints of building elevation contour

lines. First the image is rotated according to the direction

of pixel displacement calculated by the rational function

Model. Second, the Canny operator, combined with mor-

phology and the Hough transform, is used to extract the

building’s elevation contour lines. Third, seed points and

interesting points of the building are selected under the

constraint of the contour line and the geodesic distance.

Then foreground and background samples are obtained

according to these points. Fourth, GrabCut and geomet-

ric features are used to carry out image segmentation and

extract buildings. Finally, WorldView satellite images are

used to verify the proposed method. Experimental results

show that the average accuracy can reach 86.34%, which

is 15.12% higher than other building extraction methods.

Introduction

Buildings, as an important component of the living environ-

ment, have been the focus of numerous studies, including in

urban planning and construction, chang

e detection, popula-

tion-density estimation, and disaster ass

essment. Automatic

and efficient extraction of geometric and

spatial information

of buildings has always been an importa

nt research topic in

the field of geoinformation science (X. Huang

et al.

2017).

With the advance of remote sensing technology, spatial reso-

lutions of images from very-high-resolution satellites (e.g.,

SPOT-5

, WorldView-1 through WorldView-4, and QuickBird)

have reached meter level, providing more detailed spatial and

textural information (Cheng and Han 2016). Therefore, extrac-

tion of building information from high-resolution remote sens-

ing images has become a research hot spot (Cao

et al.

2016).

However, accurate building extraction from high-resolution

images remains a challenge due to various factors in remote

sensing images, such as diversity of objects, complexity of

buildings, noise, occlusions, shadows, and low contrast. To

make it worse, when the viewing perspective is oblique there

will be much coverage of building elevations in remote sensing

images. Using monocular optical images to automatically ex-

tract the top contour of buildings, those elevation areas are hard

to distinguish from building tops (Cui, Yan and Reinartz 2012;

J. Wang

et al.

2015)—but the main goal of building extraction is

to have a clean boundary for each building (J. Wang

et al.

2015).

At present, methods based on shadow and auxiliary in-

formation are frequently used. However, in locating a build-

ing, shadow-based methods treat the elevation and the roof

equally, producing inaccurate boundaries (Ok, Senaras and

Yuksel 2013; Ngo, Collet and Mazet 2015; Gao

et al.

2018).

Other methods based on auxiliary data such as lidar (light

detection and ranging) can distinguish the roof and elevation

well, but the cost of obtaining such data is high (Zarea and

Mohammadzadeh 2016; Fernandes and Dal Poz 2017; S. Kim

and Rhee 2018). Apart from those, deep learning-based im-

age object extraction i

s a new research trend, but this kind of

method needs large am

ounts of training data, and usually do

not use such data con

taining a large quantity of elevations of

buildings (J. Huang

et

al.

2019; Wurm

et al.

2019).

This article proposes a building extraction method that can

distinguish the roof from the elevation under the constraint of

the building’s elevation contours without any other types of

data or training data. First, building elevation contour lines are

extracted. Then the foreground samples are selected under the

constraints of elevation contours, which are used as back-

ground samples. Finally, GrabCut is used for image segmenta-

tion, and geometric features of the segmented area are used to

accurately extract buildings from high-resolution images. The

method is tested on two urban data sets (Guangdong, China,

and Tripoli, Libya) using WorldView-2 and WorldView-3 satel-

lite image configurations. All results are evaluated qualita-

tively and quantitatively compared with ground truths. The re-

sults show that building tops can be accurately distinguished

from the elevations of buildings in the monocular images of

highly oblique viewing angles with a high level of automation.

Ka Zhang is with the Key Laboratory of Virtual Geographic

Environment, Nanjing Normal University, Ministry of

Education, Nanjing, China; the School of Geography, Nanjing

Normal University, Nanjing, China; the Jiangsu Center

for Collaborative Innovation in Geographical Information

Resource Development and Application, China; the State Key

Laboratory Cultivation Base of Geographical Environment

Evolution (Jiangsu Province), China; and the Key Laboratory

of Urban Land Resources Monitoring and Simulation, MNR,

China (

zhangka81@126.com

).

Hui Chen (co-first author), Yehua Sheng (co-first author),

Dong Su, and Pengbo Wang are with the Key Laboratory of

Virtual Geographic Environment, Nanjing Normal University,

Ministry of Education, Nanjing, China; and the School of

Geography, Nanjing Normal University, Nanjing, China.

Wen Xiao (co-first author) is with the School of Engineering,

Newcastle University, Newcastle upon Tyne, United Kingdom.

Photogrammetric Engineering & Remote Sensing

Vol. 86, No. 4, April 2020, pp. 235–245.

0099-1112/20/235–245

and Remote Sensing

doi: 10.14358/PERS.86.4.235

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

April 2020

235

04-20 April PE&RS Public - page 235

Warning.