PE&RS June 2016 Full - page 441

Street Addressing and Mapping Board (

WVSAMB

) datasets (avail-

able from the West Virginia

GIS

Technical Center;

http://wvgis.

wvu.edu/

), which were originally created using manual photo-

interpretation of the same leaf-off orthophotography used to

produce the

DEM

data used in this study. The

WVSAMB

data were

supplemented with the 1:24 000 scale National Hydrography

Dataset (

NHD

). The Cost Distance tool creates a surface in which

each cell is assigned the accumulative cost to the closest water-

body. Slope was used as the measure of movement impedance

Topographic slope in degrees (Burrough and McDonell, 1998),

surface curvature, plan curvature (curvature perpendicular to

slope), and profile curvature (curvature in the direction of the

slope) (Moore

et al

., 1991; Zeverbergen and Thorne, 1987) were

also calculated using the Spatial Analyst Extension of ArcMap

10.2 (Esri, 2012). Additional variables were calculated using the

Arc

GIS

Geomorphometry & Gradient Metrics Toolbox (Evans

et al

., 2014) including

CTMI

(Moore

et al

., 1993; Gessler

et al

1995), slope position (Berry, 2002), roughness (Blaszcynski, 1997;

Riley

et al

., 1999), and dissection (Evans, 1972). Slope position,

roughness, and dissection rely on focal statistics calculated using

a moving window, thus the result is dependent on the window

size used. For this study, we used window sizes of 11 × 11 pixels,

21 × 21 pixels, 41 × 41 pixels, and 51 × 51 pixels as an attempt to

capture terrain variability at the hillslope scale. The window sizes

were chosen based on the range of typical ridge to valley bottom

distances in the state. We also calculated summary measures from

the average of the outputs at all four window sizes. This resulted

in a total of five variables (four variables for the individual win-

dow sizes and one average) for each of slope position, roughness,

and dissection. A total of 21 predictor variables was therefore pro-

duced (Table 1). The terrain predictor variables were then associ-

ated with the training and validation pixels using the software

tool Geospatial Modeling Environment (

GME

) (Beyer, 2012).

Probability Model Creation

The

algorithm was implemented using the Random Forest

package (Liaw and Wiener, 2002) within the statistical soft-

ware tool R (R Core Development Team, 2012). This algorithm

requires two user defined parameters: the number of trees

produced (ntree) and the number of predictor variables ran-

domly sampled as candidates at each node (mtry). For each

model, ntree was set to 501 trees, as this was found to be large

enough to produce stable results, and mtry was set to the

default value, the square root of the number of predictor vari-

ables, which resulted in a value of five. Five separate models

were produced using five separate training sets, which were

then combined to form a single model.

Model Validation

As we were primarily interested in the probabilistic predic-

tion as opposed to the per-pixel classification in this study,

we relied on receiver operating characteristic (

ROC

) curves

and the area under the

ROC

curve (

AUC

) measure to evaluate

and compare the

models produced. An

ROC

curve plots

the true positive rate (in this case the proportion of wetlands

correctly classified as wetlands) against the false positive rate

(the proportion of absence (not wetland) points incorrectly

classified as wetlands) at various probability thresholds for

a binary classifier. The

AUC

measure is the area between the

curve in the

ROC

plot and the diagonal, and is equivalent to

the probability that the classifier will rank a randomly chosen

positive (true) record higher than a randomly chosen negative

(false) record. It is equivalent to the Wilcoxon test of ranks.

Generally, values over 0.9 indicate excellent prediction rates

(Hanley and McNeil, 1982; Swets

et al

., 2000; Fawcett, 2007).

ROC

curves and

AUC

measures were produced using the

pROC

package (Robin

et al

., 2011) in R (R Core Development Team,

2012). As one goal of this study was to compare multiple

models, we also made use of Delong’s test for two

ROC

curves

to assess the difference in model performance. This test,

which provides a

-value for statistical comparison, is avail-

able in the

pROC

package (Hanley and McNeil, 1982; Delong

., 1988; Venkatraman and Begg, 1996; Venkatraman, 2000;

Pepe

et al

., 2009; Robin

et al

., 2011; Wickham, 2011).

One strength of the

algorithm is its ability to generate

measures of variable importance during the training process

by excluding each variable sequentially and recording the re-

sulting increase

OOB

error (Breiman, 2001; Rodríguez-Galiano

et al

., 2012a; Rodríguez-Galiano

et al

., 2012b). This ancillary

output of

was used to assess the relative contribution of

each terrain variable for predicting the probability of wetland

occurrence. However, variable importance from standard

tends to be biased towards correlated predictor variables,

which is the case in this study (Strobl

et al

., 2008; Strobl

., 2009; Genuer

et al

., 2010). Therefore, we used the condi-

tional variable importance measure, available in the R party

package, which is more robust in the presence of highly cor-

related input variables (Strobl

et al

., 2008; Strobl

et al

., 2009).

Results and Discussion

Importance of Physiography in Mapping Wetlands

Plate 1 shows a subset of the classification results within each

selected ecological subregion in West Virginia for the

PEM

wet-

lands, masked to the extent of grass cover in the state, and

PFO

able

2. AUC V

alues

for

PEM M

odels

for

ach

cological

ubregion

and

the

ntire

tate

. B

old

ext

ndicates

the

ighest

AUC V

alue

btained

for

ach

ubregion

eing

redicted

; * I

ndicates

tatistical

ifference

the

95% C

onfidence

evel

(

= 0.05)

between

the

odel

and

the

odel

rain

and

redicted

that

egion

PEM Model Comparison

Random Forest Model

Great Valley of

Virginia

Pittsburgh Low

Plateau

Ridge and

Valley

Western Allegheny

Mountains

Western

Coal Fields Statewide

Subregion Where

model applied

Great Valley Of Virginia

0.974

0.931*

0.945*

0.958*

0.953*

0.963*

Pittsburgh Low Plateau

0.924*

0.946

0.918*

0.931*

0.934*

0.939*

Ridge and Valley

0.903*

0.901*

0.940

0.913*

0.891*

0.900*

Western Allegheny Mountains

0.932*

0.936*

0.920*

0.954

0.935*

0.941*

Western Coal Fields

0.916*

0.938*

0.877*

0.911*

0.962

0.943*

West Virginia Total

0.893*

0.936*

0.879*

0.931*

0.937*

0.947

able

3. AUC V

alues

for

PFO/PSS M

odels

for

ach

cological

ubregion

and

the

ntire

tate

. B

old

ext

ndicates

the

ighest

AUC V

alue

btained

for

ach

ubregion

eing

redicted

; * I

ndicates

tatistical

ifference

the

95% C

onfidence

evel

(

= 0.05)

between

the

odel

and

the

odel

rain

and

redicted

that

egion

PFO/PSS Model

Comparison

Random Forest Model

Great Valley

of Virginia

Pittsburgh

Low Plateau

Ridge and

Valley

Western Allegheny

Mountains

Western

Coal Fields Statewide

Subregion Where

model applied

Great Valley Of Virginia

0.963

0.886*

0.912*

0.914*

0.920*

0.884*

Pittsburgh Low Plateau

0.980*

0.993

0.990*

0.992

0.991*

0.993

Ridge and Valley

0.986*

0.993

0.994

0.992*

0.992

Western Allegheny Mountains

0.975*

0.988*

0.982*

0.991

0.986*

0.990*

Western Coal Fields

0.995*

0.997

0.997*

0.998

West Virginia Total

0.963*

0.989*

0.985*

0.991

0.988*

0.991

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

June 2016

441

SEO Version

Warning.

You are currently viewing the SEO version of !text.
It has a number of design and functionality limitations.

We recommend viewing the Flash version or the basic HTML version of this publication.

387...,431,432,433,434,435,436,437,438,439,440 442,443,444,445,446,447,448,449,450