PE&RS June 2016 Full

An Assessment of Algorithmic Parameters

Affecting Image Classification Accuracy by

Random Forests

Di Shi and Xiaojun Yang

Abstract

Random forests as a promising ensemble learning algorithm

have been increasingly used for remote sensor image classifi-

cation, and are found to perform identical or better than some

popular classifiers. With only two algorithmic parameters,

they are relatively easier to implement. Existing literature

suggests that the performance of random forests is insensitive

to changing algorithmic parameters. However, this was largely

based on the classifier’s accuracy that does not necessarily

represent the resulting thematic map accuracy. The current

study extends beyond the classifier’s accuracy assessment

and investigate how the algorithmic parameters could affect

the resulting thematic map accuracy by random forests. A

set of random forest models with different parameter settings

was carefully constructed and then used to classify a satellite

image into multiple land cover categories. Both the classifier’s

accuracy and the map accuracy were assessed. The results re-

veal that these parameters can affect the map accuracy up to

9 ~16 percent for some classes, although their impact on the

classifier’s accuracy was quite limited. A careful parameter-

ization prioritizing thematic map accuracy can help improve

the performance of random forests in image classification,

especially for spectrally complex land cover classes. These

findings can help establish practical guidance on the use of

random forests in the remote sensing community.

Introduction

Random forests (

RF

) as a promising ensemble learning al-

gorithm have been increasingly used in the remote sensing

community. They can be used for image classification through

constructing multiple full-grown tree classifiers to vote for the

most popular classes as the labeled results. Random forests

are found to outperform individual tree classifiers (Breiman,

2001; Gislasion

et al

., 2006; Rodriguez-Galiano

et al

., 2012),

and to perform identical or better than several advanced

pattern recognizers such as artificial neural networks (

ANN

)

(Chan and Paelinckx, 2008; Liu

et al

., 2013), support vector

machines (

SVM

) (Pal, 2005; Statnikov

et al

., 2008; Adam

et al

.,

2014),and bagging and boosting methods (Breiman, 2001; Pal,

2003; Gislason

et al

.,2006; Chan and Paelinckx, 2008;Ghimire

et al

., 2012).Additionally, random forests are more straightfor-

ward and efficient when compared with

ANN

and

SVM

which

can be challenging due to the difficulty in parameterization

and the high algorithmic complexity (Breiman, 2001; Pal,

2005; Chan and Paelinckx, 2008). Some further discussion on

the theoretical underpinnings of random forests will be given

in the next section.

Like other classifiers, a wide range of external and internal

factors may affect the performance of random forests in image

classification. Some studies found that external factors such

as training sample size and quality, which normally affect the

performance of other popular classifiers, can also affect the

classification accuracy by random forests (e.g., Breiman, 1999;

Ham

et al

., 2005; Rodriguez-Galiano

et al.

, 2012).Comparing

with other advanced pattern recognizers such as

ANN

and

SVM

, random forests are relatively easier to implement since

only two algorithmic parameters (i.e., the feature number and

the tree number) and a random seed number need to be speci-

fied (Pal, 2005). Over the years, various studies have been

conducted to evaluate how these internal parameter settings

could affect the performance of random forests. Several earlier

studies (e.g., Breimen, 2001; Liaw and Wiener, 2002) found

that the performance of random forests was relatively robust

with respect to changing internal parameters. Similar findings

have been reported in several more recent studies (e.g., Pal,

2005; Lawrence

et al

., 2006; Puissant

et al

., 2014). Neverthe-

less, these existing studies were largely based on the out-of-

bag (

OOB

) error estimate using left-out subsets or set-aside

samples of the training data, in which the training and test

samples are from the same probability distribution. For land

cover mapping from remote sensor imagery, training samples

are generally selected from spectrally homogenous pixels.

Because the training data are not typically selected through a

probability sampling design, the accuracy estimate with a sub-

set of training samples is biased (Steele, 2005), which is not

appropriate to assess the end-product accuracy (Stehman and

Foody, 2009). In the remote sensing community, thematic map

accuracy assessment is normally conducted through the error

matrix analysis with a randomly selected reference dataset

after a map is generated (Congalton, 1991). Because of these

essential differences, an algorithm with stronger classifier’s

performance does not necessarily guarantee better resulting

thematic map accuracy (Richards, 1996; Steele, 2005; Stehm-

an and Foody, 2009). Understanding how the internal param-

eters could affect the thematic mapping accuracy by random

forests can help guide the use of this promising ensemble

learning algorithm in the remote sensing community.

The current work extends beyond the classifier’s error es-

timate and investigates how changing algorithmic parameters

could affect the resulting thematic map accuracy. This is a ne-

glected area in the pattern recognition literature, but the issue

is quite important for the remote sensing community since

remote sensing-based thematic mapping has been more end-

user oriented (Congalton, 1991; Richards, 1996). The entire

research work comprised several major components. First, a

set of random forest models with different internal parameter

settings was carefully constructed and trained. Then, these

models were used to classify a satellite image into multiple

Department of Geography, Florida State University, 311 Coll-

egiate Loop, Tallahassee, Florida 32306 (

xyang@fsu.edu

).

Photogrammetric Engineering & Remote Sensing

Vol. 82, No. 6, June 2016, pp. 407–417.

0099-1112/16/407–417

and Remote Sensing

doi: 10.14358/PERS.82.6.407

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

June 2016

407

PE&RS June 2016 Full - page 407

Warning.