PE&RS June 2016 Full - page 407

An Assessment of Algorithmic Parameters
Affecting Image Classification Accuracy by
Random Forests
Di Shi and Xiaojun Yang
Abstract
Random forests as a promising ensemble learning algorithm
have been increasingly used for remote sensor image classifi-
cation, and are found to perform identical or better than some
popular classifiers. With only two algorithmic parameters,
they are relatively easier to implement. Existing literature
suggests that the performance of random forests is insensitive
to changing algorithmic parameters. However, this was largely
based on the classifier’s accuracy that does not necessarily
represent the resulting thematic map accuracy. The current
study extends beyond the classifier’s accuracy assessment
and investigate how the algorithmic parameters could affect
the resulting thematic map accuracy by random forests. A
set of random forest models with different parameter settings
was carefully constructed and then used to classify a satellite
image into multiple land cover categories. Both the classifier’s
accuracy and the map accuracy were assessed. The results re-
veal that these parameters can affect the map accuracy up to
9 ~16 percent for some classes, although their impact on the
classifier’s accuracy was quite limited. A careful parameter-
ization prioritizing thematic map accuracy can help improve
the performance of random forests in image classification,
especially for spectrally complex land cover classes. These
findings can help establish practical guidance on the use of
random forests in the remote sensing community.
Introduction
Random forests (
RF
) as a promising ensemble learning al-
gorithm have been increasingly used in the remote sensing
community. They can be used for image classification through
constructing multiple full-grown tree classifiers to vote for the
most popular classes as the labeled results. Random forests
are found to outperform individual tree classifiers (Breiman,
2001; Gislasion
et al
., 2006; Rodriguez-Galiano
et al
., 2012),
and to perform identical or better than several advanced
pattern recognizers such as artificial neural networks (
ANN
)
(Chan and Paelinckx, 2008; Liu
et al
., 2013), support vector
machines (
SVM
) (Pal, 2005; Statnikov
et al
., 2008; Adam
et al
.,
2014),and bagging and boosting methods (Breiman, 2001; Pal,
2003; Gislason
et al
.,2006; Chan and Paelinckx, 2008;Ghimire
et al
., 2012).Additionally, random forests are more straightfor-
ward and efficient when compared with
ANN
and
SVM
which
can be challenging due to the difficulty in parameterization
and the high algorithmic complexity (Breiman, 2001; Pal,
2005; Chan and Paelinckx, 2008). Some further discussion on
the theoretical underpinnings of random forests will be given
in the next section.
Like other classifiers, a wide range of external and internal
factors may affect the performance of random forests in image
classification. Some studies found that external factors such
as training sample size and quality, which normally affect the
performance of other popular classifiers, can also affect the
classification accuracy by random forests (e.g., Breiman, 1999;
Ham
et al
., 2005; Rodriguez-Galiano
et al.
, 2012).Comparing
with other advanced pattern recognizers such as
ANN
and
SVM
, random forests are relatively easier to implement since
only two algorithmic parameters (i.e., the feature number and
the tree number) and a random seed number need to be speci-
fied (Pal, 2005). Over the years, various studies have been
conducted to evaluate how these internal parameter settings
could affect the performance of random forests. Several earlier
studies (e.g., Breimen, 2001; Liaw and Wiener, 2002) found
that the performance of random forests was relatively robust
with respect to changing internal parameters. Similar findings
have been reported in several more recent studies (e.g., Pal,
2005; Lawrence
et al
., 2006; Puissant
et al
., 2014). Neverthe-
less, these existing studies were largely based on the out-of-
bag (
OOB
) error estimate using left-out subsets or set-aside
samples of the training data, in which the training and test
samples are from the same probability distribution. For land
cover mapping from remote sensor imagery, training samples
are generally selected from spectrally homogenous pixels.
Because the training data are not typically selected through a
probability sampling design, the accuracy estimate with a sub-
set of training samples is biased (Steele, 2005), which is not
appropriate to assess the end-product accuracy (Stehman and
Foody, 2009). In the remote sensing community, thematic map
accuracy assessment is normally conducted through the error
matrix analysis with a randomly selected reference dataset
after a map is generated (Congalton, 1991). Because of these
essential differences, an algorithm with stronger classifier’s
performance does not necessarily guarantee better resulting
thematic map accuracy (Richards, 1996; Steele, 2005; Stehm-
an and Foody, 2009). Understanding how the internal param-
eters could affect the thematic mapping accuracy by random
forests can help guide the use of this promising ensemble
learning algorithm in the remote sensing community.
The current work extends beyond the classifier’s error es-
timate and investigates how changing algorithmic parameters
could affect the resulting thematic map accuracy. This is a ne-
glected area in the pattern recognition literature, but the issue
is quite important for the remote sensing community since
remote sensing-based thematic mapping has been more end-
user oriented (Congalton, 1991; Richards, 1996). The entire
research work comprised several major components. First, a
set of random forest models with different internal parameter
settings was carefully constructed and trained. Then, these
models were used to classify a satellite image into multiple
Department of Geography, Florida State University, 311 Coll-
egiate Loop, Tallahassee, Florida 32306 (
).
Photogrammetric Engineering & Remote Sensing
Vol. 82, No. 6, June 2016, pp. 407–417.
0099-1112/16/407–417
© 2016 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.82.6.407
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
June 2016
407
387...,397,398,399,400,401,402,403,404,405,406 408,409,410,411,412,413,414,415,416,417,...450
Powered by FlippingBook