PE&RS March 2016 full version - page 194

and examined the observed, predicted, and 95 percent
prediction interval (Figure 4). Three out of 100 predictions (3
percent) fell outside the prediction interval (denoted by the
open circles in Figure 4). However, across all draws of 100 ob-
servations in the population we would expect 5 percent of the
true values, on average, to fall outside the prediction interval
(95 percent would be in the prediction intervals). We would
further expect the proportion of true values to fall outside the
95 percent prediction interval for any subregion of predicted
values (e.g., predicted values < -10 in Figure 4) to be 0.05.
Overall, 0.97 of predictions for the Normal High (Y
1
)
population fell within the 95 percent prediction interval. For
Normal Low (Y
2
) and Model Misspecification (Y
3
) populations
0.98 and 0.96 of the observations fell within the 95 percent
prediction interval, respectively. We further examined the
behavior of our uncertainty approach by examining all pairs
of observed and predicted values and the frequency at which
observed values were contained in 95 percent prediction
intervals (Plate 2). The frequency was determined by exam-
ining the proportion of prediction intervals that contained
the observed value by integer bins of predicted values (i.e.,
the predicted values were round to the nearest integer). We
expected the proportion in the 95 percent prediction interval
to be 0.95. We found that prediction intervals were generally
conservative for 99 percent of predicted values in the Monte
Carlo assessment. Results were generally adequate within this
range of predicted values although for the Model Misspecifi-
cation (Y
3
) population there was an underestimation of width
of the prediction interval in the right tail of the distribution.
The uncertainty for predictions that had little or no represen-
tation in the Monte Carlo analysis were somewhat spurious.
In some cases prediction interval width was underestimated
while in other cases it was overestimated leading to predic-
tion intervals that were either too narrow or too wide.
Multiple Regression Results
We also examined the behavior of multiple regression models
for the three populations. The y1, y2, and y3 models had r
2
of
0.93, 0.98, and 0.03, respectively. The
RMSE
was 1.98, 0.99, and
19.12 for y1, y2, and y3, respectively. These fit statistics were
not based on cross-validation. Overall, 0.95 of predictions for
the Normal High (Y
1
) population fell within the 95 percent
prediction interval. For the Normal Low (Y
2
) and Model Mis-
specification (Y
3
) populations 0.95 and 0.94 of the observations
fell within the 95 percent prediction interval, respectively. As
noted above, the expected proportion of true values in the 95
percent prediction intervals was 0.95. For both the Normal
High (Y
1
) and Normal Low (Y
2
) populations this expectation
held within the 99 percent quantile of predicted values (Plate
3). Outside the 99 percent quantile results varied slightly. The
Model Misspecification (Y
3
) population behaved somewhat dif-
ferently because the assumptions regarding the distribution of
errors for multiple regression were intentionally violated. This
resulted in predicted values that were relatively close to the
mean and prediction intervals that were generally too narrow
outside the 99 percent quantile of predicted values (Plate 3).
Case Example in Georgia
We developed 95 percent prediction intervals for the study
area. The half-width of the 95 percent prediction interval
ranged from 0.27 percent to 90 percent (Plate 4A). Generally
speaking, the prediction interval width was wider in heteroge-
neous areas such as edges between treed areas and non-treed
areas. We also developed a “masked” version of the percent
tree canopy cover map and the procedure provided reasonable
results (Plate 4B). Generally speaking, areas that were clearly
agriculture and un-vegetated developed areas were readily
masked out from having canopy cover predictions.
Discussion
Since the late 1980s a substantial amount of research has
focused on uncertainty in spatial products (Foody and Atkin-
son, 2002), though, as previously noted, there is much more
methodological maturation with respect to categorical vari-
ables (particularly for parametric classifiers) than for continu-
ous variables. With (a) the increasing prevalence of continuous
field mapping (e.g., leaf area index, tree canopy cover, biomass,
water turbidity, and the like), (b) the production of continuous
field maps using non-parametric approaches, and (c) the use of
these maps in subsequent geospatial modeling, there is a clear
and present need for robust methods by which pixel-specific
uncertainty can be estimated. The prior literature, while
sparse, does clearly make the case for estimation (Wang
et al.
,
2005) and visualization (Dungan
et al.
, 2003) of pixel-specific
uncertainty. Further, the potential role of simulation has long
been established (Englund, 1993), absent the specifics needed
for operational implementation in our particular use-case,
namely continuous field estimation using random forest.
The Monte Carlo approach presented here is data driven
and generally provided conservative prediction interval
widths (i.e., wider than needed) within the 99 percent quan-
tile of predicted values. Outside the 99 percent quantile,
prediction interval widths could be too wide or too narrow.
There are several ways in which this could have occurred.
Our approach was data driven and therefore underperforms
in sparse areas of the distribution. For example, for our test
we drew 500 sample of the population (0.05 percent). Increas-
ing the sample to 5,000 observations (0.5 percent sample)
improved results significantly. Further, in the sparse parts of
the distribution, observations can be rare enough that the way
we analyzed the results (proportion of observations within
their 95 percent prediction intervals (e.g., Plate 2), may not
have sufficient information to estimate the proportion. For
example, in the Normal High (Y
1
) population fewer than 20
observations had predicted values <−24. This means that the
proportion within the 95 percent prediction intervals could
increment in steps of 0.05 or greater. In short, the technique
presented here assumes that the sampled data provide enough
Figure 4. Predicted versus observed and prediction intervals for
100 randomly selected predictions of Y
1
. Prediction intervals are
denoted by the grey error bars. The open circles represent predic-
tions whose 95 percent prediction interval does not contain the
observed value.
194
March 2016
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
167...,184,185,186,187,188,189,190,191,192,193 195,196,197,198,199,200,201,202,203,204,...234
Powered by FlippingBook