PE&RS July 2018 Full - page 462

hard to obtain images taken by UAVs
§
containing the entire
plant structure without any occlusion from a densely planted
crop field. Therefore, we decided to ignore the morphological
information by using patch sequences. This strategy enables
our proposed method to be applicable and robust in detecting
the drought plant stress condition in challenging datasets,
such as the
MSTCivil Dataset
.
Feature Extraction
After we obtain the patch sequences, instead of using hand-
crafted features, we use Convolutional Neural Networks (
CNN
)
to extract discriminative features from each image in the patch
sequence. In the previous works, handcrafted features have
been widely used for representing crop stress conditions [35]
[19]. However, due to the complex physiological effects of the
drought stress condition, handcrafted features, which are most-
ly focusing on specific characteristics, might discard signifi-
cant amounts of the underlying conceptual information. The
difference between any handcrafted features versus features
learned by
CNN
is that multi-layered learning models, such as
CNN
, not only are able to explore low-level features from lower
layers, but also can yield conceptual abstractions from higher
layers. Hence,
CNN
is suitable for our feature extraction task.
In this work, the pre-trained
CNN
model,
VGG-16
[26]
(shown in Figure 4), is adopted for the feature extraction.
Based on the
VGG-16
model with weights pre-trained on
ImageNet [5], we first proceed to fine-tune the model for the
drought plant classification task, and then we use the fine-
tuned model as a feature extractor for our feature extraction
task on the image patch sequences.
Fine-tuning a deep learning network is a procedure based
on the concept of transfer learning [2][6]. We first initialize
the
VGG-16
model using the weights learned from the Ima-
geNet dataset. Then, we truncate the last layer of the model,
which is a softmax layer that targets at 1,000 classes of the
ImageNet dataset, and replace it with a new softmax layer that
targets at two classes (the drought condition and the control
condition). The new softmax layer is trained using the back-
propagation algorithm with our plant image data, which are
the image patches in the patch sequences (the image patch
will inherit the label from the patch sequence it belongs to).
In order to transfer the knowledge learned from the broad do-
main (ImageNet dataset) into our specific domain, we freeze
the weights for the first ten layers so that they remain intact
throughout the fine-tuning process. To fine-tune the model,
we minimize the cross entropy function using the stochastic
gradient descent algorithm with an initial learning rate of
10
-4
, which is smaller than the learning rate for training the
model from scratch. Finally, we use the fine-tuned model to
extract spatial features from the image in the patch sequence.
Each image in the patch sequence will be fed to the fine-tuned
model as the input, which is passed through a stack of convo-
lutional layers and three fully connected layers. The activa-
tion before the last fully connected layer will be considered as
the extracted feature vector of the input. Thus, after the fea-
ture extraction step, the patch sequence will be represented
by a sequence of feature vectors for the classification, where
each image in the patch sequence will be represented by a 1 ×
4096 feature vector in the feature vector sequence.
Bidirectional Long-Short Term Memory Recurrent Neural Network
Recurrent Neural Network (
RNN
) is a layered neural network
that uses its cyclic connection to learn temporal dependencies
in the sequential data. The structure of a simple
RNN
is shown
in Figure 5. Compared to feed-forward layered neural net-
works, for instance the multilayer perceptron [22] that can only
learn static pattern mappings,
RNN
can propagate prior time in-
formation forward to the current time for learning the context
information in a sequence of feature vectors. In other words,
the hidden layer of an
RNN
serves as a memory function.
Figure 5. An example of recurrent neural network.
An
RNN
can be described mathematically as follows.
Suppose there is a sequence of feature vectors denoted as
x
t
,
t
[1,
T
]. In the
RNN
, the hidden layer output vector
h
t
and
the output layer
t
y
are calculated as follows:
§
UAV, unmanned aerial vehicle, commonly known as a drone, is an
aircraft without a human pilot aboard.
Figure 4. The architecture of the
VGG-16
CNN
model [26].
462
July 2018
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
403...,452,453,454,455,456,457,458,459,460,461 463,464,465,466,467,468,469,470
Powered by FlippingBook