Boundary Aware U-Net for Glacier Segmentation

Large-scale study of glaciers improves our understanding of global glacier change and is imperative for monitoring the ecological environment, preventing disasters, and studying the effects of global climate change. Glaciers in the Hindu Kush Himalaya (HKH) are particularly interesting as the HKH is one of the world's most sensitive regions for climate change. In this work, we: (1) propose a modified version of the U-Net for large-scale, spatially non-overlapping, clean glacial ice, and debris-covered glacial ice segmentation; (2) introduce a novel self-learning boundary-aware loss to improve debris-covered glacial ice segmentation performance; and (3) propose a feature-wise saliency score to understand the contribution of each feature in the multispectral Landsat 7 imagery for glacier mapping. Our results show that the debris-covered glacial ice segmentation model trained using self-learning boundary-aware loss outperformed the model trained using dice loss. Furthermore, we conclude that red, shortwave infrared, and near-infrared bands have the highest contribution toward debris-covered glacial ice segmentation from Landsat 7 images.


Introduction
Glacier delineation using remote sensing imagery has seen a growing use of deep learning in recent years [5,14,15,34,36]. This can be attributed to factors such as the availability of large-scale remote sensing data from multiple sources, the development of state-of-the-art deep learning architectures for image analysis, and the growing interest due to the impacts of climate change on glacier basins within the region and not across the region as a whole.
Glaciers form when snow compresses under its own weight and hardens over long timescales [12]. Near its formation, glacial ice has snow or ice surface cover and is known as clean glacial ice. As glacial ice slowly moves down valleys under gravity, avalanches can deposit debris (rocks and sediment) on top of the glacier. Glacial ice having a significant covering of dirt/rocks/boulders is known as debris-covered glacial ice. Clean glacial ice and debris-covered glacial ice appear differently in remote sensing imagery. The challenge lies in differentiating clean glacial ice from temporary snow/ice cover and debris-covered glacial ice from moraines and the surrounding valleys. The spectral uniqueness of clean glacial ice compared to surrounding terrain makes it relatively easy to identify and localize. However, the delineation of debris-covered glacial ice poses significant challenges because of its non-unique spectral signature.
The earliest approaches for debris-covered glacial ice segmentation involving deep learning used multilayer perceptrons to estimate the supraglacial debris loads of Himalayan glaciers using pre-defined glacier outlines [7,8]. More recent methods for glacier segmentation use Convolutional Neural Networks (CNNs) due to their success in image-based applications [5,25,36,39]. Most recent approaches to learning-based image segmentation use variants of the U-Net architecture [30]. Originally introduced for biomedical image segmentation, the U-Net has seen successes in numerous applications involving satellite image segmentation [23,29,38]. Moreover, different architectures based on the U-Net have also been used for glacier segmentation in recent years [5,36]. However, unlike segmentation for general images, the results when it comes to glacier segmentation are not very good, particularly for debris-covered glacial ice.
Here we present a variant of the U-Net and train it using multispectral images from Landsat 7 as inputs. Researchers have shown that the performance of deep learning models can be improved by learning multiple objectives from a shared representation [11]. Early approaches to learn multiple tasks use weighted sum of losses, where the loss weights are either constant or manually tuned [13,21,31]. We propose a method to combine two different loss functions -masked dice loss [2] and boundary loss [9] -to simultaneously learn multiple objectives automatically during the training process for improved performance.
While deep learning models have been shown to perform well on various tasks involving computer vision, the interpretability of these models is limited. Deep neural networks are often considered black boxes, since their decision rules can not be described easily. Unlike coefficients and decision boundaries of simpler machine learning methods such as linear regression and decision trees, weights of neurons in deep neural networks can not be understood as knowledge directly. The development of transparent, understandable, and explainable models is imperative for the wide-scale adoption of deep learning models. Over the years, many have proposed different approaches to describe deep leaning models [22,37,39]. One of the most widely used methods to envision which pixels in the input image affect the outputs the most is by visualizing saliency maps [33]. A saliency map is obtained by calculating the gradient of the given output class with respect to the input image by letting gradients backpropagate to the input. In the case of multispectral or hyperspectral images, spectral saliency [20] is used to visualize salient pixels of an image. Image saliency maps, computed independently for all channels on a multispectral image, can be used to visualize the contribution of each pixel in each channel toward the final output. We propose a method to quantify each channel's contributions towards the final label in the context of glacier segmentation using Landsat 7 imagery.

Dataset and Methodology
The HKH region covers an area of about 4.2 million km 2 from about 15 • to 39 • N latitude and about 60 • to 105 • E longitude extending across eight countries consisting of Afghanistan, Bangladesh, Bhutan, China, India, Myanmar, Nepal, and Pakistan [4]. The geographic extent of the glaciers within the HKH, however, ranges from about 27 • to 38 • N and about 67 • to 98 • E ( Figure 1).
We downloaded the Landsat 7 images used for label creation using Google Earth Engine. Landsat 7 contains the Enhanced Thematic Mapper Plus (ETM+) sensor which captures multiple spectral bands as shown in Table 1. The thermal infrared bands were upsampled from 60 meters to 30 meters resulting in all bands having a spatial resolution of 30 meters. The glacier outlines (labels) [3] were downloaded from International Centre for Integrated Mountain Development (ICIMOD) Regional Database System. (http://rds.icimod. org/Home/DataDetail?metadataId=31029) The glacier labels contain information on clean-ice and debris-covered glaciers in the HKH for regions within Afghanistan, Bhutan, India, Nepal, and Pakistan. The ICIMOD glacier outline labels used in this research were derived using the objectbased image classification methods separately for clean-ice and debris-covered glaciers and fine-tuned with manual intervention [4]. The Landsat 7 images that were used for delineating glacier labels [4] overlap spatially. To avoid spatial overlap between train and test regions, we created polygon features representing a fishnet of rectangular cells for the entire geographical region. We then created a mosaic of all Landsat 7 images used for labeling into a single raster and clipped the raster mosaic to country boundaries for glacier labels ( Figure 1) to avoid false negative glacier labels in the dataset. Finally, we discarded the rasters within the polygon cells that do not contain any glacier labels and downloaded clipped regions within selected cells. The Google Earth Engine code to replicate this process can be found in repository https: //code.earthengine.google.com/?accept_ repo=users/bibekaryal7/get_hkh_tiff. The selected cells were then randomly split into train, validation, and test sets with no geospatial overlap. 1163 out of 1364 cells were filtered out to leave us with 141, 20, and 40 cells in the training, validation, and test sets respectively. Each cell was then cropped into multiple sub-images of 512 × 512 pixels and the sub-images with less than 10% of pixels as glacier labels were discarded to reduce class imbalance. These sub-images are then normalized and provided as input to the model. There are 333, 68, and 98 sub-images in the training, validation, and test sets respectively. Every pixel within each sub-image can have one of four different classes as can be seen in Figure 2. The step-by-step processing we followed to prepare input features for the model is shown in Figure 3. The distribution of pixels for train, validation, and test set across different classes is shown in Table 2 and highlights that the distribution of pixels across different sets is similar and labels are heavily imbalanced across classes. We used a modified version of the U-Net architecture [30] as shown in Figure 4. Each input subimage is 512 × 512 pixels in size. Zero padding was added during each convolution operation to make the output labels the same size as input sub-images. We replaced the Rectified Linear Unit (ReLU) in the original U-Net architecture with Gaussian Er- ror Linear Units (GELU) [16]. We applied batch normalization after each convolution operation and spatial dropout [35] of 0.1 after each down-sampling and up-sampling block to reduce overfitting. We also randomly modified 15% of the training samples by either rotating (90 • , 180 • , 270 • ) or flipping (horizontal/vertical) the input sub-images to the model. We trained the modified U-Net architecture for 250 epochs using the Adam optimizer and evaluated the performance based on precision, recall, and Intersection over Union (IoU). We trained two separate models, one for segmenting clean glacial ice and one for debris-covered glacial ice, and combined the outputs to produce the final segmentation map. Definitions of what constitutes debris-covered glacial ice vary widely, however, as a glacier does not have to be fully covered by debris to be classified as debris-covered glacial ice [24]. Therefore, for the pixels where debris-covered glacial ice labels overlapped with clean glacial ice labels on the final segmentation map, the output label was set as debris-covered glacial ice. The code to replicate our process can be found in the GitHub repository (https://github. com/Aryal007/glacier_mapping.git).

Self-learning Boundary-aware Loss
The subject of this section of our work lies at the intersection of two branches of research, which are penalizing misalignment of label boundaries by using a boundary-aware loss and learning multi-task weights during the training process. We propose a combined loss (L Combined ) that is a weighted sum of masked dice loss (L M Dice ) and boundary loss (L Boundary ), as described in Equation 1. We also compare the performance of our methods using the modified U-Net to the standard U-Net trained on cross entropy loss (L CE ).
(1) The value of hyperparameter α can be set manually between 0 and 1. Having an α of 1 is equivalent to training the model exclusively using masked dice loss and an α of 0 is the same as training the model exclusively using boundary loss. However, tuning the value of α manually for the best results is an expensive process. In order to learn the weights for L Boundary and L M Dice through backpropagation, we initially set α to 0.5 and let the model find the best value of α. However, we observe that without any constraints on the value of α, the network updates α such that L Combined is minimized without necessarily having to minimize L M Dice or L Boundary . This results in poor performance. Inspired by [18] for weighing two different loss func- *clean = clean glacial ice; debris = debris-covered glacial ice tions, we propose Self-Learning Boundary-Aware loss (L SLBA ) that is a combination of L M Dice and L Boundary .
(2) In the case of L SLBA , α 1 and α 2 both are initially set to 1 and we let the model find the best value for α 1 and α 2 through backpropagation. In Table 3 we show performance for different values of α in the case of L Combined and performance of L SLBA . One advantage of using L SLBA over L Combined is that there is no extra hyperparameter that requires fine-tuning. All experiments in Table 3 use eight features from Landsat 7 imagery as inputs.  Table 3, we see that L SLBA performs the best for debris-covered glacial ice segmentation and eliminates the need to fine-tune loss weights. We can also see that the model fails to converge when training solely on boundary loss (α = 0) and training on glacier boundaries by incorporating boundary loss along with masked dice loss results in an overall improvement in performance for debriscovered glacial ice regardless of the weighting factor. Figure 5 shows the weights for masked dice loss

Representation Analysis
To understand the contribution of each feature in the multispectral image toward the final label, we computed a Saliency Score (SS) for each feature by summing all pixels in the Saliency Map (SM) for that feature.  debris-covered glacial ice segmentation in decreasing order are: red, shortwave infrared 1, near infrared, green, high-gain thermal infrared, shortwave infrared 2, low-gain thermal infrared, and blue. Similarly, for clean glacial ice segmentation, the channel-wise contributions in decreasing order are: shortwave infrared 2, blue, shortwave infrared 1, high-gain thermal infrared, red, low-gain thermal infrared, near infrared, and green. As shown in Figure 6, the segmentation models have different high contributing channels for clean glacial ice and debris-covered glacial ice segmentation.

Discussion
Glaciers have been melting at an unprecedented rate in recent years due to global climate change [17,19]. Glaciers are the largest freshwater reservoir on the planet [32], so it is necessary to understand the changes they undergo.
As a result, numerous approaches to automatically delineate glacier boundaries have been proposed [5,6,14,15,34,36]. We frequently observe deep learning methods outperforming traditional machine learning methods for glacier segmentation in the literature [1,5]. However, the results have not been very good, particularly in the case of debris-covered glacial ice.
In this work, we modify U-Net and train it using a novel loss function that allows the modified U-Net to focus on glacier boundaries. From Table 3, we see that standard U-Net is not able to detect debris-covered glacial ice in input sub-images. We can also see from Table 2 that only 2.44% of pixels in the training set correspond to debris-covered glacial ice. This shows that our proposed method is more robust than the original U-Net to imbalanced labels, which are common in remote sensing datasets.
From Figure 5, we can see how the weights change for L SLBA while training. A higher weight is assigned to masked dice loss at the beginning and the weights for boundary loss are gradually increased during training. The reason behind this could be that for an untrained model, it may be easier to learn glacier instances over trying to learn the boundaries. However once the network learns to label instances, it is easier to learn the glacier boundaries. This also explains why the model fails to converge when training solely on L Boundary from scratch as can be seen from the results in Table 3.
We presented methods to improve debris-covered glacial ice segmentation from remote sensing imagery using deep learning. While we were able to show significant improvements over existing methods, the IoU for debris-covered glacial ice still leaves much to be desired. The existing body of literature on the topic has shown that the performance for debris-covered glacial ice segmentation can be improved by incorporating thermal signatures [28] and topographical information [10,26,27] from other satellites. Since debris-covered glacial ice is common in low-gradient areas due to how it forms and has cooler surface temperatures compared to the surrounding non-glaciated regions, we suspect that adding this information can further help improve the performance of debris-covered glacial ice segmentation. We may also be able to see an improvement in performance by using images from the recently-launched Landsat 9 satellite, instead 6 of the Landsat 7 images used in this work. The Operational Land Imager 2 (OLI-2) and the Thermal Infrared Sensor 2 (TIRS-2) sensors on Landsat 9 provide data that is radiometrically and geometrically superior to instruments on the previous generation Landsat satellites. With the higher radiometric resolution, Landsat 9 can differentiate 16,384 shades of a given wavelength compared to only 256 shades in Landsat 7. Meanwhile, the TIRS-2 in Landsat 9 enables improved atmospheric correction and more accurate surface temperature measurements. Future work includes using the images captured through these improved sensors and incorporating additional information such as a digital elevation model for improving debris-covered glacial ice segmentation performance.

Conclusion
In this research study, we proposed a modified version of the U-Net architecture for large-scale debris-covered glacial ice and clean glacial ice segmentation in the HKH from Landsat 7 multispectral imagery and concluded that debris-covered glacial ice (IoU: 35.94%) is significantly harder to delineate compared to clean glacial ice (IoU: 68.17%)( Table 3). We also introduced two different methods to combine commonly-used masked dice loss and boundary loss to incorporate label boundaries into the training process. We show that the performance of debris-covered glacial ice segmentation can be improved by encouraging the deep learning model to focus on label boundaries. The performance can be improved further by correctly weighing loss terms. Furthermore, the relative weights can be learned automatically from the data during the training process using our proposed loss (L SLBA ). Figure 7 shows the performance of the models trained using L SLBA on a sample image from the test set. We also introduced the concept of feature saliency scores to quantify the contribution of each feature (channel) in the input image toward the final label and concluded that the red, shortwave infrared, and near infrared bands contribute the most towards the final label for debris-covered glacial ice segmentation, while shortwave infrared 2, blue, shortwave infrared 1 bands contributed the most towards the final label for clean glacial ice segmentation.