Detection of forest roads in Sentinel-2 images using U-Net

This paper presents a new method for semiautomatic detection of nature interventions in Sentinel-2 satellite images with 10 m spatial resolution. The Norwegian Environment Agency is maintaining a map of undisturbed nature in Norway. U-Net was used for automated detection of new roads, as these are often the cause whenever the area of undisturbed nature is reduced. The method was able to detect many new roads, but with some false positives and possibly some false negatives (i.e., missing new roads). In conclusion, we have demonstrated that automated detection of new roads, for the purpose of updating the map of undisturbed nature, is possible. We have also suggested several improvements of the method to improve its usefulness.


Introduction
The Norwegian Environment Agency (in Norwegian: Miljødirektoratet) is maintaining a map of undisturbed nature in Norway. Undisturbed nature is defined as areas 1 km or more from heavy technical interventions, such as roads, railways, builtup areas, power lines, hydropower dams and wind power turbines. Excluding Svalbard, about 45% of Norway's land area is undisturbed nature. The database of undisturbed nature consists of several thousand polygones, of which 2860 are larger than 1 km 2 .
The main source for updating the map of undisturbed nature is the regular update of the national * Corresponding Author: trier@nr.no digital map series, based on aerial photography followed by manual photo interpretation. For urban areas, the update cycle is 1-3 years, but for rural areas, forests and mountain areas, the update cycle may be more than five years. Thus, alternative sources for updating the map of undisturbed nature were sought.
The Sentinel-2 satellites of the European Space Agency provide weekly coverage of Norway in the form of multispectral images at 10 m resolution. Visual inspection of such images indicated that forest roads and construction roads may be detected by automatic methods. However, due to varying cloud cover, one may need to wait several weeks until a new road is visible in the images. Thus there may be a delay of several weeks from a new road is constructed until it is visible in the Sentinel-2 satellite images. Also, roads may be hidden by snow cover in the winter season. Albeit these limitations, it seemed realistic that Sentinel-2 could be used to obtain much more frequent updates of the map of undisturbed nature, preferably using automated methods.
In recent years, deep neural networks are being used as the preferred method in many image detection tasks in remote sensing. Salberg et al. [6] and Liu et al. [2] investigate the potential of using deep segmentation networks to map forest roads from lidar images. Yang et al. [8] uses U-Net [5] to detect roads in Google Earth images with 1.2 m resolution. The hypothesis in this study was that U-Net could be trained to detect roads also in the coarser resolution of 10 m, but with the advantage of having more spectral bands.

Data
Multispectral satellite data from Sentinel-2 covering southern Norway were used. The Sentinel-2 satellite carries a multi-spectral optical sensor that produces images of the ground with as high as 10 m spatial resolution in 13 spectral channels with wavelengths from 492 nm to 2202 nm. We used 10 spectral channels (the 10 m and 20 m bands) as input to the deep learning model. Data bands that were of lower resolution were re-sampled to 10 m spatial resolution using bi-linear interpolation. All 13 bands (10 m, 20 m and 60 m bands) were used for cloud detection. The data is delivered as 100 km × 100 km image tiles projected to the nearest UTM zone, which in this case was zone 32 N. Of the 38 tiles, 34 tiles were used for training of the parameters of deep neural network, and two tiles were used for validation of the training and were thus part of the data seen during training. The validation tiles were T32VNL and T32VNN, covering the Larvik and Lillehammer areas, respectively. The tiles T32VMN and T32VNR, covering the Geilo and Trondheim areas, respectively, were reserved for testing and not seen during training.
Vector data of all known roads, ranging from highways to forest roads, were used as ground truth data and converted to a 10 m resolution raster road mask for each 100 km × 100 km tile. Similarly, vector data of lakes and rivers were converted to a 10 m resolution water mask for each tile. The purpose of the water masks was to mask false forest road predictions in dry periods, when rivers and lake shorelines may have exposed bare rock surfaces with spectral signatures similar to gravel roads.

Convolutional Neural Network architecture
U-Net [5] is used in several image detection problems on remote sensing images. We modified the U-Net architecture to accept input images with 10 spectral bands. The network consist of an encoderpart and a decoder-part. Each of these consists of a basic building block of three convolutional layers, with subsequent batch normalization and a RELUactivation function. In the encoder part, the im-ages go through four convolution blocks with downsampling with a factor of two after each of them.
In the decoder part, the resulting feature-maps are subject to four convolution blocks with a bi-linear up-sampling operation after each of them, also with a factor of two. The feature maps go through one more block with convolutions before a final convolution and softmax layer that maps each pixel to two bands, indicating the scores of forest roads and background. Between the decoder and encoder, high-resolution feature maps are bypassed to let the network also make use of local high-resolution information.

Training procedure
The U-Net was trained and tested on cloud free Sentinel-2 image tiles of southern Norway from 2018-2021, and the months April-September. Images with snow covered roads were not used. We used a cross-entropy loss function. The optimization was done with the ADAM algorithm [1] with a learning rate of 0.004. An iteration is the process of running one batch through the network. We trained the network with 5,000 batches of 16 random image crops of 256×256 pixels at 10 m resolution, and validated the model every 500th iteration and selected the checkpoint associated with the best validation-accuracy. The batches were generated with random samples from random Sentinel-2 images, and we continued to generate such random batches until the total number of batches (5,000) was reached.

Prediction pipeline
The detection method was then implemented in an automatic pilot service. Each evening, new Sentinel-2 images covering southern Norway, and with less than 10% cloud cover, were downloaded from https://scihub.copernicus.eu/dhus/. The following processing steps were used: The pilot service was run this summer until 1 October, both to demonstrate the usefulness of an automatic detection service and to identify necessary improvements.

Use of gradient of digital terrain model
We wanted to investigate if the detection results could be improved by adding an extra input band derived from a digital terrain model (DTM). A DTM of Norway in 10 m resolution and in UTM zone 33N was downloaded from https://hoydedata.no. The DTM is derived from airborne lidar data with 2-10 points per m 2 . From the DTM, a gradient image G was obtained by using Sobel's edge detectors. Then, in order for the gradient image to have roughly the same value range as the spectral bands, a scaling was needed. We used 0.1 √ G.

Detection of existing roads in Sentinel-2 images
By running the method on the test data, rates of true positive, false positive and false negative predictions were obtained (Table 1). Two versions of the ground truth data were used. When tractor roads, i.e., the smallest forest roads, were included in both training and testing, the true positive rate dropped from 41 % to 29 % compared with when tractor roads were excluded from training and testing (Table 1). At the same time, the false positive rate was slightly higher (7 %) when tractor roads were included, compared to when tractor roads were excluded (6 %). The reason for decreased detection performance may be that tractor roads are less visible in the satellite data, so that training on tractor roads was more difficult than on the other, larger roads. This was also indicated by the best validation scores during training, which were 70 % and 66 % for roads excluding tractor roads and roads including tractor roads, respectively. When a buffer was added to all roads, so that pixels next to a road were ignored during training and testing, the true positive rate increased from 41 % to 57 % (Table 2). However, at the same time, the false positive rate increased from 6 % to 18 %. Another observation was that the true positive rate varied between different acquisition dates for the same tile. For T32VMN, the true positive rate varied from 53 % to 64 %, and for T32VNR, from 48 % to 68 % ( Table 2, columns 3-5). This indicated that several acquisition dates may be needed to increase the detection performance compared to using a single acquisition date.

Detection of existing roads in
Sentinel-2 images and DTM gradient When U-Net was trained on all roads except tractor roads, with a buffer on all roads, and DTM gradient was added as an extra input band to U-Net, the true positive rate increased to 61 % (

Detection of new roads
However, the main purpose of the detection method was not to find existing roads, but to find new roads, built after 1 January 2018, not present in the current vector data. Since a correct ground truth for new roads was not available, it was only possible to do a visual inspection of predicted roads in the satellite images. In tile T32VMN, there were several predictions of new roads that seemed to be correct (Figure 1a-b). However, there were also predictions that were wrong (Figure 1c-e). Small creeks that were not included in the water mask were often mistaken as small roads. Also, ridges of bare rock and gravel landslides were sometimes mistaken as roads, if they were long and narrow. In tile T32VNR, there were also many predictions of new roads that appeared to be correct ( Figure 2). In one of these (Figure 2a), only parts of the new road was predicted. However, by using another image acquisition date, almost all parts of the new road was predicted.

Detection using DTM gradient and Sentinel-2 images
By adding an extra band with gradient of the DTM together with te spectral bands from Sentinel-2, some of the false predictions of new roads disappeared. The number of false predictiopns were reduced in the following situations: Creeks (Figure 3), elongated gravel slides ( Figure 4) and ridges with bare rock surfaces ( Figure 5). In addition, a few new roads were more completely predicted ( Figure 6). However, some situations with false detections of creeks, ridges and gravel slides still remained, where the use of DTM gradient was not able to eliminate all false predictions. Due to lack of time we were not able to manually count all the improvements in the predictions of new roads.

Discussion and Conclusions
The false negative rate of the proposed method was high (43% without using DTM gradient and 39 % with DTM gradient) when applied on individual image acquisitions. However, it was observed that a missing road at one acquisition date could be predicted at another date. Thus, a possible improvement of the method could be to accumulate road predictions from several image acquisition dates. The false positive rate (18% without DTM gradient and 23 % with DTM gradient) seemed to be fairly high. However, this number includes both the correct predictions of new roads and false road predictions. To obtain the true positive rate for new roads may be difficult, since new roads could be missed also by manual inspection of the satellite images.
False predictions of roads at the locations of rivers were avoided by using a water mask which included rivers. However, many narrow creeks seemed to be missing in the water mask, thus many creeks were falsely predicted as new roads. A possible solution could be to derive a map of drainage channels, or stream network [4], from the detailed DTM of Norway. At the same time, we suspect that there exists additional vector lines of creeks that we have not obtained, either from the Nowegian Mapping Authority or the Norwegian Water Resources and Energy Directorate.
Masking of drainage lines could potentially also remove false predictions of roads at long and narrow gravel land slide locations. Another possible source of false road predictions could be areas with locally delayed snow melt, thus reducing the vegetation cover. Whenever these areas are long and narrow, they might be mistaken as new roads.
False predictions of roads at ridges of bare rocks could be more difficult to avoid, since new roads could occasionally be located on ridges. However, since ridge lines may also be derived from automatic terrain analysis, masking of ridge lines could be tested. Also, terrain analysis could be used to mask areas unlikely for new roads, due to steep terrain or other unsuitable conditions.
Another heuristic rule to remove false road predictions could be to remove predictions that are clearly not connected to the existing road network, since building a new road in isolation is impractical for a number of reasons. Thus, predicted new roads that are, say, more than 200 m from the nearest existing road could be discarded. The 200 m gap may be needed since it may happen that parts of a new road is missing from a predicted new road.
There is also a potential for improving the learning of the U-Net by using different loss functions. This could be achieved by adding class weights into the cross-entropy loss function, applying a sampling strategy that makes sure that road segments are included in each mini-batch, or using other loss functions like the adaptive class weighting loss proposed by Liu et al. [3]. Another loss function that may be considered is the clDice loss function proposed by Shit et al. [7], which is tailored to delineating roads in remote sensing images.
In conclusion, we have demonstrated that automated detection of new roads, for the purpose of updating the map of undisturbed nature, is possible. We have also suggested several improvements of the method to increase its usefulness.