Proceedings of the Northern Lights Deep Learning Workshop 2023-01-23T12:39:43+01:00 Benjamin Ricaud Open Journal Systems <p>Deep learning is an emerging subfield in machine learning that has in recent years achieved state-of-the-art performance in image classification, object detection, segmentation, time series prediction and speech recognition to name a few. This workshop will gather researchers both on a national and international level to exchange ideas, encourage collaborations and present cutting-edge research.</p> Learning to solve arithmetic problems with a virtual abacus 2022-11-23T22:30:49+01:00 Flavio Petruzzellis Ling Xuan Chen Alberto Testolin <p>Acquiring mathematical skills is considered a key challenge for modern Artificial Intelligence systems. Inspired by the way humans discover numerical knowledge, here we introduce a deep reinforcement learning framework that allows to simulate how cognitive agents could gradually learn to solve arithmetic problems by interacting with a virtual abacus. The proposed model successfully learn to perform multi-digit additions and subtractions, achieving an error rate below 1% even when operands are much longer than those observed during training. We also compare the performance of learning agents receiving a different amount of explicit supervision, and we analyze the most common error patterns to better understand the limitations and biases resulting from our design choices.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Flavio Petruzzellis, Ling Xuan Chen, Alberto Testolin Improved Imagery Throughput via Cascaded Uncertainty Pruning on U-Net++ 2022-11-24T11:09:42+01:00 Mingshi Li Zifu Wang Matthew Blaschko <p>The extensive use of machine learning inferences in real-life earth observation and remote sensing cases has grown over recent years. Network pruning has been carefully studied in various applications to speed up the machine learning workflow, but mainstream pruning strategies often focus on specific connection significancy rather than the sample difficulty. U-Net++ as a well-versed and capable semantic segmentation deep convolutional neural network architecture, as well as its equivalents, are all facing the challenge of overconfidence, which will create barriers for a robust uncertainty-based pruning strategy to be designed. In the following study, we analyzed the efficiency of deep neural networks and semantic segmentation in satellite imagery analysis, and proposed a new tailored workf low of dynamic pruning for U-Net++ by combining the ideas of network calibration and uncertainty and defining the inference complexity of network input samples. We tested and illustrated the capability of this new workflow and delivered a successful comparative study on its effectiveness on the DeepGlobe satellite imagery road extraction dataset and how it can greatly reduce the computational cost with little performance drop.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Mingshi Li, Zifu Wang, Matthew Blaschko Multi-lingual agents through multi-headed neural networks 2022-11-24T11:45:39+01:00 Jonathan D. Thomas Ra´ul Santos-Rodr´ıguez Mihai Anca Robert Piechocki <p>This paper considers cooperative Multi-Agent Reinforcement Learning, focusing on emergent communication in settings where multiple pairs of independent learners interact at varying frequencies. In this context, multiple distinct and incompatible languages can emerge. When an agent encounters a speaker of an alternative language, there is a requirement for a period of adaptation before they can efficiently converse. This adaptation results in the emergence of a new language and the forgetting of the previous language. In principle, this is an example of the Catastrophic Forgetting problem which can be mitigated by enabling the agents to learn and maintain multiple languages. We take inspiration from the Continual Learning literature and equip our agents with multi-headed neural networks which enable our agents to be multi-lingual. Our method is empirically validated within a referential MNIST-based communication game and is shown to be able to maintain multiple languages where existing approaches cannot.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Jonathan D. Thomas, Ra´ul Santos-Rodr´ıguez, Mihai Anca, Robert Piechocki 3D Masked Modelling Advances Lesion Classification in Axial T2w Prostate MRI 2022-11-23T13:14:28+01:00 Alvaro Fernandez-Quilez Christoffer Gabrielsen Andersen Trygve Eftestøl Svein Reidar Kjosavik Ketil Oppedal <p>Masked Image Modelling (MIM) has been shown to be an efficient self-supervised learning (SSL) pre-training paradigm when paired with transformer architectures and in the presence of a large amount of unlabelled natural images. The combination of the difficulties in accessing and obtaining large amounts of labeled data and the availability of unlabelled data in the medical imaging domain makes MIM an interesting approach to advance deep learning (DL) applications based on 3D medical imaging data. Nevertheless, SSL and, in particular, MIM applications with medical imaging data are rather scarce and there is still uncertainty around the potential of such a learning paradigm in the medical domain. We study MIM in the context of Prostate Cancer (PCa) lesion classification with T2 weighted (T2w) axial magnetic resonance imaging (MRI) data. In particular, we explore the effect of using MIM when coupled with convolutional neural networks (CNNs) under different conditions such as different masking strategies, obtaining better results in terms of AUC than other pre-training strategies like ImageNet weight initialization.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Alvaro Fernandez-Quilez, Christoffer Gabrielsen, Trygve Eftestøl, Svein Reidar Kjosavik, Ketil Oppedal Automatic Postoperative Brain Tumor Segmentation with Limited Data using Transfer Learning and Triplet Attention 2022-11-24T16:10:56+01:00 Jingpeng Li Atle Bjørnerud <p>Accurate brain tumor segmentation is clinically important for diagnosis and treatment planning. Convolutional neural networks (CNNs) have achieved promising performance in various visual recognition tasks. Training such networks usually requires large amount of labeled data, which is often challenging for medical applications. In this work, we address the segmentation problem by applying transfer learning to downstream segmentation tasks. Specifically, we explore how knowledge acquired from a large preoperative dataset can be transferred to postoperative tumor segmentation on a smaller dataset. To this end, we have developed a 3D CNN for brain tumor segmentation, and fine-tuned the pretrained models on the target domain data. To better exploit the inter-channel and spatial information, triplet attention has been incorporated and extended into existing segmentation network. Extensive experiments on our dataset demonstrate the effectiveness of transfer learning and attention modules for improved postoperative tumor segmentation performance when only limited amount of annotated data is available.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Jingpeng Li, Atle Bjørnerud Evaluating current state of monocular 3D pose models for golf 2022-11-23T15:06:32+01:00 Christian Keilstrup Ingwersen Janus Nørtoft Jensen Morten Rieger Hannemose Anders Bjorholm Dahl <p>Monocular 3D human pose estimation has reached an impressive performance. State-of-the-art mod- els predict joint locations that can be accurately reprojected back into the image, resulting in vi- sually convincing detections. However, our aim is to use the predicted poses in a domain with high- frequency movements, that is, for video of ath- letes performing golf swings. Our investigation is based on accurate marker-based motion capture data. Also, for our data, the predicted 3D joint locations look convincing when we reproject them into the image. However, by quantitatively com- paring the results with the motion capture data, we see significant model errors that are too erroneous to be used for any kinematic analysis of the move- ments. Thus we conclude that the current models cannot be used out of the box for advanced golf analytics.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Christian Keilstrup Ingwersen, Janus Nørtoft Jensen, Morten Rieger Hannemose, Anders Bjorholm Dahl Efficient Self-Supervision using Patch-based Contrastive Learning for Histopathology Image Segmentation 2022-11-23T15:47:01+01:00 Nicklas Boserup Raghavendra Selvan <p>Learning discriminative representations of unlabelled data is a challenging task. Contrastive self-supervised learning provides a framework to learn meaningful representations using learned notions of similarity measures from simple pretext tasks. In this work, we propose a simple and efficient framework for self-supervised image segmentation using contrastive learning on image patches, without using explicit pretext tasks or any further labeled fine-tuning. A fully convolutional neural network (FCNN) is trained in a self-supervised manner to discern features in the input images and obtain confidence maps which capture the network's belief about the objects belonging to the same class. Positive- and negative- patches are sampled based on the average entropy in the confidence maps for contrastive learning. Convergence is assumed when the information separation between the positive patches is small, and the positive-negative pairs is large. The proposed model only consists of a simple FCNN with 10.8k parameters and requires about 5 minutes to converge on the high resolution microscopy datasets, which is orders of magnitude smaller than the relevant self-supervised methods to attain similar performance. We evaluate the proposed method for the task of segmenting nuclei from two histopathology datasets, and show comparable performance with relevant self-supervised and supervised methods.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Nicklas Boserup, Raghavendra Selvan Multi-modal data generation with a deep metric variational autoencoder 2022-11-23T22:03:22+01:00 Josefine Vilsbøll Sundgaard Morten Rieger Hannemose Søren Laugesen Peter Bray James Harte Yosuke Kamide Chiemi Tanaka Rasmus R. Paulsen Anders Nymark Christensen <p>We present a deep metric variational autoencoder for multi-modal data generation. The variational autoencoder employs triplet loss in the latent space, which allows for conditional data generation by sampling new embeddings in the latent space within each class cluster. The approach is evaluated on a multi-modal dataset consisting of otoscopy images of the tympanic membrane with corresponding wideband tympanometry measurements. The modalities in this dataset are correlated, as they represent different aspects of the state of the middle ear, but they do not present a direct pixel-to-pixel correlation. The approach shows promising results for the conditional generation of pairs of images and tympanograms, and will allow for efficient data augmentation of data from multi-modal sources.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Josefine Vilsbøll Sundgaard, Morten Rieger Hannemose, Søren Laugesen, Peter Bray, James Harte, Yosuke Kamide, Chiemi Tanaka, Rasmus R. Paulsen, Anders Nymark Christensen Solving Nonlinear Conservation Laws of Partial Differential Equations Using Graph Neural Networks 2022-11-23T22:44:40+01:00 Qing Li Jiahui Geng Steinar Evje Chunming Rong <p>Nonlinear Conservation Laws of Partial Differential Equations (PDEs) are widely used in different domains. Solving these types of equations is a significant and challenging task. Graph Neural Networks (GNNs) have recently been established as fast and accurate alternatives for principled solvers when applied to standard equations with regular solutions. There have been few investigations on GNNs implemented for complex PDEs with nonlinear conservation laws. Herein, we explore GNNs to solve the following problem</p> <p><em>u</em><sub>t</sub> + f(<em>u</em>, <em>β</em>)<sub>x</sub> = 0 </p> <p>where f(<em>u</em>, <em>β</em>) is the nonlinear flux function of the scalar conservation law, <em>u</em> is the main variable, and <em>β</em> is the physical parameter. The main challenge of nonlinear conservation laws is that solutions typically create shocks. That is, one or several jumps in the form (<em>u</em><sub>L</sub>, <em>u</em><sub>R</sub>) with <em>u</em><sub>L</sub> ≠ <em>u</em><sub>R</sub> moving in space and probably changing over time such that information about f(<em>u</em>) in the interval associated with this jump is not present in the observation data. We demonstrate that GNNs could achieve accurate estimates of PDEs solutions based on new initial conditions and physical parameters within a specific parameter range. </p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Qing Li, Jiahui Geng, Steinar Evje, Chunming Rong Automatic Consistency Checking of Table and Text in Financial Documents 2022-11-24T11:28:41+01:00 Syed Musharraf Ali Tobias Deußer Sebastian Houben Lars Hillebrand Tim Metzler Rafet Sifa <p>A company's financial documents use tables along with text to organize the data containing key performance indicators (KPIs) (such as profit and loss) and a financial quantity linked to them. The KPI’s linked quantity in a table might not be equal to the similarly described KPI's quantity in a text. Auditors take substantial time to manually audit these financial mistakes and this process is called consistency checking. As compared to existing work, this paper attempts to automate this task with the help of transformer-based models. Furthermore, for consistency checking it is essential for the table's KPIs embeddings to encode the semantic knowledge of the KPIs and the structural knowledge of the table. Therefore, this paper proposes a pipeline that uses a tabular model to get the table's KPIs embeddings. The pipeline takes input table and text KPIs, generates their embeddings, and then checks whether these KPIs are identical. The pipeline is evaluated on the financial documents in the German language and a comparative analysis of the cell embeddings' quality from the three tabular models is also presented. From the evaluation results, the experiment that used the English-translated text and table KPIs and Tabbie model to generate table KPIs’ embeddings achieved an accuracy of 72.81% on the consistency checking task, outperforming the benchmark, and other tabular models.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Syed Musharraf Ali, Tobias Deußer, Sebastian Houben, Lars Hillebrand, Tim Metzler, Rafet Sifa A contrastive learning approach for individual re-identification in a wild fish population 2022-11-24T15:56:54+01:00 Ørjan Langøy Olsen Tonje Knutsen Sørdalen Morten Goodwin Ketil Malde Kristian Muri Knausgård Kim Tallaksen Halvorsen <p>In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis.</p> <p>This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years.</p> <p>Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Ørjan Langøy Olsen, Tonje Knutsen Sørdalen, Morten Goodwin, Ketil Malde, Kristian Muri Knausgård, Kim Tallaksen Halvorsen Simplifying Clustering with Graph Neural Networks 2022-11-23T13:57:40+01:00 Filippo Maria Bianchi <p>The objective functions used in spectral clustering are generally composed of two terms: i) a term that minimizes the local quadratic variation of the cluster assignments on the graph and; ii) a term that balances the clustering partition and helps avoiding degenerate solutions. <br />This paper shows that a graph neural network, equipped with suitable message passing layers, can generate good cluster assignments by optimizing only a balancing term.<br />Results on attributed graph datasets show the effectiveness of the proposed approach in terms of clustering performance and computation time.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Filippo Maria Bianchi Explainability in subgraphs-enhanced Graph Neural Networks 2022-11-23T15:33:26+01:00 Michele Guerra Indro Spinelli Simone Scardapane Filippo Maria Bianchi <p>Recently, subgraphs-enhanced Graph Neural Networks (SGNNs) have been introduced to enhance the expressive power of Graph Neural Networks (GNNs), which was proved to be not higher than the 1-dimensional Weisfeiler-Leman isomorphism test. <br />The new paradigm suggests using subgraphs extracted from the input graph to improve the model's expressiveness, but the additional complexity exacerbates an already challenging problem in GNNs: explaining their predictions. In this work, we adapt PGExplainer, one of the most recent explainers for GNNs, to SGNNs. The proposed explainer accounts for the contribution of all the different subgraphs and can produce a meaningful explanation that humans can interpret. The experiments that we performed both on real and synthetic datasets show that our framework is successful in explaining the decision process of an SGNN on graph classification tasks.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Michele Guerra, Indro Spinelli, Simone Scardapane, Filippo Maria Bianchi RIDDLE: Rule Induction with Deep Learning 2022-11-23T16:08:56+01:00 Cosimo Persia Ricardo Guimarães <p>Numerous applications rely on the efficiency of Deep Learning models to address complex classification tasks for critical decisions-making. However, we may not know how each feature in these models contributes towards the prediction. In contrast, Rule Induction algorithms provide an interpretable way to extract patterns from data, but traditional approaches suffer in terms of scalability. In this work, we bridge Deep Learning and Rule Induction and define the RIDDLE (Rule Induction with Deep Learning) architecture. We show that RIDDLE has state-of-the-art performance in Rule Induction via an empirical evaluation.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Cosimo Persia, Ricardo Guimarães Improving Wind Speed Uncertainty Forecasts Using Recurrent Neural Networks 2022-11-23T22:36:28+01:00 Juri Backes Wolfgang Renz <p>For integration of growing amounts of volatile renewable energy in the European electricity system, reliable weather prognosis gains importance. But, depending on weather conditions, forecast reliability of wind speed for predicting wind power can vary drastically with time. Thus, relevance of risk-aware system operation strategies is increasing based on wind speed uncertainty a measure of which is provided by the standard deviations of ensemble forecasts of the German Weather Service. However, lacking validity of this measure is known as a long-standing problem.<br>Therefore, this work investigates how machine learning based on a suitably selected set of physical quantities of weather ensemble data as well as historic wind data allows for a more realistic uncertainty quantification. A recurrent neural network (RNN) based sequence-to-sequence architecture is implemented and probabilistic wind speed forecasts are generated for a region in northern Germany.<br>The results are evaluated and compared with the forecasts of the German Weather Service thereby revealing improved validity of such deep-learning based uncertainty measures.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Juri Backes, Wolfgang Renz Using deep convolutional neural networks to predict patients age based on ECGs from an independent test cohort 2022-11-24T11:15:46+01:00 Bjørn-Jostein Singstad Belal Tavashi <p>Electrocardiography is one of the most frequently used methods to evaluate cardiovascular diseases. However, the last decade has shown that deep convolutional neural networks (CNN) can extract information from the electrocardiogram (ECG) that goes beyond traditional diagnostics, such as predicting a persons age. In this study, we trained two different 1-dimensional CNNs on open datasets to predict age from a persons ECG.</p> <p>The models were trained and validated using 10 seconds long 12-lead ECG records, resampled to 100Hz. 59355 ECGs were used for training and cross-validation, while 21748 ECGs from a separate cohort were used as the test set. We compared the performance achieved on the cross-validation with the performance on the test set. Furthermore, we used cardiologist annotated cardiovascular conditions to categorize the patients in the test set in order to assess whether some cardiac condition leads to greater discrepancies between CNN-predicted age and chronological age.</p> <p>The best CNN model, using an Inception Time architecture, showed a significant drop in performance, in terms of mean absolute error (MAE), from cross-validation on the training set (7.90 ± 0.04 years) to the performance on the test set (8.3 years). On the other hand, the mean squared error (MSE) improved from the training set (117.5 ± 2.7 years^2) to the test set (111 years^2). We also observed that the cardiovascular condition that showed the highest deviation between predicted and biological age, in terms of MAE, was the patients with pacing rhythm (10.5 years), while the patients with prolonged QT-interval had the smallest deviation (7.4 years) in terms of MAE.</p> <p>This work contributes to existing knowledge of age prediction using deep CNNs on ECGs by showing how a trained model performs on a test set from a separate cohort to that used in the training set.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Bjørn-Jostein Singstad, Belal Tavashi Contrastive learning for unsupervised medical image clustering and reconstruction 2022-11-24T11:51:05+01:00 Matteo Ferrante Tommaso Boccato Andrea Duggento Simeon Spasov Nicola Toschi <p>The lack of large labeled medical imaging datasets, along with significant inter-individual variability compared to clinically established disease classes, poses significant challenges in exploiting medical imaging information in a precision medicine paradigm, where in principle dense patient-specific data can be employed to formulate individual predictions and/or stratify patients into finer-grained groups which may follow more homogeneous trajectories and therefore empower clinical trials. In order to efficiently explore the effective degrees of freedom underlying variability in medical images in an unsupervised manner, in this work we propose an unsupervised autoencoder framework which is augmented with a contrastive loss to encourage high separability in the latent space. The model is validated on (medical) benchmark datasets. As cluster labels are assigned to each example according to cluster assignments, we compare performance with a supervised transfer learning baseline. Our methods achieves similar performance to the supervised architecture, indicating that separation in the latent space reproduces expert medical observer-assigned labels. The proposed method could be beneficial for patient stratification, exploring new subdivision of larger classes or pathological continua or, due to its sampling abilities in a variation setting, data augmentation in medical image processing.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Matteo Ferrante, Tommaso Boccato, Andrea Duggento, Simeon Spasov, Nicola Toschi FastDTI: Drug-Target Interaction Prediction using Multimodality and Transformers 2022-11-23T13:43:22+01:00 Mathijs Boezer Maryam Tavakol Zahra Sajadi <p>Recent advances in machine learning have proved effective in the application of drug discovery by predicting the drugs that are likely to interact with a protein target of a certain disease, leading to prioritizing drug development and re-purposing efforts. State-of-the-art techniques in Drug-Target Interaction (DTI) prediction are often computationally expensive and can only be trained on small specialized datasets. In this paper, we propose a novel architecture, called FastDTI, utilizing pretrained transformers and graph neural networks in a self-supervised manner on large-scale (unlabeled) data, which additionally allows for embedding of multimodal input representations, for both drug and protein properties. Extensive empirical study demonstrates that our approach outperforms state-of-the-art DTI methods on the KIBA benchmark dataset, while greatly improving the computational complexity of training, about 200 times faster, leading to excellent performance results.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Mathijs Boezer, Maryam Tavakol, Zahra Sajadi Measuring Adversarial Robustness using a Voronoi-Epsilon Adversary 2022-11-24T16:14:50+01:00 Hyeongji Kim Pekka Parviainen Ketil Malde <p>Previous studies on robustness have argued that there is a tradeoff between accuracy and adversarial accuracy. The tradeoff can be inevitable even when we neglect generalization. We argue that the tradeoff is inherent to the commonly used definition of adversarial accuracy, which uses an adversary that can construct adversarial points constrained by $\epsilon$-balls around data points. As $\epsilon$ gets large, the adversary may use real data points from other classes as adversarial examples. We propose a Voronoi-epsilon adversary which is constrained both by Voronoi cells and by $\epsilon$-balls. This adversary balances two notions of perturbation. As a result, adversarial accuracy based on this adversary avoids a tradeoff between accuracy and adversarial accuracy on training data even when $\epsilon$ is large. Finally, we show that a nearest neighbor classifier is the maximally robust classifier against the proposed adversary on the training data.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Hyeongji Kim, Pekka Parviainen, Ketil Malde Dense FixMatch: a simple semi-supervised learning method for pixel-wise prediction tasks 2022-11-23T15:14:16+01:00 Miquel Martí i Rabadán Alessandro Pieropan Hossein Azizpour Atsuto Maki <p>We propose Dense FixMatch, a simple method for online semi-supervised learning of dense and structured prediction tasks combining pseudo-labeling and consistency regularization via strong data augmentation. We enable the application of FixMatch in semi-supervised learning problems beyond image classification by adding a matching operation on the pseudo-labels. This allows us to still use the full strength of data augmentation pipelines, including geometric transformations. We evaluate it on semi-supervised semantic segmentation on Cityscapes and Pascal VOC with different percentages of labeled data and ablate design choices and hyper-parameters. Dense FixMatch significantly improves results compared to supervised learning using only labeled data, approaching its performance with 1/4 of the labeled samples.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Miquel Martí i Rabadán, Alessandro Pieropan, Hossein Azizpour, Atsuto Maki Contradiction Detection in Financial Reports 2022-11-23T15:50:32+01:00 Tobias Deußer Maren Pielka Lisa Pucknat Basil Jacob Tim Dilmaghani Mahdis Nourimand Bernd Kliem Rüdiger Loitz Christian Bauckhage Rafet Sifa <p>Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Tobias Deußer, Maren Pielka, Lisa Pucknat, Basil Jacob, Tim Dilmaghani, Mahdis Nourimand, Bernd Kliem, Rüdiger Loitz, Christian Bauckhage, Rafet Sifa Questionable Practices in Methodological Deep Learning Research 2022-11-23T22:14:11+01:00 Daniel J. Trosten <p>Evaluation of new methodology in deep learning (DL) research is typically done by reporting point estimates of a few performance metrics, calculated from a single training run. This paper argues that this frequently used evaluation protocol in DL is fundamentally flawed -- presenting 8 questionable practices that are widely adopted in the evaluation of new DL methods. The questionable practices are derived from violations of statistical principles of the scientific method, and from Hansson's definition of pseudoscience. A survey of recent publications from a top-tier DL conference indicates the widespread adoption of these practices in state-of-the-art DL research. Lastly, arguments in favor of the questionable practices, possible reasons for their adoption, and measures that have been taken to remove them, are discussed.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Daniel J. Trosten Signal and Visual Approaches for Parkinson's Disease Detection from Spiral Drawings 2022-11-23T22:52:10+01:00 Manuel Gil-Martín Cristina Luna-Jiménez Fernando Fernández-Martínez Rubén San-Segundo <p>The development of medical decision-support technologies that provide accurate biomarkers to physicians is an important research area. For example, in the case of Parkinson's Disease (PD), the current supervisions of patients become intrusive, occasional, and subjective. However, new technologies such as wearable devices, signal processing, computer vision, and deep learning could offer a non-intrusive, continuous, and objective solution to help physicians with patient monitoring. The Parkinson's Disease Spiral Drawings public dataset was selected to face PD detection in this work by comparing four representation methods of the X, Y, and the pressure time series: signal, visual, hand-crafted, and fusion. The signal approach uses the Fast Fourier Transform of recording windows and a Convolutional Neural Network for modeling; the visual strategy employs visual transformer features from gray-scale images; the hand-crafted technique utilizes statistics calculated from temporal signals, and the fusion combines the information from the previous approaches. In these procedures, a Random Forest classifier was used for PD detection using the attributes extracted from each type of representation. The best results showed an F1 score of 93.33% and 93.06% at the user level using a signal approach with the three signals for the Static Spiral Task and an image-based proposal with X and Y coordinates for the Dynamic Spiral Task, respectively.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Manuel Gil-Martín, Cristina Luna-Jiménez, Fernando Fernández-Martínez, Rubén San-Segundo Semi- and weak-supervised learning for Norwegian tree species detection 2022-11-24T11:39:20+01:00 Martijn Vermeer David Völgyes Tord Kriznik Sørensen Heidrun Miller Daniele Fantin <p>Tree species mapping of Norwegian production forests is a time-consuming process as forest associations largely rely on manual interpretation of earth observation data. Deep learning based image segmentation techniques have the potential to improve automated tree species classification, but a major challenge is the limited quality and availability of training data. Semi-supervised techniques could alleviate the need for training label and weak supervision enables handling coarse-grained and noisy labels. In this study, we evaluated the added value of semi-supervised deep learning methods in a weakly supervised setting. Specifically, consistency training and pseudo-labeling are applied for tree species classification from aerial ortho imagery in Norway. The techniques are generic and relevant for the wider earth observation domain, especially for other land cover segmentation tasks. The results show that consistency training gives a significant performance increase. Pseudo-labeling on the other hand does not, potentially this is due to varying convergence speeds for different classes causing confirmation bias or a partial violation of the cluster assumption.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Martijn Vermeer, David Völgyes, Tord Kriznik Sørensen, Heidrun Miller, Daniele Fantin Nearest Unitary and Toeplitz matrix techniques for adaptation of Deep Learning models in photonic FPGA 2022-11-24T16:03:22+01:00 Georgios Agrafiotis Eftychia Makri Ilias Kalamaras Antonios Lalas Konstantinos Votis Dimitrios Tzovaras <p>Photonic circuits pave the way to extremely quick computation and real-time inference in critical applications, such as imaging flow cytometry (IFC). Nevertheless, current photonic FPGA implementations display intrinsic limitations that restrict the complexity of Deep Learning (DL) models that could be sustained. One of these restrictions implies the weight matrices to be unitary. Thus, machine learning mechanisms to transform weight matrices to their nearest unitary one, are essential for the effective deployment of such demanding tasks. Furthermore, DL models that perform convolutions, require special handling so as to fit in the photonic system. In this work, several methods have been investigated for conversion of non-unitary matrices to unitary ones, as well as, linear algebra techniques for the transformation of Convolutional Neural Networks (CNNs) to Feed-Forward models, under the prism of discovery of the best candidate for the photonic FPGA in terms of accuracy and restrictions. Experimental results proved that post-training or iterative techniques to find the nearest unitary weight matrix can be applied for photonic chips with the minimum loss in accuracy, while CNNs adapted well in a photonic configuration employing a Toeplitz matrix implementation. The proposed approach envisions efficient tackling of DL models limitations for deployment in photonic FPGAs.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Georgios Agrafiotis, Eftychia Makri, Ilias Kalamaras, Antonios Lalas, Konstantinos Votis, Dimitrios Tzovaras Using Mask R-CNN for Underwater Fish Instance Segmentation as Novel Objects: A Proof of Concept 2022-11-23T14:58:54+01:00 I-Hao Chen Nabil Belbachir <p>Instance Segmentation in general deals with detecting, segmenting and classifying individual instances of objects in an image. Underwater instance segmentation methods often involve aquatic animals like fish as the things to be detected. In order to train deep learning models for instance segmentation in an underwater environment, rigorous human annotation in form of instance segmentation masks with labels is usually required, since the aquatic environment poses challenges due to dynamic background patterns and optical distortions.<br />However, annotating instance segmentation masks on images is especially time- and cost-intensive compared to classification tasks.<br />Here we show an unsupervised instance learning and segmentation approach that introduces a novel class, e.g., ""fish"" to a pre-trained Mask R-CNN model using its own detection and segmentation capabilities in underwater images.<br />Our results demonstrate a robust detection and segmentation of underwater fish in aquaculture without the need for human annotations.<br />This proof of concept shows that there is room for novel objects within trained instance segmentation models in the paradigm of supervised learning.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 I-Hao Chen, Nabil Belbachir PLM-AS: Pre-trained Language Models Augmented with Scanpaths for Sentiment Classification 2022-11-23T15:42:35+01:00 Duo Yang Nora Hollenstein <p>Recent research demonstrated that deep neural networks could generate meaningful feature representations from both eye-tracking data and sentences without designing handcrafted features, which achieved competitive performance across cognitive NLP tasks, such as sentiment classification over gaze datasets, but the previous works mainly encode the text and gaze data separately without considering the interaction between these two modalities or applying large-scaled pre-trained models. To address these challenges, we introduce PLM-AS, a novel framework to take full advantage of textual and eye-tracking features by sequence modeling in a highly interactive way for multimodal fusion. It is also the first attempt to combine large-scaled pre-trained models with eye-tracking features in the cognitive reading task. We show that PLM-AS captures cognitive signals from eye-tracking data and shows improved performance on sentiment classification within and across three datasets of different domains.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Duo Yang, Nora Hollenstein Hybrid bayesian convolutional neural network object detection architectures for tracking small markers in automotive crashtest videos 2022-11-23T16:11:49+01:00 Felix Neubürger Daniel Gierse Thomas Kopinski <p>Automotive crash tests are an important aspect of everyday safety, where accurate measurements and evaluations play a crucial role. In order to automate this process, we implement a bayesian hybrid computer vision model to detect two different kinds of target markers. These are commonly used in crash tests, by e.g. automotive manufacturers, to aid the localisation of the car’s positional data by attaching these markers at different positions to the vehicle. A tracking algorithm is subsequently used to add contextual time information to the marker objects. The extracted information can then be used in downstream tasks to calculate important metrics for the crash test evaluation, e.g. the speed, momentum, acceleration and trajectory at particular parts of the car during different stages of the crash test. The model consists of a pre-trained Faster-RCNN for the region proposals with the addition of a bayesian convolutional neural network to estimate a statistical uncertainty on the model’s classifications. This uncertainty estimation can be used as a tool to improve safety in uncertain edge cases in videos where lighting conditions and light reflections are not optimal. Our pipeline achieves an average recall and precision of 0.89 and 0.99, respectively, when applied to test data. This outperforms the recall of state of the art models like the Faster-RCNN Resnet-152 by more than 28\% while delivering slightly better precision, increasing robustness in most of the tested use-cases.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Felix Neubürger, Daniel Gierse, Thomas Kopinski A comparison between Tsetlin machines and deep neural networks in the context of recommendation systems 2022-11-23T22:39:28+01:00 Karl Audun Kagnes Borgersen Morten Goodwin Jivitesh Sharma <p>Recommendation Systems (RSs) are ubiquitous in modern society and are one of the largest points of interaction between humans and AI. Modern RSs are often implemented using deep learning models, which are infamously difficult to interpret. This problem is particularly exasperated in the context of recommendation scenarios, as it erodes the user's trust in the RS. In contrast, the newly introduced Tsetlin Machines (TM) possess some valuable properties due to their inherent interpretability. TMs are still fairly young as a technology. As no RS has been developed for TMs before, it has become necessary to perform some preliminary research regarding the practicality of such a system. In this paper, we develop the first RS based on TMs to evaluate its practicality in this application domain. This paper compares the viability of TMs with other machine learning models prevalent in the field of RS. We train and investigate the performance of the TM compared with a vanilla feed-forward deep learning model. These comparisons are based on model performance, interpretability/explainability, and scalability. Further, we provide some benchmark performance comparisons to similar machine learning solutions relevant to RSs.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Karl Audun Kagnes Borgersen, Morten Goodwin, Jivitesh Sharma Reducing Annotator's Burden: Cross-Pseudo Supervision for Brain Tumor Segmentation 2022-11-24T11:20:37+01:00 Lidia Luque Jon André Ottesen Atle Bjørnerud Kyrre Eeg Emblem Bradley J. MacIntosh <p>Deep learning is proven to help with common medical image processing procedures, namely segmentation. Labeling data is a core requirement for training a deep learning model; this is time-consuming and expert annotators are in short supply. Strategies that lower data annotation requirements are highly desirable. In this study, we adapt cross-pseudo supervision (CPS) for 3D medical segmentation, a state-of-art semi-supervised deep-learning method where labeled and unlabeled data are used in conjunction to further improve the resulting model. Using the 2021 BraTS dataset, a fully labeled publicly available brain tumor dataset, we train CPS-based networks using a varying number of labeled and unlabeled samples and compare the resulting models against the fully-supervised baseline. The results show that CPS improves performance scores across all combinations of dataset sizes, with an increase in the Dice similarity coefficient (DSC) of 2.6-4.2% and a decrease in the 95th percentile Hausdorff distance (95% HD) of 24-27%.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Lidia Luque, Jon André Ottesen, Kyrre Emblem, Atle Bjørnerud, Bradley MacIntosh Interactive Scribble Segmentation 2022-11-24T15:49:24+01:00 Mathias Micheelsen Lowes Jakob L. Christensen Bjørn Schreblowski Hansen Morten Rieger Hannemose Anders Bjorholm Dahl Vedrana Dahl <p>We present a deep learning model for image segmentation that uses weakly supervised inputs consisting of scribbles. A user can draw scribbles on an image with a brush tool corresponding to the labels they want segmented. The network can segment images in real time while scribbles are being drawn, giving instant feedback to the user. It is easy to correct mistakes made by the network, as more scribbles can be added. During training we use a similar psuedo-interactive and iterative setup to make sure that the network is optimized towards the human-in-the-loop inference setting. On the contrary, standard scribble segmentation methods do not consider the training of the algorithm as an interactive setting and thus are not suited for interactive inference. Our model is class-agnostic and we are able to generalize across many different data modalities. We compare our model with other weakly supervised methods such as bounding box and extrema point methods, and we show our model achieves a better mean DICE score.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Mathias Micheelsen Lowes, Jakob L. Christensen, Bjørn Schreblowski, Morten Rieger Hannemose, Anders Bjorholm Dahl, Vedrana Dahl Boundary Aware U-Net for Glacier Segmentation 2022-11-23T13:50:01+01:00 Bibek Aryal Katie E. Miles Sergio A. Vargas Zesati Olac Fuentes <p>Large-scale study of glaciers improves our understanding of global glacier change and is imperative for monitoring the ecological environment, preventing disasters, and studying the effects of global climate change. Glaciers in the Hindu Kush Himalaya (HKH) are particularly interesting as the HKH is one of the world's most sensitive regions for climate change. In this work, we: (1) propose a modified version of the U-Net for large-scale, spatially non-overlapping, clean glacial ice, and debris-covered glacial ice segmentation; (2) introduce a novel self-learning boundary-aware loss to improve debris-covered glacial ice segmentation performance; and (3) propose a feature-wise saliency score to understand the contribution of each feature in the multispectral Landsat 7 imagery for glacier mapping. Our results show that the debris-covered glacial ice segmentation model trained using self-learning boundary-aware loss outperformed the model trained using dice loss. Furthermore, we conclude that red, shortwave infrared, and near-infrared bands have the highest contribution toward debris-covered glacial ice segmentation from Landsat 7 images.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Bibek Aryal, Katie E. Miles, Sergio A. Vargas Zesati, Olac Fuentes Identifying and Mitigating Flaws of Deep Perceptual Similarity Metrics 2022-11-23T15:26:05+01:00 Oskar Sjögren Gustav Grund Pihlgren Fredrik Sandin Marcus Liwicki <p>Measuring the similarity of images is a fundamental problem to computer vision for which no universal solution exists. While simple metrics such as the pixel-wise L2-norm have been shown to have significant flaws, they remain popular. One group of recent state-of-the-art metrics that mitigates some of those flaws are Deep Perceptual Similarity (DPS) metrics, where the similarity is evaluated as the distance in the deep features of neural networks.However, DPS metrics themselves have been less thoroughly examined for their benefits and, especially, their flaws. This work investigates the most common DPS metric, where deep features are compared by spatial position, along with metrics comparing the averaged and sorted deep features. The metrics are analyzed in-depth to understand the strengths and weaknesses of the metrics by using images designed specifically to challenge them. This work contributes with new insights into the flaws of DPS, and further suggests improvements to the metrics. An implementation of this work is available online: <a href=""></a></p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Oskar Sjögren, Gustav Grund Pihlgren, Fredrik Sandin, Marcus Liwicki Robin Pre-Training for the Deep Ritz Method 2022-11-23T16:02:41+01:00 Luca Courte Marius Zeinhofer <p>We analyze the training process of the Deep Ritz Method for elliptic equations with Dirichlet boundary conditions and highlight problems arising from essential boundary values. Typically, one employs a penalty approach to enforce essential boundary conditions, however, the naive approach to this problem becomes unstable for large penalizations. A novel method to compensate this problem is proposed, using a small penalization strength to pre-train the model before the main training on the target penalization strength is conducted. We present numerical evidence that the proposed method is beneficial.</p> 2023-01-23T00:00:00+01:00 Copyright (c) 2023 Luca Courte, Marius Zeinhofer