Search | arXiv e-print repository

Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks

Authors: Dan Saattrup Nielsen, Kenneth Enevoldsen, Peter Schneider-Kamp

Abstract: This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, which initially was restricted to evaluating encoder models, we extend the evaluation framework to include decoder models. We introduce a method for evaluating decoder models on NLU ta… ▽ More This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, which initially was restricted to evaluating encoder models, we extend the evaluation framework to include decoder models. We introduce a method for evaluating decoder models on NLU tasks and apply it to the languages Danish, Swedish, Norwegian, Icelandic, Faroese, German, Dutch, and English. Through a series of experiments and analyses, we address key research questions regarding the comparative performance of encoder and decoder models, the impact of NLU task types, and the variation across language resources. Our findings reveal that decoder models can achieve significantly better NLU performance than encoder models, with nuances observed across different tasks and languages. Additionally, we investigate the correlation between decoders and task performance via a UMAP analysis, shedding light on the unique capabilities of decoder and encoder models. This study contributes to a deeper understanding of language model paradigms in NLU tasks and provides valuable insights for model selection and evaluation in multilingual settings. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 14 pages, 2 figures

ACM Class: I.2.7

arXiv:2403.15409 [pdf, other]

Coupled generator decomposition for fusion of electro- and magnetoencephalography data

Authors: Anders Stevnhoved Olsen, Jesper Duemose Nielsen, Morten Mørup

Abstract: Data fusion modeling can identify common features across diverse data sources while accounting for source-specific variability. Here we introduce the concept of a \textit{coupled generator decomposition} and demonstrate how it generalizes sparse principal component analysis (SPCA) for data fusion. Leveraging data from a multisubject, multimodal (electro- and magnetoencephalography (EEG and MEG)) n… ▽ More Data fusion modeling can identify common features across diverse data sources while accounting for source-specific variability. Here we introduce the concept of a \textit{coupled generator decomposition} and demonstrate how it generalizes sparse principal component analysis (SPCA) for data fusion. Leveraging data from a multisubject, multimodal (electro- and magnetoencephalography (EEG and MEG)) neuroimaging experiment, we demonstrate the efficacy of the framework in identifying common features in response to face perception stimuli, while accommodating modality- and subject-specific variability. Through split-half cross-validation of EEG/MEG trials, we investigate the optimal model order and regularization strengths for models of varying complexity, comparing these to a group-level model assuming shared brain responses to stimuli. Our findings reveal altered $\sim170ms$ fusiform face area activation for scrambled faces, as opposed to real faces, particularly evident in the multimodal, multisubject model. Model parameters were inferred using stochastic optimization in PyTorch, demonstrating comparable performance to conventional quadratic programming inference for SPCA but with considerably faster execution. We provide an easily accessible toolbox for coupled generator decomposition that includes data fusion for SPCA, archetypal analysis and directional archetypal analysis. Overall, our approach offers a promising new avenue for data fusion. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2311.09145 [pdf, other]

Model Agnostic Explainable Selective Regression via Uncertainty Estimation

Authors: Andrea Pugnana, Carlos Mougan, Dan Saattrup Nielsen

Abstract: With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the pr… ▽ More With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.07264 [pdf, other]

Danish Foundation Models

Authors: Kenneth Enevoldsen, Lasse Hansen, Dan S. Nielsen, Rasmus A. F. Egebæk, Søren V. Holm, Martin C. Nielsen, Martin Bernstorff, Rasmus Larsen, Peter B. Jørgensen, Malte Højmark-Bertelsen, Peter B. Vahlstrup, Per Møldrup-Dalum, Kristoffer Nielbo

Abstract: Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the… ▽ More Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the Danish language. This is achieved through broad cooperation with public and private institutions, to ensure high data quality and applicability of the trained models. We present the motivation of the project, the current status, and future perspectives. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 4 pages, 2 tables

arXiv:2306.05370 [pdf, other]

Detecting Human Rights Violations on Social Media during Russia-Ukraine War

Authors: Poli Nemkova, Solomon Ubani, Suleyman Olcay Polat, Nayeon Kim, Rodney D. Nielsen

Abstract: The present-day Russia-Ukraine military conflict has exposed the pivotal role of social media in enabling the transparent and unbridled sharing of information directly from the frontlines. In conflict zones where freedom of expression is constrained and information warfare is pervasive, social media has emerged as an indispensable lifeline. Anonymous social media platforms, as publicly available s… ▽ More The present-day Russia-Ukraine military conflict has exposed the pivotal role of social media in enabling the transparent and unbridled sharing of information directly from the frontlines. In conflict zones where freedom of expression is constrained and information warfare is pervasive, social media has emerged as an indispensable lifeline. Anonymous social media platforms, as publicly available sources for disseminating war-related information, have the potential to serve as effective instruments for monitoring and documenting Human Rights Violations (HRV). Our research focuses on the analysis of data from Telegram, the leading social media platform for reading independent news in post-Soviet regions. We gathered a dataset of posts sampled from 95 public Telegram channels that cover politics and war news, which we have utilized to identify potential occurrences of HRV. Employing a mBERT-based text classifier, we have conducted an analysis to detect any mentions of HRV in the Telegram data. Our final approach yielded an $F_2$ score of 0.71 for HRV detection, representing an improvement of 0.38 over the multilingual BERT base model. We release two datasets that contains Telegram posts: (1) large corpus with over 2.3 millions posts and (2) annotated at the sentence-level dataset to indicate HRVs. The Telegram posts are in the context of the Russia-Ukraine war. We posit that our findings hold significant implications for NGOs, governments, and researchers by providing a means to detect and document possible human rights violations. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: 9 pages

arXiv:2304.00906 [pdf, other]

ScandEval: A Benchmark for Scandinavian Natural Language Processing

Authors: Dan Saattrup Nielsen

Abstract: This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to t… ▽ More This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages. The datasets used in two of the tasks, linguistic acceptability and question answering, are new. We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results. Using this package, we benchmark more than 100 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard, as well as provide an analysis of the results. The analysis shows that there is substantial cross-lingual transfer among the Mainland Scandinavian languages (Danish, Swedish and Norwegian), with limited cross-lingual transfer between the group of Mainland Scandinavian languages and the group of Insular Scandinavian languages (Icelandic and Faroese). The benchmarking results also show that the investment in language technology in Norway, Sweden and Denmark has led to language models that outperform massively multilingual models such as XLM-RoBERTa and mDeBERTaV3. We release the source code for both the package and leaderboard. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 17 pages, 11 figures, camera-ready NoDaLiDa 2023 submission

arXiv:2304.00277 [pdf, other]

Energy Consumption Optimization in Radio Access Networks (ECO-RAN)

Authors: Anders Mariegaard, Kim G. Larsen, Marco Muniz, Thomas Dyhre Nielsen

Abstract: In recent years, mobile network operators are showing interest in reducing energy consumption. Toward this goal, in cooperation with the Danish company 2Operate we have developed a stochastic simulation environment for mobile networks. Our simulator interacts with historical data from 2Operate and allow us to turn on and off network cells, replay traffic loads, etc. We have developed an optimizati… ▽ More In recent years, mobile network operators are showing interest in reducing energy consumption. Toward this goal, in cooperation with the Danish company 2Operate we have developed a stochastic simulation environment for mobile networks. Our simulator interacts with historical data from 2Operate and allow us to turn on and off network cells, replay traffic loads, etc. We have developed an optimization tool which is based on stochastic and distributed controllers computed by \uppaal. We have conducted experiments in our simulation tool. Experiments show that there is a potential to save up to 10\% of energy. We observe that for larger networks, there exists a larger potential for saving energy. Our simulator and \uppaal controllers, have been constructed in accordance to the 2Operate data and infrastructure. However, a main difference is that current equipment do not support updating schedulers on hourly bases. Nevertheless, new equipment e.g. new Huawei equipment do support changing schedulers on hourly basis. Therefore, integrating our solution in the production server of 2Operate is possible. However, rigorous testing in the production system is required. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: Report for Energy Cluster Denmark project of the year. https://www.energycluster.dk/en/eco-ran-wins-innovation-project-of-the-year/

arXiv:2303.11042 [pdf, other]

Hospitalization Length of Stay Prediction using Patient Event Sequences

Authors: Emil Riis Hansen, Thomas Dyhre Nielsen, Thomas Mulvad, Mads Nibe Strausholm, Tomer Sagi, Katja Hose

Abstract: Predicting patients hospital length of stay (LOS) is essential for improving resource allocation and supporting decision-making in healthcare organizations. This paper proposes a novel approach for predicting LOS by modeling patient information as sequences of events. Specifically, we present a transformer-based model, termed Medic-BERT (M-BERT), for LOS prediction using the unique features descri… ▽ More Predicting patients hospital length of stay (LOS) is essential for improving resource allocation and supporting decision-making in healthcare organizations. This paper proposes a novel approach for predicting LOS by modeling patient information as sequences of events. Specifically, we present a transformer-based model, termed Medic-BERT (M-BERT), for LOS prediction using the unique features describing patients medical event sequences. We performed empirical experiments on a cohort of more than 45k emergency care patients from a large Danish hospital. Experimental results show that M-BERT can achieve high accuracy on a variety of LOS problems and outperforms traditional nonsequence-based machine learning approaches. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 11 pages, 5 figures

MSC Class: 68T07 ACM Class: I.2.7; J.3

arXiv:2210.09014 [pdf]

doi 10.1080/23299460.2023.2222514

Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda

Authors: Andrés Domínguez Hernández, Richard Owen, Dan Saattrup Nielsen, Ryan McConville

Abstract: Machine learning (ML) enabled classification models are becoming increasingly popular for tackling the sheer volume and speed of online misinformation and other content that could be identified as harmful. In building these models, data scientists need to take a stance on the legitimacy, authoritativeness and objectivity of the sources of ``truth" used for model training and testing. This has poli… ▽ More Machine learning (ML) enabled classification models are becoming increasingly popular for tackling the sheer volume and speed of online misinformation and other content that could be identified as harmful. In building these models, data scientists need to take a stance on the legitimacy, authoritativeness and objectivity of the sources of ``truth" used for model training and testing. This has political, ethical and epistemic implications which are rarely addressed in technical papers. Despite (and due to) their reported high accuracy and performance, ML-driven moderation systems have the potential to shape online public debate and create downstream negative impacts such as undue censorship and the reinforcing of false beliefs. Using collaborative ethnography and theoretical insights from social studies of science and expertise, we offer a critical analysis of the process of building ML models for (mis)information classification: we identify a series of algorithmic contingencies--key moments during model development that could lead to different future outcomes, uncertainty and harmful effects as these tools are deployed by social media platforms. We conclude by offering a tentative path toward reflexive and responsible development of ML tools for moderating misinformation and other harmful content online. △ Less

Submitted 13 April, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Andrés Domínguez Hernández, Richard Owen, Dan Saattrup Nielsen and Ryan McConville. 2023. Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda. Accepted in 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23), June 12-15, 2023, Chicago, United States of America. ACM, New York, NY, USA, 16 pages

arXiv:2208.14645 [pdf, other]

doi 10.1109/TC.2020.3002697

PaRTAA: A Real-time Multiprocessor for Mixed-Criticality Airborne Systems

Authors: Shibarchi Majumder, Jens F D Nielsen, Thomas Bak

Abstract: Mixed-criticality systems, where multiple systems with varying criticality-levels share a single hardware platform, require isolation between tasks with different criticality-levels. Isolation can be achieved with software-based solutions or can be enforced by a hardware level partitioning. An asymmetric multiprocessor architecture offers hardware-based isolation at the cost of underutilized hardw… ▽ More Mixed-criticality systems, where multiple systems with varying criticality-levels share a single hardware platform, require isolation between tasks with different criticality-levels. Isolation can be achieved with software-based solutions or can be enforced by a hardware level partitioning. An asymmetric multiprocessor architecture offers hardware-based isolation at the cost of underutilized hardware resources, and the inter-core communication mechanism is often a single point of failure in such architectures. In contrast, a partitioned uniprocessor offers efficient resource utilization at the cost of limited scalability. We propose a partitioned real-time asymmetric architecture (PaRTAA) specifically designed for mixed-criticality airborne systems, featuring robust partitioning within processing elements for establishing isolation between tasks with varying criticality. The granularity in the processing element offers efficient resource utilization where inter-dependent tasks share the same processing element for sequential execution while preserving isolation, and independent tasks simultaneously execute on different processing elements as per system requirements. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Journal ref: in IEEE Transactions on Computers, vol. 69, no. 8, pp. 1221-1232, 1 Aug. 2020

arXiv:2208.14158 [pdf, other]

doi 10.1109/TCAD.2019.2960359

Ærø: A Platform Architecture for Mixed-Criticality Airborne Systems

Authors: Shibarchi Majumder, Jens Frederik Dalsgaard Nielsen, Thomas Bak

Abstract: Real-time embedded platforms with resource constraints can take the benefits of mixed-criticality system where applications with different criticality-level share computational resources, with isolation in the temporal and spatial domain. A conventional software-based isolation mechanism adds additional overhead and requires certification with the highest level of criticality present in the system… ▽ More Real-time embedded platforms with resource constraints can take the benefits of mixed-criticality system where applications with different criticality-level share computational resources, with isolation in the temporal and spatial domain. A conventional software-based isolation mechanism adds additional overhead and requires certification with the highest level of criticality present in the system, which is often an expensive process. In this article, we present a different approach where the required isolation is established at the hardware-level by featuring partitions within the processor. A four-stage pipelined soft-processor with replicated resources in the data-path is introduced to establish isolation and avert interference between the partitions. A cycle-accurate scheduling mechanism is implemented in the hardware for hard-real-time partition scheduling that can accommodate different periodicity and execution time for each partition as per user needs, while preserving time-predictability at the individual application level. Applications running within a partition has no sense of the virtualization and can execute either on a host-software or directly on the hardware. The proposed architecture is implemented on FPGA thread and demonstrated with an avionics use case. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Journal ref: in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2307-2318, Oct. 2020

arXiv:2206.09048 [pdf, ps, other]

doi 10.5281/zenodo.6554616

ICLR 2022 Challenge for Computational Geometry and Topology: Design and Results

Authors: Adele Myers, Saiteja Utpala, Shubham Talbar, Sophia Sanborn, Christian Shewmake, Claire Donnat, Johan Mathe, Umberto Lupo, Rishi Sonthalia, Xinyue Cui, Tom Szwagier, Arthur Pignet, Andri Bergsson, Soren Hauberg, Dmitriy Nielsen, Stefan Sommer, David Klindt, Erik Hermansen, Melvin Vaupel, Benjamin Dunn, Jeffrey Xiong, Noga Aharony, Itsik Pe'er, Felix Ambellan, Martin Hanik , et al. (3 additional authors not shown)

Abstract: This paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop ``Geometric and Topological Representation Learning". The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine l… ▽ More This paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop ``Geometric and Topological Representation Learning". The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine learning part) or PyTorch. The challenge attracted seven teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings. △ Less

Submitted 26 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.07696 [pdf, other]

Diffusion Models for Video Prediction and Infilling

Authors: Tobias Höppe, Arash Mehrjou, Stefan Bauer, Didrik Nielsen, Andrea Dittadi

Abstract: Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Vide… ▽ More Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training. By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate RaMViD on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation. High-resolution videos are provided at https://sites.google.com/view/video-diffusion-prediction. △ Less

Submitted 14 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Published in TMLR (11/2022)

arXiv:2205.15463 [pdf, other]

Few-Shot Diffusion Models

Authors: Giorgio Giannone, Didrik Nielsen, Ole Winther

Abstract: Denoising diffusion probabilistic models (DDPM) are powerful hierarchical latent variable models with remarkable sample generation quality and training stability. These properties can be attributed to parameter sharing in the generative hierarchy, as well as a parameter-free diffusion-based inference procedure. In this paper, we present Few-Shot Diffusion Models (FSDM), a framework for few-shot ge… ▽ More Denoising diffusion probabilistic models (DDPM) are powerful hierarchical latent variable models with remarkable sample generation quality and training stability. These properties can be attributed to parameter sharing in the generative hierarchy, as well as a parameter-free diffusion-based inference procedure. In this paper, we present Few-Shot Diffusion Models (FSDM), a framework for few-shot generation leveraging conditional DDPMs. FSDMs are trained to adapt the generative process conditioned on a small set of images from a given class by aggregating image patch information using a set-based Vision Transformer (ViT). At test time, the model is able to generate samples from previously unseen classes conditioned on as few as 5 samples from that class. We empirically show that FSDM can perform few-shot generation and transfer to new datasets. We benchmark variants of our method on complex vision datasets for few-shot learning and compare to unconditional and conditional DDPM baselines. Additionally, we show how conditioning the model on patch-based input set information improves training convergence. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2204.12270 [pdf, other]

Graph Neural Networks for Microbial Genome Recovery

Authors: Andre Lamurias, Alessandro Tibo, Katja Hose, Mads Albertsen, Thomas Dyhre Nielsen

Abstract: Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be combined into long contiguous DNA sequences (contigs). Given the complex… ▽ More Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be combined into long contiguous DNA sequences (contigs). Given the complexity of microbial communities, single contig microbial genomes are rarely obtained. Instead, contigs are eventually clustered into bins, with each bin ideally making up a full genome. This process is referred to as metagenomic binning. Current state-of-the-art techniques for metagenomic binning rely only on the local features for the individual contigs. These techniques therefore fail to exploit the similarities between contigs as encoded by the assembly graph, in which the contigs are organized. In this paper, we propose to use Graph Neural Networks (GNNs) to leverage the assembly graph when learning contig representations for metagenomic binning. Our method, VaeG-Bin, combines variational autoencoders for learning latent representations of the individual contigs, with GNNs for refining these representations by taking into account the neighborhood structure of the contigs in the assembly graph. We explore several types of GNNs and demonstrate that VaeG-Bin recovers more high-quality genomes than other state-of-the-art binners on both simulated and real-world datasets. △ Less

Submitted 26 April, 2022; originally announced April 2022.

arXiv:2204.09889 [pdf, other]

Inducing Gaussian Process Networks

Authors: Alessandro Tibo, Thomas Dyhre Nielsen

Abstract: Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have r… ▽ More Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures. △ Less

Submitted 21 April, 2022; originally announced April 2022.

arXiv:2202.11684 [pdf, other]

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Authors: Dan Saattrup Nielsen, Ryan McConville

Abstract: Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolin… ▽ More Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1-score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards. △ Less

Submitted 8 March, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: 9+3 pages

arXiv:2201.11676 [pdf, other]

Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap

Authors: Carlos Mougan, Dan Saattrup Nielsen

Abstract: Monitoring machine learning models once they are deployed is challenging. It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach, and monitoring performance metrics becomes unfeasible. In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that… ▽ More Monitoring machine learning models once they are deployed is challenging. It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach, and monitoring performance metrics becomes unfeasible. In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that aims to monitor the deterioration of machine learning models in deployment environments, as well as determine the source of model deterioration when target labels are not available. Classical methods are purely aimed at detecting distribution shift, which can lead to false positives in the sense that the model has not deteriorated despite a shift in the data distribution. To estimate model uncertainty we construct prediction intervals using a novel bootstrap method, which improves upon the work of Kumar & Srivastava (2012). We show that both our model deterioration detection system as well as our uncertainty estimation method achieve better performance than the current state-of-the-art. Finally, we use explainable AI techniques to gain an understanding of the drivers of model deterioration. We release an open source Python package, doubt, which implements our proposed methods, as well as the code used to reproduce our experiments. △ Less

Submitted 22 November, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 7+6 pages. Accepted at AAAI'23 Safe and Robust AI track

arXiv:2104.13321 [pdf, other]

UniTE -- The Best of Both Worlds: Unifying Function-Fitting and Aggregation-Based Approaches to Travel Time and Travel Speed Estimation

Authors: Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre Nielsen

Abstract: Travel time or speed estimation are part of many intelligent transportation applications. Existing estimation approaches rely on either function fitting or aggregation and represent different trade-offs between generalizability and accuracy. Function-fitting approaches learn functions that map feature vectors of, e.g., routes, to travel time or speed estimates, which enables generalization to unse… ▽ More Travel time or speed estimation are part of many intelligent transportation applications. Existing estimation approaches rely on either function fitting or aggregation and represent different trade-offs between generalizability and accuracy. Function-fitting approaches learn functions that map feature vectors of, e.g., routes, to travel time or speed estimates, which enables generalization to unseen routes. However, map** functions are imperfect and offer poor accuracy in practice. Aggregation-based approaches instead form estimates by aggregating historical data, e.g., traversal data for routes. This enables very high accuracy given sufficient data. However, they rely on simplistic heuristics when insufficient data is available, yielding poor generalizability. We present a Unifying approach to Travel time and speed Estimation (UniTE) that combines function-fitting and aggregation-based approaches into a unified framework that aims to achieve the generalizability of function-fitting approaches and the accuracy of aggregation-based approaches. An empirical study finds that an instance of UniTE can improve the accuracies of travel speed distribution and travel time estimation by $40-64\%$ and $3-23\%$, respectively, compared to using function fitting or aggregation alone △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2102.05379 [pdf, other]

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions

Authors: Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, Max Welling

Abstract: Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function.… ▽ More Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood. △ Less

Submitted 22 October, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: Accepted at Neural Information Processing Systems (NeurIPS 2021)

arXiv:2102.02374 [pdf, other]

Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC

Authors: Priyank Jaini, Didrik Nielsen, Max Welling

Abstract: Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. However, a major limitation of HMC is its inability to be applied to discrete domains due to the lack of gradient signal. In this work, we introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions using a combination of… ▽ More Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. However, a major limitation of HMC is its inability to be applied to discrete domains due to the lack of gradient signal. In this work, we introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions using a combination of neural transport methods like normalizing flows and variational dequantization, and the Metropolis-Hastings rule. Our method first learns a continuous embedding of the discrete space using a surjective map and subsequently learns a bijective transformation from the continuous space to an approximately Gaussian distributed latent variable. Sampling proceeds by simulating MCMC chains in the latent space and map** these samples to the target discrete space via the learned transformations. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms. △ Less

Submitted 1 March, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: Accepted at AISTATS 2021; added experiments with longer MCMC chains

arXiv:2007.02731 [pdf, other]

SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows

Authors: Didrik Nielsen, Priyank Jaini, Emiel Hoogeboom, Ole Winther, Max Welling

Abstract: Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood.… ▽ More Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood. In this paper, we introduce SurVAE Flows: A modular framework of composable transformations that encompasses VAEs and normalizing flows. SurVAE Flows bridge the gap between normalizing flows and VAEs with surjective transformations, wherein the transformations are deterministic in one direction -- thereby allowing exact likelihood computation, and stochastic in the reverse direction -- hence providing a lower bound on the corresponding likelihood. We show that several recently proposed methods, including dequantization and augmented normalizing flows, can be expressed as SurVAE Flows. Finally, we introduce common operations such as the max value, the absolute value, sorting and stochastic permutation as composable layers in SurVAE Flows. △ Less

Submitted 30 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2006.09030 [pdf, other]

doi 10.1109/TITS.2020.3011799

Relational Fusion Networks: Graph Convolutional Networks for Road Networks

Authors: Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre Nielsen

Abstract: The application of machine learning techniques in the setting of road networks holds the potential to facilitate many important intelligent transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a network. However, many implicit assumptions of GCNs do not apply to road networks. We introduce the Relational Fusion Network… ▽ More The application of machine learning techniques in the setting of road networks holds the potential to facilitate many important intelligent transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a network. However, many implicit assumptions of GCNs do not apply to road networks. We introduce the Relational Fusion Network (RFN), a novel type of GCN designed specifically for road networks. In particular, we propose methods that outperform state-of-the-art GCNs by 21%-40% on two machine learning tasks in road networks. Furthermore, we show that state-of-the-art GCNs may fail to effectively leverage road network structure and may not generalize well to other road networks. △ Less

Submitted 14 September, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: IEEE Transactions on Intelligent Transportation Systems (2020). arXiv admin note: substantial text overlap with arXiv:1908.11567

arXiv:2002.02547 [pdf, other]

Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow

Authors: Didrik Nielsen, Ole Winther

Abstract: Flow models have recently made great progress at modeling ordinal discrete data such as images and audio. Due to the continuous nature of flow models, dequantization is typically applied when using them for such discrete data, resulting in lower bound estimates of the likelihood. In this paper, we introduce subset flows, a class of flows that can tractably transform finite volumes and thus allow e… ▽ More Flow models have recently made great progress at modeling ordinal discrete data such as images and audio. Due to the continuous nature of flow models, dequantization is typically applied when using them for such discrete data, resulting in lower bound estimates of the likelihood. In this paper, we introduce subset flows, a class of flows that can tractably transform finite volumes and thus allow exact computation of likelihoods for discrete data. Based on subset flows, we identify ordinal discrete autoregressive models, including WaveNets, PixelCNNs and Transformers, as single-layer flows. We use the flow formulation to compare models trained and evaluated with either the exact likelihood or its dequantization lower bound. Finally, we study multilayer flows composed of PixelCNNs and non-autoregressive coupling layers and demonstrate state-of-the-art results on CIFAR-10 for flow models trained with dequantization. △ Less

Submitted 30 October, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

arXiv:1911.06217 [pdf, other]

doi 10.1109/BigData.2018.8622416

On Network Embedding for Machine Learning on Road Networks: A Case Study on the Danish Road Network

Authors: Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre Nielsen

Abstract: Road networks are a type of spatial network, where edges may be associated with qualitative information such as road type and speed limit. Unfortunately, such information is often incomplete; for instance, OpenStreetMap only has speed limits for 13% of all Danish road segments. This is problematic for analysis tasks that rely on such information for machine learning. To enable machine learning in… ▽ More Road networks are a type of spatial network, where edges may be associated with qualitative information such as road type and speed limit. Unfortunately, such information is often incomplete; for instance, OpenStreetMap only has speed limits for 13% of all Danish road segments. This is problematic for analysis tasks that rely on such information for machine learning. To enable machine learning in such circumstances, one may consider the application of network embedding methods to extract structural information from the network. However, these methods have so far mostly been used in the context of social networks, which differ significantly from road networks in terms of, e.g., node degree and level of homophily (which are key to the performance of many network embedding methods). We analyze the use of network embedding methods, specifically node2vec, for learning road segment embeddings in road networks. Due to the often limited availability of information on other relevant road characteristics, the analysis focuses on leveraging the spatial network structure. Our results suggest that network embedding methods can indeed be used for deriving relevant network features (that may, e.g, be used for predicting speed limits), but that the qualities of the embeddings differ from embeddings for social networks. △ Less

Submitted 15 November, 2019; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: Best Paper at the 3rd IEEE International Workshop on Big Spatial Data (BSD 2018)

Journal ref: 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 3422-3431

arXiv:1908.11567 [pdf, other]

doi 10.1145/3347146.3359094

Graph Convolutional Networks for Road Networks

Authors: Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre Nielsen

Abstract: Machine learning techniques for road networks hold the potential to facilitate many important transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a road network by utilizing information of, e.g., adjacent road segments. While state-of-the-art GCNs target node classification tasks in social, citation, and biological ne… ▽ More Machine learning techniques for road networks hold the potential to facilitate many important transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a road network by utilizing information of, e.g., adjacent road segments. While state-of-the-art GCNs target node classification tasks in social, citation, and biological networks, machine learning tasks in road networks differ substantially from such tasks. In road networks, prediction tasks concern edges representing road segments, and many tasks involve regression. In addition, road networks differ substantially from the networks assumed in the GCN literature in terms of the attribute information available and the network characteristics. Many implicit assumptions of GCNs do therefore not apply. We introduce the notion of Relational Fusion Network (RFN), a novel type of GCN designed specifically for machine learning on road networks. In particular, we propose methods that outperform state-of-the-art GCNs on both a road segment regression task and a road segment classification task by 32-40% and 21-24%, respectively. In addition, we provide experimental evidence of the short-comings of state-of-the-art GCNs in the context of road networks: unlike our method, they cannot effectively leverage the road network structure for road segment classification and fail to outperform a regular multi-layer perceptron. △ Less

Submitted 22 July, 2020; v1 submitted 30 August, 2019; originally announced August 2019.

Comments: Ten-page pre-print version of a four-page ACM SIGSPATIAL 2019 poster paper

arXiv:1908.03442 [pdf, other]

Probabilistic Models with Deep Neural Networks

Authors: Andrés R. Masegosa, Rafael Cabañas, Helge Langseth, Thomas D. Nielsen, Antonio Salmerón

Abstract: Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inf… ▽ More Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to (i) very restricted model classes where exact or approximate probabilistic inference were feasible, and (ii) small or medium-sized data sets which fit within the main memory of the computer. However, developments in variational inference, a general form of approximate probabilistic inference originated in statistical physics, are allowing probabilistic modeling to overcome these restrictions: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computation engines allow to apply probabilistic modeling over massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within a probabilistic model to capture complex non-linear stochastic relationships between random variables. These advances in conjunction with the release of novel probabilistic modeling toolboxes have greatly expanded the scope of application of probabilistic models, and allow these models to take advantage of the recent strides made by the deep learning community. In this paper we review the main concepts, methods and tools needed to use deep neural networks within a probabilistic modeling framework. △ Less

Submitted 2 October, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

arXiv:1904.03713 [pdf, other]

AI Meets Austen: Towards Human-Robot Discussions of Literary Metaphor

Authors: Natalie Parde, Rodney D. Nielsen

Abstract: Artificial intelligence is revolutionizing formal education, fueled by innovations in learning assessment, content generation, and instructional delivery. Informal, lifelong learning settings have been the subject of less attention. We provide a proof-of-concept for an embodied book discussion companion, designed to stimulate conversations with readers about particularly creative metaphors in fict… ▽ More Artificial intelligence is revolutionizing formal education, fueled by innovations in learning assessment, content generation, and instructional delivery. Informal, lifelong learning settings have been the subject of less attention. We provide a proof-of-concept for an embodied book discussion companion, designed to stimulate conversations with readers about particularly creative metaphors in fiction literature. We collect ratings from 26 participants, each of whom discuss Jane Austen's "Pride and Prejudice" with the robot across one or more sessions, and find that participants rate their interactions highly. This suggests that companion robots could be an interesting entryway for the promotion of lifelong learning and cognitive exercise in future applications. △ Less

Submitted 7 April, 2019; originally announced April 2019.

Comments: Accepted to the 20th International Conference on Artificial Intelligence in Education (AIED 2019)

arXiv:1811.04504 [pdf, other]

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient

Authors: Aaron Mishkin, Frederik Kunstner, Didrik Nielsen, Mark Schmidt, Mohammad Emtiyaz Khan

Abstract: Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue,… ▽ More Uncertainty estimation in large deep-learning models is a computationally challenging task, where it is difficult to form even a Gaussian approximation to the posterior distribution. In such situations, existing methods usually resort to a diagonal approximation of the covariance matrix despite, the fact that these matrices are known to result in poor uncertainty estimates. To address this issue, we propose a new stochastic, low-rank, approximate natural-gradient (SLANG) method for variational inference in large, deep models. Our method estimates a "diagonal plus low-rank" structure based solely on back-propagated gradients of the network log-likelihood. This requires strictly less gradient computations than methods that compute the gradient of the whole variational objective. Empirical evaluations on standard benchmarks confirm that SLANG enables faster and more accurate estimation of uncertainty than mean-field methods, and performs comparably to state-of-the-art methods. △ Less

Submitted 11 January, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

Comments: NeurIPS 2018 final version

arXiv:1807.04489 [pdf, other]

Fast yet Simple Natural-Gradient Descent for Variational Inference in Complex Models

Authors: Mohammad Emtiyaz Khan, Didrik Nielsen

Abstract: Bayesian inference plays an important role in advancing machine learning, but faces computational challenges when applied to complex models such as deep neural networks. Variational inference circumvents these challenges by formulating Bayesian inference as an optimization problem and solving it using gradient-based optimization. In this paper, we argue in favor of natural-gradient approaches whic… ▽ More Bayesian inference plays an important role in advancing machine learning, but faces computational challenges when applied to complex models such as deep neural networks. Variational inference circumvents these challenges by formulating Bayesian inference as an optimization problem and solving it using gradient-based optimization. In this paper, we argue in favor of natural-gradient approaches which, unlike their gradient-based counterparts, can improve convergence by exploiting the information geometry of the solutions. We show how to derive fast yet simple natural-gradient updates by using a duality associated with exponential-family distributions. An attractive feature of these methods is that, by using natural-gradients, they are able to extract accurate local approximations for individual model components. We summarize recent results for Bayesian deep learning showing the superiority of natural-gradient approaches over their gradient counterparts. △ Less

Submitted 2 August, 2018; v1 submitted 12 July, 2018; originally announced July 2018.

Comments: Camera-ready version

Journal ref: International Symposium on Information Theory and Its Applications (ISITA), 2018

arXiv:1806.04854 [pdf, other]

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

Authors: Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava

Abstract: Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented w… ▽ More Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization. △ Less

Submitted 2 August, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: Camera ready version

Journal ref: Thirty-fifth International Conference on Machine Learning, 2018

arXiv:1806.03369 [pdf, other]

#SarcasmDetection is soooo general! Towards a Domain-Independent Approach for Detecting Sarcasm

Authors: Natalie Parde, Rodney D. Nielsen

Abstract: Automatic sarcasm detection methods have traditionally been designed for maximum performance on a specific domain. This poses challenges for those wishing to transfer those approaches to other existing or novel domains, which may be typified by very different language characteristics. We develop a general set of features and evaluate it under different training scenarios utilizing in-domain and/or… ▽ More Automatic sarcasm detection methods have traditionally been designed for maximum performance on a specific domain. This poses challenges for those wishing to transfer those approaches to other existing or novel domains, which may be typified by very different language characteristics. We develop a general set of features and evaluate it under different training scenarios utilizing in-domain and/or out-of-domain training data. The best-performing scenario, training on both while employing a domain adaptation step, achieves an F1 of 0.780, which is well above baseline F1-measures of 0.515 and 0.345. We also show that the approach outperforms the best results from prior work on the same target domain. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference

arXiv:1805.05470 [pdf, other]

doi 10.1145/3208903.3208924

Adaptive User-Oriented Direct Load-Control of Residential Flexible Devices

Authors: Davide Frazzetto, Bijay Neupane, Torben Bach Pedersen, Thomas Dyhre Nielsen

Abstract: Demand Response (DR) schemes are effective tools to maintain a dynamic balance in energy markets with higher integration of fluctuating renewable energy sources. DR schemes can be used to harness residential devices' flexibility and to utilize it to achieve social and financial objectives. However, existing DR schemes suffer from low user participation as they fail at taking into account the users… ▽ More Demand Response (DR) schemes are effective tools to maintain a dynamic balance in energy markets with higher integration of fluctuating renewable energy sources. DR schemes can be used to harness residential devices' flexibility and to utilize it to achieve social and financial objectives. However, existing DR schemes suffer from low user participation as they fail at taking into account the users' requirements. First, DR schemes are highly demanding for the users, as users need to provide direct information, e.g. via surveys, on their energy consumption preferences. Second, the user utility models based on these surveys are hard-coded and do not adapt over time. Third, the existing scheduling techniques require the users to input their energy requirements on a daily basis. As an alternative, this paper proposes a DR scheme for user-oriented direct load-control of residential appliances operations. Instead of relying on user surveys to evaluate the user utility, we propose an online data-driven approach for estimating user utility functions, purely based on available load consumption data, that adaptively models the users' preference over time. Our scheme is based on a day-ahead scheduling technique that transparently prescribes the users with optimal device operation schedules that take into account both financial benefits and user-perceived quality of service. To model day-ahead user energy demand and flexibility, we propose a probabilistic approach for generating flexibility models under uncertainty. Results on both real-world and simulated datasets show that our DR scheme can provide significant financial benefits while preserving the user-perceived quality of service. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: 10 pages plus 1 page references, 11 figures, conference: ACM e-Energy 2018

arXiv:1711.05560 [pdf, other]

Variational Adaptive-Newton Method for Explorative Learning

Authors: Mohammad Emtiyaz Khan, Wu Lin, Voot Tangkaratt, Zuozhu Liu, Didrik Nielsen

Abstract: We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution… ▽ More We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution reveals that VAN is a second-order method that unifies existing methods in distinct fields of continuous optimization, variational inference, and evolution strategies. Our experimental results show that VAN performs well on a wide-variety of learning tasks. This work presents a general-purpose explorative-learning method that has the potential to improve learning in areas such as active learning and reinforcement learning. △ Less

Submitted 15 November, 2017; originally announced November 2017.

arXiv:1707.02293 [pdf, other]

Bayesian Models of Data Streams with Hierarchical Power Priors

Authors: Andres Masegosa, Thomas D. Nielsen, Helge Langseth, Dario Ramos-Lopez, Antonio Salmeron, Anders L. Madsen

Abstract: Making inferences from data streams is a pervasive problem in many modern data analysis applications. But it requires to address the problem of continuous model updating and adapt to changes or drifts in the underlying data generating distribution. In this paper, we approach these problems from a Bayesian perspective covering general conjugate exponential models. Our proposal makes use of non-conj… ▽ More Making inferences from data streams is a pervasive problem in many modern data analysis applications. But it requires to address the problem of continuous model updating and adapt to changes or drifts in the underlying data generating distribution. In this paper, we approach these problems from a Bayesian perspective covering general conjugate exponential models. Our proposal makes use of non-conjugate hierarchical priors to explicitly model temporal changes of the model parameters. We also derive a novel variational inference scheme which overcomes the use of non-conjugate priors while maintaining the computational efficiency of variational methods over conjugate models. The approach is validated on three real data sets over three latent variable models. △ Less

Submitted 7 July, 2017; originally announced July 2017.

Comments: ICML 2017

arXiv:1704.01427 [pdf, ps, other]

doi 10.1016/j.knosys.2018.09.019

AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning

Authors: Andrés R. Masegosa, Ana M. Martínez, Darío Ramos-López, Rafael Cabañas, Antonio Salmerón, Thomas D. Nielsen, Helge Langseth, Anders L. Madsen

Abstract: The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorit… ▽ More The AMIDST Toolbox is a software for scalable probabilistic machine learning with a spe- cial focus on (massive) streaming data. The toolbox supports a flexible modeling language based on probabilistic graphical models with latent variables and temporal dependencies. The specified models can be learnt from large data sets using parallel or distributed implementa- tions of Bayesian learning algorithms for either streaming or batch data. These algorithms are based on a flexible variational message passing scheme, which supports discrete and continu- ous variables from a wide range of probability distributions. AMIDST also leverages existing functionality and algorithms by interfacing to software tools such as Flink, Spark, MOA, Weka, R and HUGIN. AMIDST is an open source toolbox written in Java and available at http://www.amidsttoolbox.com under the Apache Software License version 2.0. △ Less

Submitted 4 April, 2017; originally announced April 2017.

ACM Class: I.2.6

arXiv:1511.03603 [pdf, other]

doi 10.1109/EMBC.2014.6944375

Automatic Measurement of Physical Mobility in Get-Up-and-Go Test Using Kinect Sensor

Authors: Amir H. Kargar B., Ali Mollahosseini, Taylor Struemph, Wilson Pace, Rodney D. Nielsen, Mohammad H. Mahoor

Abstract: Get-Up-and-Go Test is commonly used for assessing the physical mobility of the elderly by physicians. This paper presents a method for automatic analysis and classification of human gait in the Get-Up-and-Go Test using a Microsoft Kinect sensor. Two types of features are automatically extracted from the human skeleton data provided by the Kinect sensor. The first type of feature is related to the… ▽ More Get-Up-and-Go Test is commonly used for assessing the physical mobility of the elderly by physicians. This paper presents a method for automatic analysis and classification of human gait in the Get-Up-and-Go Test using a Microsoft Kinect sensor. Two types of features are automatically extracted from the human skeleton data provided by the Kinect sensor. The first type of feature is related to the human gait (e.g., number of steps, step duration, and turning duration); whereas the other one describes the anatomical configuration (e.g., knee angles, leg angle, and distance between elbows). These features characterize the degree of human physical mobility. State-of-the-art machine learning algorithms (i.e. Bag of Words and Support Vector Machines) are used to classify the severity of gaits in 12 subjects with ages ranging between 65 and 90 enrolled in a pilot study. Our experimental results show that these features can discriminate between patients who have a high risk for falling and patients with a lower fall risk. △ Less

Submitted 11 November, 2015; originally announced November 2015.

Comments: Published in: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE

arXiv:1301.6729 [pdf]

Welldefined Decision Scenarios

Authors: Thomas D. Nielsen, Finn Verner Jensen

Abstract: Influence diagrams serve as a powerful tool for modelling symmetric decision problems. When solving an influence diagram we determine a set of strategies for the decisions involved. A strategy for a decision variable is in principle a function over its past. However, some of the past may be irrelevant for the decision, and for computational reasons it is important not to deal with redundant variab… ▽ More Influence diagrams serve as a powerful tool for modelling symmetric decision problems. When solving an influence diagram we determine a set of strategies for the decisions involved. A strategy for a decision variable is in principle a function over its past. However, some of the past may be irrelevant for the decision, and for computational reasons it is important not to deal with redundant variables in the strategies. We show that current methods (e.g. the "Decision Bayes-ball" algorithm by Shachter UAI98) do not determine the relevant past, and we present a complete algorithm. Actually, this paper takes a more general outset: When formulating a decision scenario as an influence diagram, a linear temporal ordering of the decisions variables is required. This constraint ensures that the decision scenario is welldefined. However, the structure of a decision scenario often yields certain decisions conditionally independent, and it is therefore unnecessary to impose a linear temporal ordering on the decisions. In this paper we deal with partial influence diagrams i.e. influence diagrams with only a partial temporal ordering specified. We present a set of conditions which are necessary and sufficient to ensure that a partial influence diagram is welldefined. These conditions are used as a basis for the construction of an algorithm for determining whether or not a partial influence diagram is welldefined. △ Less

Submitted 23 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI1999)

Report number: UAI-P-1999-PG-502-511

arXiv:1301.3880 [pdf]

Using ROBDDs for Inference in Bayesian Networks with Troubleshooting as an Example

Authors: Thomas D. Nielsen, Pierre-Henri Wuillemin, Finn Verner Jensen, Uffe Kjærulff

Abstract: When using Bayesian networks for modelling the behavior of man-made machinery, it usually happens that a large part of the model is deterministic. For such Bayesian networks deterministic part of the model can be represented as a Boolean function, and a central part of belief updating reduces to the task of calculating the number of satisfying configurations in a Boolean function. In this paper we… ▽ More When using Bayesian networks for modelling the behavior of man-made machinery, it usually happens that a large part of the model is deterministic. For such Bayesian networks deterministic part of the model can be represented as a Boolean function, and a central part of belief updating reduces to the task of calculating the number of satisfying configurations in a Boolean function. In this paper we explore how advances in the calculation of Boolean functions can be adopted for belief updating, in particular within the context of troubleshooting. We present experimental results indicating a substantial speed-up compared to traditional junction tree propagation. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Report number: UAI-P-2000-PG-426-435

arXiv:1301.3879 [pdf]

Representing and Solving Asymmetric Bayesian Decision Problems

Authors: Thomas D. Nielsen, Finn Verner Jensen

Abstract: This paper deals with the representation and solution of asymmetric Bayesian decision problems. We present a formal framework, termed asymmetric influence diagrams, that is based on the influence diagram and allows an efficient representation of asymmetric decision problems. As opposed to existing frameworks, the asymmetric influece diagram primarily encodes asymmetry at the qualitative level and… ▽ More This paper deals with the representation and solution of asymmetric Bayesian decision problems. We present a formal framework, termed asymmetric influence diagrams, that is based on the influence diagram and allows an efficient representation of asymmetric decision problems. As opposed to existing frameworks, the asymmetric influece diagram primarily encodes asymmetry at the qualitative level and it can therefore be read directly from the model. We give an algorithm for solving asymmetric influence diagrams. The algorithm initially decomposes the asymmetric decision problem into a structure of symmetric subproblems organized as a tree. A solution to the decision problem can then be found by propagating from the leaves toward the root using existing evaluation methods to solve the sub-problems. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Report number: UAI-P-2000-PG-416-425

arXiv:1212.3873 [pdf, ps, other]

doi 10.4204/EPTCS.103.6

Learning Markov Decision Processes for Model Checking

Authors: Hua Mao, Yingke Chen, Manfred Jaeger, Thomas D. Nielsen, Kim G. Larsen, Brian Nielsen

Abstract: Constructing an accurate system model for formal model verification can be both resource demanding and time-consuming. To alleviate this shortcoming, algorithms have been proposed for automatically learning system models based on observed system behaviors. In this paper we extend the algorithm on learning probabilistic automata to reactive systems, where the observed system behavior is in the form… ▽ More Constructing an accurate system model for formal model verification can be both resource demanding and time-consuming. To alleviate this shortcoming, algorithms have been proposed for automatically learning system models based on observed system behaviors. In this paper we extend the algorithm on learning probabilistic automata to reactive systems, where the observed system behavior is in the form of alternating sequences of inputs and outputs. We propose an algorithm for automatically learning a deterministic labeled Markov decision process model from the observed behavior of a reactive system. The proposed learning algorithm is adapted from algorithms for learning deterministic probabilistic finite automata, and extended to include both probabilistic and nondeterministic transitions. The algorithm is empirically analyzed and evaluated by learning system models of slot machines. The evaluation is performed by analyzing the probabilistic linear temporal logic properties of the system as well as by analyzing the schedulers, in particular the optimal schedulers, induced by the learned models. △ Less

Submitted 16 December, 2012; originally announced December 2012.

Comments: In Proceedings QFM 2012, arXiv:1212.3454

Journal ref: EPTCS 103, 2012, pp. 49-63

arXiv:1212.2500 [pdf]

On Local Optima in Learning Bayesian Networks

Authors: Jens D. Nielsen, Tomas Kocka, Jose M. Pena

Abstract: This paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima. When greediness is set at maximum, KES corresponds to the greedy equivalence search algorithm (GES). When greediness is k… ▽ More This paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima. When greediness is set at maximum, KES corresponds to the greedy equivalence search algorithm (GES). When greediness is kept at minimum, we prove that under mild assumptions KES asymptotically returns any inclusion optimal BN with nonzero probability. Experimental results for both synthetic and real data are reported showing that KES often finds a better local optima than GES. Moreover, we use KES to experimentally confirm that the number of different local optima is often huge. △ Less

Submitted 19 October, 2012; originally announced December 2012.

Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

Report number: UAI-P-2003-PG-435-442

Showing 1–42 of 42 results for author: Nielsen, D