Search | arXiv e-print repository

Challenges in explaining deep learning models for data with biological variation

Authors: Lenka Tětková, Erik Schou Dreier, Robin Malm, Lars Kai Hansen

Abstract: Much machine learning research progress is based on develo** models and evaluating them on a benchmark dataset (e.g., ImageNet for images). However, applying such benchmark-successful methods to real-world data often does not work as expected. This is particularly the case for biological data where we expect variability at multiple time and spatial scales. In this work, we are using grain data a… ▽ More Much machine learning research progress is based on develo** models and evaluating them on a benchmark dataset (e.g., ImageNet for images). However, applying such benchmark-successful methods to real-world data often does not work as expected. This is particularly the case for biological data where we expect variability at multiple time and spatial scales. In this work, we are using grain data and the goal is to detect diseases and damages. Pink fusarium, skinned grains, and other diseases and damages are key factors in setting the price of grains or excluding dangerous grains from food production. Apart from challenges stemming from differences of the data from the standard toy datasets, we also present challenges that need to be overcome when explaining deep learning models. For example, explainability methods have many hyperparameters that can give different results, and the ones published in the papers do not work on dissimilar images. Other challenges are more general: problems with visualization of the explanations and their comparison since the magnitudes of their values differ from method to method. An open fundamental question also is: How to evaluate explanations? It is a non-trivial task because the "ground truth" is usually missing or ill-defined. Also, human annotators may create what they think is an explanation of the task at hand, yet the machine learning model might solve it in a different and perhaps counter-intuitive way. We discuss several of these challenges and evaluate various post-hoc explainability methods on grain data. We focus on robustness, quality of explanations, and similarity to particular "ground truth" annotations made by experts. The goal is to find the methods that overall perform well and could be used in this challenging task. We hope the proposed pipeline will be used as a framework for evaluating explainability methods in specific use cases. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2404.07008 [pdf, other]

Knowledge graphs for empirical concept retrieval

Authors: Lenka Tětková, Teresa Karen Scheidt, Maria Mandrup Fogh, Ellen Marie Gaunby Jørgensen, Finn Årup Nielsen, Lars Kai Hansen

Abstract: Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz.\ as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018)… ▽ More Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user, viz.\ as a tool for personalized explainability. An important class of concept-based explainability methods is constructed with empirically defined concepts, indirectly defined through a set of positive and negative examples, as in the TCAV approach (Kim et al., 2018). While it is appealing to the user to avoid formal definitions of concepts and their operationalization, it can be challenging to establish relevant concept datasets. Here, we address this challenge using general knowledge graphs (such as, e.g., Wikidata or WordNet) for comprehensive concept definition and present a workflow for user-driven data collection in both text and image domains. The concepts derived from knowledge graphs are defined interactively, providing an opportunity for personalization and ensuring that the concepts reflect the user's intentions. We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs) (Crabbe and van der Schaar, 2022). We show that CAVs and CARs based on these empirical concept datasets provide robust and accurate explanations. Importantly, we also find good alignment between the models' representations of concepts and the structure of knowledge graphs, i.e., human representations. This supports our conclusion that knowledge graph-based concepts are relevant for XAI. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Preprint. Accepted to The 2nd World Conference on eXplainable Artificial Intelligence

arXiv:2311.18364 [pdf, other]

Hubness Reduction Improves Sentence-BERT Semantic Spaces

Authors: Beatrix M. G. Nielsen, Lars Kai Hansen

Abstract: Semantic representations of text, i.e. representations of natural language which capture meaning by geometry, are essential for areas such as information retrieval and document grou**. High-dimensional trained dense vectors have received much attention in recent years as such representations. We investigate the structure of semantic spaces that arise from embeddings made with Sentence-BERT and f… ▽ More Semantic representations of text, i.e. representations of natural language which capture meaning by geometry, are essential for areas such as information retrieval and document grou**. High-dimensional trained dense vectors have received much attention in recent years as such representations. We investigate the structure of semantic spaces that arise from embeddings made with Sentence-BERT and find that the representations suffer from a well-known problem in high dimensions called hubness. Hubness results in asymmetric neighborhood relations, such that some texts (the hubs) are neighbours of many other texts while most texts (so-called anti-hubs), are neighbours of few or no other texts. We quantify the semantic quality of the embeddings using hubness scores and error rate of a neighbourhood based classifier. We find that when hubness is high, we can reduce error rate and hubness using hubness reduction methods. We identify a combination of two methods as resulting in the best reduction. For example, on one of the tested pretrained models, this combined method can reduce hubness by about 75% and error rate by about 9%. Thus, we argue that mitigating hubness in the embedding space provides better semantic representations of text. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: Accepted at NLDL 2024

arXiv:2307.12745 [pdf, ps, other]

Concept-based explainability for an EEG transformer model

Authors: Anders Gjølbye Madsen, William Theodor Lehn-Schiøler, Áshildur Jónsdóttir, Bergdís Arnardóttir, Lars Kai Hansen

Abstract: Deep learning models are complex due to their size, structure, and inherent randomness in training procedures. Additional complexity arises from the selection of datasets and inductive biases. Addressing these challenges for explainability, Kim et al. (2018) introduced Concept Activation Vectors (CAVs), which aim to understand deep models' internal states in terms of human-aligned concepts. These… ▽ More Deep learning models are complex due to their size, structure, and inherent randomness in training procedures. Additional complexity arises from the selection of datasets and inductive biases. Addressing these challenges for explainability, Kim et al. (2018) introduced Concept Activation Vectors (CAVs), which aim to understand deep models' internal states in terms of human-aligned concepts. These concepts correspond to directions in latent space, identified using linear discriminants. Although this method was first applied to image classification, it was later adapted to other domains, including natural language processing. In this work, we attempt to apply the method to electroencephalogram (EEG) data for explainability in Kostas et al.'s BENDR (2021), a large-scale transformer model. A crucial part of this endeavor involves defining the explanatory concepts and selecting relevant datasets to ground concepts in the latent space. Our focus is on two mechanisms for EEG concept formation: the use of externally labeled EEG datasets, and the application of anatomically defined concepts. The former approach is a straightforward generalization of methods used in image classification, while the latter is novel and specific to EEG. We present evidence that both approaches to concept formation yield valuable insights into the representations learned by deep EEG models. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: To appear in proceedings of 2023 IEEE International workshop on Machine Learning for Signal Processing

arXiv:2306.03009 [pdf, other]

doi 10.1038/s43588-023-00573-5

Using Sequences of Life-events to Predict Human Lives

Authors: Germans Savcisens, Tina Eliassi-Rad, Lars Kai Hansen, Laust Mortensen, Lau Lilleholt, Anna Rogers, Ingo Zettler, Sune Lehmann

Abstract: Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also rep… ▽ More Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also represent human lives in a way that shares this structural similarity to language. From one perspective, lives are simply sequences of events: People are born, visit the pediatrician, start school, move to a new location, get married, and so on. Here, we exploit this similarity to adapt innovations from natural language processing to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on arguably the most comprehensive registry data in existence, available for an entire nation of more than six million individuals across decades. Our data include information about life-events related to health, education, occupation, income, address, and working hours, recorded with day-to-day resolution. We create embeddings of life-events in a single vector space showing that this embedding space is robust and highly structured. Our models allow us to predict diverse outcomes ranging from early mortality to personality nuances, outperforming state-of-the-art models by a wide margin. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to identify new potential mechanisms that impact life outcomes and associated possibilities for personalized interventions. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.00561 [pdf, other]

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

Authors: Sarthak Yadav, Sergios Theodoridis, Lars Kai Hansen, Zheng-Hua Tan

Abstract: In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standar… ▽ More In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, along with demonstrating considerably better scaling characteristics. Investigating attention distances and entropies reveals that MW-MAE encoders learn heads with broader local and global attention. Analyzing attention head feature representations through Projection Weighted Canonical Correlation Analysis (PWCCA) shows that attention heads with the same window sizes across the decoder layers of the MW-MAE learn correlated feature representations which enables each block to independently capture local and global information, leading to a decoupled decoder feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained models will be released publically. △ Less

Submitted 1 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.17154 [pdf, other]

On convex decision regions in deep network representations

Authors: Lenka Tětková, Thea Brüsch, Teresa Karen Scheidt, Fabian Martin Mager, Rasmus Ørtoft Aagaard, Jonathan Foldager, Tommy Sonne Alstrøm, Lars Kai Hansen

Abstract: Current work on human-machine alignment aims at understanding machine-learned latent spaces and their correspondence to human representations. G{ä}rdenfors' conceptual spaces is a prominent framework for understanding human representations. Convexity of object regions in conceptual spaces is argued to promote generalizability, few-shot learning, and interpersonal alignment. Based on these insights… ▽ More Current work on human-machine alignment aims at understanding machine-learned latent spaces and their correspondence to human representations. G{ä}rdenfors' conceptual spaces is a prominent framework for understanding human representations. Convexity of object regions in conceptual spaces is argued to promote generalizability, few-shot learning, and interpersonal alignment. Based on these insights, we investigate the notion of convexity of concept regions in machine-learned latent spaces. We develop a set of tools for measuring convexity in sampled data and evaluate emergent convexity in layered representations of state-of-the-art deep networks. We show that convexity is robust to basic re-parametrization and, hence, meaningful as a quality of machine-learned latent spaces. We find that approximate convexity is pervasive in neural representations in multiple application domains, including models of images, audio, human activity, text, and medical images. Generally, we observe that fine-tuning increases the convexity of label regions. We find evidence that pretraining convexity of class label regions predicts subsequent fine-tuning performance. △ Less

Submitted 6 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2304.08984 [pdf, other]

Robustness of Visual Explanations to Common Data Augmentation

Authors: Lenka Tětková, Lars Kai Hansen

Abstract: As the use of deep neural networks continues to grow, understanding their behaviour has become more crucial than ever. Post-hoc explainability methods are a potential solution, but their reliability is being called into question. Our research investigates the response of post-hoc visual explanations to naturally occurring transformations, often referred to as augmentations. We anticipate explanati… ▽ More As the use of deep neural networks continues to grow, understanding their behaviour has become more crucial than ever. Post-hoc explainability methods are a potential solution, but their reliability is being called into question. Our research investigates the response of post-hoc visual explanations to naturally occurring transformations, often referred to as augmentations. We anticipate explanations to be invariant under certain transformations, such as changes to the colour map while responding in an equivariant manner to transformations like translation, object scaling, and rotation. We have found remarkable differences in robustness depending on the type of transformation, with some explainability methods (such as LRP composites and Guided Backprop) being more stable than others. We also explore the role of training with data augmentation. We provide evidence that explanations are typically less robust to augmentation than classification performance, regardless of whether data augmentation is used in training or not. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Accepted to The 2nd Explainable AI for Computer Vision (XAI4CV) Workshop at CVPR 2023

arXiv:2301.05983 [pdf, other]

On the role of Model Uncertainties in Bayesian Optimization

Authors: Jonathan Foldager, Mikkel Jordahn, Lars Kai Hansen, Michael Riis Andersen

Abstract: Bayesian optimization (BO) is a popular method for black-box optimization, which relies on uncertainty as part of its decision-making process when deciding which experiment to perform next. However, not much work has addressed the effect of uncertainty on the performance of the BO algorithm and to what extent calibrated uncertainties improve the ability to find the global optimum. In this work, we… ▽ More Bayesian optimization (BO) is a popular method for black-box optimization, which relies on uncertainty as part of its decision-making process when deciding which experiment to perform next. However, not much work has addressed the effect of uncertainty on the performance of the BO algorithm and to what extent calibrated uncertainties improve the ability to find the global optimum. In this work, we provide an extensive study of the relationship between the BO performance (regret) and uncertainty calibration for popular surrogate models and compare them across both synthetic and real-world experiments. Our results confirm that Gaussian Processes are strong surrogate models and that they tend to outperform other popular models. Our results further show a positive association between calibration error and regret, but interestingly, this association disappears when we control for the type of model in the analysis. We also studied the effect of re-calibration and demonstrate that it generally does not lead to improved regret. Finally, we provide theoretical justification for why uncertainty calibration might be difficult to combine with BO due to the small sample sizes commonly used. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 14 pages, 4 figures, 2 tables

arXiv:2111.03935 [pdf, other]

Noise-Assisted Variational Quantum Thermalization

Authors: Jonathan Foldager, Arthur Pesah, Lars Kai Hansen

Abstract: Preparing thermal states on a quantum computer can have a variety of applications, from simulating many-body quantum systems to training machine learning models. Variational circuits have been proposed for this task on near-term quantum computers, but several challenges remain, such as finding a scalable cost-function, avoiding the need of purification, and mitigating noise effects. We propose a n… ▽ More Preparing thermal states on a quantum computer can have a variety of applications, from simulating many-body quantum systems to training machine learning models. Variational circuits have been proposed for this task on near-term quantum computers, but several challenges remain, such as finding a scalable cost-function, avoiding the need of purification, and mitigating noise effects. We propose a new algorithm for thermal state preparation that tackles those three challenges by exploiting the noise of quantum circuits. We consider a variational architecture containing a depolarizing channel after each unitary layer, with the ability to directly control the level of noise. We derive a closed-form approximation for the free-energy of such circuit and use it as a cost function for our variational algorithm. By evaluating our method on a variety of Hamiltonians and system sizes, we find several systems for which the thermal state can be approximated with a high fidelity. However, we also show that the ability for our algorithm to learn the thermal state strongly depends on the temperature: while a high fidelity can be obtained for high and low temperatures, we identify a specific range for which the problem becomes more challenging. We hope that this first study on noise-assisted thermal state preparation will inspire future research on exploiting noise in variational algorithms. △ Less

Submitted 6 November, 2021; originally announced November 2021.

Comments: 13 pages, 7 figures. Submitted to Scientific Reports

arXiv:2109.12306 [pdf, other]

Topic Model Robustness to Automatic Speech Recognition Errors in Podcast Transcripts

Authors: Raluca Alexandra Fetic, Mikkel Jordahn, Lucas Chaves Lima, Rasmus Arpe Fogh Egebæk, Martin Carsten Nielsen, Benjamin Biering, Lars Kai Hansen

Abstract: For a multilingual podcast streaming service, it is critical to be able to deliver relevant content to all users independent of language. Podcast content relevance is conventionally determined using various metadata sources. However, with the increasing quality of speech recognition in many languages, utilizing automatic transcriptions to provide better content recommendations becomes possible. In… ▽ More For a multilingual podcast streaming service, it is critical to be able to deliver relevant content to all users independent of language. Podcast content relevance is conventionally determined using various metadata sources. However, with the increasing quality of speech recognition in many languages, utilizing automatic transcriptions to provide better content recommendations becomes possible. In this work, we explore the robustness of a Latent Dirichlet Allocation topic model when applied to transcripts created by an automatic speech recognition engine. Specifically, we explore how increasing transcription noise influences topics obtained from transcriptions in Danish; a low resource language. First, we observe a baseline of cosine similarity scores between topic embeddings from automatic transcriptions and the descriptions of the podcasts written by the podcast creators. We then observe how the cosine similarities decrease as transcription noise increases and conclude that even when automatic speech recognition transcripts are erroneous, it is still possible to obtain high-quality topic embeddings from the transcriptions. △ Less

Submitted 25 September, 2021; originally announced September 2021.

arXiv:2107.02253 [pdf, other]

Generalization by design: Shortcuts to Generalization in Deep Learning

Authors: Petr Taborsky, Lars Kai Hansen

Abstract: We take a geometrical viewpoint and present a unifying view on supervised deep learning with the Bregman divergence loss function - this entails frequent classification and prediction tasks. Motivated by simulations we suggest that there is principally no implicit bias of vanilla stochastic gradient descent training of deep models towards "simpler" functions. Instead, we show that good generalizat… ▽ More We take a geometrical viewpoint and present a unifying view on supervised deep learning with the Bregman divergence loss function - this entails frequent classification and prediction tasks. Motivated by simulations we suggest that there is principally no implicit bias of vanilla stochastic gradient descent training of deep models towards "simpler" functions. Instead, we show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer. It is revealed that in deep enough models such a regularizer enables both, extreme accuracy and generalization, to be reached. We associate popular regularization techniques like weight decay, drop out, batch normalization, and early stop** with this perspective. Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network. We design two such easy-to-use structural regularizers that insert an additional \textit{generalization layer} into a model architecture, one with a skip connection and another one with drop-out. We verify our theoretical results in experiments on various feedforward and convolutional architectures, including ResNets, and datasets (MNIST, CIFAR10, synthetic data). We believe this work opens up new avenues of research towards better generalizing architectures. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 16 pages + 9 pages supplementary

arXiv:2010.15718 [pdf, other]

Minimal Model Structure Analysis for Input Reconstruction in Federated Learning

Authors: Jia Qian, Hiba Nassar, Lars Kai Hansen

Abstract: \ac{fl} proposed a distributed \ac{ml} framework where every distributed worker owns a complete copy of global model and their own data. The training is occurred locally, which assures no direct transmission of training data. However, the recent work \citep{zhu2019deep} demonstrated that input data from a neural network may be reconstructed only using knowledge of gradients of that network, which… ▽ More \ac{fl} proposed a distributed \ac{ml} framework where every distributed worker owns a complete copy of global model and their own data. The training is occurred locally, which assures no direct transmission of training data. However, the recent work \citep{zhu2019deep} demonstrated that input data from a neural network may be reconstructed only using knowledge of gradients of that network, which completely breached the promise of \ac{fl} and sabotaged the user privacy. In this work, we aim to further explore the theoretical limits of reconstruction, speedup and stabilize the reconstruction procedure. We show that a single input may be reconstructed with the analytical form, regardless of network depth using a fully-connected neural network with one hidden node. Then we generalize this result to a gradient averaged over batches of size $B$. In this case, the full batch can be reconstructed if the number of hidden units exceeds $B$. For a \ac{cnn}, the number of required kernels in convolutional layers is decided by multiple factors, e.g., padding, kernel and stride size, etc. We require the number of kernels $h\geq (\frac{d}{d^{\prime}})^2C$, where we define $d$ as input width, $d^{\prime}$ as output width after convolutional layer, and $C$ as channel number of input. We validate our observation and demonstrate the improvements using bio-medical (fMRI, \ac{wbc}) and benchmark data (MNIST, Kuzushiji-MNIST, CIFAR100, ImageNet and face images). △ Less

Submitted 5 November, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

arXiv:2007.06381 [pdf, other]

A simple defense against adversarial attacks on heatmap explanations

Authors: Laura Rieger, Lars Kai Hansen

Abstract: With machine learning models being used for more sensitive applications, we rely on interpretability methods to prove that no discriminating attributes were used for classification. A potential concern is the so-called "fair-washing" - manipulating a model such that the features used in reality are hidden and more innocuous features are shown to be important instead. In our work we present an ef… ▽ More With machine learning models being used for more sensitive applications, we rely on interpretability methods to prove that no discriminating attributes were used for classification. A potential concern is the so-called "fair-washing" - manipulating a model such that the features used in reality are hidden and more innocuous features are shown to be important instead. In our work we present an effective defence against such adversarial attacks on neural networks. By a simple aggregation of multiple explanation methods, the network becomes robust against manipulation. This holds even when the attacker has exact knowledge of the model weights and the explanation methods used. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted at 2020 Workshop on Human Interpretability in Machine Learning (WHI)

arXiv:2007.04806 [pdf, other]

Client Adaptation improves Federated Learning with Simulated Non-IID Clients

Authors: Laura Rieger, Rasmus M. Th. Høegh, Lars K. Hansen

Abstract: We present a federated learning approach for learning a client adaptable, robust model when data is non-identically and non-independently distributed (non-IID) across clients. By simulating heterogeneous clients, we show that adding learned client-specific conditioning improves model performance, and the approach is shown to work on balanced and imbalanced data set from both audio and image domain… ▽ More We present a federated learning approach for learning a client adaptable, robust model when data is non-identically and non-independently distributed (non-IID) across clients. By simulating heterogeneous clients, we show that adding learned client-specific conditioning improves model performance, and the approach is shown to work on balanced and imbalanced data set from both audio and image domains. The client adaptation is implemented by a conditional gated activation unit and is particularly beneficial when there are large differences between the data distribution for each client, a common scenario in federated learning. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: 11 pages, 11 figures. To appear at International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2020

arXiv:2006.09046 [pdf, other]

Probabilistic Decoupling of Labels in Classification

Authors: Jeppe Nørregaard, Lars Kai Hansen

Abstract: In this paper we develop a principled, probabilistic, unified approach to non-standard classification tasks, such as semi-supervised, positive-unlabelled, multi-positive-unlabelled and noisy-label learning. We train a classifier on the given labels to predict the label-distribution. We then infer the underlying class-distributions by variationally optimizing a model of label-class transitions. In this paper we develop a principled, probabilistic, unified approach to non-standard classification tasks, such as semi-supervised, positive-unlabelled, multi-positive-unlabelled and noisy-label learning. We train a classifier on the given labels to predict the label-distribution. We then infer the underlying class-distributions by variationally optimizing a model of label-class transitions. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: Submitted to ICML 2020 (not accepted)

arXiv:2004.12482 [pdf, other]

doi 10.5220/0010377112001209

On the Limits to Multi-Modal Popularity Prediction on Instagram -- A New Robust, Efficient and Explainable Baseline

Authors: Christoffer Riis, Damian Konrad Kowalczyk, Lars Kai Hansen

Abstract: Our global population contributes visual content on platforms like Instagram, attempting to express themselves and engage their audiences, at an unprecedented and increasing rate. In this paper, we revisit the popularity prediction on Instagram. We present a robust, efficient, and explainable baseline for population-based popularity prediction, achieving strong ranking performance. We employ the l… ▽ More Our global population contributes visual content on platforms like Instagram, attempting to express themselves and engage their audiences, at an unprecedented and increasing rate. In this paper, we revisit the popularity prediction on Instagram. We present a robust, efficient, and explainable baseline for population-based popularity prediction, achieving strong ranking performance. We employ the latest methods in computer vision to maximize the information extracted from the visual modality. We use transfer learning to extract visual semantics such as concepts, scenes, and objects, allowing a new level of scrutiny in an extensive, explainable ablation study. We inform feature selection towards a robust and scalable model, but also illustrate feature interactions, offering new directions for further inquiry in computational social science. Our strongest models inform a lower limit to population-based predictability of popularity on Instagram. The models are immediately applicable to social media monitoring and influencer identification. △ Less

Submitted 20 February, 2021; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: Presented at ICAART 2021

Journal ref: Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-484-8, pages 1200-1209, 2021

arXiv:2003.08747 [pdf, other]

IROF: a low resource evaluation metric for explanation methods

Authors: Laura Rieger, Lars Kai Hansen

Abstract: The adoption of machine learning in health care hinges on the transparency of the used algorithms, necessitating the need for explanation methods. However, despite a growing literature on explaining neural networks, no consensus has been reached on how to evaluate those explanation methods. We propose IROF, a new approach to evaluating explanation methods that circumvents the need for manual evalu… ▽ More The adoption of machine learning in health care hinges on the transparency of the used algorithms, necessitating the need for explanation methods. However, despite a growing literature on explaining neural networks, no consensus has been reached on how to evaluate those explanation methods. We propose IROF, a new approach to evaluating explanation methods that circumvents the need for manual evaluation. Compared to other recent work, our approach requires several orders of magnitude less computational resources and no human input, making it accessible to lower resource groups and robust to human bias. △ Less

Submitted 9 March, 2020; originally announced March 2020.

arXiv:2002.05038 [pdf, other]

Robustness analytics to data heterogeneity in edge computing

Authors: Jia Qian, Lars Kai Hansen, Xenofon Fafoutis, Prayag Tiwari, Hari Mohan Pandey

Abstract: Federated Learning is a framework that jointly trains a model \textit{with} complete knowledge on a remotely placed centralized server, but \textit{without} the requirement of accessing the data stored in distributed machines. Some work assumes that the data generated from edge devices are identically and independently sampled from a common population distribution. However, such ideal sampling may… ▽ More Federated Learning is a framework that jointly trains a model \textit{with} complete knowledge on a remotely placed centralized server, but \textit{without} the requirement of accessing the data stored in distributed machines. Some work assumes that the data generated from edge devices are identically and independently sampled from a common population distribution. However, such ideal sampling may not be realistic in many contexts. Also, models based on intrinsic agency, such as active sampling schemes, may lead to highly biased sampling. So an imminent question is how robust Federated Learning is to biased sampling? In this work\footnote{\url{https://github.com/jiaqian/robustness_of_FL}}, we experimentally investigate two such scenarios. First, we study a centralized classifier aggregated from a collection of local classifiers trained with data having categorical heterogeneity. Second, we study a classifier aggregated from a collection of local classifiers trained by data through active sampling at the edge. We present evidence in both scenarios that Federated Learning is robust to data heterogeneity when local training iterations and communication frequency are appropriately chosen. △ Less

Submitted 24 October, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

arXiv:1910.02807 [pdf, ps, other]

doi 10.5220/0009169709180925

The Complexity of Social Media Response: Statistical Evidence For One-Dimensional Engagement Signal in Twitter

Authors: Damian Konrad Kowalczyk, Lars Kai Hansen

Abstract: Many years after online social networks exceeded our collective attention, social influence is still built on attention capital. Quality is not a prerequisite for viral spreading, yet large diffusion cascades remain the hallmark of a social influencer. Consequently, our exposure to low-quality content and questionable influence is expected to increase. Since the conception of influence maximizatio… ▽ More Many years after online social networks exceeded our collective attention, social influence is still built on attention capital. Quality is not a prerequisite for viral spreading, yet large diffusion cascades remain the hallmark of a social influencer. Consequently, our exposure to low-quality content and questionable influence is expected to increase. Since the conception of influence maximization frameworks, multiple content performance metrics became available, albeit raising the complexity of influence analysis. In this paper, we examine and consolidate a diverse set of content engagement metrics. The correlations discovered lead us to propose a new, more holistic, one-dimensional engagement signal. We then show it is more predictable than any individual influence predictors previously investigated. Our proposed model achieves strong engagement ranking performance and is the first to explain half of the variance with features available early. We share the detailed numerical workflow to compute the new compound engagement signal. The model is immediately applicable to social media monitoring, influencer identification, campaign engagement forecasting, and curating user feeds. △ Less

Submitted 15 February, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: Presented at ICAART 2020

Report number: ICAART20-RP-238

Journal ref: Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART (2020) 918-925

arXiv:1906.10718 [pdf, other]

Active Learning Solution on Distributed Edge Computing

Authors: Jia Qian, Sayantan Sengupta, Lars Kai Hansen

Abstract: Industry 4.0 becomes possible through the convergence between Operational and Information Technologies. All the requirements to realize the convergence is integrated on the Fog Platform. Fog Platform is introduced between the cloud server and edge devices when the unprecedented generation of data causes the burden of the cloud server, leading the ineligible latency. In this new paradigm, we divide… ▽ More Industry 4.0 becomes possible through the convergence between Operational and Information Technologies. All the requirements to realize the convergence is integrated on the Fog Platform. Fog Platform is introduced between the cloud server and edge devices when the unprecedented generation of data causes the burden of the cloud server, leading the ineligible latency. In this new paradigm, we divide the computation tasks and push it down to edge devices. Furthermore, local computing (at edge side) may improve privacy and trust. To address these problems, we present a new method, in which we decompose the data aggregation and processing, by dividing them between edge devices and fog nodes intelligently. We apply active learning on edge devices; and federated learning on the fog node which significantly reduces the data samples to train the model as well as the communication cost. To show the effectiveness of the proposed method, we implemented and evaluated its performance for an image classification task. In addition, we consider two settings: massively distributed and non-massively distributed and offer the corresponding solutions. △ Less

Submitted 25 June, 2019; originally announced June 2019.

arXiv:1905.12403 [pdf, other]

Probabilistic Decoupling of Labels in Classification

Authors: Jeppe Nørregaard, Lars Kai Hansen

Abstract: We investigate probabilistic decoupling of labels supplied for training, from the underlying classes for prediction. Decoupling enables an inference scheme general enough to implement many classification problems, including supervised, semi-supervised, positive-unlabelled, noisy-label and suggests a general solution to the multi-positive-unlabelled learning problem. We test the method on the Fashi… ▽ More We investigate probabilistic decoupling of labels supplied for training, from the underlying classes for prediction. Decoupling enables an inference scheme general enough to implement many classification problems, including supervised, semi-supervised, positive-unlabelled, noisy-label and suggests a general solution to the multi-positive-unlabelled learning problem. We test the method on the Fashion MNIST and 20 News Groups datasets for performance benchmarks, where we simulate noise, partial labelling etc. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: 8 pages + 10 pages of supplementary material. NeurIPS preprint

arXiv:1905.00709 [pdf, ps, other]

Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!

Authors: Niels Bruun Ipsen, Lars Kai Hansen

Abstract: How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal com… ▽ More How does missing data affect our ability to learn signal structures? It has been shown that learning signal structure in terms of principal components is dependent on the ratio of sample size and dimensionality and that a critical number of observations is needed before learning starts (Biehl and Mietzner, 1993). Here we generalize this analysis to include missing data. Probabilistic principal component analysis is regularly used for estimating signal structures in datasets with missing data. Our analytic result suggests that the effect of missing data is to effectively reduce signal-to-noise ratio rather than - as generally believed - to reduce sample size. The theory predicts a phase transition in the learning curves and this is indeed found both in simulation data and in real datasets. △ Less

Submitted 2 May, 2019; originally announced May 2019.

Comments: Accepted to ICML 2019. This version is the submitted paper

Journal ref: International Conference on Machine Learning. 2019. pp. 2951-2960

arXiv:1903.00519 [pdf, other]

Aggregating explanation methods for stable and robust explainability

Authors: Laura Rieger, Lars Kai Hansen

Abstract: Despite a growing literature on explaining neural networks, no consensus has been reached on how to explain a neural network decision or how to evaluate an explanation. Our contributions in this paper are twofold. First, we investigate schemes to combine explanation methods and reduce model uncertainty to obtain a single aggregated explanation. We provide evidence that the aggregation is better at… ▽ More Despite a growing literature on explaining neural networks, no consensus has been reached on how to explain a neural network decision or how to evaluate an explanation. Our contributions in this paper are twofold. First, we investigate schemes to combine explanation methods and reduce model uncertainty to obtain a single aggregated explanation. We provide evidence that the aggregation is better at identifying important features, than on individual methods. Adversarial attacks on explanations is a recent active research topic. As our second contribution, we present evidence that aggregate explanations are much more robust to attacks than individual explanation methods. △ Less

Submitted 20 March, 2020; v1 submitted 1 March, 2019; originally announced March 2019.

arXiv:1802.02343 [pdf, ps, other]

doi 10.1162/NECO_a_00774

Multi-View Bayesian Correlated Component Analysis

Authors: Simon Kamronn, Andreas Trier Poulsen, Lars Kai Hansen

Abstract: Correlated component analysis as proposed by Dmochowski et al. (2012) is a tool for investigating brain process similarity in the responses to multiple views of a given stimulus. Correlated components are identified under the assumption that the involved spatial networks are identical. Here we propose a hierarchical probabilistic model that can infer the level of universality in such multi-view da… ▽ More Correlated component analysis as proposed by Dmochowski et al. (2012) is a tool for investigating brain process similarity in the responses to multiple views of a given stimulus. Correlated components are identified under the assumption that the involved spatial networks are identical. Here we propose a hierarchical probabilistic model that can infer the level of universality in such multi-view data, from completely unrelated representations, corresponding to canonical correlation analysis, to identical representations as in correlated component analysis. This new model, which we denote Bayesian correlated component analysis, evaluates favourably against three relevant algorithms in simulated data. A well-established benchmark EEG dataset is used to further validate the new model and infer the variability of spatial representations across multiple subjects. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Journal ref: Neural Computation, 27, (10):220730, 2015

arXiv:1710.11379 [pdf, other]

Latent Space Oddity: on the Curvature of Deep Generative Models

Authors: Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg

Abstract: Deep generative models provide a systematic way to learn nonlinear data distributions, through a set of latent variables and a nonlinear "generator" function that maps latent points into the input space. The nonlinearity of the generator imply that the latent space gives a distorted view of the input space. Under mild conditions, we show that this distortion can be characterized by a stochastic Ri… ▽ More Deep generative models provide a systematic way to learn nonlinear data distributions, through a set of latent variables and a nonlinear "generator" function that maps latent points into the input space. The nonlinearity of the generator imply that the latent space gives a distorted view of the input space. Under mild conditions, we show that this distortion can be characterized by a stochastic Riemannian metric, and demonstrate that distances and interpolants are significantly improved under this metric. This in turn improves probability distributions, sampling algorithms and clustering in the latent space. Our geometric analysis further reveals that current generators provide poor variance estimates and we propose a new generator architecture with vastly improved variance estimates. Results are demonstrated on convolutional and fully connected variational autoencoders, but the formalism easily generalize to other deep generative models. △ Less

Submitted 13 December, 2021; v1 submitted 31 October, 2017; originally announced October 2017.

Comments: Published at International Conference on Learning Representations (ICLR) 2018

arXiv:1710.00633 [pdf, other]

Deep Convolutional Neural Networks for Interpretable Analysis of EEG Sleep Stage Scoring

Authors: Albert Vilamala, Kristoffer H. Madsen, Lars K. Hansen

Abstract: Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out o… ▽ More Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out over the last years. In this work, we resort to multitaper spectral analysis to create visually interpretable images of sleep patterns from EEG signals as inputs to a deep convolutional network trained to solve visual recognition tasks. As a working example of transfer learning, a system able to accurately classify sleep stages in new unseen patients is presented. Evaluations in a widely-used publicly available dataset favourably compare to state-of-the-art results, while providing a framework for visual interpretation of outcomes. △ Less

Submitted 2 October, 2017; originally announced October 2017.

Comments: 8 pages, 1 figure, 2 tables, IEEE 2017 International Workshop on Machine Learning for Signal Processing

arXiv:1710.00629 [pdf, other]

doi 10.1109/PRNI.2017.7981499

Adaptive Smoothing in fMRI Data Processing Neural Networks

Authors: Albert Vilamala, Kristoffer Hougaard Madsen, Lars Kai Hansen

Abstract: Functional Magnetic Resonance Imaging (fMRI) relies on multi-step data processing pipelines to accurately determine brain activity; among them, the crucial step of spatial smoothing. These pipelines are commonly suboptimal, given the local optimisation strategy they use, treating each step in isolation. With the advent of new tools for deep learning, recent work has proposed to turn these pipeline… ▽ More Functional Magnetic Resonance Imaging (fMRI) relies on multi-step data processing pipelines to accurately determine brain activity; among them, the crucial step of spatial smoothing. These pipelines are commonly suboptimal, given the local optimisation strategy they use, treating each step in isolation. With the advent of new tools for deep learning, recent work has proposed to turn these pipelines into end-to-end learning networks. This change of paradigm offers new avenues to improvement as it allows for a global optimisation. The current work aims at benefitting from this paradigm shift by defining a smoothing step as a layer in these networks able to adaptively modulate the degree of smoothing required by each brain volume to better accomplish a given data analysis task. The viability is evaluated on real fMRI data where subjects did alternate between left and right finger tap** tasks. △ Less

Submitted 2 October, 2017; originally announced October 2017.

Comments: 4 pages, 3 figures, 1 table, IEEE 2017 International Workshop on Pattern Recognition in Neuroimaging (PRNI)

arXiv:1704.05748 [pdf, other]

EEG source imaging assists decoding in a face recognition task

Authors: Rasmus S. Andersen, Anders U. Eliasen, Nicolai Pedersen, Michael Riis Andersen, Sofie Therese Hansen, Lars Kai Hansen

Abstract: EEG based brain state decoding has numerous applications. State of the art decoding is based on processing of the multivariate sensor space signal, however evidence is mounting that EEG source reconstruction can assist decoding. EEG source imaging leads to high-dimensional representations and rather strong a priori information must be invoked. Recent work by Edelman et al. (2016) has demonstrated… ▽ More EEG based brain state decoding has numerous applications. State of the art decoding is based on processing of the multivariate sensor space signal, however evidence is mounting that EEG source reconstruction can assist decoding. EEG source imaging leads to high-dimensional representations and rather strong a priori information must be invoked. Recent work by Edelman et al. (2016) has demonstrated that introduction of a spatially focal source space representation can improve decoding of motor imagery. In this work we explore the generality of Edelman et al. hypothesis by considering decoding of face recognition. This task concerns the differentiation of brain responses to images of faces and scrambled faces and poses a rather difficult decoding problem at the single trial level. We implement the pipeline using spatially focused features and show that this approach is challenged and source imaging does not lead to an improved decoding. We design a distributed pipeline in which the classifier has access to brain wide features which in turn does lead to a 15% reduction in the error rate using source space features. Hence, our work presents supporting evidence for the hypothesis that source imaging improves decoding. △ Less

Submitted 17 April, 2017; originally announced April 2017.

arXiv:1610.04079 [pdf, other]

Towards end-to-end optimisation of functional image analysis pipelines

Authors: Albert Vilamala, Kristoffer Hougaard Madsen, Lars Kai Hansen

Abstract: The study of neurocognitive tasks requiring accurate localisation of activity often rely on functional Magnetic Resonance Imaging, a widely adopted technique that makes use of a pipeline of data processing modules, each involving a variety of parameters. These parameters are frequently set according to the local goal of each specific module, not accounting for the rest of the pipeline. Given recen… ▽ More The study of neurocognitive tasks requiring accurate localisation of activity often rely on functional Magnetic Resonance Imaging, a widely adopted technique that makes use of a pipeline of data processing modules, each involving a variety of parameters. These parameters are frequently set according to the local goal of each specific module, not accounting for the rest of the pipeline. Given recent success of neural network research in many different domains, we propose to convert the whole data pipeline into a deep neural network, where the parameters involved are jointly optimised by the network to best serve a common global goal. As a proof of concept, we develop a module able to adaptively apply the most suitable spatial smoothing to every brain volume for each specific neuroimaging task, and we validate its results in a standard brain decoding experiment. △ Less

Submitted 13 October, 2016; originally announced October 2016.

Comments: 7 pages, 2 figures

arXiv:1606.02518 [pdf, other]

A Locally Adaptive Normal Distribution

Authors: Georgios Arvanitidis, Lars Kai Hansen, Søren Hauberg

Abstract: The multivariate normal density is a monotonic function of the distance to the mean, and its ellipsoidal shape is due to the underlying Euclidean metric. We suggest to replace this metric with a locally adaptive, smoothly changing (Riemannian) metric that favors regions of high local density. The resulting locally adaptive normal distribution (LAND) is a generalization of the normal distribution t… ▽ More The multivariate normal density is a monotonic function of the distance to the mean, and its ellipsoidal shape is due to the underlying Euclidean metric. We suggest to replace this metric with a locally adaptive, smoothly changing (Riemannian) metric that favors regions of high local density. The resulting locally adaptive normal distribution (LAND) is a generalization of the normal distribution to the "manifold" setting, where data is assumed to lie near a potentially low-dimensional manifold embedded in $\mathbb{R}^D$. The LAND is parametric, depending only on a mean and a covariance, and is the maximum entropy distribution under the given metric. The underlying metric is, however, non-parametric. We develop a maximum likelihood algorithm to infer the distribution parameters that relies on a combination of gradient descent and Monte Carlo integration. We further extend the LAND to mixture models, and provide the corresponding EM algorithm. We demonstrate the efficiency of the LAND to fit non-trivial probability distributions over both synthetic data, and EEG measurements of human sleep. △ Less

Submitted 23 September, 2016; v1 submitted 8 June, 2016; originally announced June 2016.

arXiv:1604.03019 [pdf, other]

EEG in the classroom: Synchronised neural recordings during video presentation

Authors: Andreas Trier Poulsen, Simon Kamronn, Jacek Dmochowski, Lucas C. Parra, Lars Kai Hansen

Abstract: We performed simultaneous recordings of electroencephalography (EEG) from multiple students in a classroom, and measured the inter-subject correlation (ISC) of activity evoked by a common video stimulus. The neural reliability, as quantified by ISC, has been linked to engagement and attentional modulation in earlier studies that used high-grade equipment in laboratory settings. Here we reproduce m… ▽ More We performed simultaneous recordings of electroencephalography (EEG) from multiple students in a classroom, and measured the inter-subject correlation (ISC) of activity evoked by a common video stimulus. The neural reliability, as quantified by ISC, has been linked to engagement and attentional modulation in earlier studies that used high-grade equipment in laboratory settings. Here we reproduce many of the results from these studies using portable low-cost equipment, focusing on the robustness of using ISC for subjects experiencing naturalistic stimuli. The present data shows that stimulus-evoked neural responses, known to be modulated by attention, can be tracked in for groups of students with synchronized EEG acquisition. This is a step towards real-time inference of engagement in the classroom. △ Less

Submitted 27 December, 2016; v1 submitted 11 April, 2016; originally announced April 2016.

Comments: 14 pages, 5 figures, 3 tables. Preprint version. Revision of original preprint. Supplementary materials added as ancillary file

arXiv:1510.02795 [pdf, other]

Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation

Authors: Søren Hauberg, Oren Freifeld, Anders Boesen Lindbo Larsen, John W. Fisher III, Lars Kai Hansen

Abstract: Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g.~new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature en… ▽ More Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g.~new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature engineering. With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis. Particularly, we align image pairs within each class under the assumption that the spatial transformation between images belongs to a large class of diffeomorphisms. We then learn a class-specific probabilistic generative models of the transformations in a Riemannian submanifold of the Lie group of diffeomorphisms. We demonstrate significant performance improvements in training deep neural nets over manually-specified augmentation schemes. Our code and augmented datasets are available online. △ Less

Submitted 30 June, 2016; v1 submitted 9 October, 2015; originally announced October 2015.

Journal ref: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 342-350, 2016

arXiv:1509.04752 [pdf, other]

Bayesian inference for spatio-temporal spike-and-slab priors

Authors: Michael Riis Andersen, Aki Vehtari, Ole Winther, Lars Kai Hansen

Abstract: In this work, we address the problem of solving a series of underdetermined linear inverse problems subject to a sparsity constraint. We generalize the spike-and-slab prior distribution to encode a priori correlation of the support of the solution in both space and time by imposing a transformed Gaussian process on the spike-and-slab probabilities. An expectation propagation (EP) algorithm for pos… ▽ More In this work, we address the problem of solving a series of underdetermined linear inverse problems subject to a sparsity constraint. We generalize the spike-and-slab prior distribution to encode a priori correlation of the support of the solution in both space and time by imposing a transformed Gaussian process on the spike-and-slab probabilities. An expectation propagation (EP) algorithm for posterior inference under the proposed model is derived. For large scale problems, the standard EP algorithm can be prohibitively slow. We therefore introduce three different approximation schemes to reduce the computational complexity. Finally, we demonstrate the proposed model using numerical experiments based on both synthetic and real data sets. △ Less

Submitted 1 December, 2017; v1 submitted 15 September, 2015; originally announced September 2015.

Comments: 58 pages, 17 figures

Journal ref: Journal of Machine Learning Research, 18(139):1-58, 2017

arXiv:1508.04556 [pdf, ps, other]

Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

Authors: Michael Riis Andersen, Ole Winther, Lars Kai Hansen

Abstract: We are interested in solving the multiple measurement vector (MMV) problem for instances, where the underlying sparsity pattern exhibit spatio-temporal structure motivated by the electroencephalogram (EEG) source localization problem. We propose a probabilistic model that takes this structure into account by generalizing the structured spike and slab prior and the associated Expectation Propagatio… ▽ More We are interested in solving the multiple measurement vector (MMV) problem for instances, where the underlying sparsity pattern exhibit spatio-temporal structure motivated by the electroencephalogram (EEG) source localization problem. We propose a probabilistic model that takes this structure into account by generalizing the structured spike and slab prior and the associated Expectation Propagation inference scheme. Based on numerical experiments, we demonstrate the viability of the model and the approximate inference scheme. △ Less

Submitted 19 August, 2015; originally announced August 2015.

Comments: 6 pages, 6 figures, accepted for presentation at SPARS 2015

arXiv:1405.6886 [pdf, other]

A Topic Model Approach to Multi-Modal Similarity

Authors: Rasmus Troelsgård, Bjørn Sand Jensen, Lars Kai Hansen

Abstract: Calculating similarities between objects defined by many heterogeneous data modalities is an important challenge in many multimedia applications. We use a multi-modal topic model as a basis for defining such a similarity between objects. We propose to compare the resulting similarities from different model realizations using the non-parametric Mantel test. The approach is evaluated on a music data… ▽ More Calculating similarities between objects defined by many heterogeneous data modalities is an important challenge in many multimedia applications. We use a multi-modal topic model as a basis for defining such a similarity between objects. We propose to compare the resulting similarities from different model realizations using the non-parametric Mantel test. The approach is evaluated on a music dataset. △ Less

Submitted 27 May, 2014; originally announced May 2014.

Comments: topic modelling workshop at NIPS 2013

arXiv:1403.2745 [pdf, other]

Privacy for Personal Neuroinformatics

Authors: Arkadiusz Stopczynski, Dazza Greenwood, Lars Kai Hansen, Alex Pentland

Abstract: Human brain activity collected in the form of Electroencephalography (EEG), even with low number of sensors, is an extremely rich signal. Traces collected from multiple channels and with high sampling rates capture many important aspects of participants' brain activity and can be used as a unique personal identifier. The motivation for sharing EEG signals is significant, as a mean to understand th… ▽ More Human brain activity collected in the form of Electroencephalography (EEG), even with low number of sensors, is an extremely rich signal. Traces collected from multiple channels and with high sampling rates capture many important aspects of participants' brain activity and can be used as a unique personal identifier. The motivation for sharing EEG signals is significant, as a mean to understand the relation between brain activity and well-being, or for communication with medical services. As the equipment for such data collection becomes more available and widely used, the opportunities for using the data are growing; at the same time however inherent privacy risks are mounting. The same raw EEG signal can be used for example to diagnose mental diseases, find traces of epilepsy, and decode personality traits. The current practice of the informed consent of the participants for the use of the data either prevents reuse of the raw signal or does not truly respect participants' right to privacy by reusing the same raw data for purposes much different than originally consented to. Here we propose an integration of a personal neuroinformatics system, Smartphone Brain Scanner, with a general privacy framework openPDS. We show how raw high-dimensionality data can be collected on a mobile device, uploaded to a server, and subsequently operated on and accessed by applications or researchers, without disclosing the raw signal. Those extracted features of the raw signal, called answers, are of significantly lower-dimensionality, and provide the full utility of the data in given context, without the risk of disclosing sensitive raw signal. Such architecture significantly mitigates a very serious privacy risk related to raw EEG recordings floating around and being used and reused for various purposes. △ Less

Submitted 11 March, 2014; originally announced March 2014.

arXiv:1311.6976 [pdf, ps, other]

Dimensionality reduction for click-through rate prediction: Dense versus sparse representation

Authors: Bjarne Ørum Fruergaard, Toke Jansen Hansen, Lars Kai Hansen

Abstract: In online advertising, display ads are increasingly being placed based on real-time auctions where the advertiser who wins gets to serve the ad. This is called real-time bidding (RTB). In RTB, auctions have very tight time constraints on the order of 100ms. Therefore mechanisms for bidding intelligently such as clickthrough rate prediction need to be sufficiently fast. In this work, we propose to… ▽ More In online advertising, display ads are increasingly being placed based on real-time auctions where the advertiser who wins gets to serve the ad. This is called real-time bidding (RTB). In RTB, auctions have very tight time constraints on the order of 100ms. Therefore mechanisms for bidding intelligently such as clickthrough rate prediction need to be sufficiently fast. In this work, we propose to use dimensionality reduction of the user-website interaction graph in order to produce simplified features of users and websites that can be used as predictors of clickthrough rate. We demonstrate that the Infinite Relational Model (IRM) as a dimensionality reduction offers comparable predictive performance to conventional dimensionality reduction schemes, while achieving the most economical usage of features and fastest computations at run-time. For applications such as real-time bidding, where fast database I/O and few computations are key to success, we thus recommend using IRM based features as predictors to exploit the recommender effects from bipartite graphs. △ Less

Submitted 13 May, 2014; v1 submitted 27 November, 2013; originally announced November 2013.

Comments: Presented at the Probabilistic Models for Big Data workshop at NIPS 2013

arXiv:1310.5089 [pdf, other]

doi 10.1109/MSP.2013.2250591

Kernel Multivariate Analysis Framework for Supervised Subspace Learning: A Tutorial on Linear and Kernel Multivariate Methods

Authors: Jerónimo Arenas-García, Kaare Brandt Petersen, Gustavo Camps-Valls, Lars Kai Hansen

Abstract: Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature col… ▽ More Feature extraction and dimensionality reduction are important tasks in many fields of science dealing with signal processing and analysis. The relevance of these techniques is increasing as current sensory devices are developed with ever higher resolution, and problems involving multimodal data sources become more common. A plethora of feature extraction methods are available in the literature collectively grouped under the field of Multivariate Analysis (MVA). This paper provides a uniform treatment of several methods: Principal Component Analysis (PCA), Partial Least Squares (PLS), Canonical Correlation Analysis (CCA) and Orthonormalized PLS (OPLS), as well as their non-linear extensions derived by means of the theory of reproducing kernel Hilbert spaces. We also review their connections to other methods for classification and statistical dependence estimation, and introduce some recent developments to deal with the extreme cases of large-scale and low-sized problems. To illustrate the wide applicability of these methods in both classification and regression problems, we analyze their performance in a benchmark of publicly available data sets, and pay special attention to specific real applications involving audio processing for music genre prediction and hyperspectral satellite images for Earth and climate monitoring. △ Less

Submitted 18 October, 2013; originally announced October 2013.

Journal ref: IEEE Signal Processing Magazine, 30(4), 16-29, 2013

arXiv:1304.0357 [pdf, other]

doi 10.1371/journal.pone.0086733

The Smartphone Brain Scanner: A Mobile Real-time Neuroimaging System

Authors: Arkadiusz Stopczynski, Carsten Stahlhut, Jakob Eg Larsen, Michael Kai Petersen, Lars Kai Hansen

Abstract: Combining low cost wireless EEG sensors with smartphones offers novel opportunities for mobile brain imaging in an everyday context. We present a framework for building multi-platform, portable EEG applications with real-time 3D source reconstruction. The system - Smartphone Brain Scanner - combines an off-the-shelf neuroheadset or EEG cap with a smartphone or tablet, and as such represents the fi… ▽ More Combining low cost wireless EEG sensors with smartphones offers novel opportunities for mobile brain imaging in an everyday context. We present a framework for building multi-platform, portable EEG applications with real-time 3D source reconstruction. The system - Smartphone Brain Scanner - combines an off-the-shelf neuroheadset or EEG cap with a smartphone or tablet, and as such represents the first fully mobile system for real-time 3D EEG imaging. We discuss the benefits and challenges of a fully portable system, including technical limitations as well as real-time reconstruction of 3D images of brain activity. We present examples of the brain activity captured in a simple experiment involving imagined finger tap**, showing that the acquired signal in a relevant brain region is similar to that obtained with standard EEG lab equipment. Although the quality of the signal in a mobile solution using a off-the-shelf consumer neuroheadset is lower compared to that obtained using high density standard EEG equipment, we propose that mobile application development may offset the disadvantages and provide completely new opportunities for neuroimaging in natural settings. △ Less

Submitted 1 April, 2013; originally announced April 2013.

arXiv:1303.3229 [pdf, other]

doi 10.1016/j.ijmedinf.2013.01.005

FindZebra: A search engine for rare diseases

Authors: Radu Dragusin, Paula Petcu, Christina Lioma, Birger Larsen, Henrik L. Jørgensen, Ingemar J. Cox, Lars Kai Hansen, Peter Ingwersen, Ole Winther

Abstract: Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) disea… ▽ More Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) diseases represent an especially challenging and thus interesting class to diagnose as each is rare, diverse in symptoms and usually has scattered resources associated with it. Methods: We use an evaluation approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, state-of-the-art evaluation measures, and curated information resources. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source search technology and uses curated freely available online medical information. Results: FindZebra outperforms Google Search in both default setup and customised to the resources used by FindZebra. We extend FindZebra with specialized functionalities exploiting medical ontological information and UMLS medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Conclusions: Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular web search engines. The proposed evaluation approach can be valuable for future development and benchmarking. The FindZebra search engine is available at http://www.findzebra.com/. △ Less

Submitted 13 March, 2013; originally announced March 2013.

Journal ref: International Journal of Medical Informatics, Available online 23 February 2013, ISSN 1386-5056

arXiv:1101.5097 [pdf, ps, other]

Infinite Multiple Membership Relational Modeling for Complex Networks

Authors: Morten Mørup, Mikkel N. Schmidt, Lars Kai Hansen

Abstract: Learning latent structure in complex networks has become an important problem fueled by many types of networked data originating from practically all fields of science. In this paper, we propose a new non-parametric Bayesian multiple-membership latent feature model for networks. Contrary to existing multiple-membership models that scale quadratically in the number of vertices the proposed model sc… ▽ More Learning latent structure in complex networks has become an important problem fueled by many types of networked data originating from practically all fields of science. In this paper, we propose a new non-parametric Bayesian multiple-membership latent feature model for networks. Contrary to existing multiple-membership models that scale quadratically in the number of vertices the proposed model scales linearly in the number of links admitting multiple-membership analysis in large scale networks. We demonstrate a connection between the single membership relational model and multiple membership models and show on "real" size benchmark network data that accounting for multiple memberships improves the learning of latent structure as measured by link prediction while explicitly accounting for multiple membership result in a more compact representation of the latent structure of networks. △ Less

Submitted 26 January, 2011; originally announced January 2011.

Comments: 8 pages, 4 figures

arXiv:1101.0510 [pdf, ps, other]

Good Friends, Bad News - Affect and Virality in Twitter

Authors: Lars Kai Hansen, Adam Arvidsson, Finn Årup Nielsen, Elanor Colleoni, Michael Etter

Abstract: The link between affect, defined as the capacity for sentimental arousal on the part of a message, and virality, defined as the probability that it be sent along, is of significant theoretical and practical importance, e.g. for viral marketing. A quantitative study of emailing of articles from the NY Times finds a strong link between positive affect and virality, and, based on psychological theori… ▽ More The link between affect, defined as the capacity for sentimental arousal on the part of a message, and virality, defined as the probability that it be sent along, is of significant theoretical and practical importance, e.g. for viral marketing. A quantitative study of emailing of articles from the NY Times finds a strong link between positive affect and virality, and, based on psychological theories it is concluded that this relation is universally valid. The conclusion appears to be in contrast with classic theory of diffusion in news media emphasizing negative affect as promoting propagation. In this paper we explore the apparent paradox in a quantitative analysis of information diffusion on Twitter. Twitter is interesting in this context as it has been shown to present both the characteristics social and news media. The basic measure of virality in Twitter is the probability of retweet. Twitter is different from email in that retweeting does not depend on pre-existing social relations, but often occur among strangers, thus in this respect Twitter may be more similar to traditional news media. We therefore hypothesize that negative news content is more likely to be retweeted, while for non-news tweets positive sentiments support virality. To test the hypothesis we analyze three corpora: A complete sample of tweets about the COP15 climate summit, a random sample of tweets, and a general text corpus including news. The latter allows us to train a classifier that can distinguish tweets that carry news and non-news information. We present evidence that negative sentiment enhances virality in the news segment, but not in the non-news segment. We conclude that the relation between affect and virality is more complex than expected based on the findings of Berger and Milkman (2010), in short 'if you want to be cited: Sweet talk your friends or serve bad news to the public'. △ Less

Submitted 3 January, 2011; originally announced January 2011.

Comments: 14 pages, 1 table. Submitted to The 2011 International Workshop on Social Computing, Network, and Services (SocialComNet 2011)

MSC Class: 1D30 ACM Class: H.4.3; J.4

arXiv:1008.1398 [pdf, ps, other]

Semi-Supervised Kernel PCA

Authors: Christian Walder, Ricardo Henao, Morten Mørup, Lars Kai Hansen

Abstract: We present three generalisations of Kernel Principal Components Analysis (KPCA) which incorporate knowledge of the class labels of a subset of the data points. The first, MV-KPCA, penalises within class variances similar to Fisher discriminant analysis. The second, LSKPCA is a hybrid of least squares regression and kernel PCA. The final LR-KPCA is an iteratively reweighted version of the previous… ▽ More We present three generalisations of Kernel Principal Components Analysis (KPCA) which incorporate knowledge of the class labels of a subset of the data points. The first, MV-KPCA, penalises within class variances similar to Fisher discriminant analysis. The second, LSKPCA is a hybrid of least squares regression and kernel PCA. The final LR-KPCA is an iteratively reweighted version of the previous which achieves a sigmoid loss function on the labeled points. We provide a theoretical risk bound as well as illustrative experiments on real and toy data sets. △ Less

Submitted 8 August, 2010; originally announced August 2010.

arXiv:0903.0687 [pdf, ps, other]

doi 10.1007/978-3-319-54241-6_1

Second-Order Assortative Mixing in Social Networks

Authors: Shi Zhou, Ingemar J. Cox, Lars K. Hansen

Abstract: In a social network, the number of links of a node, or node degree, is often assumed as a proxy for the node's importance or prominence within the network. It is known that social networks exhibit the (first-order) assortative mixing, i.e. if two nodes are connected, they tend to have similar node degrees, suggesting that people tend to mix with those of comparable prominence. In this paper, we re… ▽ More In a social network, the number of links of a node, or node degree, is often assumed as a proxy for the node's importance or prominence within the network. It is known that social networks exhibit the (first-order) assortative mixing, i.e. if two nodes are connected, they tend to have similar node degrees, suggesting that people tend to mix with those of comparable prominence. In this paper, we report the second-order assortative mixing in social networks. If two nodes are connected, we measure the degree correlation between their most prominent neighbours, rather than between the two nodes themselves. We observe very strong second-order assortative mixing in social networks, often significantly stronger than the first-order assortative mixing. This suggests that if two people interact in a social network, then the importance of the most prominent person each knows is very likely to be the same. This is also true if we measure the average prominence of neighbours of the two people. This property is weaker or negative in non-social networks. We investigate a number of possible explanations for this property. However, none of them was found to provide an adequate explanation. We therefore conclude that second-order assortative mixing is a new property of social networks. △ Less

Submitted 23 October, 2017; v1 submitted 3 March, 2009; originally announced March 2009.

Comments: Cite as: Zhou S., Cox I.J., Hansen L.K. (2017) Second-Order Assortative Mixing in Social Networks. In: Goncalves B., Menezes R., Sinatra R., Zlatic V. (eds) Complex Networks VIII. CompleNet 2017. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-54241-6_1

arXiv:0710.4867 [pdf, ps, other]

doi 10.1103/PhysRevE.78.016108

Bi-clique Communities

Authors: Sune Lehmann, Martin Schwartz, Lars Kai Hansen

Abstract: We present a novel method for detecting communities in bipartite networks. Based on an extension of the $k$-clique community detection algorithm, we demonstrate how modular structure in bipartite networks presents itself as overlap** bicliques. If bipartite information is available, the bi-clique community detection algorithm retains all of the advantages of the $k$-clique algorithm, but avoid… ▽ More We present a novel method for detecting communities in bipartite networks. Based on an extension of the $k$-clique community detection algorithm, we demonstrate how modular structure in bipartite networks presents itself as overlap** bicliques. If bipartite information is available, the bi-clique community detection algorithm retains all of the advantages of the $k$-clique algorithm, but avoids discarding important structural information when performing a one-mode projection of the network. Further, the bi-clique community detection algorithm provides a new level of flexibility by incorporating independent clique thresholds for each of the non-overlap** node sets in the bipartite network. △ Less

Submitted 7 July, 2008; v1 submitted 25 October, 2007; originally announced October 2007.

Comments: 10 pages, 6 figures

Journal ref: Phys. Rev. E, v78, p016108 (2008)

arXiv:physics/0701348 [pdf, ps, other]

doi 10.1140/epjb/e2007-00313-2

Deterministic Modularity Optimization

Authors: Sune Lehmann, Lars Kai Hansen

Abstract: We study community structure of networks. We have developed a scheme for maximizing the modularity Q based on mean field methods. Further, we have defined a simple family of random networks with community structure; we understand the behavior of these networks analytically. Using these networks, we show how the mean field methods display better performance than previously known deterministic met… ▽ More We study community structure of networks. We have developed a scheme for maximizing the modularity Q based on mean field methods. Further, we have defined a simple family of random networks with community structure; we understand the behavior of these networks analytically. Using these networks, we show how the mean field methods display better performance than previously known deterministic methods for optimization of Q. △ Less

Submitted 1 March, 2007; v1 submitted 31 January, 2007; originally announced January 2007.

Comments: 7 pages, 4 figures, minor changes

Showing 1–47 of 47 results for author: Hansen, L K