-
Particle Denoising Diffusion Sampler
Authors:
Angus Phillips,
Hai-Dang Dau,
Michael John Hutchinson,
Valentin De Bortoli,
George Deligiannidis,
Arnaud Doucet
Abstract:
Denoising diffusion models have become ubiquitous for generative modeling. The core idea is to transport the data distribution to a Gaussian by using a diffusion. Approximate samples from the data distribution are then obtained by estimating the time-reversal of this diffusion using score matching ideas. We follow here a similar strategy to sample from unnormalized probability densities and comput…
▽ More
Denoising diffusion models have become ubiquitous for generative modeling. The core idea is to transport the data distribution to a Gaussian by using a diffusion. Approximate samples from the data distribution are then obtained by estimating the time-reversal of this diffusion using score matching ideas. We follow here a similar strategy to sample from unnormalized probability densities and compute their normalizing constants. However, the time-reversed diffusion is here simulated by using an original iterative particle scheme relying on a novel score matching loss. Contrary to standard denoising diffusion models, the resulting Particle Denoising Diffusion Sampler (PDDS) provides asymptotically consistent estimates under mild assumptions. We demonstrate PDDS on multimodal and high dimensional sampling tasks.
△ Less
Submitted 15 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
SleepNet: Attention-Enhanced Robust Sleep Prediction using Dynamic Social Networks
Authors:
Maryam Khalid,
Elizabeth B. Klerman,
Andrew W. Mchill,
Andrew J. K. Phillips,
Akane Sano
Abstract:
Sleep behavior significantly impacts health and acts as an indicator of physical and mental well-being. Monitoring and predicting sleep behavior with ubiquitous sensors may therefore assist in both sleep management and tracking of related health conditions. While sleep behavior depends on, and is reflected in the physiology of a person, it is also impacted by external factors such as digital media…
▽ More
Sleep behavior significantly impacts health and acts as an indicator of physical and mental well-being. Monitoring and predicting sleep behavior with ubiquitous sensors may therefore assist in both sleep management and tracking of related health conditions. While sleep behavior depends on, and is reflected in the physiology of a person, it is also impacted by external factors such as digital media usage, social network contagion, and the surrounding weather. In this work, we propose SleepNet, a system that exploits social contagion in sleep behavior through graph networks and integrates it with physiological and phone data extracted from ubiquitous mobile and wearable devices for predicting next-day sleep labels about sleep duration. Our architecture overcomes the limitations of large-scale graphs containing connections irrelevant to sleep behavior by devising an attention mechanism. The extensive experimental evaluation highlights the improvement provided by incorporating social networks in the model. Additionally, we conduct robustness analysis to demonstrate the system's performance in real-life conditions. The outcomes affirm the stability of SleepNet against perturbations in input data. Further analyses emphasize the significance of network topology in prediction performance revealing that users with higher eigenvalue centrality are more vulnerable to data perturbations.
△ Less
Submitted 26 January, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Which algorithm to select in sports timetabling?
Authors:
David Van Bulck,
Dries Goossens,
Jan-Patrick Clarner,
Angelos Dimitsas,
George H. G. Fonseca,
Carlos Lamas-Fernandez,
Martin Mariusz Lester,
Jaap Pedersen,
Antony E. Phillips,
Roberto Maria Rosati
Abstract:
Any sports competition needs a timetable, specifying when and where teams meet each other. The recent International Timetabling Competition (ITC2021) on sports timetabling showed that, although it is possible to develop general algorithms, the performance of each algorithm varies considerably over the problem instances. This paper provides an instance space analysis for sports timetabling, resulti…
▽ More
Any sports competition needs a timetable, specifying when and where teams meet each other. The recent International Timetabling Competition (ITC2021) on sports timetabling showed that, although it is possible to develop general algorithms, the performance of each algorithm varies considerably over the problem instances. This paper provides an instance space analysis for sports timetabling, resulting in powerful insights into the strengths and weaknesses of eight state-of-the-art algorithms. Based on machine learning techniques, we propose an algorithm selection system that predicts which algorithm is likely to perform best when given the characteristics of a sports timetabling problem instance. Furthermore, we identify which characteristics are important in making that prediction, providing insights in the performance of the algorithms, and suggestions to further improve them. Finally, we assess the empirical hardness of the instances. Our results are based on large computational experiments involving about 50 years of CPU time on more than 500 newly generated problem instances.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Bayesian Decision Trees Inspired from Evolutionary Algorithms
Authors:
Efthyvoulos Drousiotis,
Alexander M. Phillips,
Paul G. Spirakis,
Simon Maskell
Abstract:
Bayesian Decision Trees (DTs) are generally considered a more advanced and accurate model than a regular Decision Tree (DT) because they can handle complex and uncertain data. Existing work on Bayesian DTs uses Markov Chain Monte Carlo (MCMC) with an accept-reject mechanism and sample using naive proposals to proceed to the next iteration, which can be slow because of the burn-in time needed. We c…
▽ More
Bayesian Decision Trees (DTs) are generally considered a more advanced and accurate model than a regular Decision Tree (DT) because they can handle complex and uncertain data. Existing work on Bayesian DTs uses Markov Chain Monte Carlo (MCMC) with an accept-reject mechanism and sample using naive proposals to proceed to the next iteration, which can be slow because of the burn-in time needed. We can reduce the burn-in period by proposing a more sophisticated way of sampling or by designing a different numerical Bayesian approach. In this paper, we propose a replacement of the MCMC with an inherently parallel algorithm, the Sequential Monte Carlo (SMC), and a more effective sampling strategy inspired by the Evolutionary Algorithms (EA). Experiments show that SMC combined with the EA can produce more accurate results compared to MCMC in 100 times fewer iterations.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Unlocking the potential of two-point cells for energy-efficient and resilient training of deep nets
Authors:
Ahsan Adeel,
Adewale Adetomi,
Khubaib Ahmed,
Amir Hussain,
Tughrul Arslan,
W. A. Phillips
Abstract:
Context-sensitive two-point layer 5 pyramidal cells (L5PCs) were discovered as long ago as 1999. However, the potential of this discovery to provide useful neural computation has yet to be demonstrated. Here we show for the first time how a transformative L5PCs-driven deep neural network (DNN), termed the multisensory cooperative computing (MCC) architecture, can effectively process large amounts…
▽ More
Context-sensitive two-point layer 5 pyramidal cells (L5PCs) were discovered as long ago as 1999. However, the potential of this discovery to provide useful neural computation has yet to be demonstrated. Here we show for the first time how a transformative L5PCs-driven deep neural network (DNN), termed the multisensory cooperative computing (MCC) architecture, can effectively process large amounts of heterogeneous real-world audio-visual (AV) data, using far less energy compared to best available 'point' neuron-driven DNNs. A novel highly-distributed parallel implementation on a Xilinx UltraScale+ MPSoC device estimates energy savings up to 245759 $ \times $ 50000 $μ$J (i.e., 62% less than the baseline model in a semi-supervised learning setup) where a single synapse consumes $8e^{-5}μ$J. In a supervised learning setup, the energy-saving can potentially reach up to 1250x less (per feedforward transmission) than the baseline model. The significantly reduced neural activity in MCC leads to inherently fast learning and resilience against sudden neural damage. This remarkable performance in pilot experiments demonstrates the embodied neuromorphic intelligence of our proposed cooperative L5PC that receives input from diverse neighbouring neurons as context to amplify the transmission of most salient and relevant information for onward transmission, from overwhelmingly large multimodal information utilised at the early stages of on-chip training. Our proposed approach opens new cross-disciplinary avenues for future on-chip DNN training implementations and posits a radical shift in current neuromorphic computing paradigms.
△ Less
Submitted 22 December, 2022; v1 submitted 24 October, 2022;
originally announced November 2022.
-
Spectral Diffusion Processes
Authors:
Angus Phillips,
Thomas Seror,
Michael Hutchinson,
Valentin De Bortoli,
Arnaud Doucet,
Emile Mathieu
Abstract:
Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reducti…
▽ More
Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reduction techniques we then sample from their stochastic component using finite dimensional SGM. We demonstrate our method's effectiveness for modelling various multimodal datasets.
△ Less
Submitted 28 November, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
A comparison of partial information decompositions using data from real and simulated layer 5b pyramidal cells
Authors:
Jim W. Kay,
Jan M. Schulz,
W. A. Phillips
Abstract:
Partial information decomposition allows the joint mutual information between an output and a set of inputs to be divided into components that are synergistic or shared or unique to each input. We consider five different decompositions and compare their results on data from layer 5b pyramidal cells in two different studies. The first study was of the amplification of somatic action potential outpu…
▽ More
Partial information decomposition allows the joint mutual information between an output and a set of inputs to be divided into components that are synergistic or shared or unique to each input. We consider five different decompositions and compare their results on data from layer 5b pyramidal cells in two different studies. The first study was of the amplification of somatic action potential output by apical dendritic input and its regulation by dendritic inhibition. We find that two of the decompositions produce much larger estimates of synergy and shared information than the others, as well as large levels of unique misinformation. When within-neuron differences in the components are examined, the five methods produce more similar results for all but the shared information component, for which two methods produce a different statistical conclusion from the others. There are some differences in the expression of unique information asymmetry among the methods. It is significantly larger, on average, under dendritic inhibition. Three of the methods support a previous conclusion that apical amplification is reduced by dendritic inhibition. The second study used a detailed compartmental model to produce action potentials for many combinations of the numbers of basal and apical synaptic inputs. Two analyses of decompositions are conducted on subsets of the data. In the first, the decompositions reveal a bifurcation in unique information asymmetry. For three of the methods this suggests that apical drive switches to basal drive as the strength of the basal input increases, while the other two show changing mixtures of information and misinformation. Decompositions produced using the second set of subsets show that all five decompositions provide support for properties of cooperative context-sensitivity - to varying extents.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Sex and Gender in the Computer Graphics Research Literature
Authors:
Ana Dodik,
Silvia Sellán,
Theodore Kim,
Amanda Phillips
Abstract:
We survey the treatment of sex and gender in the Computer Graphics research literature from an algorithmic fairness perspective. The established practices on the use of gender and sex in our community are scientifically incorrect and constitute a form of algorithmic bias with potential harmful effects. We propose ways of addressing these as technical limitations.
We survey the treatment of sex and gender in the Computer Graphics research literature from an algorithmic fairness perspective. The established practices on the use of gender and sex in our community are scientifically incorrect and constitute a form of algorithmic bias with potential harmful effects. We propose ways of addressing these as technical limitations.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Connected Components for Infinite Graph Streams: Theory and Practice
Authors:
Jonathan W. Berry,
Cynthia A Phillips,
Alexandra M. Porter
Abstract:
Motivated by the properties of unending real-world cybersecurity streams, we present a new graph streaming model: XStream. We maintain a streaming graph and its connected components at single-edge granularity. In cybersecurity graph applications, input streams typically consist of edge insertions; individual deletions are not explicit. Analysts maintain as much history as possible and will trigger…
▽ More
Motivated by the properties of unending real-world cybersecurity streams, we present a new graph streaming model: XStream. We maintain a streaming graph and its connected components at single-edge granularity. In cybersecurity graph applications, input streams typically consist of edge insertions; individual deletions are not explicit. Analysts maintain as much history as possible and will trigger customized bulk deletions when necessary Despite a variety of dynamic graph processing systems and some canonical literature on theoretical sliding-window graph streaming, XStream is the first model explicitly designed to accommodate this usage model. Users can provide Boolean predicates to define bulk deletions. Edge arrivals are expected to occur continuously and must always be handled. XStream is implemented via a ring of finite-memory processors. We give algorithms to maintain connected components on the input stream, answer queries about connectivity, and to perform bulk deletion. The system requires bandwidth for internal messages that is some constant factor greater than the stream arrival rate. We prove a relationship among four quantities: the proportion of query downtime allowed, the proportion of edges that survive an aging event, the proportion of duplicated edges, and the bandwidth expansion factor. In addition to presenting the theory behind XStream, we present computational results for a single-threaded prototype implementation. Stream ingestion rates are bounded by computer architecture. We determine this bound for XStream inter-process message-passing rates in Intel TBB applications on Intel Sky Lake processors: between one and five million graph edges per second. Our single-threaded prototype runs our full protocols through multiple aging events at between one half and one a million edges per second, and we give ideas for speeding this up by orders of magnitude.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Adaptive Transfer Learning: a simple but effective transfer learning
Authors:
Jung H Lee,
Henry J Kvinge,
Scott Howland,
Zachary New,
John Buckheit,
Lauren A. Phillips,
Elliott Skomski,
Jessica Hibler,
Courtney D. Corley,
Nathan O. Hodas
Abstract:
Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a…
▽ More
Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a line of studies showed that the same feature extractors can be used to train classifiers on multiple tasks. Furthermore, recent studies proposed multiple algorithms that can fine-tune teacher models' feature extractors to train student models more efficiently. We note that regardless of the fine-tuning of feature extractors, the classifiers of student models are trained with final outputs of feature extractors (i.e., the outputs of penultimate layers). However, a recent study suggested that feature maps in ResNets across layers could be functionally equivalent, raising the possibility that feature maps inside the feature extractors can also be used to train student models' classifiers. Inspired by this study, we tested if feature maps in the hidden layers of the teacher models can be used to improve the student models' accuracy (i.e., TL's efficiency). Specifically, we developed 'adaptive transfer learning (ATL)', which can choose an optimal set of feature maps for TL, and tested it in the few-shot learning setting. Our empirical evaluations suggest that ATL can help DL models learn more efficiently, especially when available examples are limited.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
Gaps, Ambiguity, and Establishing Complexity-Class Containments via Iterative Constant-Setting
Authors:
Lane A. Hemaspaandra,
Mandar Juvekar,
Arian Nadjimzadah,
Patrick A. Phillips
Abstract:
Cai and Hemachandra used iterative constant-setting to prove that Few $\subseteq$ $\oplus$P (and thus that FewP $\subseteq$ $\oplus$P). In this paper, we note that there is a tension between the nondeterministic ambiguity of the class one is seeking to capture, and the density (or, to be more precise, the needed "nongappy"-ness) of the easy-to-find "targets" used in iterative constant-setting. In…
▽ More
Cai and Hemachandra used iterative constant-setting to prove that Few $\subseteq$ $\oplus$P (and thus that FewP $\subseteq$ $\oplus$P). In this paper, we note that there is a tension between the nondeterministic ambiguity of the class one is seeking to capture, and the density (or, to be more precise, the needed "nongappy"-ness) of the easy-to-find "targets" used in iterative constant-setting. In particular, we show that even less restrictive gap-size upper bounds regarding the targets allow one to capture ambiguity-limited classes. Through a flexible, metatheorem-based approach, we do so for a wide range of classes including the logarithmic-ambiguity version of Valiant's unambiguous nondeterminism class UP. Our work lowers the bar for what advances regarding the existence of infinite, P-printable sets of primes would suffice to show that restricted counting classes based on the primes have the power to accept superconstant-ambiguity analogues of UP. As an application of our work, we prove that the Lenstra-Pomerance-Wagstaff Conjecture implies that all (O(1) + loglogn)-ambiguity NP sets are in the restricted counting class $\rm RC_{PRIMES}$.
△ Less
Submitted 9 February, 2024; v1 submitted 29 September, 2021;
originally announced September 2021.
-
One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations
Authors:
Henry Kvinge,
Scott Howland,
Nico Courts,
Lauren A. Phillips,
John Buckheit,
Zachary New,
Elliott Skomski,
Jung H. Lee,
Sandeep Tiwari,
Jessica Hibler,
Courtney D. Corley,
Nathan O. Hodas
Abstract:
The field of few-shot learning has made remarkable strides in develo** powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we desc…
▽ More
The field of few-shot learning has made remarkable strides in develo** powerful models that can operate in the small data regime. Nearly all of these methods assume every unlabeled instance encountered will belong to a handful of known classes for which one has examples. This can be problematic for real-world use cases where one routinely finds 'none-of-the-above' examples. In this paper we describe this challenge of identifying what we term 'out-of-support' (OOS) examples. We describe how this problem is subtly different from out-of-distribution detection and describe a new method of identifying OOS examples within the Prototypical Networks framework using a fixed point which we call the generic representation. We show that our method outperforms other existing approaches in the literature as well as other approaches that we propose in this paper. Finally, we investigate how the use of such a generic point affects the geometry of a model's feature space.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Controller Synthesis for Multi-Agent Systems with Intermittent Communication and Metric Temporal Logic Specifications
Authors:
Zhe Xu,
Federico M. Zegers,
Bo Wu,
Alexander J. Phillips,
Warren Dixon,
Ufuk Topcu
Abstract:
This paper investigates the controller synthesis problem for a multi-agent system (MAS) with intermittent communication. We adopt a relay-explorer scheme, where a mobile relay agent with absolute position sensors switches among a set of explorers with relative position sensors to provide intermittent state information. We model the MAS as a switched system where the explorers' dynamics can be eith…
▽ More
This paper investigates the controller synthesis problem for a multi-agent system (MAS) with intermittent communication. We adopt a relay-explorer scheme, where a mobile relay agent with absolute position sensors switches among a set of explorers with relative position sensors to provide intermittent state information. We model the MAS as a switched system where the explorers' dynamics can be either fully-actuated or underactuated. The objective of the explorers is to reach approximate consensus to a predetermined goal region. To guarantee the stability of the switched system and the approximate consensus of the explorers, we derive maximum dwell-time conditions to constrain the length of time each explorer goes without state feedback (from the relay agent). Furthermore, the relay agent needs to satisfy practical constraints such as charging its battery and staying in specific regions of interest. Both the maximum dwell-time conditions and these practical constraints can be expressed by metric temporal logic (MTL) specifications. We iteratively compute the optimal control inputs for the relay agent to satisfy the MTL specifications, while guaranteeing stability and approximate consensus of the explorers. We implement the proposed method on a case study with the CoppeliaSim robot simulator.
△ Less
Submitted 5 February, 2021;
originally announced April 2021.
-
Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning
Authors:
Henry Kvinge,
Zachary New,
Nico Courts,
Jung H. Lee,
Lauren A. Phillips,
Courtney D. Corley,
Aaron Tuor,
Andrew Avila,
Nathan O. Hodas
Abstract:
Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category…
▽ More
Deep learning has shown great success in settings with massive amounts of data but has struggled when data is limited. Few-shot learning algorithms, which seek to address this limitation, are designed to generalize well to new tasks with limited data. Typically, models are evaluated on unseen classes and datasets that are defined by the same fundamental task as they are trained for (e.g. category membership). One can also ask how well a model can generalize to fundamentally different tasks within a fixed dataset (for example: moving from category membership to tasks that involve detecting object orientation or quantity). To formalize this kind of shift we define a notion of "independence of tasks" and identify three new sets of labels for established computer vision datasets that test a model's ability to generalize to tasks which draw on orthogonal attributes in the data. We use these datasets to investigate the failure modes of metric-based few-shot models. Based on our findings, we introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data. In particular, FSN models can not only form multiple representations for a given class but can also begin to capture the low-dimensional structure which characterizes class manifolds in the encoded space of deep networks. We show that FSN outperforms state-of-the-art models on the challenging tasks we introduce in this paper while remaining competitive on standard few-shot benchmarks.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness
Authors:
Nick A. Phillips,
Pranav Rajpurkar,
Mark Sabini,
Rayan Krishnan,
Sharon Zhou,
Anuj Pareek,
Nguyet Minh Phu,
Chris Wang,
Mudit Jain,
Nguyen Duong Du,
Steven QH Truong,
Andrew Y. Ng,
Matthew P. Lungren
Abstract:
Clinical deployment of deep learning algorithms for chest x-ray interpretation requires a solution that can integrate into the vast spectrum of clinical workflows across the world. An appealing approach to scaled deployment is to leverage the ubiquity of smartphones by capturing photos of x-rays to share with clinicians using messaging services like WhatsApp. However, the application of chest x-ra…
▽ More
Clinical deployment of deep learning algorithms for chest x-ray interpretation requires a solution that can integrate into the vast spectrum of clinical workflows across the world. An appealing approach to scaled deployment is to leverage the ubiquity of smartphones by capturing photos of x-rays to share with clinicians using messaging services like WhatsApp. However, the application of chest x-ray algorithms to photos of chest x-rays requires reliable classification in the presence of artifacts not typically encountered in digital x-rays used to train machine learning models. We introduce CheXphoto, a dataset of smartphone photos and synthetic photographic transformations of chest x-rays sampled from the CheXpert dataset. To generate CheXphoto we (1) automatically and manually captured photos of digital x-rays under different settings, and (2) generated synthetic transformations of digital x-rays targeted to make them look like photos of digital x-rays and x-ray films. We release this dataset as a resource for testing and improving the robustness of deep learning algorithms for automated chest x-ray interpretation on smartphone photos of chest x-rays.
△ Less
Submitted 11 December, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Constant-Depth and Subcubic-Size Threshold Circuits for Matrix Multiplication
Authors:
Ojas Parekh,
Cynthia A. Phillips,
Conrad D. James,
James B. Aimone
Abstract:
Boolean circuits of McCulloch-Pitts threshold gates are a classic model of neural computation studied heavily in the late 20th century as a model of general computation. Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate lo…
▽ More
Boolean circuits of McCulloch-Pitts threshold gates are a classic model of neural computation studied heavily in the late 20th century as a model of general computation. Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two $N$ by $N$ matrices that integrates threshold gate logic with conventional fast matrix multiplication algorithms, that perform $O(N^ω)$ arithmetic operations for a positive constant $ω< 3$. Our approach converts such a fast matrix multiplication algorithm into a constant-depth threshold circuit with approximately $O(N^ω)$ gates. Prior to our work, it was not known whether the $Θ(N^3)$-gate barrier for matrix multiplication was surmountable by constant-depth threshold circuits.
Dense matrix multiplication is a core operation in convolutional neural network training. Performing this work on a neural architecture instead of off-loading it to a GPU may be an appealing option.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Fractional Decomposition Tree Algorithm: A tool for studying the integrality gap of Integer Programs
Authors:
Robert D. Carr,
Arash Haddadan,
Cynthia A. Phillips
Abstract:
We present a new algorithm, Fractional Decomposition Tree (FDT) for finding a feasible solution for an integer program (IP) where all variables are binary. FDT runs in polynomial time and is guaranteed to find a feasible integer solution provided the integrality gap is bounded. The algorithm gives a construction for Carr and Vempala's theorem that any feasible solution to the IP's linear-programmi…
▽ More
We present a new algorithm, Fractional Decomposition Tree (FDT) for finding a feasible solution for an integer program (IP) where all variables are binary. FDT runs in polynomial time and is guaranteed to find a feasible integer solution provided the integrality gap is bounded. The algorithm gives a construction for Carr and Vempala's theorem that any feasible solution to the IP's linear-programming relaxation, when scaled by the instance integrality gap, dominates a convex combination of feasible solutions. FDT is also a tool for studying the integrality gap of IP formulations. We demonstrate that with experiments studying the integrality gap of two problems: optimally augmenting a tree to a 2-edge-connected graph and finding a minimum-cost 2-edge-connected multi-subgraph (2EC). We also give a simplified algorithm, Dom2IP, that more quickly determines if an instance has an unbounded integrality gap. We show that FDT's speed and approximation quality compare well to that of feasibility pump on moderate-sized instances of the vertex cover problem. For a particular set of hard-to-decompose fractional 2EC solutions, FDT always gave a better integer solution than the best previous approximation algorithm (Christofides).
△ Less
Submitted 11 August, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
LAReQA: Language-agnostic answer retrieval from a multilingual pool
Authors:
Uma Roy,
Noah Constant,
Rami Al-Rfou,
Aditya Barua,
Aaron Phillips,
Yinfei Yang
Abstract:
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strateg…
▽ More
We present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for "strong" cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box. Interestingly, the embedding baseline that performs the best on LAReQA falls short of competing baselines on zero-shot variants of our task that only target "weak" alignment. This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
Probing a Set of Trajectories to Maximize Captured Information
Authors:
Sándor P. Fekete,
Alexander Hill,
Dominik Krupke,
Tyler Mayer,
Joseph S. B. Mitchell,
Ojas Parekh,
Cynthia A. Phillips
Abstract:
We study a trajectory analysis problem we call the Trajectory Capture Problem (TCP), in which, for a given input set ${\cal T}$ of trajectories in the plane, and an integer $k\geq 2$, we seek to compute a set of $k$ points (``portals'') to maximize the total weight of all subtrajectories of ${\cal T}$ between pairs of portals. This problem naturally arises in trajectory analysis and summarization.…
▽ More
We study a trajectory analysis problem we call the Trajectory Capture Problem (TCP), in which, for a given input set ${\cal T}$ of trajectories in the plane, and an integer $k\geq 2$, we seek to compute a set of $k$ points (``portals'') to maximize the total weight of all subtrajectories of ${\cal T}$ between pairs of portals. This problem naturally arises in trajectory analysis and summarization.
We show that the TCP is NP-hard (even in very special cases) and give some first approximation results. Our main focus is on attacking the TCP with practical algorithm-engineering approaches, including integer linear programming (to solve instances to provable optimality) and local search methods. We study the integrality gap arising from such approaches. We analyze our methods on different classes of data, including benchmark instances that we generate. Our goal is to understand the best performing heuristics, based on both solution time and solution quality. We demonstrate that we are able to compute provably optimal solutions for real-world instances.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems
Authors:
Geoffrey Roeder,
Paul K. Grant,
Andrew Phillips,
Neil Dalchau,
Edward Meeds
Abstract:
We introduce a flexible, scalable Bayesian inference framework for nonlinear dynamical systems characterised by distinct and hierarchical variability at the individual, group, and population levels. Our model class is a generalisation of nonlinear mixed-effects (NLME) dynamical systems, the statistical workhorse for many experimental sciences. We cast parameter inference as stochastic optimisation…
▽ More
We introduce a flexible, scalable Bayesian inference framework for nonlinear dynamical systems characterised by distinct and hierarchical variability at the individual, group, and population levels. Our model class is a generalisation of nonlinear mixed-effects (NLME) dynamical systems, the statistical workhorse for many experimental sciences. We cast parameter inference as stochastic optimisation of an end-to-end differentiable, block-conditional variational autoencoder. We specify the dynamics of the data-generating process as an ordinary differential equation (ODE) such that both the ODE and its solver are fully differentiable. This model class is highly flexible: the ODE right-hand sides can be a mixture of user-prescribed or "white-box" sub-components and neural network or "black-box" sub-components. Using stochastic optimisation, our amortised inference algorithm could seamlessly scale up to massive data collection pipelines (common in labs with robotic automation). Finally, our framework supports interpretability with respect to the underlying dynamics, as well as predictive generalization to unseen combinations of group components (also called "zero-shot" learning). We empirically validate our method by predicting the dynamic behaviour of bacteria that were genetically engineered to function as biosensors. Our implementation of the framework, the dataset, and all code to reproduce the experimental results is available at https://www.github.com/Microsoft/vi-hds .
△ Less
Submitted 1 October, 2019; v1 submitted 28 May, 2019;
originally announced May 2019.
-
The Online Event-Detection Problem
Authors:
Michael A. Bender,
Jonathan W. Berry,
Martin Farach-Colton,
Rob Johnson,
Thomas M. Kroeger,
Prashant Pandey,
Cynthia A. Phillips,
Shikha Singh
Abstract:
Given a stream $S = (s_1, s_2, ..., s_N)$, a $φ$-heavy hitter is an item $s_i$ that occurs at least $φN$ times in $S$. The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a $φ$-event at time $t$ if $s_t$ occurs exactly $φN$ times in $(s_1, s_2, ..., s_t)$. Thus, for each $φ$-heavy hitter there…
▽ More
Given a stream $S = (s_1, s_2, ..., s_N)$, a $φ$-heavy hitter is an item $s_i$ that occurs at least $φN$ times in $S$. The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a $φ$-event at time $t$ if $s_t$ occurs exactly $φN$ times in $(s_1, s_2, ..., s_t)$. Thus, for each $φ$-heavy hitter there is a single $φ$-event which occurs when its count reaches the reporting threshold $φN$. We define the online event-detection problem (OEDP) as: given $φ$ and a stream $S$, report all $φ$-events as soon as they occur.
Many real-world monitoring systems demand event detection where all events must be reported (no false negatives), in a timely manner, with no non-events reported (no false positives), and a low reporting threshold. As a result, the OEDP requires a large amount of space (Omega(N) words) and is not solvable in the streaming model or via standard sampling-based approaches.
Since OEDP requires large space, we focus on cache-efficient algorithms in the external-memory model.
We provide algorithms for the OEDP that are within a log factor of optimal. Our algorithms are tunable: its parameters can be set to allow for a bounded false-positives and a bounded delay in reporting. None of our relaxations allow false negatives since reporting all events is a strict requirement of our applications. Finally, we show improved results when the count of items in the input stream follows a power-law distribution.
△ Less
Submitted 23 December, 2018;
originally announced December 2018.
-
Improving the Modularity of AUV Control Systems using Behaviour Trees
Authors:
Christopher Iliffe Sprague,
Özer Özkahraman,
Andrea Munafo,
Rachel Marlow,
Alexander Phillips,
Petter Ögren
Abstract:
In this paper, we show how behaviour trees (BTs) can be used to design modular, versatile, and robust control architectures for mission-critical systems. In particular, we show this in the context of autonomous underwater vehicles (AUVs). Robustness, in terms of system safety, is important since manual recovery of AUVs is often extremely difficult. Further more, versatility is important to be able…
▽ More
In this paper, we show how behaviour trees (BTs) can be used to design modular, versatile, and robust control architectures for mission-critical systems. In particular, we show this in the context of autonomous underwater vehicles (AUVs). Robustness, in terms of system safety, is important since manual recovery of AUVs is often extremely difficult. Further more, versatility is important to be able to execute many different kinds of missions. Finally, modularity is needed to achieve a combination of robustness and versatility, as the complexity of a versatile systems needs to be encapsulated in modules, in order to create a simple overall structure enabling robustness analysis. The proposed design is illustrated using a typical AUV mission.
△ Less
Submitted 1 November, 2018;
originally announced November 2018.
-
Fully Convolutional Network for Melanoma Diagnostics
Authors:
Adon Phillips,
Iris Teo,
Jochen Lang
Abstract:
This work seeks to determine how modern machine learning techniques may be applied to the previously unexplored topic of melanoma diagnostics using digital pathology. We curated a new dataset of 50 patient cases of cutaneous melanoma using digital pathology. We provide gold standard annotations for three tissue types (tumour, epidermis, and dermis) which are important for the prognostic measuremen…
▽ More
This work seeks to determine how modern machine learning techniques may be applied to the previously unexplored topic of melanoma diagnostics using digital pathology. We curated a new dataset of 50 patient cases of cutaneous melanoma using digital pathology. We provide gold standard annotations for three tissue types (tumour, epidermis, and dermis) which are important for the prognostic measurements known as Breslow thickness and Clark level. Then, we devised a novel multi-stride fully convolutional network (FCN) architecture that outperformed other networks trained and evaluated using the same data according to standard metrics. Finally, we trained a model to detect and localize the target tissue types. When processing previously unseen cases, our model's output is qualitatively very similar to the gold standard. In addition to the standard metrics computed as a baseline for our approach, we asked three additional pathologists to measure the Breslow thickness on the network's output. Their responses were diagnostically equivalent to the ground truth measurements, and when removing cases where a measurement was not appropriate, inter-rater reliability (IRR) between the four pathologists was 75.0%. Given the qualitative and quantitative results, it is possible to overcome the discriminative challenges of the skin and tumour anatomy for segmentation using modern machine learning techniques, though more work is required to improve the network's performance on dermis segmentation. Further, we show that it is possible to achieve a level of accuracy required to manually perform the Breslow thickness measurement.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Contrasting information theoretic decompositions of modulatory and arithmetic interactions in neural information processing systems
Authors:
Jim W. Kay,
William A. Phillips
Abstract:
Biological and artificial neural systems are composed of many local processors, and their capabilities depend upon the transfer function that relates each local processor's outputs to its inputs. This paper uses a recent advance in the foundations of information theory to study the properties of local processors that use contextual input to amplify or attenuate transmission of information about th…
▽ More
Biological and artificial neural systems are composed of many local processors, and their capabilities depend upon the transfer function that relates each local processor's outputs to its inputs. This paper uses a recent advance in the foundations of information theory to study the properties of local processors that use contextual input to amplify or attenuate transmission of information about their driving inputs. This advance enables the information transmitted by processors with two distinct inputs to be decomposed into those components unique to each input, that shared between the two inputs, and that which depends on both though it is in neither, i.e. synergy. The decompositions that we report here show that contextual modulation has information processing properties that contrast with those of all four simple arithmetic operators, that it can take various forms, and that the form used in our previous studies of artificial neural nets composed of local processors with both driving and contextual inputs is particularly well-suited to provide the distinctive capabilities of contextual modulation under a wide range of conditions. We argue that the decompositions reported here could be compared with those obtained from empirical neurobiological and psychophysical data under conditions thought to reflect contextual modulation. That would then shed new light on the underlying processes involved. Finally, we suggest that such decompositions could aid the design of context-sensitive machine learning algorithms.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Geometric Hitting Set for Segments of Few Orientations
Authors:
Sándor P. Fekete,
Kan Huang,
Joseph S. B. Mitchell,
Ojas Parekh,
Cynthia A. Phillips
Abstract:
We study several natural instances of the geometric hitting set problem for input consisting of sets of line segments (and rays, lines) having a small number of distinct slopes. These problems model path monitoring (e.g., on road networks) using the fewest sensors (the "hitting points"). We give approximation algorithms for cases including (i) lines of 3 slopes in the plane, (ii) vertical lines an…
▽ More
We study several natural instances of the geometric hitting set problem for input consisting of sets of line segments (and rays, lines) having a small number of distinct slopes. These problems model path monitoring (e.g., on road networks) using the fewest sensors (the "hitting points"). We give approximation algorithms for cases including (i) lines of 3 slopes in the plane, (ii) vertical lines and horizontal segments, (iii) pairs of horizontal/vertical segments. We give hardness and hardness of approximation results for these problems. We prove that the hitting set problem for vertical lines and horizontal rays is polynomially solvable.
△ Less
Submitted 17 December, 2016; v1 submitted 19 March, 2016;
originally announced March 2016.
-
Partial Information Decomposition as a Unified Approach to the Specification of Neural Goal Functions
Authors:
Michael Wibral,
Viola Priesemann,
Jim W. Kay,
Joseph T. Lizier,
William A. Phillips
Abstract:
In many neural systems anatomical motifs are present repeatedly, but despite their structural similarity they can serve very different tasks. A prime example for such a motif is the canonical microcircuit of six-layered neo-cortex, which is repeated across cortical areas, and is involved in a number of different tasks (e.g.sensory, cognitive, or motor tasks). This observation has spawned interest…
▽ More
In many neural systems anatomical motifs are present repeatedly, but despite their structural similarity they can serve very different tasks. A prime example for such a motif is the canonical microcircuit of six-layered neo-cortex, which is repeated across cortical areas, and is involved in a number of different tasks (e.g.sensory, cognitive, or motor tasks). This observation has spawned interest in finding a common underlying principle, a 'goal function', of information processing implemented in this structure. By definition such a goal function, if universal, cannot be cast in processing-domain specific language (e.g. 'edge filtering', 'working memory'). Thus, to formulate such a principle, we have to use a domain-independent framework. Information theory offers such a framework. However, while the classical framework of information theory focuses on the relation between one input and one output (Shannon's mutual information), we argue that neural information processing crucially depends on the combination of \textit{multiple} inputs to create the output of a processor. To account for this, we use a very recent extension of Shannon Information theory, called partial information decomposition (PID). PID allows to quantify the information that several inputs provide individually (unique information), redundantly (shared information) or only jointly (synergistic information) about the output. First, we review the framework of PID. Then we apply it to reevaluate and analyze several earlier proposals of information theoretic neural goal functions (predictive coding, infomax, coherent infomax, efficient coding). We find that PID allows to compare these goal functions in a common framework, and also provides a versatile approach to design new goal functions from first principles. Building on this, we design and analyze a novel goal function, called 'coding with synergy'. [...]
△ Less
Submitted 3 October, 2015;
originally announced October 2015.
-
Why do simple algorithms for triangle enumeration work in the real world?
Authors:
Jonathan W. Berry,
Luke A. Fostvedt,
Daniel J. Nordman,
Cynthia A. Phillips,
C. Seshadhri,
Alyson G. Wilson
Abstract:
Listing all triangles is a fundamental graph operation. Triangles can have important interpretations in real-world graphs, especially social and other interaction networks. Despite the lack of provably efficient (linear, or slightly super-linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are…
▽ More
Listing all triangles is a fundamental graph operation. Triangles can have important interpretations in real-world graphs, especially social and other interaction networks. Despite the lack of provably efficient (linear, or slightly super-linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs to provide major speedups in running time?
We study one of the most prevalent algorithms used by practitioners. A trivial algorithm enumerates all paths of length $2$, and checks if each such path is incident to a triangle. A good heuristic is to enumerate only those paths of length $2$ where the middle vertex has the lowest degree. It is easily implemented and is empirically known to give remarkable speedups over the trivial algorithm.
We study the behavior of this algorithm over graphs with heavy-tailed degree distributions, a defining feature of real-world graphs. The erased configuration model (ECM) efficiently generates a graph with asymptotically (almost) any desired degree sequence. We show that the expected running time of this algorithm over the distribution of graphs created by the ECM is controlled by the $\ell_{4/3}$-norm of the degree sequence. Norms of the degree sequence are a measure of the heaviness of the tail, and it is precisely this feature that allows non-trivial speedups of simple triangle enumeration algorithms. As a corollary of our main theorem, we prove expected linear-time performance for degree sequences following a power law with exponent $α\geq 7/3$, and non-trivial speedup whenever $α\in (2,3)$.
△ Less
Submitted 3 July, 2014;
originally announced July 2014.
-
Stochastic Simulation of Process Calculi for Biology
Authors:
Andrew Phillips,
Matthew Lakin,
Loïc Paulevé
Abstract:
Biological systems typically involve large numbers of components with complex, highly parallel interactions and intrinsic stochasticity. To model this complexity, numerous programming languages based on process calculi have been developed, many of which are expressive enough to generate unbounded numbers of molecular species and reactions. As a result of this expressiveness, such calculi cannot re…
▽ More
Biological systems typically involve large numbers of components with complex, highly parallel interactions and intrinsic stochasticity. To model this complexity, numerous programming languages based on process calculi have been developed, many of which are expressive enough to generate unbounded numbers of molecular species and reactions. As a result of this expressiveness, such calculi cannot rely on standard reaction-based simulation methods, which require fixed numbers of species and reactions. Rather than implementing custom stochastic simulation algorithms for each process calculus, we propose to use a generic abstract machine that can be instantiated to a range of process calculi and a range of reaction-based simulation algorithms. The abstract machine functions as a just-in-time compiler, which dynamically updates the set of possible reactions and chooses the next reaction in an iterative cycle. In this short paper we give a brief summary of the generic abstract machine, and show how it can be instantiated with the stochastic simulation algorithm known as Gillespie's Direct Method. We also discuss the wider implications of such an abstract machine, and outline how it can be used to simulate multiple calculi simultaneously within a common framework.
△ Less
Submitted 1 November, 2010;
originally announced November 2010.
-
Communication-Aware Processor Allocation for Supercomputers
Authors:
Michael A. Bender,
David P. Bunde,
Erik D. Demaine,
Sandor P. Fekete,
Vitus J. Leung,
Henk Meijer,
Cynthia A. Phillips
Abstract:
This paper gives processor-allocation algorithms for minimizing the average number of communication hops between the assigned processors for grid architectures, in the presence of occupied cells. The simpler problem of assigning processors on a free grid has been studied by Karp, McKellar, and Wong who show that the solutions have nontrivial structure; they left open the complexity of the proble…
▽ More
This paper gives processor-allocation algorithms for minimizing the average number of communication hops between the assigned processors for grid architectures, in the presence of occupied cells. The simpler problem of assigning processors on a free grid has been studied by Karp, McKellar, and Wong who show that the solutions have nontrivial structure; they left open the complexity of the problem.
The associated clustering problem is as follows: Given n points in Re^d, find k points that minimize their average pairwise L1 distance. We present a natural approximation algorithm and show that it is a 7/4-approximation for 2D grids. For d-dimensional space, the approximation guarantee is 2-(1/2d), which is tight. We also give a polynomial-time approximation scheme (PTAS) for constant dimension d, and report on experimental results.
△ Less
Submitted 6 December, 2005; v1 submitted 24 July, 2004;
originally announced July 2004.