Search | arXiv e-print repository

Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Authors: Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz

Abstract: We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its… ▽ More We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the $N$ filter outputs are aggregated to optimize a lower bound for the loss of the aggregated LLMs, which can be optimized in closed-form, thus generating our ensemble predictor. Our contributions here are: (I) the MoE-F algorithm -- deployable as a plug-and-play filtering harness, (II) theoretical optimality guarantees of the proposed filtering-based gating algorithm, and (III) empirical evaluation and ablative results using state of the art foundational and MoE LLMs on a real-world Financial Market Movement task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 29 pages, 5 Appendix sections

MSC Class: 60J05; 60G35; 68T20; 68T42; 68T50 ACM Class: I.2.6; I.2.7; G.3

arXiv:2404.11599 [pdf, other]

Variational Bayesian Last Layers

Authors: James Harrison, John Willes, Jasper Snoek

Abstract: We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard archit… ▽ More We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard architectures. We experimentally investigate VBLLs, and show that they improve predictive accuracy, calibration, and out of distribution detection over baselines across both regression and classification. Finally, we investigate combining VBLL layers with variational Bayesian feature learning, yielding a lower variance collapsed variational inference method for Bayesian neural networks. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: International Conference on Learning Representations (ICLR) 2024

arXiv:2312.03140 [pdf, other]

FlexModel: A Framework for Interpretability of Distributed Large Language Models

Authors: Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson

Abstract: With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed co… ▽ More With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed computing. This often hinders contributions from researchers with machine learning expertise but limited distributed computing background. Addressing this challenge, we present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi-GPU and multi-node configurations. The library is compatible with existing model distribution libraries and encapsulates PyTorch models. It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals, bridging the gap between distributed and single-device model paradigms. Primarily, FlexModel enhances accessibility by democratizing model interactions and promotes more inclusive research in the domain of large-scale neural networks. The package is found at https://github.com/VectorInstitute/flex_model. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 14 pages, 8 figures. To appear at the Socially Responsible Language Modelling Research (SoLaR) Workshop, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2308.05711 [pdf, other]

A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control

Authors: Marshall Wang, John Willes, Thomas Jiralerspong, Matin Moezzi

Abstract: Reinforcement learning (RL) is a promising approach for optimizing HVAC control. RL offers a framework for improving system performance, reducing energy consumption, and enhancing cost efficiency. We benchmark two popular classical and deep RL methods (Q-Learning and Deep-Q-Networks) across multiple HVAC environments and explore the practical consideration of model hyper-parameter selection and re… ▽ More Reinforcement learning (RL) is a promising approach for optimizing HVAC control. RL offers a framework for improving system performance, reducing energy consumption, and enhancing cost efficiency. We benchmark two popular classical and deep RL methods (Q-Learning and Deep-Q-Networks) across multiple HVAC environments and explore the practical consideration of model hyper-parameter selection and reward tuning. The findings provide insight for configuring RL agents in HVAC systems, promoting energy-efficient and cost-effective operation. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2209.12487 [pdf, other]

Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

Authors: AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, John Willes, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik

Abstract: The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the… ▽ More The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the emergence of many new approaches in recent years, comparatively little progress has been made in develo** realistic benchmarks that reflect the complexity of molecular design for real-world applications. In this work, we develop a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions. Additionally, we demonstrate the utility and ease of use of our new benchmark set by demonstrating how to compare the performance of several well-established families of algorithms. Surprisingly, we find that model performance can strongly depend on the benchmark domain. We believe that our benchmark suite will help move the field towards more realistic molecular design benchmarks, and move the development of inverse molecular design algorithms closer to designing molecules that solve existing problems in both academia and industry alike. △ Less

Submitted 11 October, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 29+21 pages, 6+19 figures, 6+2 tables

arXiv:2208.08041 [pdf, other]

InterTrack: Interaction Transformer for 3D Multi-Object Tracking

Authors: John Willes, Cody Reading, Steven L. Waslander

Abstract: 3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, InterTrack, introduces the Interaction Transformer for… ▽ More 3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, InterTrack, introduces the Interaction Transformer for 3D MOT to generate discriminative object representations for data association. We extract state and shape features for each track and detection, and efficiently aggregate global information via attention. We then perform a learned regression on each track/detection feature pair to estimate affinities, and use a robust two-stage data association and track management approach to produce the final tracks. We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant improvements, particularly on classes with small physical sizes and clustered objects. As of submission, InterTrack ranks 1st in overall AMOTA among methods using CenterPoint detections. △ Less

Submitted 6 May, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Accepted to CRV 2023

arXiv:2107.13682 [pdf, other]

Bayesian Embeddings for Few-Shot Open World Recognition

Authors: John Willes, James Harrison, Ali Harakeh, Chelsea Finn, Marco Pavone, Steven Waslander

Abstract: As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that are typically designed with a known set of classes… ▽ More As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that are typically designed with a known set of classes and a large number of examples for each class. In this work we extend embedding-based few-shot learning algorithms to the open-world recognition setting. We combine Bayesian non-parametric class priors with an embedding-based pre-training scheme to yield a highly flexible framework which we refer to as few-shot learning for open world recognition (FLOWR). We benchmark our framework on open-world extensions of the common MiniImageNet and TieredImageNet few-shot learning datasets. Our results show, compared to prior methods, strong classification accuracy performance and up to a 12% improvement in H-measure (a measure of novel class detection) from our non-parametric open-world few-shot learning scheme. △ Less

Submitted 5 October, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

arXiv:astro-ph/0302583 [pdf, ps, other]

doi 10.1111/j.1365-2966.2004.07363.x

Electron-cyclotron maser emission from white-dwarf pairs and white-dwarf planetary systems

Authors: Andrew J. Willes, Kinwah Wu

Abstract: By analogy to Jovian radio emissions powered by the electromagnetic interaction between Jupiter and its moons, we propose that close magnetic-nonmagnetic white-dwarf pairs and white-dwarf planetary systems are strong radio sources. A simple model is developed to predict the flux densities of radio emission generated by a loss-cone-driven electron-cyclotron maser. The radio emission from these sy… ▽ More By analogy to Jovian radio emissions powered by the electromagnetic interaction between Jupiter and its moons, we propose that close magnetic-nonmagnetic white-dwarf pairs and white-dwarf planetary systems are strong radio sources. A simple model is developed to predict the flux densities of radio emission generated by a loss-cone-driven electron-cyclotron maser. The radio emission from these systems has high brightness temperatures, is highly polarized, and varies on a periodic cycle following the orbital rotation. Masers from magnetic-nonmagnetic white-dwarf pairs, with orbital periods <10 min, are expected to be detectable over a wide range of radio frequencies. Terrestrial planets in close orbits about magnetic white dwarfs, with orbital periods $\la 30$ hr, can also produce detectable radio emission, thus providing a means to identify Earth-sized extrasolar planets. △ Less

Submitted 27 February, 2003; originally announced February 2003.

Comments: 13 pages, 5 figures, submitted to MNRAS

Report number: USYD-03-328

Journal ref: Mon.Not.Roy.Astron.Soc. 348 (2004) 285

Showing 1–8 of 8 results for author: Willes, J