-
High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics
Authors:
Tony Bonnaire,
Davide Ghio,
Kamesh Krishnamurthy,
Francesca Mignacco,
Atsushi Yamamura,
Giulio Biroli
Abstract:
In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.
In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.
△ Less
Submitted 10 November, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
R-LPIPS: An Adversarially Robust Perceptual Similarity Metric
Authors:
Sara Ghazanfari,
Siddharth Garg,
Prashanth Krishnamurthy,
Farshad Khorrami,
Alexandre Araujo
Abstract:
Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception whe…
▽ More
Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception when evaluating relative image similarity. However, it is now well-known that neural networks are susceptible to adversarial examples, i.e., small perturbations invisible to humans crafted to deliberately mislead the model. Consequently, the LPIPS metric is also sensitive to such adversarial examples. This susceptibility introduces significant security concerns, especially considering the widespread adoption of LPIPS in large-scale applications. In this paper, we propose the Robust Learned Perceptual Image Patch Similarity (R-LPIPS) metric, a new metric that leverages adversarially trained deep features. Through a comprehensive set of experiments, we demonstrate the superiority of R-LPIPS compared to the classical LPIPS metric. The code is available at https://github.com/SaraGhazanfari/R-LPIPS.
△ Less
Submitted 31 July, 2023; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Spinon continuum in the Heisenberg quantum chain compound Sr$_2$V$_3$O$_9$
Authors:
Shang Gao,
Ling-Fang Lin,
Pontus Laurell,
Qiang Chen,
Qing Huang,
Clarina dela Cruz,
Krishnamurthy V. Vemuru,
Mark D. Lumsden,
Stephen E. Nagler,
Gonzalo Alvarez,
Elbio Dagotto,
Haidong Zhou,
Andrew D. Christianson,
Matthew B. Stone
Abstract:
Magnetic excitations in the spin chain candidate Sr$_2$V$_3$O$_9$ have been investigated by inelastic neutron scattering on a single crystal sample. A spinon continuum with a bandwidth of $\sim22$ meV is observed along the chain formed by alternating magnetic V$^{4+}$ and nonmagnetic V$^{5+}$ ions. Incipient magnetic Bragg peaks due to weak ferromagnetic interchain couplings emerge when approachin…
▽ More
Magnetic excitations in the spin chain candidate Sr$_2$V$_3$O$_9$ have been investigated by inelastic neutron scattering on a single crystal sample. A spinon continuum with a bandwidth of $\sim22$ meV is observed along the chain formed by alternating magnetic V$^{4+}$ and nonmagnetic V$^{5+}$ ions. Incipient magnetic Bragg peaks due to weak ferromagnetic interchain couplings emerge when approaching the magnetic transition at $T_N\sim 5.3$ K while the excitations remain gapless within the instrumental resolution. Comparisons to the Bethe ansatz, density matrix renormalization group (DMRG) calculations, and effective field theories confirm Sr$_2$V$_3$O$_9$ as a host of weakly coupled $S = 1/2$ chains dominated by antiferromagnetic intrachain interactions of $\sim7.1$(1) meV.
△ Less
Submitted 25 July, 2023; v1 submitted 22 July, 2023;
originally announced July 2023.
-
Using Circulation to Mitigate Spurious Equilibria in Control Barrier Function -- Extended Version
Authors:
Vinicius Mariano Goncalves,
Prashanth Krishnamurthy,
Anthony Tzes,
Farshad Khorrami
Abstract:
Control Barrier Functions and Quadratic Programming are increasingly used for designing controllers that consider critical safety constraints. However, like Artificial Potential Fields, they can suffer from the stable spurious equilibrium point problem, which can result in the controller failing to reach the goal. To address this issue, we propose introducing circulation inequalities as a constrai…
▽ More
Control Barrier Functions and Quadratic Programming are increasingly used for designing controllers that consider critical safety constraints. However, like Artificial Potential Fields, they can suffer from the stable spurious equilibrium point problem, which can result in the controller failing to reach the goal. To address this issue, we propose introducing circulation inequalities as a constraint. These inequalities force the system to explicitly circulate the obstacle region in configuration space, thus avoiding undesirable equilibria. We conduct a theoretical analysis of the proposed framework and demonstrate its efficacy through simulation studies. By mitigating spurious equilibria, our approach enhances the reliability of CBF-based controllers, making them more suitable for real-world applications.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Trainability, Expressivity and Interpretability in Gated Neural ODEs
Authors:
Timothy Doyeon Kim,
Tankut Can,
Kamesh Krishnamurthy
Abstract:
Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differen…
▽ More
Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection
Authors:
Hao Fu,
Prashanth Krishnamurthy,
Siddharth Garg,
Farshad Khorrami
Abstract:
This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on de…
▽ More
This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
△ Less
Submitted 14 July, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
FODVid: Flow-guided Object Discovery in Videos
Authors:
Silky Singh,
Shripad Deshmukh,
Mausoom Sarkar,
Rishabh Jain,
Mayur Hemani,
Balaji Krishnamurthy
Abstract:
Segmentation of objects in a video is challenging due to the nuances such as motion blurring, parallax, occlusions, changes in illumination, etc. Instead of addressing these nuances separately, we focus on building a generalizable solution that avoids overfitting to the individual intricacies. Such a solution would also help us save enormous resources involved in human annotation of video corpora.…
▽ More
Segmentation of objects in a video is challenging due to the nuances such as motion blurring, parallax, occlusions, changes in illumination, etc. Instead of addressing these nuances separately, we focus on building a generalizable solution that avoids overfitting to the individual intricacies. Such a solution would also help us save enormous resources involved in human annotation of video corpora. To solve Video Object Segmentation (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs using flow-guided graph-cut and temporal consistency. Basically, we design a segmentation model incorporating intra-frame appearance and flow similarities, and inter-frame temporal continuation of the objects under consideration. We perform an extensive experimental analysis of our straightforward methodology on the standard DAVIS16 video benchmark. Though simple, our approach produces results comparable (within a range of ~2 mIoU) to the existing top approaches in unsupervised VOS. The simplicity and effectiveness of our technique opens up new avenues for research in the video domain.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Markov Persuasion Processes with Endogenous Agent Beliefs
Authors:
Krishnamurthy Iyer,
Haifeng Xu,
You Zu
Abstract:
We consider a dynamic Bayesian persuasion setting where a single long-lived sender persuades a stream of ``short-lived'' agents (receivers) by sharing information about a payoff-relevant state. The state transitions are Markovian and the sender seeks to maximize the long-run average reward by committing to a (possibly history-dependent) signaling mechanism. While most previous studies of Markov pe…
▽ More
We consider a dynamic Bayesian persuasion setting where a single long-lived sender persuades a stream of ``short-lived'' agents (receivers) by sharing information about a payoff-relevant state. The state transitions are Markovian and the sender seeks to maximize the long-run average reward by committing to a (possibly history-dependent) signaling mechanism. While most previous studies of Markov persuasion consider exogenous agent beliefs that are independent of the chain, we study a more natural variant with endogenous agent beliefs that depend on the chain's realized history. A key challenge to analyze such settings is to model the agents' partial knowledge about the history information. We analyze a Markov persuasion process (MPP) under various information models that differ in the amount of information the receivers have about the history of the process. Specifically, we formulate a general partial-information model where each receiver observes the history with an $\ell$ period lag. Our technical contribution start with analyzing two benchmark models, i.e., the full-history information model and the no-history information model. We establish an ordering of the sender's payoff as a function of the informativeness of agent's information model (with no-history as the least informative), and develop efficient algorithms to compute optimal solutions for these two benchmarks. For general $\ell$, we present the technical challenges in finding an optimal signaling mechanism, where even determining the right dependency on the history becomes difficult. To bypass the difficulties, we use a robustness framework to design a "simple" \emph{history-independent} signaling mechanism that approximately achieves optimal payoff when $\ell$ is reasonably large.
△ Less
Submitted 13 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
Authors:
Sanath Kumar Krishnamurthy,
Ruohan Zhan,
Susan Athey,
Emma Brunskill
Abstract:
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter de…
▽ More
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter determines the weight placed on cumulative regret minimization (where we establish near-optimal minimax guarantees) versus simple regret minimization (where we establish state-of-the-art guarantees). Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings. This flexibility comes from constructing and relying on "conformal arm sets" (CASs). CASs provide a set of arms for every context, encompassing the context-specific optimal arm with a certain probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted with a negative result, which shows that no algorithm can achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.
△ Less
Submitted 2 November, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Optimal Surrogate Boundary Selection and Scalability Studies for the Shifted Boundary Method on Octree Meshes
Authors:
Cheng-Hau Yang,
Kumar Saurabh,
Guglielmo Scovazzi,
Claudio Canuto,
Adarsh Krishnamurthy,
Baskar Ganapathysubramanian
Abstract:
The accurate and efficient simulation of Partial Differential Equations (PDEs) in and around arbitrarily defined geometries is critical for many application domains. Immersed boundary methods (IBMs) alleviate the usually laborious and time-consuming process of creating body-fitted meshes around complex geometry models (described by CAD or other representations, e.g., STL, point clouds), especially…
▽ More
The accurate and efficient simulation of Partial Differential Equations (PDEs) in and around arbitrarily defined geometries is critical for many application domains. Immersed boundary methods (IBMs) alleviate the usually laborious and time-consuming process of creating body-fitted meshes around complex geometry models (described by CAD or other representations, e.g., STL, point clouds), especially when high levels of mesh adaptivity are required. In this work, we advance the field of IBM in the context of the recently developed Shifted Boundary Method (SBM). In the SBM, the location where boundary conditions are enforced is shifted from the actual boundary of the immersed object to a nearby surrogate boundary, and boundary conditions are corrected utilizing Taylor expansions. This approach allows choosing surrogate boundaries that conform to a Cartesian mesh without losing accuracy or stability. Our contributions in this work are as follows: (a) we show that the SBM numerical error can be greatly reduced by an optimal choice of the surrogate boundary, (b) we mathematically prove the optimal convergence of the SBM for this optimal choice of the surrogate boundary, (c) we deploy the SBM on massively parallel octree meshes, including algorithmic advances to handle incomplete octrees, and (d) we showcase the applicability of these approaches with a wide variety of simulations involving complex shapes, sharp corners, and different topologies. Specific emphasis is given to Poisson's equation and the linear elasticity equations.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Screening Mixed-Metal Sn$_2$M(III)Ch$_2$X$_3$ Chalcohalides for Photovoltaic Applications
Authors:
Pascal Henkel,
**grui Li,
G. Krishnamurthy Grandhi,
Paola Vivo,
Patrick Rinke
Abstract:
Quaternary mixed-metal chalcohalides (Sn$_2$BCh$_2$X$_3$) are emerging as promising lead-free perovskite-inspired photovoltaic absorbers. Motivated by recent developments of a first Sn$_2$BCh$_2$X$_3$-based device, we used density functional theory to identify lead-free Sn$_2$BCh$_2$X$_3$ materials that are structurally and energetically stable within Cmcm, Cmc2$_1$ and P2$_1$/c space groups and h…
▽ More
Quaternary mixed-metal chalcohalides (Sn$_2$BCh$_2$X$_3$) are emerging as promising lead-free perovskite-inspired photovoltaic absorbers. Motivated by recent developments of a first Sn$_2$BCh$_2$X$_3$-based device, we used density functional theory to identify lead-free Sn$_2$BCh$_2$X$_3$ materials that are structurally and energetically stable within Cmcm, Cmc2$_1$ and P2$_1$/c space groups and have a band gap in the range of 0.7 to 2.0 eV to cover out- and indoor photovoltaic applications. A total of 27 Sn$_2$BCh$_2$X$_3$ materials were studied, including Sb, Bi, In for B-site, S, Se, Te for Ch-site and Cl, Br, I for X-site. We identified 12 materials with a direct band gap that meet our requirements, namely: Sn$_2$InS$_2$Br$_3$, Sn$_2$InS$_2$I$_3$, Sn$_2$InSe$_2$Cl$_3$, Sn$_2$InSe$_2$Br$_3$, Sn$_2$InTe$_2$Br$_3$, Sn$_2$InTe$_2$Cl$_3$, Sn$_2$SbS$_2$I$_3$, Sn$_2$SbSe$_2$Cl$_3$, Sn$_2$SbSe$_2$I$_3$, Sn$_2$SbTe$_2$Cl$_3$, Sn$_2$BiS$_2$I$_3$ and Sn$_2$BiTe$_2$Cl$_3$. A database scan reveals that 9 out of 12 are new compositions. For all 27 materials, P2$_1$/c is the thermodynamically preferred structure, followed by Cmc2$_1$. In Cmcm and Cmc2$_1$ mainly direct gaps occur, whereas mostly indirects in P2$_1$/c. To open up the possibility of band gap tuning in the future, we identified 12 promising Sn$_2$B$_{1-{a}}$B$'_{a}$Ch$_{2-{b}}$Ch$'_{b}$X$_{3-{c}}$X$_{c}$ alloys which fulfill our requirements and additional 69 materials by combining direct and indirect band gap compounds.
△ Less
Submitted 12 September, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
SARC: Soft Actor Retrospective Critic
Authors:
Sukriti Verma,
Ayush Chopra,
Jayakumar Subramanian,
Mausoom Sarkar,
Nikaash Puri,
Piyush Gupta,
Balaji Krishnamurthy
Abstract:
The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence.…
▽ More
The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
A semi-implicit finite volume scheme for dissipative measure-valued solutions to the barotropic Euler system
Authors:
K. R. Arun,
Amogh Krishnamurthy
Abstract:
A semi-implicit in time, entropy stable finite volume scheme for the compressible barotropic Euler system is designed and analyzed and its weak convergence to a dissipative measure-valued (DMV) solution [E. Feireisl et al., Dissipative measure-valued solutions to the compressible Navier-Stokes system, Calc. Var. Partial Differential Equations, 2016] of the Euler system is shown. The entropy stabil…
▽ More
A semi-implicit in time, entropy stable finite volume scheme for the compressible barotropic Euler system is designed and analyzed and its weak convergence to a dissipative measure-valued (DMV) solution [E. Feireisl et al., Dissipative measure-valued solutions to the compressible Navier-Stokes system, Calc. Var. Partial Differential Equations, 2016] of the Euler system is shown. The entropy stability is achieved by introducing a shifted velocity in the convective fluxes of the mass and momentum balances, provided some CFL-like condition is satisfied to ensure stability. A consistency analysis is performed in the spirit of the Lax's equivalence theorem under some physically reasonable boundedness assumptions. The concept of K-convergence [E. Feireisl et al., K-convergence as a new tool in numerical analysis, IMA J. Numer. Anal., 2020] is used in order to obtain some strong convergence results, which are then illustrated via rigorous numerical case studies. The convergence of the scheme to a DMV solution, a weak solution and a strong solution of the Euler system using the weak-strong uniqueness principle and relative entropy are presented.
△ Less
Submitted 6 December, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Selective Concept Models: Permitting Stakeholder Customisation at Test-Time
Authors:
Matthew Barker,
Katherine M. Collins,
Krishnamurthy Dvijotham,
Adrian Weller,
Umang Bhatt
Abstract:
Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We propose Selective COncept Models (SCOMs) which make predictions using only a subset of concepts and can be customised by stakeholders at test-time according to t…
▽ More
Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We propose Selective COncept Models (SCOMs) which make predictions using only a subset of concepts and can be customised by stakeholders at test-time according to their preferences. We show that SCOMs only require a fraction of the total concepts to achieve optimal accuracy on multiple real-world datasets. Further, we collect and release a new dataset, CUB-Sel, consisting of human concept set selections for 900 bird images from the popular CUB dataset. Using CUB-Sel, we show that humans have unique individual preferences for the choice of concepts they prefer to reason about, and struggle to identify the most theoretically informative concepts. The customisation and concept selection provided by SCOM improves the efficiency of interpretation and intervention for stakeholders.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
ZeroForge: Feedforward Text-to-Shape Without 3D Supervision
Authors:
Kelly O. Marshall,
Minh Pham,
Ameya Joshi,
Anushrut Jignasu,
Aditya Balu,
Adarsh Krishnamurthy,
Chinmay Hegde
Abstract:
Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we requir…
▽ More
Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations
△ Less
Submitted 15 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits
Authors:
Lequn Wang,
Akshay Krishnamurthy,
Aleksandrs Slivkins
Abstract:
We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are either specialized or computationally inefficient. We present the first general oracle-efficient algorithm for pessimistic OPO: it reduces to supervised lea…
▽ More
We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are either specialized or computationally inefficient. We present the first general oracle-efficient algorithm for pessimistic OPO: it reduces to supervised learning, leading to broad applicability. We obtain statistical guarantees analogous to those for prior pessimistic approaches. We instantiate our approach for both discrete and continuous actions and perform experiments in both settings, showing advantage over unregularized OPO across a wide range of configurations.
△ Less
Submitted 25 October, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Faithful Knowledge Distillation
Authors:
Tom A. Lamb,
Rudy Brunel,
Krishnamurthy DJ Dvijotham,
M. Pawan Kumar,
Philip H. S. Torr,
Francisco Eiras
Abstract:
Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD setting, previous works overlook what we term the relative calibration of the student network with respect to its teacher in terms of soft confidences. In parti…
▽ More
Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD setting, previous works overlook what we term the relative calibration of the student network with respect to its teacher in terms of soft confidences. In particular, we focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples? These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting. To address these questions, we introduce a faithful imitation framework to discuss the relative calibration of confidences and provide empirical and certified methods to evaluate the relative calibration of a student w.r.t. its teacher. Further, to verifiably align the relative calibration incentives of the student to those of its teacher, we introduce faithful distillation. Our experiments on the MNIST, Fashion-MNIST and CIFAR-10 datasets demonstrate the need for such an analysis and the advantages of the increased verifiability of faithful distillation over alternative adversarial distillation methods.
△ Less
Submitted 11 August, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Exposing Attention Glitches with Flip-Flop Language Modeling
Authors:
Bingbin Liu,
Jordan T. Ash,
Surbhi Goel,
Akshay Krishnamurthy,
Cyril Zhang
Abstract:
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem,…
▽ More
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of attention glitches, in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning. To isolate the issue, we introduce flip-flop language modeling (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.
△ Less
Submitted 30 October, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Bandwidth Optimal Pipeline Schedule for Collective Communication
Authors:
Liangyu Zhao,
Arvind Krishnamurthy
Abstract:
We present a strongly polynomial-time algorithm to generate bandwidth optimal allgather/reduce-scatter on any network topology, with or without switches. Our algorithm constructs pipeline schedules achieving provably the best possible bandwidth performance on a given topology. To provide a universal solution, we model the network topology as a directed graph with heterogeneous link capacities and…
▽ More
We present a strongly polynomial-time algorithm to generate bandwidth optimal allgather/reduce-scatter on any network topology, with or without switches. Our algorithm constructs pipeline schedules achieving provably the best possible bandwidth performance on a given topology. To provide a universal solution, we model the network topology as a directed graph with heterogeneous link capacities and switches directly as vertices in the graph representation. The algorithm is strongly polynomial-time with respect to the topology size. This work heavily relies on previous graph theory work on edge-disjoint spanning trees and edge splitting. While we focus on allgather, the methods in this paper can be easily extended to generate schedules for reduce, broadcast, reduce-scatter, and allreduce.
△ Less
Submitted 31 May, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Training Private Models That Know What They Don't Know
Authors:
Stephan Rabanser,
Anvith Thudi,
Abhradeep Thakurta,
Krishnamurthy Dvijotham,
Nicolas Papernot
Abstract:
Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selec…
▽ More
Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selective classifiers -- that can abstain when they are unsure -- under a differential privacy constraint. We find that several popular selective prediction approaches are ineffective in a differentially private setting as they increase the risk of privacy leakage. At the same time, we identify that a recent approach that only uses checkpoints produced by an off-the-shelf private learning algorithm stands out as particularly suitable under DP. Further, we show that differential privacy does not just harm utility but also degrades selective classification performance. To analyze this effect across privacy levels, we propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels. Our experimental results show that recovering the performance level attainable by non-private models is possible but comes at a considerable coverage cost as the privacy budget decreases.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Expressive Losses for Verified Robustness via Convex Combinations
Authors:
Alessandro De Palma,
Rudy Bunel,
Krishnamurthy Dvijotham,
M. Pawan Kumar,
Robert Stanforth,
Alessio Lomuscio
Abstract:
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypot…
▽ More
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
△ Less
Submitted 18 March, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
TSoR: TCP Socket over RDMA Container Network for Cloud Native Computing
Authors:
Yulin Sun,
Qingming Qu,
Chenxingyu Zhao,
Arvind Krishnamurthy,
Hong Chang,
Ying Xiong
Abstract:
Cloud-native containerized applications constantly seek high-performance and easy-to-operate container network solutions. RDMA network is a potential enabler with higher throughput and lower latency than the standard TCP/IP network stack. However, several challenges remain in equip** containerized applications with RDMA network: 1) How to deliver transparent improvements without modifying applic…
▽ More
Cloud-native containerized applications constantly seek high-performance and easy-to-operate container network solutions. RDMA network is a potential enabler with higher throughput and lower latency than the standard TCP/IP network stack. However, several challenges remain in equip** containerized applications with RDMA network: 1) How to deliver transparent improvements without modifying application code; 2) How to integrate RDMA-based network solutions with container orchestration systems; 3) How to efficiently utilize RDMA for container networks.
In this paper, we present an RDMA-based container network solution, TCP Socket over RDMA (TSoR), which addresses all the above challenges. To transparently accelerate applications using POSIX socket interfaces without modifications, we integrate TSoR with a container runtime that can intercept system calls for socket interfaces. To be compatible with orchestration systems like Kubernetes, TSoR implements a container network following the Kubernetes network model and satisfies all requirements of the model. To leverage RDMA benefits, TSoR designs a high-performance network stack that efficiently transfers TCP traffic using RDMA network. Thus, TSoR provides a turn-key solution for existing Kubernetes clusters to adopt the high-performance RDMA network with minimal effort.
Our evaluation results show that TSoR provides up to 2.3x higher throughput and 64\% lower latency for existing containerized applications, such as Redis key-value store and Node.js web server, with no code changes. TSoR code will be open-sourced.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Efficient Error Certification for Physics-Informed Neural Networks
Authors:
Francisco Eiras,
Adel Bibi,
Rudy Bunel,
Krishnamurthy Dj Dvijotham,
Philip Torr,
M. Pawan Kumar
Abstract:
Recent work provides promising evidence that Physics-Informed Neural Networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their soluti…
▽ More
Recent work provides promising evidence that Physics-Informed Neural Networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their solution and the ones obtained by a solver on a set of inputs. In real-world applications, one cannot consider tests on a finite set of points to be sufficient grounds for deployment, as the performance could be substantially worse on a different set. To alleviate this issue, we establish guaranteed error-based conditions for PINNs over their continuous applicability domain. To verify the extent to which they hold, we introduce $\partial$-CROWN: a general, efficient and scalable post-training framework to bound PINN residual errors. We demonstrate its effectiveness in obtaining tight certificates by applying it to two classically studied PINNs - Burgers' and Schrödinger's equations -, and two more challenging ones with real-world applications - the Allan-Cahn and Diffusion-Sorption equations.
△ Less
Submitted 29 May, 2024; v1 submitted 17 May, 2023;
originally announced May 2023.
-
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot
Authors:
Aanisha Bhattacharya,
Yaman K Singla,
Balaji Krishnamurthy,
Rajiv Ratn Shah,
Changyou Chen
Abstract:
Multimedia content, such as advertisements and story videos, exhibit a rich blend of creativity and multiple modalities. They incorporate elements like text, visuals, audio, and storytelling techniques, employing devices like emotions, symbolism, and slogans to convey meaning. There is a dearth of large annotated training datasets in the multimedia domain hindering the development of supervised le…
▽ More
Multimedia content, such as advertisements and story videos, exhibit a rich blend of creativity and multiple modalities. They incorporate elements like text, visuals, audio, and storytelling techniques, employing devices like emotions, symbolism, and slogans to convey meaning. There is a dearth of large annotated training datasets in the multimedia domain hindering the development of supervised learning models with satisfactory performance for real-world applications. On the other hand, the rise of large language models (LLMs) has witnessed remarkable zero-shot performance in various natural language processing (NLP) tasks, such as emotion classification, question-answering, and topic classification. To leverage such advanced techniques to bridge this performance gap in multimedia understanding, we propose verbalizing long videos to generate their descriptions in natural language, followed by performing video-understanding tasks on the generated story as opposed to the original video. Through extensive experiments on fifteen video-understanding tasks, we demonstrate that our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding. Furthermore, to alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
△ Less
Submitted 26 October, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
HyHTM: Hyperbolic Geometry based Hierarchical Topic Models
Authors:
Simra Shahid,
Tanay Anand,
Nikitha Srikanth,
Sumit Bhatia,
Balaji Krishnamurthy,
Nikaash Puri
Abstract:
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addres…
▽ More
Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addresses these limitations by incorporating hierarchical information from hyperbolic geometry to explicitly model hierarchies in topic models. Experimental results with four baselines show that HyHTM can better attend to parent-child relationships among topics. HyHTM produces coherent topic hierarchies that specialise in granularity from generic higher-level topics to specific lowerlevel topics. Further, our model is significantly faster and leaves a much smaller memory footprint than our best-performing baseline.We have made the source code for our algorithm publicly accessible.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Geometric Modeling and Physics Simulation Framework for Building a Digital Twin of Extrusion-based Additive Manufacturing
Authors:
Dhruv Gamdha,
Kumar Saurabh,
Baskar Ganapathysubramanian,
Adarsh Krishnamurthy
Abstract:
Accurate simulation of the printing process is essential for improving print quality, reducing waste, and optimizing the printing parameters of extrusion-based additive manufacturing. Traditional additive manufacturing simulations are very compute-intensive and are not scalable to simulate even moderately-sized geometries. In this paper, we propose a general framework for creating a digital twin o…
▽ More
Accurate simulation of the printing process is essential for improving print quality, reducing waste, and optimizing the printing parameters of extrusion-based additive manufacturing. Traditional additive manufacturing simulations are very compute-intensive and are not scalable to simulate even moderately-sized geometries. In this paper, we propose a general framework for creating a digital twin of the dynamic printing process by performing physics simulations with the intermediate print geometries. Our framework takes a general extrusion-based additive manufacturing G-code, generates an analysis-suitable voxelized geometry representation from the print schedule, and performs physics-based (transient thermal and phase change) simulations of the printing process. Our approach leverages parallel adaptive octree meshes for both voxelated geometry representation as well as for fast simulations to address real-time predictions. We demonstrate the effectiveness of our method by simulating the printing of complex geometries at high voxel resolutions with both sparse and dense infills. Our results show that this approach scales to high voxel resolutions and can predict the transient heat distribution as the print progresses. This work lays the computational and algorithmic foundations for building real-time digital twins and performing rapid virtual print sequence exploration to improve print quality and further reduce material waste.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
REMaQE: Reverse Engineering Math Equations from Executables
Authors:
Meet Udeshi,
Prashanth Krishnamurthy,
Hammond Pearce,
Ramesh Karri,
Farshad Khorrami
Abstract:
Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowl…
▽ More
Cybersecurity attacks on embedded devices for industrial control systems and cyber-physical systems may cause catastrophic physical damage as well as economic loss. This could be achieved by infecting device binaries with malware that modifies the physical characteristics of the system operation. Mitigating such attacks benefits from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical equations of the implemented algorithm. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This paper proposes the REMaQE automated framework for reverse engineering of math equations from binary executables. Improving over state-of-the-art, REMaQE handles equation parameters accessed via registers, the stack, global memory, or pointers, and can reverse engineer object-oriented implementations such as C++ classes. Using REMaQE, we discovered a bug in the Linux kernel thermal monitoring tool "tmon". To evaluate REMaQE, we generate a dataset of 25,096 binaries with math equations implemented in C and Simulink. REMaQE successfully recovers a semantically matching equation for all 25,096 binaries. REMaQE executes in 0.48 seconds on average and in up to 2 seconds for complex equations. Real-time execution enables integration in an interactive math-oriented reverse engineering workflow.
△ Less
Submitted 11 April, 2024; v1 submitted 11 May, 2023;
originally announced May 2023.
-
INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models
Authors:
H S V N S Kowndinya Renduchintala,
Krishnateja Killamsetty,
Sumit Bhatia,
Milan Aggarwal,
Ganesh Ramakrishnan,
Rishabh Iyer,
Balaji Krishnamurthy
Abstract:
A salient characteristic of pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we are witnessing the development of enormous models pushing the state-of-the-art. It is, however, imperative to realize that this inevitably leads to prohibitivel…
▽ More
A salient characteristic of pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we are witnessing the development of enormous models pushing the state-of-the-art. It is, however, imperative to realize that this inevitably leads to prohibitively long training times, extortionate computing costs, and a detrimental environmental impact. Significant efforts are underway to make PTLM training more efficient through innovations in model architectures, training pipelines, and loss function design, with scant attention being paid to optimizing the utility of training data. The key question that we ask is whether it is possible to train PTLMs by employing only highly informative subsets of the training data while maintaining downstream performance? Building upon the recent progress in informative data subset selection, we show how we can employ submodular optimization to select highly representative subsets of the training corpora and demonstrate that the proposed framework can be applied to efficiently train multiple PTLMs (BERT, BioBERT, GPT-2) using only a fraction of data. Further, we perform a rigorous empirical evaluation to show that the resulting models achieve up to $\sim99\%$ of the performance of the fully-trained models. We made our framework publicly available at https://github.com/Efficient-AI/ingenious.
△ Less
Submitted 19 October, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
State Constrained Stochastic Optimal Control for Continuous and Hybrid Dynamical Systems Using DFBSDE
Authors:
Bolun Dai,
Prashanth Krishnamurthy,
Andrew Papanicolaou,
Farshad Khorrami
Abstract:
We develop a computationally efficient learning-based forward-backward stochastic differential equations (FBSDE) controller for both continuous and hybrid dynamical (HD) systems subject to stochastic noise and state constraints. Solutions to stochastic optimal control (SOC) problems satisfy the Hamilton-Jacobi-Bellman (HJB) equation. Using current FBSDE-based solutions, the optimal control can be…
▽ More
We develop a computationally efficient learning-based forward-backward stochastic differential equations (FBSDE) controller for both continuous and hybrid dynamical (HD) systems subject to stochastic noise and state constraints. Solutions to stochastic optimal control (SOC) problems satisfy the Hamilton-Jacobi-Bellman (HJB) equation. Using current FBSDE-based solutions, the optimal control can be obtained from the HJB equations using deep neural networks (e.g., long short-term memory (LSTM) networks). To ensure the learned controller respects the constraint boundaries, we enforce the state constraints using a soft penalty function. In addition to previous works, we adapt the deep FBSDE (DFBSDE) control framework to handle HD systems consisting of continuous dynamics and a deterministic discrete state change. We demonstrate our proposed algorithm in simulation on a continuous nonlinear system (cart-pole) and a hybrid nonlinear system (five-link biped).
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Multiple-stop** time Sequential Detection for Energy Efficient Mining in Blockchain-Enabled IoT
Authors:
Anurag Gupta,
Vikram Krishnamurthy
Abstract:
What are the optimal times for an Internet of Things (IoT) device to act as a blockchain miner? The aim is to minimize the energy consumed by low-power IoT devices that log their data into a secure (tamper-proof) distributed ledger. We formulate a multiple stop** time Bayesian sequential detection problem to address energy-efficient blockchain mining for IoT devices. The objective is to identify…
▽ More
What are the optimal times for an Internet of Things (IoT) device to act as a blockchain miner? The aim is to minimize the energy consumed by low-power IoT devices that log their data into a secure (tamper-proof) distributed ledger. We formulate a multiple stop** time Bayesian sequential detection problem to address energy-efficient blockchain mining for IoT devices. The objective is to identify $L$ optimal stops for mining, thereby maximizing the probability of successfully adding a block to the blockchain; we also present a model to optimize the number of stops (mining instants). The formulation is equivalent to a multiple stop** time POMDP. Since POMDPs are in general computationally intractable to solve, we show mathematically using submodularity arguments that the optimal mining policy has a useful structure: 1) it is monotone in belief space, and 2) it exhibits a threshold structure, which divides the belief space into two connected sets. Exploiting the structural results, we formulate a computationally-efficient linear mining policy for the blockchain-enabled IoT device. We present a policy gradient technique to optimize the parameters of the linear mining policy. Finally, we use synthetic and real Bitcoin datasets to study the performance of our proposed mining policy. We demonstrate the energy efficiency achieved by the optimal linear mining policy in contrast to other heuristic strategies.
△ Less
Submitted 17 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Explaining RL Decisions with Trajectories
Authors:
Shripad Vilasrao Deshmukh,
Arpan Dasgupta,
Balaji Krishnamurthy,
Nan Jiang,
Chirag Agarwal,
Georgios Theocharous,
Jayakumar Subramanian
Abstract:
Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL…
▽ More
Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories). We then attribute policy decisions to a set of trajectories in this encoded space by estimating the sensitivity of the decision with respect to that set. Further, we demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (MuJoCo). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy. Keywords -- Explainable AI, Verifiability of AI Decisions, Explainable RL.
△ Less
Submitted 22 January, 2024; v1 submitted 6 May, 2023;
originally announced May 2023.
-
The mass determination of TOI-519 b: a close-in giant planet transiting a metal-rich mid-M dwarf
Authors:
Taiki Kagetani,
Norio Narita,
Tadahiro Kimura,
Teruyuki Hirano,
Masahiro Ikoma,
Hiroyuki Tako Ishikawa,
Steven Giacalone,
Akihiko Fukui,
Takanori Kodama,
Rebecca Gore,
Ashley Schroeder,
Yasunori Hori,
Kiyoe Kawauchi,
Noriharu Watanabe,
Mayuko Mori,
Yujie Zou,
Kai Ikuta,
Vigneshwaran Krishnamurthy,
Jon Zink,
Kevin Hardegree-Ullman,
Hiroki Harakawa,
Tomoyuki Kudo,
Takayuki Kotani,
Takashi Kurokawa,
Nobuhiko Kusakabe
, et al. (11 additional authors not shown)
Abstract:
We report the mass determination of TOI-519 b, a transiting substellar object around a mid-M dwarf. We carried out radial velocity measurements using Subaru / InfraRed Doppler (IRD), revealing that TOI-519 b is a planet with a mass of $0.463^{+0.082}_{-0.088}~M_{\rm Jup}$. We also find that the host star is metal rich ($\rm [Fe/H] = 0.27 \pm 0.09$ dex) and has the lowest effective temperature (…
▽ More
We report the mass determination of TOI-519 b, a transiting substellar object around a mid-M dwarf. We carried out radial velocity measurements using Subaru / InfraRed Doppler (IRD), revealing that TOI-519 b is a planet with a mass of $0.463^{+0.082}_{-0.088}~M_{\rm Jup}$. We also find that the host star is metal rich ($\rm [Fe/H] = 0.27 \pm 0.09$ dex) and has the lowest effective temperature ($T_{\rm eff}=3322 \pm 49$ K) among all stars hosting known close-in giant planets based on the IRD spectra and mid-resolution infrared spectra obtained with NASA Infrared Telescope Facility / SpeX. The core mass of TOI-519 b inferred from a thermal evolution model ranges from $0$ to $\sim30~M_\oplus$, which can be explained by both the core accretion and disk instability models as the formation origins of this planet. However, TOI-519 is in line with the emerging trend that M dwarfs with close-in giant planets tend to have high metallicity, which may indicate that they formed in the core accretion model. The system is also consistent with the potential trend that close-in giant planets around M dwarfs tend to be less massive than those around FGK dwarfs.
△ Less
Submitted 1 May, 2023; v1 submitted 28 April, 2023;
originally announced April 2023.
-
Statistical Detection of Coordination in a Cognitive Radar Network through Inverse Multi-objective Optimization
Authors:
Luke Snow,
Vikram Krishnamurthy
Abstract:
Consider a target being tracked by a cognitive radar network. If the target can intercept noisy radar emissions, how can it detect coordination in the radar network? By 'coordination' we mean that the radar emissions satisfy Pareto optimality with respect to multi-objective optimization over the objective functions of each radar and a constraint on total network power output. This paper provides a…
▽ More
Consider a target being tracked by a cognitive radar network. If the target can intercept noisy radar emissions, how can it detect coordination in the radar network? By 'coordination' we mean that the radar emissions satisfy Pareto optimality with respect to multi-objective optimization over the objective functions of each radar and a constraint on total network power output. This paper provides a novel inverse multi-objective optimization approach for statistically detecting Pareto optimal ('coordinating') behavior, from a finite dataset of noisy radar emissions. Specifically, we develop necessary and sufficient conditions for radar network emissions to be consistent with multi-objective optimization (coordination), and we provide a statistical detector with theoretical guarantees for determining this consistency when radar emissions are observed in noise. We also provide numerical simulations which validate our approach. Note that while we make use of the specific framework of a radar network coordination problem, our results apply more generally to the field of inverse multi-objective optimization.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics
Authors:
Luke Snow,
Vikram Krishnamurthy
Abstract:
This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve adaptive inverse reinforcement learning (IRL). By passive, we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner) that aims t…
▽ More
This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve adaptive inverse reinforcement learning (IRL). By passive, we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner) that aims to optimize a cost function. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing this cost function nonparametrically from the stationary measure of a Langevin diffusion. Previous work has analyzed the asymptotic performance of this passive algorithm using weak convergence techniques. This paper analyzes the non-asymptotic (finite-sample) performance using a logarithmic-Sobolev inequality and the Otto-Villani Theorem. We obtain finite-sample bounds on the 2-Wasserstein distance between the estimates generated by the PSGLD algorithm and the cost function. Apart from achieving finite-sample guarantees for adaptive IRL, this work extends a line of research in analysis of passive stochastic gradient algorithms to the finite-sample regime for Langevin dynamics.
△ Less
Submitted 27 September, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Safe Navigation and Obstacle Avoidance Using Differentiable Optimization Based Control Barrier Functions
Authors:
Bolun Dai,
Rooholla Khorrambakht,
Prashanth Krishnamurthy,
Vinícius Gonçalves,
Anthony Tzes,
Farshad Khorrami
Abstract:
Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also…
▽ More
Control barrier functions (CBFs) have been widely applied to safety-critical robotic applications. However, the construction of control barrier functions for robotic systems remains a challenging task. Recently, collision detection using differentiable optimization has provided a way to compute the minimum uniform scaling factor that results in an intersection between two convex shapes and to also compute the Jacobian of the scaling factor. In this letter, we propose a framework that uses this scaling factor, with an offset, to systematically define a CBF for obstacle avoidance tasks. We provide theoretical analyses of the continuity and continuous differentiability of the proposed CBF. We empirically evaluate the proposed CBF's behavior and show that the resulting optimal control problem is computationally efficient, which makes it applicable for real-time robotic control. We validate our approach, first using a 2D mobile robot example, then on the Franka-Emika Research 3 (FR3) robot manipulator both in simulation and experiment.
△ Less
Submitted 21 November, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Higher-order Bragg gaps in the electronic band structure of bilayer graphene renormalized by recursive supermoiré potential
Authors:
Mohit Kumar Jat,
Priya Tiwari,
Robin Bajaj,
Ishita Shitut,
Shinjan Mandal,
Kenji Watanabe,
Takashi Taniguchi,
H. R. Krishnamurthy,
Manish Jain,
Aveek Bid
Abstract:
This letter presents our findings on the recursive band gap engineering of chiral fermions in bilayer graphene doubly aligned with hBN. By utilizing two interfering moiré potentials, we generate a supermoiré pattern which renormalizes the electronic bands of the pristine bilayer graphene, resulting in higher-order fractal gaps even at very low energies. These Bragg gaps can be mapped using a uniqu…
▽ More
This letter presents our findings on the recursive band gap engineering of chiral fermions in bilayer graphene doubly aligned with hBN. By utilizing two interfering moiré potentials, we generate a supermoiré pattern which renormalizes the electronic bands of the pristine bilayer graphene, resulting in higher-order fractal gaps even at very low energies. These Bragg gaps can be mapped using a unique linear combination of periodic areas within the system. To validate our findings, we used electronic transport measurements to identify the position of these gaps as functions of the carrier density and establish their agreement with the predicted carrier densities and corresponding quantum numbers obtained using the continuum model. Our work provides direct experimental evidence of the quantization of the area of quasi-Brillouin zones in supermoiré systems. It fills essential gaps in understanding the band structure engineering of Dirac fermions by a recursive doubly periodic superlattice potential.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Who You Play Affects How You Play: Predicting Sports Performance Using Graph Attention Networks With Temporal Convolution
Authors:
Rui Luo,
Vikram Krishnamurthy
Abstract:
This study presents a novel deep learning method, called GATv2-GCN, for predicting player performance in sports. To construct a dynamic player interaction graph, we leverage player statistics and their interactions during gameplay. We use a graph attention network to capture the attention that each player pays to each other, allowing for more accurate modeling of the dynamic player interactions. T…
▽ More
This study presents a novel deep learning method, called GATv2-GCN, for predicting player performance in sports. To construct a dynamic player interaction graph, we leverage player statistics and their interactions during gameplay. We use a graph attention network to capture the attention that each player pays to each other, allowing for more accurate modeling of the dynamic player interactions. To handle the multivariate player statistics time series, we incorporate a temporal convolution layer, which provides the model with temporal predictive power. We evaluate the performance of our model using real-world sports data, demonstrating its effectiveness in predicting player performance. Furthermore, we explore the potential use of our model in a sports betting context, providing insights into profitable strategies that leverage our predictive power. The proposed method has the potential to advance the state-of-the-art in player performance prediction and to provide valuable insights for sports analytics and betting industries.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Parameter Efficient Local Implicit Image Function Network for Face Segmentation
Authors:
Mausoom Sarkar,
Nikitha SR,
Mayur Hemani,
Rishabh Jain,
Balaji Krishnamurthy
Abstract:
Face parsing is defined as the per-pixel labeling of images containing human faces. The labels are defined to identify key facial regions like eyes, lips, nose, hair, etc. In this work, we make use of the structural consistency of the human face to propose a lightweight face-parsing method using a Local Implicit Function network, FP-LIIF. We propose a simple architecture having a convolutional enc…
▽ More
Face parsing is defined as the per-pixel labeling of images containing human faces. The labels are defined to identify key facial regions like eyes, lips, nose, hair, etc. In this work, we make use of the structural consistency of the human face to propose a lightweight face-parsing method using a Local Implicit Function network, FP-LIIF. We propose a simple architecture having a convolutional encoder and a pixel MLP decoder that uses 1/26th number of parameters compared to the state-of-the-art models and yet matches or outperforms state-of-the-art models on multiple datasets, like CelebAMask-HQ and LaPa. We do not use any pretraining, and compared to other works, our network can also generate segmentation at different resolutions without any changes in the input resolution. This work enables the use of facial segmentation on low-compute or low-bandwidth devices because of its higher FPS and smaller model size.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Efficient Symbolic Reasoning for Neural-Network Verification
Authors:
Zi Wang,
Somesh Jha,
Krishnamurthy,
Dvijotham
Abstract:
The neural network has become an integral part of modern software systems. However, they still suffer from various problems, in particular, vulnerability to adversarial attacks. In this work, we present a novel program reasoning framework for neural-network verification, which we refer to as symbolic reasoning. The key components of our framework are the use of the symbolic domain and the quadrati…
▽ More
The neural network has become an integral part of modern software systems. However, they still suffer from various problems, in particular, vulnerability to adversarial attacks. In this work, we present a novel program reasoning framework for neural-network verification, which we refer to as symbolic reasoning. The key components of our framework are the use of the symbolic domain and the quadratic relation. The symbolic domain has very flexible semantics, and the quadratic relation is quite expressive. They allow us to encode many verification problems for neural networks as quadratic programs. Our scheme then relaxes the quadratic programs to semidefinite programs, which can be efficiently solved. This framework allows us to verify various neural-network properties under different scenarios, especially those that appear challenging for non-symbolic domains. Moreover, it introduces new representations and perspectives for the verification tasks. We believe that our framework can bring new theoretical insights and practical tools to verification problems for neural networks.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Human Uncertainty in Concept-Based AI Systems
Authors:
Katherine M. Collins,
Matthew Barker,
Mateo Espinosa Zarlenga,
Naveen Raman,
Umang Bhatt,
Mateja Jamnik,
Ilia Sucholutsky,
Adrian Weller,
Krishnamurthy Dvijotham
Abstract:
Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context of concept-based models, a family of AI systems t…
▽ More
Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context of concept-based models, a family of AI systems that enable human feedback via concept interventions where an expert intervenes on human-interpretable concepts relevant to the task. Prior work in this space often assumes that humans are oracles who are always certain and correct. Yet, real-world decision-making by humans is prone to occasional mistakes and uncertainty. We study how existing concept-based models deal with uncertain interventions from humans using two novel datasets: UMNIST, a visual dataset with controlled simulated uncertainty based on the MNIST dataset, and CUB-S, a relabeling of the popular CUB concept dataset with rich, densely-annotated soft labels from humans. We show that training with uncertain concept labels may help mitigate weaknesses of concept-based systems when handling uncertain interventions. These results allow us to identify several open challenges, which we argue can be tackled through future multidisciplinary research on building interactive uncertainty-aware systems. To facilitate further research, we release a new elicitation platform, UElic, to collect uncertain feedback from humans in collaborative prediction tasks.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Deep Learning Pipeline for Preprocessing and Segmenting Cardiac Magnetic Resonance of Single Ventricle Patients from an Image Registry
Authors:
Tina Yao,
Nicole St. Clair,
Gabriel F. Miller,
Adam L. Dorfman,
Mark A. Fogel,
Sunil Ghelani,
Rajesh Krishnamurthy,
Christopher Z. Lam,
Joshua D. Robinson,
David Schidlow,
Timothy C. Slesnick,
Justin Weigand,
Michael Quail,
Rahul Rathod,
Jennifer A. Steeden,
Vivek Muthurangu
Abstract:
Purpose: To develop and evaluate an end-to-end deep learning pipeline for segmentation and analysis of cardiac magnetic resonance images to provide core-lab processing for a multi-centre registry of Fontan patients.
Materials and Methods: This retrospective study used training (n = 175), validation (n = 25) and testing (n = 50) cardiac magnetic resonance image exams collected from 13 institution…
▽ More
Purpose: To develop and evaluate an end-to-end deep learning pipeline for segmentation and analysis of cardiac magnetic resonance images to provide core-lab processing for a multi-centre registry of Fontan patients.
Materials and Methods: This retrospective study used training (n = 175), validation (n = 25) and testing (n = 50) cardiac magnetic resonance image exams collected from 13 institutions in the UK, US and Canada. The data was used to train and evaluate a pipeline containing three deep-learning models. The pipeline's performance was assessed on the Dice and IoU score between the automated and reference standard manual segmentation. Cardiac function values were calculated from both the automated and manual segmentation and evaluated using Bland-Altman analysis and paired t-tests. The overall pipeline was further evaluated qualitatively on 475 unseen patient exams.
Results: For the 50 testing dataset, the pipeline achieved a median Dice score of 0.91 (0.89-0.94) for end-diastolic volume, 0.86 (0.82-0.89) for end-systolic volume, and 0.74 (0.70-0.77) for myocardial mass. The deep learning-derived end-diastolic volume, end-systolic volume, myocardial mass, stroke volume and ejection fraction had no statistical difference compared to the same values derived from manual segmentation with p values all greater than 0.05. For the 475 unseen patient exams, the pipeline achieved 68% adequate segmentation in both systole and diastole, 26% needed minor adjustments in either systole or diastole, 5% needed major adjustments, and the crop** model only failed in 0.4%.
Conclusion: Deep learning pipeline can provide standardised 'core-lab' segmentation for Fontan patients. This pipeline can now be applied to the >4500 cardiac magnetic resonance exams currently in the FORCE registry as well as any new patients that are recruited.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Fréchet Statistics Based Change Point Detection in Dynamic Social Networks
Authors:
Rui Luo,
Vikram Krishnamurthy
Abstract:
This paper proposes a method to detect change points in dynamic social networks using Fréchet statistics. We address two main questions: (1) what metric can quantify the distances between graph Laplacians in a dynamic network and enable efficient computation, and (2) how can the Fréchet statistics be extended to detect multiple change points while maintaining the significance level of the hypothes…
▽ More
This paper proposes a method to detect change points in dynamic social networks using Fréchet statistics. We address two main questions: (1) what metric can quantify the distances between graph Laplacians in a dynamic network and enable efficient computation, and (2) how can the Fréchet statistics be extended to detect multiple change points while maintaining the significance level of the hypothesis test? Our solution defines a metric space for graph Laplacians using the Log-Euclidean metric, enabling a closed-form formula for Fréchet mean and variance. We present a framework for change point detection using Fréchet statistics and extend it to multiple change points with binary segmentation. The proposed algorithm uses incremental computation for Fréchet mean and variance to improve efficiency and is validated on simulated and two real-world datasets, namely the UCI message dataset and the Enron email dataset.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Mutual Information Measure for Glass Ceiling Effect in Preferential Attachment Models
Authors:
Rui Luo,
Buddhika Nettasinghe,
Vikram Krishnamurthy
Abstract:
We propose a new way to measure inequalities such as the glass ceiling effect in attributed networks. Existing measures typically rely solely on node degree distribution or degree assortativity, but our approach goes beyond these measures by using mutual information (based on Shannon and more generally, Renyi entropy) between the conditional probability distributions of node attributes given node…
▽ More
We propose a new way to measure inequalities such as the glass ceiling effect in attributed networks. Existing measures typically rely solely on node degree distribution or degree assortativity, but our approach goes beyond these measures by using mutual information (based on Shannon and more generally, Renyi entropy) between the conditional probability distributions of node attributes given node degrees of adjacent nodes. We show that this mutual information measure aligns with both the analytical structural inequality model and historical publication data, making it a reliable approach to capture the complexities of attributed networks. Specifically, we demonstrate this through an analysis of citation networks. Moreover, we propose a stochastic optimization algorithm using a parameterized conditional logit model for edge addition, which outperforms a baseline uniform distribution. By recommending links at random using this algorithm, we can mitigate the glass ceiling effect, which is a crucial tool in addressing structural inequalities in networks.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Neural Lyapunov Control for Nonlinear Systems with Unstructured Uncertainties
Authors:
Shiqing Wei,
Prashanth Krishnamurthy,
Farshad Khorrami
Abstract:
Stabilizing controller design and region of attraction (RoA) estimation are essential in nonlinear control. Moreover, it is challenging to implement a control Lyapunov function (CLF) in practice when only partial knowledge of the system is available. We propose a learning framework that can synthesize state-feedback controllers and a CLF for control-affine nonlinear systems with unstructured uncer…
▽ More
Stabilizing controller design and region of attraction (RoA) estimation are essential in nonlinear control. Moreover, it is challenging to implement a control Lyapunov function (CLF) in practice when only partial knowledge of the system is available. We propose a learning framework that can synthesize state-feedback controllers and a CLF for control-affine nonlinear systems with unstructured uncertainties. Based on a regularity condition on these uncertainties, we model them as bounded disturbances and prove that a CLF for the nominal system (estimate of the true system) is an input-to-state stable control Lyapunov function (ISS-CLF) for the true system when the CLF's gradient is bounded. We integrate the robust Lyapunov analysis with the learning of both the control law and CLF. We demonstrate the effectiveness of our learning framework on several examples, such as an inverted pendulum system, a strict-feedback system, and a cart-pole system.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Data-Driven Deep Learning Based Feedback Linearization of Systems with Unknown Dynamics
Authors:
Raktim Gautam Goswami,
Prashanth Krishnamurthy,
Farshad Khorrami
Abstract:
A methodology is developed to learn a feedback linearization (i.e., nonlinear change of coordinates and input transformation) using a data-driven approach for a single input control-affine nonlinear system with unknown dynamics. We employ deep neural networks to learn the feedback law (input transformation) in conjunction with an extension of invertible neural networks to learn the nonlinear chang…
▽ More
A methodology is developed to learn a feedback linearization (i.e., nonlinear change of coordinates and input transformation) using a data-driven approach for a single input control-affine nonlinear system with unknown dynamics. We employ deep neural networks to learn the feedback law (input transformation) in conjunction with an extension of invertible neural networks to learn the nonlinear change of coordinates (state transformation). We also learn the matrices A and B of the transformed linear system and define loss terms to ensure controllability of the pair (A, B). The efficacy of our approach is demonstrated by simulations on several nonlinear systems. Furthermore, we show that state feedback controllers designed using the feedback linearized system yield expected closed-loop behavior when applied to the original nonlinear system, further demonstrating validity of the learned feedback linearization.
△ Less
Submitted 21 May, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Data-Efficient Control Barrier Function Refinement
Authors:
Bolun Dai,
Heming Huang,
Prashanth Krishnamurthy,
Farshad Khorrami
Abstract:
Control barrier functions (CBFs) have been widely used for synthesizing controllers in safety-critical applications. When used as a safety filter, it provides a simple and computationally efficient way to obtain safe controls from a possibly unsafe performance controller. Despite its conceptual simplicity, constructing a valid CBF is well known to be challenging, especially for high-relative degre…
▽ More
Control barrier functions (CBFs) have been widely used for synthesizing controllers in safety-critical applications. When used as a safety filter, it provides a simple and computationally efficient way to obtain safe controls from a possibly unsafe performance controller. Despite its conceptual simplicity, constructing a valid CBF is well known to be challenging, especially for high-relative degree systems under nonconvex constraints. Recently, work has been done to learn a valid CBF from data based on a handcrafted CBF (HCBF). Even though the HCBF gives a good initialization point, it still requires a large amount of data to train the CBF network. In this work, we propose a new method to learn more efficiently from the collected data through a novel prioritized data sampling strategy. A priority score is computed from the loss value of each data point. Then, a probability distribution based on the priority score of the data points is used to sample data and update the learned CBF. Using our proposed approach, we can learn a valid CBF that recovers a larger portion of the true safe set using a smaller amount of data. The effectiveness of our method is demonstrated in simulation on a unicycle and a two-link arm.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Streaming Active Learning with Deep Neural Networks
Authors:
Akanksha Saran,
Safoora Yousefi,
Akshay Krishnamurthy,
John Langford,
Jordan T. Ash
Abstract:
Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are e…
▽ More
Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are encountered. Our approach trades off between uncertainty and diversity of queried samples to match a desired query rate without requiring any hand-tuned hyperparameters. Altogether, we expand the applicability of deep neural networks to realistic active learning scenarios, such as applications relevant to HCI and large, fractured datasets.
△ Less
Submitted 6 June, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
Learning Hidden Markov Models Using Conditional Samples
Authors:
Sham M. Kakade,
Akshay Krishnamurthy,
Gaurav Mahajan,
Cyril Zhang
Abstract:
This paper is concerned with the computational complexity of learning the Hidden Markov Model (HMM). Although HMMs are some of the most widely used tools in sequential and time series modeling, they are cryptographically hard to learn in the standard setting where one has access to i.i.d. samples of observation sequences. In this paper, we depart from this setup and consider an interactive access…
▽ More
This paper is concerned with the computational complexity of learning the Hidden Markov Model (HMM). Although HMMs are some of the most widely used tools in sequential and time series modeling, they are cryptographically hard to learn in the standard setting where one has access to i.i.d. samples of observation sequences. In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs. We show that interactive access to the HMM enables computationally efficient learning algorithms, thereby bypassing cryptographic hardness. Specifically, we obtain efficient algorithms for learning HMMs in two settings:
(a) An easier setting where we have query access to the exact conditional probabilities. Here our algorithm runs in polynomial time and makes polynomially many queries to approximate any HMM in total variation distance.
(b) A harder setting where we can only obtain samples from the conditional distributions. Here the performance of the algorithm depends on a new parameter, called the fidelity of the HMM. We show that this captures cryptographically hard instances and previously known positive results.
We also show that these results extend to a broader class of distributions with latent low rank structure. Our algorithms can be viewed as generalizations and robustifications of Angluin's $L^*$ algorithm for learning deterministic finite automata from membership queries.
△ Less
Submitted 24 February, 2024; v1 submitted 28 February, 2023;
originally announced February 2023.
-
Improving Expert Specialization in Mixture of Experts
Authors:
Yamuna Krishnamurthy,
Chris Watkins,
Thomas Gaertner
Abstract:
Mixture of experts (MoE), introduced over 20 years ago, is the simplest gated modular neural network architecture. There is renewed interest in MoE because the conditional computation allows only parts of the network to be used during each inference, as was recently demonstrated in large scale natural language processing models. MoE is also of potential interest for continual learning, as experts…
▽ More
Mixture of experts (MoE), introduced over 20 years ago, is the simplest gated modular neural network architecture. There is renewed interest in MoE because the conditional computation allows only parts of the network to be used during each inference, as was recently demonstrated in large scale natural language processing models. MoE is also of potential interest for continual learning, as experts may be reused for new tasks, and new experts introduced. The gate in the MoE architecture learns task decompositions and individual experts learn simpler functions appropriate to the gate's decomposition. In this paper: (1) we show that the original MoE architecture and its training method do not guarantee intuitive task decompositions and good expert utilization, indeed they can fail spectacularly even for simple data such as MNIST and FashionMNIST; (2) we introduce a novel gating architecture, similar to attention, that improves performance and results in a lower entropy task decomposition; and (3) we introduce a novel data-driven regularization that improves expert specialization. We empirically validate our methods on MNIST, FashionMNIST and CIFAR-100 datasets.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Statistical Learning under Heterogeneous Distribution Shift
Authors:
Max Simchowitz,
Anurag Ajay,
Pulkit Agrawal,
Akshay Krishnamurthy
Abstract:
This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distri…
▽ More
This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $F$ is "simpler" than $G$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. Our analysis proceeds by demonstrating that ERM behaves qualitatively similarly to orthogonal machine learning: the rate at which ERM recovers the $f$-component of the predictor has only a lower-order dependence on the complexity of the class $G$, adjusted for partial non-indentifiability introduced by the additive structure. These results rely on a novel Hölder style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.
△ Less
Submitted 27 October, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.