-
An Optimization Framework for Processing and Transfer Learning for the Brain Tumor Segmentation
Authors:
Tianyi Ren,
Ethan Honey,
Harshitha Rebala,
Abhishek Sharma,
Agamdeep Chopra,
Mehmet Kurt
Abstract:
Tumor segmentation from multi-modal brain MRI images is a challenging task due to the limited samples, high variance in shapes and uneven distribution of tumor morphology. The performance of automated medical image segmentation has been significant improvement by the recent advances in deep learning. However, the model predictions have not yet reached the desired level for clinical use in terms of…
▽ More
Tumor segmentation from multi-modal brain MRI images is a challenging task due to the limited samples, high variance in shapes and uneven distribution of tumor morphology. The performance of automated medical image segmentation has been significant improvement by the recent advances in deep learning. However, the model predictions have not yet reached the desired level for clinical use in terms of accuracy and generalizability. In order to address the distinct problems presented in Challenges 1, 2, and 3 of BraTS 2023, we have constructed an optimization framework based on a 3D U-Net model for brain tumor segmentation. This framework incorporates a range of techniques, including various pre-processing and post-processing techniques, and transfer learning. On the validation datasets, this multi-modality brain tumor segmentation framework achieves an average lesion-wise Dice score of 0.79, 0.72, 0.74 on Challenges 1, 2, 3 respectively.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
BaMn$_2$P$_2$: Highest magnetic ordering temperature 122-pnictide compound
Authors:
B. S. Jacobs,
Abhishek Pandey
Abstract:
We report the growth of high-quality single crystals of ThCr$_2$Si$_2$-type tetragonal BaMn$_2$P$_2$ and investigation of its structural, electrical transport, thermal and magnetic properties. Our results of basal plane electrical resistivity and heat capacity measurements show that the compound has an insulating ground state with a small band gap. Anisotropic susceptibility $χ_{ab,c}(T)$ data inf…
▽ More
We report the growth of high-quality single crystals of ThCr$_2$Si$_2$-type tetragonal BaMn$_2$P$_2$ and investigation of its structural, electrical transport, thermal and magnetic properties. Our results of basal plane electrical resistivity and heat capacity measurements show that the compound has an insulating ground state with a small band gap. Anisotropic susceptibility $χ_{ab,c}(T)$ data infer a collinear local-moment Néel-type antiferromagnetic (AFM) ground state below the ordering temperature $T_{\rm N} = 795(15)$~K, which is highest among all the ThCr$_2$Si$_2$- and CaAl$_2$Si$_2$-type 122-pnictide compounds reported so far suggesting that the strength of magnetic exchange interactions is strongest in this material. The magnetic transition temperatures of BaMn$_2$$Pn_{2}$ ($Pn$ = P, As, Sb, Bi) compounds exhibit a monotonic decrease with the increase of tetragonal unit cell parameters $a$ and $c$, suggesting a strong dependence of the strength of the decisive magnetic exchange interactions on the separation between the localized spins residing on the Mn-ions. The observed monotonic increase of both $χ_{ab}$ and $χ_{c}$ for $T > T_{\rm N}$ suggests that short-range dynamic quasi-two dimensional AFM correlations persist above the $T_{\rm N}$ up to the highest temperature of the measurements. The large $T_{\rm N}$ of BaMn$_2$P$_2$ demands for systematic hole-do** studies on this material as similar investigations on related BaMn$_2$As$_{2}$ with $T_{\rm N} = 618$~K have led to the discovery of an outstanding ground state where AFM of localized Mn-spins and itinerant half-metallic ferromagnetism with $T_{\rm c} \approx 100$~K originating from the doped holes coexist together.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
A plastic correction algorithm for full-field elasto-plastic finite element simulations : critical assessment of predictive capabilities and improvement by machine learning
Authors:
Abhishek Palchoudhary,
Simone Peter,
Vincent Maurel,
Cristian Ovalle,
Pierre Kerfriden
Abstract:
This paper introduces a new local plastic correction algorithm developed to accelerate finite element simulations for structures with elasto-plastic constitutive laws. The proposed method belongs to the category of generalized multiaxial Neuber-type methods enabled by pointwise proportional evolution rules. The algorithm numerically integrates J2 plasticity laws as a function of the finite element…
▽ More
This paper introduces a new local plastic correction algorithm developed to accelerate finite element simulations for structures with elasto-plastic constitutive laws. The proposed method belongs to the category of generalized multiaxial Neuber-type methods enabled by pointwise proportional evolution rules. The algorithm numerically integrates J2 plasticity laws as a function of the finite element elastic response of the structure, to obtain full-field 3D elasto-plastic quantities for any proportionally applied loading. Examples of the numerical capabilities of this algorithm are shown on a structure containing a distribution of pores, for monotonic and fatigue loading. The approximation errors due to the proposed local plastic correction are also investigated. As a second point of innovation, we show that the proposed local plastic correction can be accelerated when dealing with large-scale structures by employing a simple meta-model, with virtually no added errors. Finally, we develop and investigate the merits of an additional deep-learning-based corrective layer to reduce approximations errors on a subset of structures for which full elasto-plastic FE simulations are performed, the solutions of which are subsequently used as training set for a Convolutional Neural Network algorithm designed to learn the error between full FE and plastic correction approximations.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Trustful Coopetitive Infrastructures for the New Space Exploration Era
Authors:
Renan Lima Baima,
Loïck Chovet,
Eduard Hartwich,
Abhishek Bera,
Johannes Sedlmeir,
Gilbert Fridgen,
Miguel Angel Olivares-Mendez
Abstract:
In the new space economy, space agencies, large enterprises, and start-ups aim to launch space multi-robot systems (MRS) for various in-situ resource utilization (ISRU) purposes, such as map**, soil evaluation, and utility provisioning. However, these stakeholders' competing economic interests may hinder effective collaboration on a centralized digital platform. To address this issue, neutral an…
▽ More
In the new space economy, space agencies, large enterprises, and start-ups aim to launch space multi-robot systems (MRS) for various in-situ resource utilization (ISRU) purposes, such as map**, soil evaluation, and utility provisioning. However, these stakeholders' competing economic interests may hinder effective collaboration on a centralized digital platform. To address this issue, neutral and transparent infrastructures could facilitate coordination and value exchange among heterogeneous space MRS. While related work has expressed legitimate concerns about the technical challenges associated with blockchain use in space, we argue that weighing its potential economic benefits against its drawbacks is necessary. This paper presents a novel architectural framework and a comprehensive set of requirements for integrating blockchain technology in MRS, aiming to enhance coordination and data integrity in space exploration missions. We explored distributed ledger technology (DLT) to design a non-proprietary architecture for heterogeneous MRS and validated the prototype in a simulated lunar environment. The analyses of our implementation suggest global ISRU efficiency improvements for map exploration, compared to a corresponding group of individually acting robots, and that fostering a coopetitive environment may provide additional revenue opportunities for stakeholders.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Efficient Stagewise Pretraining via Progressive Subnetworks
Authors:
Abhishek Panigrahi,
Nikunj Saunshi,
Kaifeng Lyu,
Sobhan Miryoosefi,
Sashank Reddi,
Satyen Kale,
Sanjiv Kumar
Abstract:
Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. gradual stacking (Reddi et al., 2023)). While the resource and wall-time savings are appealing, it has limitations, particularly the inability to evalu…
▽ More
Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. gradual stacking (Reddi et al., 2023)). While the resource and wall-time savings are appealing, it has limitations, particularly the inability to evaluate the full model during earlier stages, and degradation in model quality due to smaller model capacity in the initial stages. In this work, we propose an alternative framework, progressive subnetwork training, that maintains the full model throughout training, but only trains subnetworks within the model in each step. We focus on a simple instantiation of this framework, Random Path Training (RaPTr) that only trains a sub-path of layers in each step, progressively increasing the path lengths in stages. RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods. Furthermore, RaPTr shows better downstream performance on UL2, improving QA tasks and SuperGLUE by 1-5% compared to standard training and stacking. Finally, we provide a theoretical basis for RaPTr to justify (a) the increasing complexity of subnetworks in stages, and (b) the stability in loss across stage transitions due to residual connections and layer norm.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Random Methods for Variational Inequalities
Authors:
Abhishek Chakraborty,
Angelia Nedić
Abstract:
This paper considers a variational inequality (VI) problem arising from a game among multiple agents, where each agent aims to minimize its own cost function subject to its constrained set represented as the intersection of a (possibly infinite) number of convex functional level sets. A direct projection-based approach or Lagrangian-based techniques for such a problem can be computationally expens…
▽ More
This paper considers a variational inequality (VI) problem arising from a game among multiple agents, where each agent aims to minimize its own cost function subject to its constrained set represented as the intersection of a (possibly infinite) number of convex functional level sets. A direct projection-based approach or Lagrangian-based techniques for such a problem can be computationally expensive if not impossible to implement. To deal with the problem, we consider randomized methods that avoid the projection step on the whole constraint set by employing random feasibility updates. In particular, we propose and analyze such random methods for solving VIs based on the projection method, Korpelevich method, and Popov method. We establish the almost sure convergence of the methods and, also, provide their convergence rate guarantees. We illustrate the performance of the methods in simulations for two-agent games.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
On the Effect of Image Resolution on Semantic Segmentation
Authors:
Ritambhara Singh,
Abhishek Jain,
Pietro Perona,
Shivani Agarwal,
Junfeng Yang
Abstract:
High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model ca…
▽ More
High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model capable of directly producing high-resolution segmentations can match the performance of more complex systems that generate lower-resolution results. By simplifying the network architecture, we enable the processing of images at their native resolution. Our approach leverages a bottom-up information propagation technique across various scales, which we have empirically shown to enhance segmentation accuracy. We have rigorously tested our method using leading-edge semantic segmentation datasets. Specifically, for the Cityscapes dataset, we further boost accuracy by applying the Noisy Student Training technique.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents
Authors:
Yash Shukla,
Tanushree Burman,
Abhishek Kulkarni,
Robert Wright,
Alvaro Velasquez,
Jivko Sinapov
Abstract:
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the lea…
▽ More
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.
△ Less
Submitted 2 April, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Ultrafast Nuclear Dynamics in Double-Core Ionized Water Molecules
Authors:
Iyas Ismail,
Ludger Inhester,
Tatiana Marchenko,
Florian Trinter,
Abhishek Verma,
Alberto De Fanis,
Anthony Ferte,
Daniel E. Rivas,
Dawei Peng,
Dimitris Koulentianos,
Edwin Kukk,
Francis Penent,
Gilles Doumy,
Giuseppe Sansone,
John D. Bozek,
Kai Li,
Linda Young,
Markus Ilchen,
Maria Novella Piancastelli,
Michael Meyer,
Nicolas Velasquez,
Oksana Travnikova,
Rebecca Boll,
Renaud Guillemin,
Reinhard Dorner
, et al. (8 additional authors not shown)
Abstract:
Double-core-hole (DCH) states in isolated water and heavy water molecules, resulting from the sequential absorption of two x-ray photons, have been investigated. A comparison of the subsequent Auger emission spectra from the two isotopes provides direct evidence of ultrafast nuclear motion during the 1.5 fs lifetime of these DCH states. Our numerical results align well with the experimental data,…
▽ More
Double-core-hole (DCH) states in isolated water and heavy water molecules, resulting from the sequential absorption of two x-ray photons, have been investigated. A comparison of the subsequent Auger emission spectra from the two isotopes provides direct evidence of ultrafast nuclear motion during the 1.5 fs lifetime of these DCH states. Our numerical results align well with the experimental data, providing for various DCH states an in-depth study of the dynamics responsible of the observed isotope effect.
△ Less
Submitted 11 March, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Multi-Agent Reinforcement Learning for Offloading Cellular Communications with Cooperating UAVs
Authors:
Abhishek Mondal,
Deepak Mishra,
Ganesh Prasad,
George C. Alexandropoulos,
Azzam Alnahari,
Riku Jantti
Abstract:
Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an…
▽ More
Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an alternative means to offload data traffic from terrestrial BSs, serving as additional access points. This paper introduces a novel approach to efficiently maximize the utilization of multiple UAVs for data traffic offloading from terrestrial BSs. Specifically, the focus is on maximizing user association with UAVs by jointly optimizing UAV trajectories and users association indicators under quality of service constraints. Since, the formulated UAVs control problem is nonconvex and combinatorial, this study leverages the multi agent reinforcement learning framework. In this framework, each UAV acts as an independent agent, aiming to maintain inter UAV cooperative behavior. The proposed approach utilizes the finite state Markov decision process to account for UAVs velocity constraints and the relationship between their trajectories and state space. A low complexity distributed state action reward state action algorithm is presented to determine UAVs optimal sequential decision making policies over training episodes. The extensive simulation results validate the proposed analysis and offer valuable insights into the optimal UAV trajectories. The derived trajectories demonstrate superior average UAV association performance compared to benchmark techniques such as Q learning and particle swarm optimization.
△ Less
Submitted 31 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
ClipFormer: Key-Value Clip** of Transformers on Memristive Crossbars for Write Noise Mitigation
Authors:
Abhiroop Bhattacharjee,
Abhishek Moitra,
Priyadarshini Panda
Abstract:
Transformers have revolutionized various real-world applications from natural language processing to computer vision. However, traditional von-Neumann computing paradigm faces memory and bandwidth limitations in accelerating transformers owing to their massive model sizes. To this end, In-memory Computing (IMC) crossbars based on Non-volatile Memories (NVMs), due to their ability to perform highly…
▽ More
Transformers have revolutionized various real-world applications from natural language processing to computer vision. However, traditional von-Neumann computing paradigm faces memory and bandwidth limitations in accelerating transformers owing to their massive model sizes. To this end, In-memory Computing (IMC) crossbars based on Non-volatile Memories (NVMs), due to their ability to perform highly parallelized Matrix-Vector-Multiplications (MVMs) with high energy-efficiencies, have emerged as a promising solution for accelerating transformers. However, analog MVM operations in crossbars introduce non-idealities, such as stochastic read & write noise, which affect the inference accuracy of the deployed transformers. Specifically, we find pre-trained Vision Transformers (ViTs) to be vulnerable on crossbars due to the impact of write noise on the dynamically-generated Key (K) and Value (V) matrices in the attention layers, an effect not accounted for in prior studies. We, thus, propose ClipFormer, a transformation on the K and V matrices during inference, to boost the non-ideal accuracies of pre-trained ViT models. ClipFormer requires no additional hardware and training overhead and is amenable to transformers deployed on any memristive crossbar platform. Our experiments on Imagenet-1k dataset using pre-trained DeiT-S transformers, subjected to standard training and variation-aware-training, show >10-40% higher non-ideal accuracies at the high write noise regime by applying ClipFormer.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Improved Upper Bound for the Size of a Trifferent Code
Authors:
Siddharth Bhandari,
Abhishek Khetan
Abstract:
A subset $\mathcal{C}\subseteq\{0,1,2\}^n$ is said to be a $\textit{trifferent}$ code (of block length $n$) if for every three distinct codewords $x,y, z \in \mathcal{C}$, there is a coordinate $i\in \{1,2,\ldots,n\}$ where they all differ, that is, $\{x(i),y(i),z(i)\}$ is same as $\{0,1,2\}$. Let $T(n)$ denote the size of the largest trifferent code of block length $n$. Understanding the asymptot…
▽ More
A subset $\mathcal{C}\subseteq\{0,1,2\}^n$ is said to be a $\textit{trifferent}$ code (of block length $n$) if for every three distinct codewords $x,y, z \in \mathcal{C}$, there is a coordinate $i\in \{1,2,\ldots,n\}$ where they all differ, that is, $\{x(i),y(i),z(i)\}$ is same as $\{0,1,2\}$. Let $T(n)$ denote the size of the largest trifferent code of block length $n$. Understanding the asymptotic behavior of $T(n)$ is closely related to determining the zero-error capacity of the $(3/2)$-channel defined by Elias'88, and is a long-standing open problem in the area. Elias had shown that $T(n)\leq 2\times (3/2)^n$ and prior to our work the best upper bound was $T(n)\leq 0.6937 \times (3/2)^n$ due to Kurz'23. We improve this bound to $T(n)\leq c \times n^{-2/5}\times (3/2)^n$ where $c$ is an absolute constant.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Solution of the Probabilistic Lambert's Problem: Optimal Transport Approach
Authors:
Alexis M. H. Teter,
Iman Nodozi,
Abhishek Halder
Abstract:
The deterministic variant of the Lambert's problem was posed by Lambert in the 18th century and its solution for conic trajectory has been derived by many, including Euler, Lambert, Lagrange, Laplace, Gauss and Legendre. The solution amounts to designing velocity control for steering a spacecraft from a given initial to a given terminal position subject to gravitational potential and flight time c…
▽ More
The deterministic variant of the Lambert's problem was posed by Lambert in the 18th century and its solution for conic trajectory has been derived by many, including Euler, Lambert, Lagrange, Laplace, Gauss and Legendre. The solution amounts to designing velocity control for steering a spacecraft from a given initial to a given terminal position subject to gravitational potential and flight time constraints. In recent years, a probabilistic variant of the Lambert's problem has received attention in the aerospace community where the endpoint position constraints are softened to endpoint joint probability distributions over the respective positions. Such probabilistic specifications account for the estimation errors, modeling uncertainties, etc. Building on a deterministic optimal control reformulation via analytical mechanics, we show that the probabilistic Lambert's problem is a generalized dynamic optimal mass transport problem where the gravitational potential plays the role of an additive state cost. This allows us to rigorously prove the existence-uniqueness of the solution for the probabilistic Lambert problem both with and without process noise. In the latter case, the problem and its solution correspond to a generalized Schrödinger bridge, much like how classical Schrodinger bridge can be seen as stochastic regularization of the optimal mass transport. We deduce the large deviation principle enjoyed by the Lambertian Schrödinger bridge. Leveraging these newfound connections, we design a computational algorithm to illustrate the nonparametric numerical solution of the probabilistic Lambert's problem.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Nano-ironing van der Waals Heterostructures Towards Electrically Controlled Quantum Dots
Authors:
Teymour Talha-Dean,
Yaoju Tarn,
Subhrajit Mukherjee,
John Wellington John,
Ding Huang,
Ivan A. Verzhbitskiy,
Dasari Venkatakrishnarao,
Sarthak Das,
Rainer Lee,
Abhishek Mishra,
Shuhua Wang,
Yee Sin Ang,
Kuan Eng Johnson Goh,
Chit Siong Lau
Abstract:
Assembling two-dimensional van der Waals layered materials into heterostructures is an exciting development that sparked the discovery of rich correlated electronic phenomena and offers possibilities for designer device applications. However, resist residue from fabrication processes is a major limitation. Resulting disordered interfaces degrade device performance and mask underlying transport phy…
▽ More
Assembling two-dimensional van der Waals layered materials into heterostructures is an exciting development that sparked the discovery of rich correlated electronic phenomena and offers possibilities for designer device applications. However, resist residue from fabrication processes is a major limitation. Resulting disordered interfaces degrade device performance and mask underlying transport physics. Conventional cleaning processes are inefficient and can cause material and device damage. Here, we show that thermal scanning probe based cleaning can effectively eliminate resist residue to recover pristine material surfaces. Our technique is compatible at both the material- and device-level, and we demonstrate the significant improvement in the electrical performance of 2D WS2 transistors. We also demonstrate the cleaning of van der Waals heterostructures to achieve interfaces with low disorder. This enables the electrical formation and control of quantum dots that can be tuned from macroscopic current flow to the single-electron tunnelling regime. Such material processing advances are crucial for constructing high-quality vdW heterostructures that are important platforms for fundamental studies and building blocks for quantum and nano-electronics applications.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Score-based Causal Representation Learning: Linear and General Transformations
Authors:
Burak Varıcı,
Emre Acartürk,
Karthikeyan Shanmugam,
Abhishek Kumar,
Ali Tajer
Abstract:
This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic con…
▽ More
This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure recovering the true latent causal variables and the latent causal graph underlying them. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between score functions (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a score-based class of algorithms that ensures both identifiability and achievability. First, the paper focuses on linear transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to ancestors for general causal models and perfect latent graph recovery for sufficiently non-linear causal models. Secondly, it focuses on general transformations and shows that two stochastic hard interventions per node suffice for identifiability. Notably, one does not need to know which pair of interventional environments have the same node intervened.
△ Less
Submitted 26 February, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Spherical maximal functions and Hardy spaces for Fourier integral operators
Authors:
Abhishek Ghosh,
Naijia Liu,
Jan Rozendaal,
Liang Song
Abstract:
We use the Hardy spaces for Fourier integral operators to obtain bounds for spherical maximal functions in $L^{p}(\mathbb{R}^{n})$, $n\geq2$, where the radii of the spheres are restricted to a compact subset of $(0,\infty)$. These bounds extend to general hypersurfaces with non-vanishing Gaussian curvature, to the complex spherical means, and to geodesic spheres on compact manifolds. We also obtai…
▽ More
We use the Hardy spaces for Fourier integral operators to obtain bounds for spherical maximal functions in $L^{p}(\mathbb{R}^{n})$, $n\geq2$, where the radii of the spheres are restricted to a compact subset of $(0,\infty)$. These bounds extend to general hypersurfaces with non-vanishing Gaussian curvature, to the complex spherical means, and to geodesic spheres on compact manifolds. We also obtain improved maximal function bounds and pointwise convergence statements for wave equations, both on $\mathbb{R}^{n}$ and on compact manifolds. The maximal function bounds are essentially sharp for all $p\in[1,2]\cup [\frac{2(n+1)}{n-1},\infty]$, for each such hypersurface, every complex spherical mean, and on every manifold.
△ Less
Submitted 29 March, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
Authors:
Jianlan Luo,
Zheyuan Hu,
Charles Xu,
You Liang Tan,
Jacob Berg,
Archit Sharma,
Stefan Schaal,
Chelsea Finn,
Abhishek Gupta,
Sergey Levine
Abstract:
In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementati…
▽ More
In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at https://serl-robot.github.io/
△ Less
Submitted 12 February, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Fermions, quantum gravity and holography in two dimensions
Authors:
Muhammad Asaduzzaman,
Simon Catterall,
Abhishek Samlodia
Abstract:
We study a model comprising $N$ flavors of Kähler Dirac fermion propagating on a triangulated two dimensional disk which is constrained to have a negative average bulk curvature. Dirichlet boundary conditions are chosen for the fermions. Quantum fluctuations of the geometry are included by summing over all possible triangulations consistent with these constraints. We show in the limit…
▽ More
We study a model comprising $N$ flavors of Kähler Dirac fermion propagating on a triangulated two dimensional disk which is constrained to have a negative average bulk curvature. Dirichlet boundary conditions are chosen for the fermions. Quantum fluctuations of the geometry are included by summing over all possible triangulations consistent with these constraints. We show in the limit $N\to \infty$ that the partition function is dominated by a regular triangulation of two dimensional hyperbolic space. We use strong coupling expansions and Monte Carlo simulation to show that in this limit boundary correlators of the fermions have a power law dependence on boundary separation as one expects from holography. However we argue that this behavior breaks down for any finite number of massive fields in the thermodynamic limit and quantum fluctuations of the bulk geometry drive the theory into a non-holographic phase. In contrast, for massless fermions we find evidence that the boundary is conformal even for finite $N$. This is consistent with theoretical results in quantum Liouville theory.
△ Less
Submitted 29 February, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Reversibility and algebraic characterization of the quaternionic Möbius transformations
Authors:
Krishnendu Gongopadhyay,
Tejbir Lohan,
Abhishek Mukherjee
Abstract:
Let $\mathrm {SL}(2, \mathbb{H})$ be the group of $2 \times 2$ quaternionic matrices with quaternionic determinant $1$. The group $\mathrm {SL}(2, \mathbb{H})$ acts on the four-dimensional sphere $\widehat {\mathbb{H}}=\mathbb{H} \cup \{\infty\}$ by the (orientation-preserving) quaternionic Möbius transformations: $$A=\begin{pmatrix} a & b \\ c & d \end{pmatrix}: z \mapsto (az+b)(cz+d)^{-1}.$$ Usi…
▽ More
Let $\mathrm {SL}(2, \mathbb{H})$ be the group of $2 \times 2$ quaternionic matrices with quaternionic determinant $1$. The group $\mathrm {SL}(2, \mathbb{H})$ acts on the four-dimensional sphere $\widehat {\mathbb{H}}=\mathbb{H} \cup \{\infty\}$ by the (orientation-preserving) quaternionic Möbius transformations: $$A=\begin{pmatrix} a & b \\ c & d \end{pmatrix}: z \mapsto (az+b)(cz+d)^{-1}.$$ Using this action, the Möbius group may be identified with $\mathrm {PSL}(2, \mathbb{H})$. A quaternionic Möbius transformation $[A]$ is reversible if it is conjugate to its inverse in $\mathrm {PSL}(2, \mathbb{H})$. Given an element $A$ in $\mathrm {SL}(2, \mathbb{H})$, we provide a characterization in terms of the conjugacy invariants of $A$ that recognizes whether or not $[A]$ is reversible. Equivalently, this characterization detects whether $[A]$ is a product of two involutions or not. Further, we revisit the algebraic characterization of the quaternionic Möbius transformations.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Backscatter Measurements and Models for RF Sensing Applications in Cluttered Environments
Authors:
Dmitry Chizhik,
**feng Du,
Jakub Sapis,
Reinaldo A. Valenzuela,
Abhishek Adhikari,
Gil Zussman,
Manuel A. Almendra,
Mauricio Rodriguez,
Rodolfo Feick
Abstract:
A statistical backscatter channel model for indoor clutter is developed for indoor RF sensing applications based on measurements. A narrowband 28 GHz sounder used a quazi-monostatic radar arrangement with an omnidirectional transmit antenna illuminating an indoor scene and a spinning horn receive antenna less than 1 m away collecting backscattered power as a function of azimuth. Median average bac…
▽ More
A statistical backscatter channel model for indoor clutter is developed for indoor RF sensing applications based on measurements. A narrowband 28 GHz sounder used a quazi-monostatic radar arrangement with an omnidirectional transmit antenna illuminating an indoor scene and a spinning horn receive antenna less than 1 m away collecting backscattered power as a function of azimuth. Median average backscatter power was found to vary over a 12 dB range, with average power generally decreasing with increasing room size. A deterministic model of average backscattered power dependent on distance to nearest wall and wall reflection coefficient reproduces observations with 4.0 dB RMS error. Distribution of power variation in azimuth around this average is reproduced within 1 dB by a random azimuth spectrum with a lognormal amplitude distribution and uniformly random phase. The model is extended to provide power distribution over both azimuth and delay (conveying range to scatterer) by combining azimuthal distribution with published results on power delay profiles in reverberant environments. The statistical model does not require a detailed room layout description, aiming to reproduce backscatter clutter statistics, as opposed to a deterministic response.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Scanning Tunneling Microscopy for Molecules: Effects of Electron Propagation into Vacuum
Authors:
Abhishek Grewal,
Christopher C. Leon,
Klaus Kuhnke,
Klaus Kern,
Olle Gunnarsson
Abstract:
Using scanning tunneling microscopy (STM), we experimentally and theoretically investigate isolated platinum phthalocyanine (PtPc) molecules adsorbed on atomically thin NaCl(100) vapor deposited on Au(111). We obtain good agreement between theory and constant-height STM topography. We examine why strong distortions of STM images occur as a function of distance between molecule and STM tip. The ima…
▽ More
Using scanning tunneling microscopy (STM), we experimentally and theoretically investigate isolated platinum phthalocyanine (PtPc) molecules adsorbed on atomically thin NaCl(100) vapor deposited on Au(111). We obtain good agreement between theory and constant-height STM topography. We examine why strong distortions of STM images occur as a function of distance between molecule and STM tip. The images of the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) exhibit, for increasing distance, significant radial expansion due to electron propagation in the vacuum. Additionally, the imaged angular dependence is substantially distorted. The LUMO image has substantial intensity along the molecular diagonals where PtPc has no atoms. In the electronic transport gap the image differs drastically from HOMO and LUMO, even at energies very close to these orbitals. As the tunneling becomes increasingly off-resonant, the eight angular lobes of the HOMO or of the degenerate LUMOs diminish and reveal four lobes with maxima along the molecular axes, where both, HOMO and LUMO have little or no weight. These images are strongly influenced by low-lying PtPc orbitals that have simple angular structures.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Omnipredictors for Regression and the Approximate Rank of Convex Functions
Authors:
Parikshit Gopalan,
Princewill Okoroafor,
Prasad Raghavendra,
Abhishek Shetty,
Mihir Singhal
Abstract:
Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of…
▽ More
Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions.
Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $ε$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/ε)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
Authors:
Kumar Abhishek,
Aditi Jain,
Ghassan Hamarneh
Abstract:
The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as t…
▽ More
The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition. In this paper, we conduct meticulous analyses of two popular dermatological image datasets: DermaMNIST and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our analysis pipeline and the accompanying code publicly available, we aim to encourage similar explorations and to facilitate the identification and addressing of potential data quality issues in other large datasets.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset Generalization
Authors:
Daniel Lichy,
Hang Su,
Abhishek Badki,
Jan Kautz,
Orazio Gallo
Abstract:
Wide field-of-view (FoV) cameras efficiently capture large portions of the scene, which makes them attractive in multiple domains, such as automotive and robotics. For such applications, estimating depth from multiple images is a critical task, and therefore, a large amount of ground truth (GT) data is available. Unfortunately, most of the GT data is for pinhole cameras, making it impossible to pr…
▽ More
Wide field-of-view (FoV) cameras efficiently capture large portions of the scene, which makes them attractive in multiple domains, such as automotive and robotics. For such applications, estimating depth from multiple images is a critical task, and therefore, a large amount of ground truth (GT) data is available. Unfortunately, most of the GT data is for pinhole cameras, making it impossible to properly train depth estimation models for large-FoV cameras. We propose the first method to train a stereo depth estimation model on the widely available pinhole data, and to generalize it to data captured with larger FoVs. Our intuition is simple: We warp the training data to a canonical, large-FoV representation and augment it to allow a single network to reason about diverse types of distortions that otherwise would prevent generalization. We show strong generalization ability of our approach on both indoor and outdoor datasets, which was not possible with previous methods.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Flaring Stars in a Non-targeted mm-wave Survey with SPT-3G
Authors:
C. Tandoi,
S. Guns,
A. Foster,
P. A. R. Ade,
A. J. Anderson,
B. Ansarinejad,
M. Archipley,
L. Balkenhol,
K. Benabed,
A. N. Bender,
B. A. Benson,
F. Bianchini,
L. E. Bleem,
F. R. Bouchet,
L. Bryant,
E. Camphuis,
J. E. Carlstrom,
T. W. Cecil,
C. L. Chang,
P. Chaubal,
P. M. Chichura,
T. -L. Chou,
A. Coerver,
T. M. Crawford,
A. Cukierman
, et al. (74 additional authors not shown)
Abstract:
We present a flare star catalog from four years of non-targeted millimeter-wave survey data from the South Pole Telescope (SPT). The data were taken with the SPT-3G camera and cover a 1500-square-degree region of the sky from $20^{h}40^{m}0^{s}$ to $3^{h}20^{m}0^{s}$ in right ascension and $-42^{\circ}$ to $-70^{\circ}$ in declination. This region was observed on a nearly daily cadence from 2019-2…
▽ More
We present a flare star catalog from four years of non-targeted millimeter-wave survey data from the South Pole Telescope (SPT). The data were taken with the SPT-3G camera and cover a 1500-square-degree region of the sky from $20^{h}40^{m}0^{s}$ to $3^{h}20^{m}0^{s}$ in right ascension and $-42^{\circ}$ to $-70^{\circ}$ in declination. This region was observed on a nearly daily cadence from 2019-2022 and chosen to avoid the plane of the galaxy. A short-duration transient search of this survey yields 111 flaring events from 66 stars, increasing the number of both flaring events and detected flare stars by an order of magnitude from the previous SPT-3G data release. We provide cross-matching to Gaia DR3, as well as matches to X-ray point sources found in the second ROSAT all-sky survey. We have detected flaring stars across the main sequence, from early-type A stars to M dwarfs, as well as a large population of evolved stars. These stars are mostly nearby, spanning 10 to 1000 parsecs in distance. Most of the flare spectral indices are constant or gently rising as a function of frequency at 95/150/220 GHz. The timescale of these events can range from minutes to hours, and the peak $νL_ν$ luminosities range from $10^{27}$ to $10^{31}$ erg s$^{-1}$ in the SPT-3G frequency bands.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
A proof theory of right-linear (omega-)grammars via cyclic proofs
Authors:
Anupam Das,
Abhishek De
Abstract:
Right-linear (or left-linear) grammars are a well-known class of context-free grammars computing just the regular languages. They may naturally be written as expressions with (least) fixed points but with products restricted to letters as left arguments, giving an alternative to the syntax of regular expressions. In this work, we investigate the resulting logical theory of this syntax. Namely, we…
▽ More
Right-linear (or left-linear) grammars are a well-known class of context-free grammars computing just the regular languages. They may naturally be written as expressions with (least) fixed points but with products restricted to letters as left arguments, giving an alternative to the syntax of regular expressions. In this work, we investigate the resulting logical theory of this syntax. Namely, we propose a theory of right-linear algebras (RLA) over of this syntax and a cyclic proof system CRLA for reasoning about them.
We show that CRLA is sound and complete for the intended model of regular languages. From here we recover the same completeness result for RLA by extracting inductive invariants from cyclic proofs, rendering the model of regular languages the free right-linear algebra.
Finally, we extend system CRLA by greatest fixed points, nuCRLA, naturally modelled by languages of omega-words thanks to right-linearity. We show a similar soundness and completeness result of (the guarded fragment of) nuCRLA for the model of omega-regular languages, employing game theoretic techniques.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Free Form Medical Visual Question Answering in Radiology
Authors:
Abhishek Narayanan,
Rushabh Musthyala,
Rahul Sankar,
Anirudh Prasad Nistala,
Pranav Singh,
Jacopo Cirrone
Abstract:
Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and…
▽ More
Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and the joint learning of multimodal representations, surpassing existing methods. We innovatively augment the SLAKE dataset, enabling our model to respond to a more diverse array of questions, not limited to the immediate content of radiology or pathology images. Our model achieves a top-1 accuracy of 79.55\% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models. This research not only advances medical VQA but also opens avenues for practical applications in diagnostic settings.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
A new "temperature inversion" estimator to detect CMB patchy screening by large-scale structure
Authors:
Theo Schutt,
Abhishek S. Maniyar,
Emmanuel Schaan,
William R. Coulton,
Nishant Mishra
Abstract:
Thomson scattering of cosmic microwave background (CMB) photons imprints various properties of the baryons around galaxies on the CMB. One such imprint, called patchy screening, is a direct probe of the gas density profile around galaxies. It usefully complements the information from the kinematic and thermal Sunyaev-Zel'dovich effects and does not require individual redshifts. In this paper, we d…
▽ More
Thomson scattering of cosmic microwave background (CMB) photons imprints various properties of the baryons around galaxies on the CMB. One such imprint, called patchy screening, is a direct probe of the gas density profile around galaxies. It usefully complements the information from the kinematic and thermal Sunyaev-Zel'dovich effects and does not require individual redshifts. In this paper, we derive new estimators of patchy screening called the "temperature inversion" (TI) and "signed" estimators, analogous to the gradient inversion estimator of CMB lensing. Pedagogically, we clarify the relation between these estimators and the standard patchy screening quadratic estimator (QE). The new estimators trade optimality for robustness to biases caused by the dominant CMB lensing and foreground contaminants, allowing the use of smaller angular scales. We perform a simulated analysis to realistically forecast the expected precision of patchy screening measurements from four CMB experiments, ACT, SPT, Simons Observatory (SO) and CMB-S4, cross-correlated with three galaxy samples from BOSS, unWISE and the simulated Rubin LSST Data Challenge 2 catalog. Our results give further confidence in the first detection of this effect from the ACT$\times$unWISE data in the companion paper and show patchy screening will be a powerful observable for future surveys like SO, CMB-S4 and LSST. Implementations of the patchy screening QE and the TI and signed estimators are publicly available in our LensQuEst and ThumbStack software packages, available at https://github.com/EmmanuelSchaan/LensQuEst and https://github.com/EmmanuelSchaan/ThumbStack , respectively.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
The Atacama Cosmology Telescope: Detection of Patchy Screening of the Cosmic Microwave Background
Authors:
William R. Coulton,
Theo Schutt,
Abhishek S. Maniyar,
Emmanuel Schaan,
Rui An,
Zachary Atkins,
Nicholas Battaglia,
J Richard Bond,
Erminia Calabrese,
Steve K. Choi,
Mark J. Devlin,
Adriaan J. Duivenvoorden,
Jo Dunkley,
Simone Ferraro,
Vera Gluscevic,
J. Colin Hill,
Matt Hilton,
Adam D. Hincks,
Arthur Kosowsky,
Darby Kramer,
Aleksandra Kusiak,
Adrien La Posta,
Thibaut Louis,
Mathew S. Madhavacheril,
Gabriela A. Marques
, et al. (15 additional authors not shown)
Abstract:
Spatial variations in the cosmic electron density after reionization generate cosmic microwave background anisotropies via Thomson scattering, a process known as the ``patchy screening" effect. In this paper, we propose a new estimator for the patchy screening effect that is designed to mitigate biases from the dominant foreground signals. We use it to measure the cross-correlation between \textit…
▽ More
Spatial variations in the cosmic electron density after reionization generate cosmic microwave background anisotropies via Thomson scattering, a process known as the ``patchy screening" effect. In this paper, we propose a new estimator for the patchy screening effect that is designed to mitigate biases from the dominant foreground signals. We use it to measure the cross-correlation between \textit{unWISE} galaxies and patchy screening, the latter measured by the Atacama Cosmology Telescope and \textit{Planck} satellite. We report the first detection of the patchy screening effect, with the statistical significance of the cross-correlation exceeding $7σ$. This measurement directly probes the distribution of electrons around these galaxies and provides strong evidence that gas is more extended than the underlying dark matter. By comparing our measurements to electron profiles extracted from simulations, we demonstrate the power of these observations to constrain galaxy evolution models. Requiring only the 2D positions of objects and no individual redshifts or velocity estimates, this approach is complementary to existing gas probes, such as those based on the kinetic Sunyaev-Zeldovich effect.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
arXiv:2401.10508
[pdf]
physics.optics
cond-mat.mes-hall
cond-mat.mtrl-sci
physics.app-ph
quant-ph
Photonic Supercoupling in Silicon Topological Waveguides
Authors:
Ridong Jia,
Yi Ji Tan,
Nikhil Navaratna,
Abhishek Kumar,
Ranjan Singh
Abstract:
Electromagnetic wave coupling between photonic systems relies on the evanescent field typically confined within a single wavelength. Extending evanescent coupling distance requires low refractive index contrast and perfect momentum matching for achieving a large coupling ratio. Here, we report the discovery of photonic supercoupling in a topological valley Hall pair of waveguides, showing a substa…
▽ More
Electromagnetic wave coupling between photonic systems relies on the evanescent field typically confined within a single wavelength. Extending evanescent coupling distance requires low refractive index contrast and perfect momentum matching for achieving a large coupling ratio. Here, we report the discovery of photonic supercoupling in a topological valley Hall pair of waveguides, showing a substantial improvement in coupling efficiency across multiple wavelengths. Experimentally, we realize ultra-high coupling ratios between waveguides through valley-conserved vortex flow of electromagnetic energy, attaining 95% coupling efficiency for separations of up to three wavelengths. This demonstration of photonic supercoupling in topological systems significantly extends the coupling distance between on-chip waveguides and components, paving the path for the development of supercoupled photonic integrated devices, optical sensing, and telecommunications.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Authors:
Andreas Engelhardt,
Amit Raj,
Mark Boss,
Yunzhi Zhang,
Abhishek Kar,
Yuanzhen Li,
Deqing Sun,
Ricardo Martin Brualla,
Jonathan T. Barron,
Hendrik P. A. Lensch,
Varun Jampani
Abstract:
We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit…
▽ More
We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit shape representation based on a multi-resolution hash encoding enables faster and robust shape reconstruction with joint camera alignment optimization that outperforms prior work. Further, to enable the editing of illumination and object reflectance (i.e. material) we jointly optimize BRDF and illumination together with the object's shape. Our method is class-agnostic and works on in-the-wild image collections of objects to produce relightable 3D assets for several use cases such as AR/VR, movies, games, etc. Project page: https://shinobi.aengelhardt.com Video: https://www.youtube.com/watch?v=iFENQ6AcYd8&feature=youtu.be
△ Less
Submitted 29 March, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Plug-in for visualizing 3D tool tracking from videos of Minimally Invasive Surgeries
Authors:
Shubhangi Nema,
Abhishek Mathur,
Leena Vachhani
Abstract:
This paper tackles instrument tracking and 3D visualization challenges in minimally invasive surgery (MIS), crucial for computer-assisted interventions. Conventional and robot-assisted MIS encounter issues with limited 2D camera projections and minimal hardware integration. The objective is to track and visualize the entire surgical instrument, including shaft and metallic clasper, enabling safe n…
▽ More
This paper tackles instrument tracking and 3D visualization challenges in minimally invasive surgery (MIS), crucial for computer-assisted interventions. Conventional and robot-assisted MIS encounter issues with limited 2D camera projections and minimal hardware integration. The objective is to track and visualize the entire surgical instrument, including shaft and metallic clasper, enabling safe navigation within the surgical environment. The proposed method involves 2D tracking based on segmentation maps, facilitating creation of labeled dataset without extensive ground-truth knowledge. Geometric changes in 2D intervals express motion, and kinematics based algorithms process results into 3D tracking information. Synthesized and experimental results in 2D and 3D motion estimates demonstrate negligible errors, validating the method for labeling and motion tracking of instruments in MIS videos. The conclusion underscores the proposed 2D segmentation technique's simplicity and computational efficiency, emphasizing its potential as direct plug-in for 3D visualization in instrument tracking and MIS practices.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
TT-SNN: Tensor Train Decomposition for Efficient Spiking Neural Network Training
Authors:
Donghyun Lee,
Ruokai Yin,
Youngeun Kim,
Abhishek Moitra,
Yuhang Li,
Priyadarshini Panda
Abstract:
Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation. However, SNNs suffer from memory and computation overhead due to spatio-temporal dynamics and multiple backpropagation computations across timesteps during training. To address this issue, we introduce Tensor Train Decompos…
▽ More
Spiking Neural Networks (SNNs) have gained significant attention as a potentially energy-efficient alternative for standard neural networks with their sparse binary activation. However, SNNs suffer from memory and computation overhead due to spatio-temporal dynamics and multiple backpropagation computations across timesteps during training. To address this issue, we introduce Tensor Train Decomposition for Spiking Neural Networks (TT-SNN), a method that reduces model size through trainable weight decomposition, resulting in reduced storage, FLOPs, and latency. In addition, we propose a parallel computation pipeline as an alternative to the typical sequential tensor computation, which can be flexibly integrated into various existing SNN architectures. To the best of our knowledge, this is the first of its kind application of tensor decomposition in SNNs. We validate our method using both static and dynamic datasets, CIFAR10/100 and N-Caltech101, respectively. We also propose a TT-SNN-tailored training accelerator to fully harness the parallelism in TT-SNN. Our results demonstrate substantial reductions in parameter size (7.98X), FLOPs (9.25X), training time (17.7%), and training energy (28.3%) during training for the N-Caltech101 dataset, with negligible accuracy degradation.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Solution of the Probabilistic Lambert Problem: Connections with Optimal Mass Transport, Schrödinger Bridge and Reaction-Diffusion PDEs
Authors:
Alexis M. H. Teter,
Iman Nodozi,
Abhishek Halder
Abstract:
Lambert's problem concerns with transferring a spacecraft from a given initial to a given terminal position within prescribed flight time via velocity control subject to a gravitational force field. We consider a probabilistic variant of the Lambert problem where the knowledge of the endpoint constraints in position vectors are replaced by the knowledge of their respective joint probability densit…
▽ More
Lambert's problem concerns with transferring a spacecraft from a given initial to a given terminal position within prescribed flight time via velocity control subject to a gravitational force field. We consider a probabilistic variant of the Lambert problem where the knowledge of the endpoint constraints in position vectors are replaced by the knowledge of their respective joint probability density functions. We show that the Lambert problem with endpoint joint probability density constraints is a generalized optimal mass transport (OMT) problem, thereby connecting this classical astrodynamics problem with a burgeoning area of research in modern stochastic control and stochastic machine learning. This newfound connection allows us to rigorously establish the existence and uniqueness of solution for the probabilistic Lambert problem. The same connection also helps to numerically solve the probabilistic Lambert problem via diffusion regularization, i.e., by leveraging further connection of the OMT with the Schrödinger bridge problem (SBP). This also shows that the probabilistic Lambert problem with additive dynamic process noise is in fact a generalized SBP, and can be solved numerically using the so-called Schrödinger factors, as we do in this work. We explain how the resulting analysis leads to solving a boundary-coupled system of reaction-diffusion PDEs where the nonlinear gravitational potential appears as the reaction rate. We propose novel algorithms for the same, and present illustrative numerical results. Our analysis and the algorithmic framework are nonparametric, i.e., we make neither statistical (e.g., Gaussian, first few moments, mixture or exponential family, finite dimensionality of the sufficient statistic) nor dynamical (e.g., Taylor series) approximations.
△ Less
Submitted 18 March, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
2.5-D MHD Simulation of the Formation and Evolution of Plasmoids in Coronal Current Sheets
Authors:
Sripan Mondal,
Abhishek K Srivastava,
David I. Pontin,
Ding Yuan,
Eric R. Priest
Abstract:
In the present paper, using MPI-AMRVAC, we perform a 2.5-D numerical MHD simulation of the dynamics and associated thermodynamical evolution of an initially force-free Harris current sheet subjected to an external velocity perturbation under the condition of uniform resistivity. The amplitude of the magnetic field is taken to be 10 Gauss, typical of the solar corona. We impose a Gaussian velocity…
▽ More
In the present paper, using MPI-AMRVAC, we perform a 2.5-D numerical MHD simulation of the dynamics and associated thermodynamical evolution of an initially force-free Harris current sheet subjected to an external velocity perturbation under the condition of uniform resistivity. The amplitude of the magnetic field is taken to be 10 Gauss, typical of the solar corona. We impose a Gaussian velocity pulse across this current sheet mimicking the interaction of fast magnetoacoustic waves with a current sheet in corona. This leads to a variety of dynamics and plasma processes in the current sheet, which is initially quasi-static. The initial pulse interacts with the current sheet and splits into a pair of counter-propagating wavefronts, which forms a rarefied region and leads to inflow and a thinning of the current sheet. The thinning results in Petschek-type magnetic reconnection followed by tearing instability and plasmoid formation. The reconnection outflows containing outward-moving plasmoids have accelerated motions with velocities ranging from 105-303 km/s. The average temperature and density of the plasmoids are found to be 8 MK and twice the background density of the solar corona, respectively. These estimates of velocity, temperature and density of plasmoids are similar to values reported from various solar coronal observations. Therefore, we infer that the external triggering of a quasi-static current sheet by a single velocity pulse is capable of initiating magnetic reconnection and plasmoid formation in the absence of a localized enhancement of resistivity in the solar corona.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Universality in coupled stochastic Burgers systems with degenerate flux Jacobian
Authors:
Dipankar Roy,
Abhishek Dhar,
Konstantin Khanin,
Manas Kulkarni,
Herbert Spohn
Abstract:
In our contribution we study stochastic models in one space dimension with two conservation laws. One model is the coupled continuum stochastic Burgers equation, for which each current is a sum of quadratic non-linearities, linear diffusion, and spacetime white noise. The second model is a two-lane stochastic lattice gas. As distinct from previous studies, the two conserved densities are tuned suc…
▽ More
In our contribution we study stochastic models in one space dimension with two conservation laws. One model is the coupled continuum stochastic Burgers equation, for which each current is a sum of quadratic non-linearities, linear diffusion, and spacetime white noise. The second model is a two-lane stochastic lattice gas. As distinct from previous studies, the two conserved densities are tuned such that the flux Jacobian, a $2 \times 2$ matrix, has coinciding eigenvalues. In the steady state, investigated are spacetime correlations of the conserved fields and the time-integrated currents at the origin. For a particular choice of couplings the dynamical exponent 3/2 is confirmed. Furthermore, at these couplings, continuum stochastic Burgers equation and lattice gas are demonstrated to be in the same universality class.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Improving the energy density and flexibility of PMN-0.3PT based piezoelectric generator by composite designing
Authors:
Abhishek Kumar,
Kaushik Das,
Amritendu Roy
Abstract:
Ceramics based piezoelectric generators are known for their high energy density and poor flexibility. In this work, v_r-PMN-0.3PT/PDMS 2-2 composite with optimum PMN-0.3PT content (v_r) was designed that demonstrated enhanced output energy density and superior mechanical flexibility under dynamic mechanical excitation. v_r-PMN-0.3PT/PDMS 2-2 composite with different PMN-PT reinforcement content (v…
▽ More
Ceramics based piezoelectric generators are known for their high energy density and poor flexibility. In this work, v_r-PMN-0.3PT/PDMS 2-2 composite with optimum PMN-0.3PT content (v_r) was designed that demonstrated enhanced output energy density and superior mechanical flexibility under dynamic mechanical excitation. v_r-PMN-0.3PT/PDMS 2-2 composite with different PMN-PT reinforcement content (v_r) and two different reinforcement configurations were fabricated and characterized for effective electro-elastic properties and energy harvesting response. Parallelly, using the finite element method and analytical models, effective electromechanical properties were calculated. Composites with parallel connectivity of the reinforcement phase demonstrated enhanced piezoelectric charge coefficient even with low PMN-0.3PT content whereas the relative permittivity and elastic modulus exhibited a linearly increasing trend with reinforcement volume fraction. At a compressive load of 50 N and 5 Hz frequency, a piezoelectric generator (PG) based on a v_r = 0.2, v_r- PMN-0.3PT/PDMS 2-2 composite with parallel connectivity produced a maximum short-circuit current density of 69 nA/cm2 and an open-circuit electric field of 189 V/cm, translating to a maximum output power density of ~13 μW/cm3 higher than that of pristine PMN-0.3 PT based piezoelectric generator. Estimated mechanical flexibility was found to be ~53 % higher than that of pristine PMN-0.3PT.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Dynamic Capital Requirements for Markov Decision Processes
Authors:
William B. Haskell,
Abhishek Gupta,
Shi** Shao
Abstract:
We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences…
▽ More
We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences in terms of a single set that we call the risk frontier which characterizes acceptable triples. We then propose the generalized capital requirement (GCR) which evaluates the risk of a payoff stream by minimizing the premium schedule over acceptable triples. We apply this model to a risk-aware decision maker (DM) who controls a Markov decision process (MDP) and wants to find a policy to minimize the GCR of its payoff stream. The resulting GCR-MDP recovers many well-known risk-aware MDPs as special cases. To make this approach computationally viable, we obtain the temporal decomposition of the GCR in terms of the risk frontier. Then, we connect the temporal decomposition with the notion of an information state to compactly capture the dependence of DM's risk preferences on the problem history, where augmented dynamic programming can be used to compute an optimal policy. We report numerical experiments for the GCR-minimizing newsvendor.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
The Undecidable Charge Gap and the Oil Drop Experiment
Authors:
Abhishek Majhi
Abstract:
Decision problems in physics have been an active field of research for quite a few decades resulting in some interesting findings in recent years. However, such research investigations are based on a priori knowledge of theoretical computer science and the technical jargon of set theory. Here, I discuss a particular, but a significant, instance of how decision problems in physics can be realized w…
▽ More
Decision problems in physics have been an active field of research for quite a few decades resulting in some interesting findings in recent years. However, such research investigations are based on a priori knowledge of theoretical computer science and the technical jargon of set theory. Here, I discuss a particular, but a significant, instance of how decision problems in physics can be realized without such specific prerequisites. I expose a hitherto unnoticed contradiction, that can be posed as a decision problem, concerning the oil drop experiment and thereby resolve it by refining the notion of ``existence'' in physics. This consequently leads to the undecidability of the charge spectral gap through the notion of ``undecidable charges" which is in tandem with the completeness condition of a theory as was stated by Einstein, Podolsky and Rosen in their seminal work. Decision problems can now be realized in connection to basic physics, in general, rather than quantum physics, in particular, as per some recent claims.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Machine Learning (ML)-assisted Beam Management in millimeter (mm)Wave Distributed Multiple Input Multiple Output (D-MIMO) systems
Authors:
Karthik R M,
Dhiraj Nagaraja Hegde,
Muris Sarajlic,
Abhishek Sarkar
Abstract:
Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is…
▽ More
Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is challenging due to a large number of beams that need to be sounded with Downlink (DL) reference signals. The objective of this paper is to investigate whether the best AP/beam can be reliably inferred from sounding only a small subset of beams and leveraging AI/ML for inference of best beam/AP. We use Random Forest (RF), MissForest (MF) and conditional Generative Adversarial Networks (c-GAN) for demonstrating the performance benefits of inference.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Resolving non-equilibrium shape variations amongst millions of gold nanoparticles
Authors:
Zhou Shen,
Salah Awel,
Anton Barty,
Richard Bean,
Johan Bielecki,
Martin Bergemann,
Benedikt J. Daurer,
Tomas Ekeberg,
Armando D. Estillore,
Hans Fangohr,
Klaus Giewekemeyer,
Mark S. Hunter,
Mikhail Karnevskiy,
Richard A. Kirian,
Henry Kirkwood,
Yoonhee Kim,
Jayanath Koliyadu,
Holger Lange,
Romain Letrun,
Jannik Lübke,
Abhishek Mall,
Thomas Michelat,
Andrew J. Morgan,
Nils Roth,
Amit K. Samanta
, et al. (14 additional authors not shown)
Abstract:
Nanoparticles, exhibiting functionally relevant structural heterogeneity, are at the forefront of cutting-edge research. Now, high-throughput single-particle imaging (SPI) with x-ray free-electron lasers (XFELs) creates unprecedented opportunities for recovering the shape distributions of millions of particles that exhibit functionally relevant structural heterogeneity. To realize this potential,…
▽ More
Nanoparticles, exhibiting functionally relevant structural heterogeneity, are at the forefront of cutting-edge research. Now, high-throughput single-particle imaging (SPI) with x-ray free-electron lasers (XFELs) creates unprecedented opportunities for recovering the shape distributions of millions of particles that exhibit functionally relevant structural heterogeneity. To realize this potential, three challenges have to be overcome: (1) simultaneous parametrization of structural variability in real and reciprocal spaces; (2) efficiently inferring the latent parameters of each SPI measurement; (3) scaling up comparisons between $10^5$ structural models and $10^6$ XFEL-SPI measurements. Here, we describe how we overcame these three challenges to resolve the non-equilibrium shape distributions within millions of gold nanoparticles imaged at the European XFEL. These shape distributions allowed us to quantify the degree of asymmetry in these particles, discover a relatively stable `shape envelope' amongst nanoparticles, discern finite-size effects related to shape-controlling surfactants, and extrapolate nanoparticles' shapes to their idealized thermodynamic limit. Ultimately, these demonstrations show that XFEL SPI can help transform nanoparticle shape characterization from anecdotally interesting to statistically meaningful.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
On the Target Detection Performance of a Molecular Communication Network with Multiple Mobile Nanomachines
Authors:
Nithin V. Sabu,
Abhishek K. Gupta
Abstract:
A network of nanomachines (NMs) can be used to build a target detection system for a variety of promising applications. They have the potential to detect toxic chemicals, infectious bacteria, and biomarkers of dangerous diseases such as cancer within the human body. Many diseases and health disorders can be detected early and efficiently treated in the future by utilizing these systems. To fully g…
▽ More
A network of nanomachines (NMs) can be used to build a target detection system for a variety of promising applications. They have the potential to detect toxic chemicals, infectious bacteria, and biomarkers of dangerous diseases such as cancer within the human body. Many diseases and health disorders can be detected early and efficiently treated in the future by utilizing these systems. To fully grasp the potential of these systems, mathematical analysis is required. This paper describes an analytical framework for modeling and analyzing the performance of target detection systems composed of multiple mobile nanomachines of varying sizes with passive/absorbing boundaries. We consider both direct contact detection, in which NMs must physically contact the target to detect it, and indirect sensing, in which NMs must detect the marker molecules emitted by the target. The detection performance of such systems is calculated for degradable and non-degradable targets, as well as mobile and stationary targets. The derived expressions provide various insights, such as the effect of NM density and target degradation on detection probability.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
A Polynomial Kernel for Proper Helly Circular-arc Vertex Deletion
Authors:
Akanksha Agrawal,
Satyabrata Jana,
Abhishek Sahu
Abstract:
A proper Helly circular-arc graph is an intersection graph of a set of arcs on a circle such that none of the arcs properly contains any other arc and every set of pairwise intersecting arcs has a common intersection. The Proper Helly Circular-arc Vertex Deletion problem takes as input a graph $G$ and an integer $k$, and the goal is to check if we can remove at most $k$ vertices from the graph to…
▽ More
A proper Helly circular-arc graph is an intersection graph of a set of arcs on a circle such that none of the arcs properly contains any other arc and every set of pairwise intersecting arcs has a common intersection. The Proper Helly Circular-arc Vertex Deletion problem takes as input a graph $G$ and an integer $k$, and the goal is to check if we can remove at most $k$ vertices from the graph to obtain a proper Helly circular-arc graph; the parameter is $k$. Recently, Cao et al.~[MFCS 2023] obtained an FPT algorithm for this (and related) problem. In this work, we obtain a polynomial kernel for the problem.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Decision Making in Non-Stationary Environments with Policy-Augmented Search
Authors:
Ava Pettet,
Yunuo Zhang,
Baiting Luo,
Kyle Wray,
Hendrik Baier,
Aron Laszka,
Abhishek Dubey,
Ayan Mukhopadhyay
Abstract:
Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action tra…
▽ More
Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.
△ Less
Submitted 20 January, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
LLM Augmented LLMs: Expanding Capabilities through Composition
Authors:
Rachit Bansal,
Bidisha Samanta,
Siddharth Dalmia,
Nitish Gupta,
Shikhar Vashishth,
Sriram Ganapathy,
Abhishek Bapna,
Prateek Jain,
Partha Talukdar
Abstract:
Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domai…
▽ More
Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes
Authors:
Baiting Luo,
Yunuo Zhang,
Abhishek Dubey,
Ayan Mukhopadhyay
Abstract:
A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated enviro…
▽ More
A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts ``safely'' to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice -- updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree Search (ADA-MCTS)} that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify ``updated knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the agent's updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with the multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and highly adaptive without sacrificing safety.
△ Less
Submitted 21 January, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
Zero-shot Active Learning Using Self Supervised Learning
Authors:
Abhishek Sinha,
Shreya Singh
Abstract:
Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the be…
▽ More
Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the best subset of data for annotation, such that the deep learning model when trained over that subset will have maximum generalization performance under this budget. In this work, we aim to propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process. We aim to leverage self-supervised learnt features for the task of Active Learning. The benefit of self-supervised learning, is that one can get useful feature representation of the input data, without having any annotation.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality
Authors:
Yaqing Hou,
Mingyang Sun,
Abhishek Gupta,
Yaochu **,
Haiyin Piao,
Hongwei Ge,
Qiang Zhang
Abstract:
In this paper, we scale evolutionary algorithms to high-dimensional optimization problems that deceptively possess a low effective dimensionality (certain dimensions do not significantly affect the objective function). To this end, an instantiation of the multiform optimization paradigm is presented, where multiple low-dimensional counterparts of a target high-dimensional task are generated via ra…
▽ More
In this paper, we scale evolutionary algorithms to high-dimensional optimization problems that deceptively possess a low effective dimensionality (certain dimensions do not significantly affect the objective function). To this end, an instantiation of the multiform optimization paradigm is presented, where multiple low-dimensional counterparts of a target high-dimensional task are generated via random embeddings. Since the exact relationship between the auxiliary (low-dimensional) tasks and the target is a priori unknown, a multiform evolutionary algorithm is developed for unifying all formulations into a single multi-task setting. The resultant joint optimization enables the target task to efficiently reuse solutions evolved across various low-dimensional searches via cross-form genetic transfers, hence speeding up overall convergence characteristics. To validate the overall efficacy of our proposed algorithmic framework, comprehensive experimental studies are carried out on well-known continuous benchmark functions as well as a set of practical problems in the hyper-parameter tuning of machine learning models and deep learning models in classification tasks and Predator-Prey games, respectively.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Coupled pyroelectric-photovoltaic effect in 2D ferroelectric $α$-In$_2$Se$_3$
Authors:
Michael Uzhansky,
Abhishek Rakshit,
Yoav Kalcheim,
Elad Koren
Abstract:
Pyroelectric and photovoltaic effects are vital in cutting-edge thermal imaging, infrared sensors, thermal and solar energy harvesting. Recent advances revealed the great potential of the bulk photovoltaic effect in two-dimensional (2D) semiconductor-ferroelectric materials to enable reconfigurable p-n junction operation with the potential to surpass the Shockley-Queiseer limit. Moreover, the extr…
▽ More
Pyroelectric and photovoltaic effects are vital in cutting-edge thermal imaging, infrared sensors, thermal and solar energy harvesting. Recent advances revealed the great potential of the bulk photovoltaic effect in two-dimensional (2D) semiconductor-ferroelectric materials to enable reconfigurable p-n junction operation with the potential to surpass the Shockley-Queiseer limit. Moreover, the extremely low thickness, high thermal conductivity, dangling bonds free interface, and room-temperature stable ferroelectricity down to a single monolayer endow 2D ferroelectrics with a superior pyroelectric figure of merit. Herein, we performed direct pyroelectric measurements of 2D $α$-In$_2$Se$_3$ under dark and light conditions. The results reveal a gigantic pyroelectric coefficient of 30.7 mC/m$^2$K and a figure of merit of 135.9 m$^2$/C. In addition, we perform temperature-dependent short-circuit photovoltaic response measurements in which the excess photocurrent is modulated in proportion with the temperature variations due to the induced in-plane potential variations. Consequently, the discovered pyroelectric-photovoltaic effect allows the combination of direct temperature (photovoltaic) and temperature-derivative (pyroelectric) sensing. Finally, we utilized the intercoupled ferroelectricity of In$_2$Se$_3$ to realize a non-volatile, self-powered photovoltaic memory operation, demonstrating a stable short-circuit current switching with a decent 103 ON-OFF ratio. The coupled pyroelectric-photovoltaic effect, along with reconfigurable photocurrent, pave the way for a novel monolithic device technology with integrated thermal and optical response, in-memory logic and energy harvesting.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Multilingual Bias Detection and Mitigation for Indian Languages
Authors:
Ankita Maity,
Anubhav Sharma,
Rudra Dhar,
Tushar Abhishek,
Manish Gupta,
Vasudeva Varma
Abstract:
Lack of diverse perspectives causes neutrality bias in Wikipedia content leading to millions of worldwide readers getting exposed by potentially inaccurate information. Hence, neutrality bias detection and mitigation is a critical problem. Although previous studies have proposed effective solutions for English, no work exists for Indian languages. First, we contribute two large datasets, mWikiBias…
▽ More
Lack of diverse perspectives causes neutrality bias in Wikipedia content leading to millions of worldwide readers getting exposed by potentially inaccurate information. Hence, neutrality bias detection and mitigation is a critical problem. Although previous studies have proposed effective solutions for English, no work exists for Indian languages. First, we contribute two large datasets, mWikiBias and mWNC, covering 8 languages, for the bias detection and mitigation tasks respectively. Next, we investigate the effectiveness of popular multilingual Transformer-based models for the two tasks by modeling detection as a binary classification problem and mitigation as a style transfer problem. We make the code and data publicly available.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.