Search | arXiv e-print repository

Exploring Altermagnetism in Orthorhombic $Pnma$ structure through Group Theory and DFT Calculations

Authors: Suman Rooj, Sugandha Saxena, Nirmal Ganguli

Abstract: Antiferromagnetism, initially considered interesting but useless, recently emerged as one of the most promising magnetic phases for technology. Recently, a low symmetry antiferromagnetic phase, known as altermagnetic phase, have been discovered, where no time reversal ($\mathcal{T}$) symmetry is observed in spite of a vanishing net magnetization, leading to non-degenerate bands from the opposite m… ▽ More Antiferromagnetism, initially considered interesting but useless, recently emerged as one of the most promising magnetic phases for technology. Recently, a low symmetry antiferromagnetic phase, known as altermagnetic phase, have been discovered, where no time reversal ($\mathcal{T}$) symmetry is observed in spite of a vanishing net magnetization, leading to non-degenerate bands from the opposite magnetic sublattices. In this work, we consider two representatives of orthorhombic $Pnma$ space group, namely, BiFeO$_3$ and CaMnO$_3$ and find altermagnetic lowest energy phase in both from our density functional theory calculations. We find a substantial spin-splitting in both systems along a high-symmetry path in the Brillouin zone without considering the spin-orbit interaction (SOI). Detailed features of the band dispersion obtained from our calculation confirm the lifting of sublattice spin degeneracy only in the $k_y$-$k_z$ plane while preserving the spin degeneracy in the other planes of the Brillouin zone. We provide a comprehensive symmetry analysis based on the magnetic space group (MSG) to explain our DFT findings and an insightful symmetry-allowed model Hamiltonian, which qualitatively agrees with our results. Additionally, we extend our symmetry analysis to encompass two other potential MSGs within the $Pnma$ space group that may host the spin-splitting phenomenon without considering SOI and the likely form of their Hamiltonian. These detailed studies pave the way for a deeper understanding of the spin-splitting phenomena within the $Pnma$ space group, offering insights into the intricate interplay between symmetry and electronic as well as magnetic properties. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 10 pages, 6 figures

arXiv:2405.16616 [pdf, other]

DPHGNN: A Dual Perspective Hypergraph Neural Networks

Authors: Siddhant Saxena, Shounak Ghatak, Raghu Kolla, Debashis Mukherjee, Tanmoy Chakraborty

Abstract: Message passing on hypergraphs has been a standard framework for learning higher-order correlations between hypernodes. Recently-proposed hypergraph neural networks (HGNNs) can be categorized into spatial and spectral methods based on their design choices. In this work, we analyze the impact of change in hypergraph topology on the suboptimal performance of HGNNs and propose DPHGNN, a novel dual-pe… ▽ More Message passing on hypergraphs has been a standard framework for learning higher-order correlations between hypernodes. Recently-proposed hypergraph neural networks (HGNNs) can be categorized into spatial and spectral methods based on their design choices. In this work, we analyze the impact of change in hypergraph topology on the suboptimal performance of HGNNs and propose DPHGNN, a novel dual-perspective HGNN that introduces equivariant operator learning to capture lower-order semantics by inducing topology-aware spatial and spectral inductive biases. DPHGNN employs a unified framework to dynamically fuse lower-order explicit feature representations from the underlying graph into the super-imposed hypergraph structure. We benchmark DPHGNN over eight benchmark hypergraph datasets for the semi-supervised hypernode classification task and obtain superior performance compared to seven state-of-the-art baselines. We also provide a theoretical framework and a synthetic hypergraph isomorphism test to express the power of spatial HGNNs and quantify the expressivity of DPHGNN beyond the Generalized Weisfeiler Leman (1-GWL) test. Finally, DPHGNN was deployed by our partner e-commerce company for the Return-to-Origin (RTO) prediction task, which shows ~7% higher macro F1-Score than the best baseline. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted in SIGKDD'24 -- Research Track

arXiv:2405.15811 [pdf, other]

Maximizing Weighted Dominance in the Plane

Authors: Waseem Akram, Sanjeev Saxena

Abstract: Let P be a set of n weighted points, Q be a set of m unweighted points in the plane, and k a non-negative integer. We consider the problem of computing a subset $Q'\subseteq Q$ with size at most k such that the sum of the weights of the points of P dominated by at least one point in the set Q' is maximized. A point q in the plane dominates another point p if and only if $x(q)\ge x(p)$ and… ▽ More Let P be a set of n weighted points, Q be a set of m unweighted points in the plane, and k a non-negative integer. We consider the problem of computing a subset $Q'\subseteq Q$ with size at most k such that the sum of the weights of the points of P dominated by at least one point in the set Q' is maximized. A point q in the plane dominates another point p if and only if $x(q)\ge x(p)$ and $y(q)\ge y(p)$, and at least one inequality is strict. We present a solution to the problem that takes O(n + m)-space and $O(k \min\{n+m, \frac{n}{k}+m^2\}\log m)$-time. We (conditionally) improve upon the existing result (the bounds of our solution are interesting when $m= o(\sqrt{n}))$. Moreover, we also present a simple algorithm solving the problem in $O(km^2+n\log m)$-time and $O(n+m)$-space. The bounds of the algorithm are interesting when $m= o(\sqrt{n})$. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.04333 [pdf]

A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI

Authors: Hannah Chafetz, Sampriti Saxena, Stefaan G. Verhulst

Abstract: Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field re… ▽ More Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field remain underexplored areas. This white paper seeks to unpack the relationship between open data and generative AI and explore possible components of a new Fourth Wave of Open Data: Is open data becoming AI ready? Is open data moving towards a data commons approach? Is generative AI making open data more conversational? Will generative AI improve open data quality and provenance? Towards this end, we provide a new Spectrum of Scenarios framework. This framework outlines a range of scenarios in which open data and generative AI could intersect and what is required from a data quality and provenance perspective to make open data ready for those specific scenarios. These scenarios include: pertaining, adaptation, inference and insight generation, data augmentation, and open-ended exploration. Through this process, we found that in order for data holders to embrace generative AI to improve open data access and develop greater insights from open data, they first must make progress around five key areas: enhance transparency and documentation, uphold quality and integrity, promote interoperability and standards, improve accessibility and useability, and address ethical considerations. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 58 pages

arXiv:2404.13113 [pdf, other]

Towards quantum computing for clinical trial design and optimization: A perspective on new opportunities and challenges

Authors: Hakan Doga, M. Emre Sahin, Joao Bettencourt-Silva, Anh Pham, Eunyoung Kim, Alan Andress, Sudhir Saxena, Aritra Bose, Laxmi Parida, Jan Lukas Robertus, Hideaki Kawaguchi, Radwa Soliman, Daniel Blankenberg

Abstract: Clinical trials are pivotal in the drug discovery process to determine the safety and efficacy of a drug candidate. The high failure rates of these trials are attributed to deficiencies in clinical model development and protocol design. Improvements in the clinical drug design process could therefore yield significant benefits for all stakeholders involved. This paper examines the current challeng… ▽ More Clinical trials are pivotal in the drug discovery process to determine the safety and efficacy of a drug candidate. The high failure rates of these trials are attributed to deficiencies in clinical model development and protocol design. Improvements in the clinical drug design process could therefore yield significant benefits for all stakeholders involved. This paper examines the current challenges faced in clinical trial design and optimization, reviews established classical computational approaches, and introduces quantum algorithms aimed at enhancing these processes. Specifically, the focus is on three critical aspects: clinical trial simulations, site selection, and cohort identification. This study aims to provide a comprehensive framework that leverages quantum computing to innovate and refine the efficiency and effectiveness of clinical trials. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.00952 [pdf, other]

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Authors: Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

Abstract: Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e.g., biomedicine). Although domain-specific pre-training enhances efficiency and leads to smaller models, the computational costs of training these LLMs remain high, posing… ▽ More Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e.g., biomedicine). Although domain-specific pre-training enhances efficiency and leads to smaller models, the computational costs of training these LLMs remain high, posing budgeting challenges. We introduce MediSwift, a suite of biomedical LMs that leverage sparse pre-training on domain-specific biomedical text data. By inducing up to 75% weight sparsity during the pre-training phase, MediSwift achieves a 2-2.5x reduction in training FLOPs. Notably, all sparse pre-training was performed on the Cerebras CS-2 system, which is specifically designed to realize the acceleration benefits from unstructured weight sparsity, thereby significantly enhancing the efficiency of the MediSwift models. Through subsequent dense fine-tuning and strategic soft prompting, MediSwift models outperform existing LLMs up to 7B parameters on biomedical tasks, setting new benchmarks w.r.t efficiency-accuracy on tasks such as PubMedQA. Our results show that sparse pre-training, along with dense fine-tuning and soft prompting, offers an effective method for creating high-performing, computationally efficient models in specialized domains. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.12531 [pdf, other]

Improving Deep Generative Models on Many-To-One Image-to-Image Translation

Authors: Sagar Saxena, Mohammad Nayeem Teli

Abstract: Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one mo… ▽ More Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one modality. However, there are many datasets that have a many-to-one relationship between two domains. In this work, we first introduce a Colorized MNIST dataset and a Color-Recall score that can provide a simple benchmark for evaluating models on many-to-one translation. We then introduce a new asymmetric framework to improve existing deep generative models on many-to-one image-to-image translation. We apply this framework to StarGAN V2 and show that in both unsupervised and semi-supervised settings, the performance of this new model improves on many-to-one image-to-image translation. △ Less

Submitted 22 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 11 pages, 6 figures; template format corrected

arXiv:2402.12247 [pdf, other]

Finite-temperature grain boundary properties from quasistatic atomistics

Authors: Miguel Spínola, Shashank Saxena, Prateek Gupta, Brandon Runnels, Dennis M. Kochmann

Abstract: Grain boundary (GB) properties greatly influence the mechanical, electrical, and thermal response of polycrystalline materials. Most computational studies of GB properties at finite temperatures use molecular dynamics (MD), which is computationally expensive, limited in the range of accessible timescales, and requires cumbersome techniques like thermodynamic integration to estimate free energies.… ▽ More Grain boundary (GB) properties greatly influence the mechanical, electrical, and thermal response of polycrystalline materials. Most computational studies of GB properties at finite temperatures use molecular dynamics (MD), which is computationally expensive, limited in the range of accessible timescales, and requires cumbersome techniques like thermodynamic integration to estimate free energies. This restricts the reasonable computation (without incurring excessive computational expense) of GB properties to regimes that are often unrealistic, such as zero temperature or extremely high strain rates. Consequently, there is a need for simulation methodology that avoids the timescale limitations of MD, while providing reliable estimates of GB properties. The Gaussian Phase-Packet (GPP) method is a temporal coarse-graining technique that can predict relaxed atomic structures at finite temperature in the quasistatic limit. This work applies GPP, combined with the quasiharmonic approximation for computing the free energy, to the problem of determining the free energy and shear coupling factor of grain boundaries in metals over a range of realistic temperatures. Validation is achieved by comparison to thermodynamic integration, which confirms that the presented approach captures relaxed-energy GB structures and shear coupling factors at finite temperature with a high degree of accuracy. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.14502 [pdf, other]

MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models

Authors: Saumya Saxena, Mohit Sharma, Oliver Kroemer

Abstract: Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. I… ▽ More Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. In this work, we propose a framework, MResT (Multi-Resolution Transformer), for learning generalizable language-conditioned multi-task policies that utilize sensing at different spatial and temporal resolutions using networks of varying capacities to effectively perform real time control of precise and reactive tasks. We leverage off-the-shelf pretrained vision-language models to operate on low-frequency global features along with small non-pretrained models to adapt to high frequency local feedback. Through extensive experiments in 3 domains (coarse, precise and dynamic manipulation tasks), we show that our approach significantly improves (2X on average) over recent multi-task baselines. Further, our approach generalizes well to visual and geometric variations in target objects and to varying interaction forces. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: CoRL'23, Project website: http://tinyurl.com/multi-res-realtime-control

arXiv:2312.13252 [pdf, other]

Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Authors: Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

Abstract: While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized mult… ▽ More While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see https://diffusion-vision.github.io/dmd △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.04560 [pdf, other]

NeRFiller: Completing Scenes via Generative 3D Inpainting

Authors: Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

Abstract: We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpaintin… ▽ More We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2$\times$2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions. Our project page is at https://ethanweber.me/nerfiller. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Project page: https://ethanweber.me/nerfiller

arXiv:2311.13878 [pdf, other]

Minimizing Factual Inconsistency and Hallucination in Large Language Models

Authors: Muneeswaran I, Shreya Saxena, Siva Prasad, M V Sai Prakash, Advaith Shankar, Varun V, Vishal Vaddina, Saisubramaniam Gopalakrishnan

Abstract: Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generat… ▽ More Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer. The generated rationale enhances the transparency of the answer and our framework provides insights into how the model arrived at this answer, by using this rationale and the references to the context. In this paper, we demonstrate its effectiveness in improving the quality of responses to drug-related inquiries in the life sciences industry. Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets. Furthermore, fine-tuning samples based on our framework improves the accuracy of smaller open-access LLMs by 33-42% and competes with RAG on commercial models. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2310.11020 [pdf]

On the Relationship of Dichotomy of Mars and Occurrence of Dust Devils with Crustal Magnetic Fields

Authors: Shivam Saxena, Jayesh P. Pabari

Abstract: The dichotomy referred to as a partition or separation of a whole into two parts and specifically, the dichotomy is very important feature of Mars between the Southern and Northern regions of Mars, and another thing that makes Mars very special that is the occurrence of Dust Devils on Mars. So, we studied and survey the dust devils occurrence on Mars in different Martian Years on the whole Mars. W… ▽ More The dichotomy referred to as a partition or separation of a whole into two parts and specifically, the dichotomy is very important feature of Mars between the Southern and Northern regions of Mars, and another thing that makes Mars very special that is the occurrence of Dust Devils on Mars. So, we studied and survey the dust devils occurrence on Mars in different Martian Years on the whole Mars. We create a 2D map of Martian Surface and plot the coordinates where the dust devils are captured during their activity and use those locations where they leave the tracks behind them after passing from those locations and those tracks commonly referred to as a Dust Devils Tracks. So, we plot them in two different categories Direct Observations and Indirect Observations of Dust Devils and in the map, we have plotted the locations (coordinates) of DDs shows a variation in locations of occurrences with the Dichotomy the serpent like variation we observed and we find most of the dust devil are occurred on the Dichotomy and the nearby regions of it which follows the serpent like trajectory of dichotomy of Mars and another observation shows that these locations lie on the remanent magnetic fields zones of mars which referred to as crustal magnetic fields of Mars this previously unknown relationship between crustal magnetic fields, dichotomy of mars and occurrence of dust devils is being examined here. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: 14 Pages, 6 Figures, 2 Tables

arXiv:2309.04477 [pdf, other]

High pressure behaviour of the magnetic van der Waals molecular framework Ni(NCS)$_2$

Authors: Madeleine Geers, David M. Jarvis, Cheng Liu, Siddharth S. Saxena, Jem Pitcairn, Emily Myatt, Sebastian A. Hallweger, Silva M. Kronawitter, Gregor Kieslich, Sanliang Ling, Andrew B. Cairns, Dominik Daisenberger, Oscar Fabelo, Laura Cañadillas-Delgado, Matthew J. Cliffe

Abstract: Two-dimensional materials offer a unique range of magnetic, electronic and mechanical properties which can be controlled by external stimuli. Pressure is a particularly important stimulus, as it can be achieved readily and can produce large responses, especially in low-dimensional materials. In this paper we explore the pressure-dependence of the structural and magnetic properties of a two-dimensi… ▽ More Two-dimensional materials offer a unique range of magnetic, electronic and mechanical properties which can be controlled by external stimuli. Pressure is a particularly important stimulus, as it can be achieved readily and can produce large responses, especially in low-dimensional materials. In this paper we explore the pressure-dependence of the structural and magnetic properties of a two-dimensional van der Waals (vdW) molecular framework antiferromagnet with ferromagnetic layers, Ni(NCS)$_2$, up to 8.4 kbar. Through a combination of X-ray and neutron diffraction analysis, we find that Ni(NCS)$_2$ is significantly more compressible than comparable vdW metal halides, and its response is anisotropic not only out of the plane, but also within the layers. Using bulk magnetisation and neutron diffraction data, we show that the ambient layered antiferromagnetic phase is maintained up to the largest investigated pressure, but with an enhanced Néel temperature, $T_\mathrm{N}$, ($ΔT_\mathrm{N} / T_\mathrm{N} = +19$ %) and a large pressure sensitivity ($Q = \frac{1}{T_\mathrm{N}} \frac{\mathrm{d}T_\mathrm{N}}{\mathrm{d}P} = +2.3$ % kbar$^{-1}$), one of the larger values of magnetic pressure responsiveness for a vdW material. Density functional theory calculations suggest that this is due to increasing three-dimensionality. These results provide some of the first insights into the pressure response of molecular framework vdW magnets and suggest investigation of other molecular framework vdW magnets might uncover contenders for future pressure-switchable devices. △ Less

Submitted 4 October, 2023; v1 submitted 3 August, 2023; originally announced September 2023.

Comments: 10 pages, 7 figures

arXiv:2308.13624 [pdf]

Field Testing of Residential Bidirectional Electric Vehicle Charger for Power System Applications

Authors: Shivam Saxena, Hany Farag, Khunsha Nasr, Leigh St. Hilaire

Abstract: Bidirectional electric vehicle (EV) charging is a technology that is gaining rapid popularity due to its ability to provide economic and environmental benefits to both EV owners and power system operators (PSOs). Using the EV as a flexible source of energy, an EV owner can provide power to homes/buildings, or even participate in grid services such as demand response and frequency regulation. Howev… ▽ More Bidirectional electric vehicle (EV) charging is a technology that is gaining rapid popularity due to its ability to provide economic and environmental benefits to both EV owners and power system operators (PSOs). Using the EV as a flexible source of energy, an EV owner can provide power to homes/buildings, or even participate in grid services such as demand response and frequency regulation. However, there is a lack of real-world testing and validation for bidirectional charging technology, particularly in the residential segment. As such, this paper presents real-world field testing of a bidirectional EV charger deployed in a home. Control software is developed to dispatch the EV according to static setpoints, as well as automated load following, and its accuracy and responsiveness is reported on. The results of the testing with the charger and 2019 Nissan Leaf combination indicates a responsiveness of 6-8 seconds and accuracy of over 99%, which suggests feasible participation for applications such as load following, arbitrage, and demand response. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 5 pages

ACM Class: J.2

arXiv:2308.12047 [pdf, other]

doi 10.1364/OPTICA.503936

Terahertz imaging through emissivity control

Authors: Michal Mrnka, Harry Penketh, Ian R. Hooper, Sonal Saxena, Nicholas E. Grant, John D. Murphy, David B. Phillips, Euan Hendry

Abstract: Adoption of terahertz technologies is hindered by the lack of cost-effective THz sources. Here we demonstrate a fundamentally new way to generate and control THz radiation, via spatio-temporal emissivity modulation. By patterning the optical photoexcitation of a surface-passivated silicon wafer, we locally control the free-electron density, and thereby pattern the wafer's emissivity in the THz par… ▽ More Adoption of terahertz technologies is hindered by the lack of cost-effective THz sources. Here we demonstrate a fundamentally new way to generate and control THz radiation, via spatio-temporal emissivity modulation. By patterning the optical photoexcitation of a surface-passivated silicon wafer, we locally control the free-electron density, and thereby pattern the wafer's emissivity in the THz part of the electromagnetic spectrum. We show how this unconventional source of controllable THz radiation enables a new form of incoherent computational THz imaging. We use it to image various concealed objects, demonstrating this scheme has the penetrating capability of state-of-the-art THz imaging approaches, without the requirement of femto-second pulsed laser sources. Furthermore, the incoherent nature of thermal radiation also ensures the obtained images are free of interference artifacts. Our spatio-temporal emissivity control paves the way towards a new family of long-wavelength structured illumination, imaging and spectroscopy systems. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2306.02794 [pdf]

Long-term normalized difference urban index (NDUI) data time series for urban studies

Authors: Manmeet Singh, Subhasis Ghosh, Harsh Kamath, Vaisakh SB, Chandana Mitra, Shivam Saxena, Suryachandra Rao, Marshall Shepherd, Dev Niyogi

Abstract: Kee** continuous, long-term data to examine changes in urban surroundings is crucial as cities expand and develop. The DMSP OLS nighttime lights data and the Landsat NDVI were used to create the Normalized Difference Urbanization Index (NDUI), which has proven to be an invaluable resource for studying urban areas. However, DMSP's reach and usefulness are constrained by the fact that data collect… ▽ More Kee** continuous, long-term data to examine changes in urban surroundings is crucial as cities expand and develop. The DMSP OLS nighttime lights data and the Landsat NDVI were used to create the Normalized Difference Urbanization Index (NDUI), which has proven to be an invaluable resource for studying urban areas. However, DMSP's reach and usefulness are constrained by the fact that data collecting ended in 2014 while VIIRS has continued to collect the nighttime lights data since 2012. The unavailability of DMSP translates to a challenge in performing urban studies using the NDUI. In this work, we address this difficulty and suggest a novel approach to bringing the NDUI time series up to date. We first map the VIIRS to DMSP using 2012 as a calibration year and then construct an updated NDUI time series. ClimateDownscaleSuite is used and Swin Transformer is selected as the best model for the map**. The Swin Transformer model and the sophisticated machine learning capabilities it offers are used in conjunction with the VIIRS evening lighting data collected after 2012. By using this strategy, not only is the NDUI time series extended, but the potential of AI in filling in data gaps and boosting urban studies is also highlighted. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.01923 [pdf, other]

The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

Authors: Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet

Abstract: Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also… ▽ More Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, and a simple form of coarse-to-fine refinement, one can train state-of-the-art diffusion models for depth and optical flow estimation. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model, DDVM (Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\% on the KITTI optical flow benchmark, about 25\% better than the best published method. For an overview see https://diffusion-vision.github.io. △ Less

Submitted 5 December, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 (Oral)

arXiv:2305.07639 [pdf, other]

Efficient Neural Network based Classification and Outlier Detection for Image Moderation using Compressed Sensing and Group Testing

Authors: Sabyasachi Ghosh, Sanyam Saxena, Ajit Rajwade

Abstract: Popular social media platforms employ neural network based image moderation engines to classify images uploaded on them as having potentially objectionable content. Such moderation engines must answer a large number of queries with heavy computational cost, even though the actual number of images with objectionable content is usually a tiny fraction. Inspired by recent work on Neural Group Testing… ▽ More Popular social media platforms employ neural network based image moderation engines to classify images uploaded on them as having potentially objectionable content. Such moderation engines must answer a large number of queries with heavy computational cost, even though the actual number of images with objectionable content is usually a tiny fraction. Inspired by recent work on Neural Group Testing, we propose an approach which exploits this fact to reduce the overall computational cost of such engines using the technique of Compressed Sensing (CS). We present the quantitative matrix-pooled neural network (QMPNN), which takes as input $n$ images, and a $m \times n$ binary pooling matrix with $m < n$, whose rows indicate $m$ pools of images i.e. selections of $r$ images out of $n$. The QMPNN efficiently outputs the product of this matrix with the unknown sparse binary vector indicating whether each image is objectionable or not, i.e. it outputs the number of objectionable images in each pool. For suitable matrices, this is decoded using CS decoding algorithms to predict which images were objectionable. The computational cost of running the QMPNN and the CS algorithms is significantly lower than the cost of using a neural network with the same number of parameters separately on each image to classify the images, which we demonstrate via extensive experiments. Our technique is inherently resilient to moderate levels of errors in the prediction from the QMPNN. Furthermore, we present pooled deep outlier detection, which brings CS and group testing techniques to deep outlier detection, to provide for the case when the objectionable images do not belong to a set of pre-defined classes. This technique enables efficient automated moderation of off-topic images shared on topical forums dedicated to sharing images of a certain single class, many of which are currently human-moderated. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2304.12170 [pdf]

doi 10.1016/j.heliyon.2023.e15388

On Using Non-Kekule' Triangular Graphene Quantum Dots for Scavenging Hazardous Sulfur Hexafluoride Components

Authors: Vaishali Roondhe, Basant Roondhe, Sumit Saxena, Rajeev Ahuja, Alok Shukla

Abstract: The goal of the present study is to explore how the size and functionalization of graphene quantum dots (GQDs) affect their sensing capabilities. Specifically, we investigated the adsorption of SO$_2$, SOF$_2$, SO$_2$F$_2$, and SF$_6$ on GQDs that were functionalized with -CH$_3$, - COCH$_3$, and -NH$_2$. We used density functional theory to analyze the electronic properties of these functionalize… ▽ More The goal of the present study is to explore how the size and functionalization of graphene quantum dots (GQDs) affect their sensing capabilities. Specifically, we investigated the adsorption of SO$_2$, SOF$_2$, SO$_2$F$_2$, and SF$_6$ on GQDs that were functionalized with -CH$_3$, - COCH$_3$, and -NH$_2$. We used density functional theory to analyze the electronic properties of these functionalized GQDs and found that the functionalization significantly altered their electronic properties. For example, the B3LYP H-L gap of pristine triangulene was 3.9eV, while the H-L gap of functionalized triangulene ranged from 2.8 eV-3.6 eV (using the B3LYP functional). Our results indicate that -NH2 functionalized phenalenyl and triangulene provide strong interaction with SO$_2$, with adsorption energies of -0.429 eV and -0.427 eV, respectively. These adsorption properties exhibit physisorption, leading to high gas sensitivity and superior recovery time. The findings of this study provide new insights into the potential use of GQDs for detecting the decomposed constituents of sulfur hexafluoride, which can be beneficial for assessing the operation status of SF$_6$ insulated devices. Overall, our calculations suggest that functionalized GQDs can be employed in gas insulated systems for partial discharge detection. △ Less

Submitted 24 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 37 pages; 30 pages main manuscript (13 figures); 7 pages supporting information (8 figures)

Journal ref: Heliyon 9, e15388 (2023)

arXiv:2303.16088 [pdf, other]

GNN-Assisted Phase Space Integration with Application to Atomistics

Authors: Shashank Saxena, Jan-Hendrik Bastek, Miguel Spinola, Prateek Gupta, Dennis M. Kochmann

Abstract: Overcoming the time scale limitations of atomistics can be achieved by switching from the state-space representation of Molecular Dynamics (MD) to a statistical-mechanics-based representation in phase space, where approximations such as maximum-entropy or Gaussian phase packets (GPP) evolve the atomistic ensemble in a time-coarsened fashion. In practice, this requires the computation of expensive… ▽ More Overcoming the time scale limitations of atomistics can be achieved by switching from the state-space representation of Molecular Dynamics (MD) to a statistical-mechanics-based representation in phase space, where approximations such as maximum-entropy or Gaussian phase packets (GPP) evolve the atomistic ensemble in a time-coarsened fashion. In practice, this requires the computation of expensive high-dimensional integrals over all of phase space of an atomistic ensemble. This, in turn, is commonly accomplished efficiently by low-order numerical quadrature. We show that numerical quadrature in this context, unfortunately, comes with a set of inherent problems, which corrupt the accuracy of simulations -- especially when dealing with crystal lattices with imperfections. As a remedy, we demonstrate that Graph Neural Networks, trained on Monte-Carlo data, can serve as a replacement for commonly used numerical quadrature rules, overcoming their deficiencies and significantly improving the accuracy. This is showcased by three benchmarks: the thermal expansion of copper, the martensitic phase transition of iron, and the energy of grain boundaries. We illustrate the benefits of the proposed technique over classically used third- and fifth-order Gaussian quadrature, we highlight the impact on time-coarsened atomistic predictions, and we discuss the computational efficiency. The latter is of general importance when performing frequent evaluation of phase space or other high-dimensional integrals, which is why the proposed framework promises applications beyond the scope of atomistics. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.11525 [pdf, other]

Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

Authors: Vithursan Thangarasa, Shreyas Saxena, Abhay Gupta, Sean Lie

Abstract: Recent research has focused on weight sparsity in neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w.r.t training FLOPs). However, sparse weight training often sacrifices accuracy, requiring extended training schedules to attain the accuracy of dense models. In contrast, our approach, Sparse Iso-FLOP Transformations (Sparse-IFT), uses sparsity to improve accur… ▽ More Recent research has focused on weight sparsity in neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w.r.t training FLOPs). However, sparse weight training often sacrifices accuracy, requiring extended training schedules to attain the accuracy of dense models. In contrast, our approach, Sparse Iso-FLOP Transformations (Sparse-IFT), uses sparsity to improve accuracy while maintaining dense model FLOPs. Using a single hyperparameter (i.e., sparsity level), Sparse-IFTs efficiently replace dense layers, expanding the search space for optimal sparse masks. In addition, dynamic sparse training with Sparse-IFT models effectively navigates this larger sparse mask-weight space, which is evidenced by a spectral analysis using Ramanujan graph properties. Our study reveals a robust correlation among mask topology, weights, and final performance. Notably, without adjusting hyperparameters, replacing dense layers with Sparse-IFT yields significant improvements, such as a +3.5% boost for ResNet-18 on ImageNet and +0.9% for GPT-3 Small on the Open LLM leaderboard. To our knowledge, this is the first work to demonstrate the use of sparsity for improving the accuracy of dense models through a simple-to-use set of sparse transformations. Code is available at: https://github.com/CerebrasResearch/Sparse-IFT. △ Less

Submitted 5 March, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: 13 pages, 5 figures (Main Paper) + 9 pages (Supplementary Material)

arXiv:2303.10464 [pdf, other]

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

Authors: Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis DeCoste, Sean Lie, Shreyas Saxena

Abstract: The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Sca… ▽ More The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Scaling the model and dataset size has helped improve the performance of LLMs, but unfortunately, this also lead to highly prohibitive computational costs. Pre-training LLMs often require orders of magnitude more FLOPs than fine-tuning and the model capacity often remains the same between the two phases. To achieve training efficiency w.r.t training FLOPs, we propose to decouple the model capacity between the two phases and introduce Sparse Pre-training and Dense Fine-tuning (SPDF). In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to learn (Dense Fine-tuning). We demonstrate that we can induce up to 75% sparsity into a 1.3B parameter GPT-3 XL model resulting in a 2.5x reduction in pre-training FLOPs, without a significant loss in accuracy on the downstream tasks relative to the dense baseline. By rigorously evaluating multiple downstream tasks, we also establish a relationship between sparsity, task complexity and dataset size. Our work presents a promising direction to train large GPT models at a fraction of the training FLOPs using weight sparsity, while retaining the benefits of pre-trained textual representations for downstream tasks. △ Less

Submitted 29 July, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

Comments: Accepted to Uncertainty in Artificial Intelligence (UAI) 2023 Conference; 13 pages, 4 figures (Main Paper) + 5 pages (Supplementary Material)

arXiv:2302.14816 [pdf, other]

Monocular Depth Estimation using Diffusion Models

Authors: Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet

Abstract: We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an $L_1$ loss, and depth infilling during training. To cope with the limited availability o… ▽ More We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an $L_1$ loss, and depth infilling during training. To cope with the limited availability of data for supervised training, we leverage pre-training on self-supervised image-to-image translation tasks. Despite the simplicity of the approach, with a generic loss and architecture, our DepthGen model achieves SOTA performance on the indoor NYU dataset, and near SOTA results on the outdoor KITTI dataset. Further, with a multimodal posterior, DepthGen naturally represents depth ambiguity (e.g., from transparent surfaces), and its zero-shot performance combined with depth imputation, enable a simple but effective text-to-3D pipeline. Project page: https://depth-gen.github.io △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.11821 [pdf, ps, other]

Storage in Computational Geometry

Authors: Yijie Han, Sanjeev Saxena

Abstract: We show that $n$ real numbers can be stored in a constant number of real numbers such that each original real number can be fetched in $O(\log n)$ time. Although our result has implications for many computational geometry problems, we show here, combined with Han's $O(n\sqrt{\log n})$ time real number sorting algorithm [3, arXiv:1801.00776], we can improve the complexity of Kirkpatrick's point l… ▽ More We show that $n$ real numbers can be stored in a constant number of real numbers such that each original real number can be fetched in $O(\log n)$ time. Although our result has implications for many computational geometry problems, we show here, combined with Han's $O(n\sqrt{\log n})$ time real number sorting algorithm [3, arXiv:1801.00776], we can improve the complexity of Kirkpatrick's point location algorithm [8] to $O(n\sqrt{\log n})$ preprocessing time, a constant number of real numbers for storage and $O(\log n)$ point location time. Kirkpatrick's algorithm uses $O(n\log n)$ preprocessing time, $O(n)$ storage and $O(\log n)$ point location time. The complexity results in Kirkpatrick's algorithm was the previous best result. Although Lipton and Tarjan's algorithm [10] predates Kirkpatrick's algorithm and has the same complexity, Kirkpatrick's algorithm is simpler and has a better structure. This paper can be viewed as a companion paper of paper [3, arXiv:1801.00776]. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: This is an interesting result, especially when read together with paper [3]

arXiv:2302.06854 [pdf, other]

doi 10.1109/BigData55660.2022.10020725

Large-Scale Knowledge Synthesis and Complex Information Retrieval from Biomedical Documents

Authors: Shreya Saxena, Raj Sangani, Siva Prasad, Shubham Kumar, Mihir Athale, Rohan Awhad, Vishal Vaddina

Abstract: Recent advances in the healthcare industry have led to an abundance of unstructured data, making it challenging to perform tasks such as efficient and accurate information retrieval at scale. Our work offers an all-in-one scalable solution for extracting and exploring complex information from large-scale research documents, which would otherwise be tedious. First, we briefly explain our knowledge… ▽ More Recent advances in the healthcare industry have led to an abundance of unstructured data, making it challenging to perform tasks such as efficient and accurate information retrieval at scale. Our work offers an all-in-one scalable solution for extracting and exploring complex information from large-scale research documents, which would otherwise be tedious. First, we briefly explain our knowledge synthesis process to extract helpful information from unstructured text data of research documents. Then, on top of the knowledge extracted from the documents, we perform complex information retrieval using three major components- Paragraph Retrieval, Triplet Retrieval from Knowledge Graphs, and Complex Question Answering (QA). These components combine lexical and semantic-based methods to retrieve paragraphs and triplets and perform faceted refinement for filtering these search results. The complexity of biomedical queries and documents necessitates using a QA system capable of handling queries more complex than factoid queries, which we evaluate qualitatively on the COVID-19 Open Research Dataset (CORD-19) to demonstrate the effectiveness and value-add. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2212.10247 [pdf, other]

Dominance for Containment Problems

Authors: Waseem Akram, Sanjeev Saxena

Abstract: In a containment problem, the goal is to preprocess a set of geometric objects so that, given a geometric query object, we can report all the objects containing the query object. We consider the containment problem where input objects are homothetic triangles and the query objects considered are line segments, circles, and trapezoids with bases parallel to either axis. We show that this problem ca… ▽ More In a containment problem, the goal is to preprocess a set of geometric objects so that, given a geometric query object, we can report all the objects containing the query object. We consider the containment problem where input objects are homothetic triangles and the query objects considered are line segments, circles, and trapezoids with bases parallel to either axis. We show that this problem can be solved using the 3-d query dominance problem. The solutions presented can also be extended for higher dimensions. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2210.06366 [pdf, other]

A Generalist Framework for Panoptic Segmentation of Images and Videos

Authors: Ting Chen, Lala Li, Saurabh Saxena, Geoffrey Hinton, David J. Fleet

Abstract: Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many map**. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, withou… ▽ More Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many map**. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our simple approach can perform competitively to state-of-the-art specialist methods in similar settings. △ Less

Submitted 12 October, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: ICCV'23. Code at https://github.com/google-research/pix2seq

arXiv:2209.15132 [pdf, other]

Dynamic Inference on Graphs using Structured Transition Models

Authors: Saumya Saxena, Oliver Kroemer

Abstract: Enabling robots to perform complex dynamic tasks such as picking up an object in one swee** motion or pushing off a wall to quickly turn a corner is a challenging problem. The dynamic interactions implicit in these tasks are critical towards the successful execution of such tasks. Graph neural networks (GNNs) provide a principled way of learning the dynamics of interactive systems but can suffer… ▽ More Enabling robots to perform complex dynamic tasks such as picking up an object in one swee** motion or pushing off a wall to quickly turn a corner is a challenging problem. The dynamic interactions implicit in these tasks are critical towards the successful execution of such tasks. Graph neural networks (GNNs) provide a principled way of learning the dynamics of interactive systems but can suffer from scaling issues as the number of interactions increases. Furthermore, the problem of using learned GNN-based models for optimal control is insufficiently explored. In this work, we present a method for efficiently learning the dynamics of interacting systems by simultaneously learning a dynamic graph structure and a stable and locally linear forward model of the system. The dynamic graph structure encodes evolving contact modes along a trajectory by making probabilistic predictions over the edges of the graph. Additionally, we introduce a temporal dependence in the learned graph structure which allows us to incorporate contact measurement updates during execution thus enabling more accurate forward predictions. The learned stable and locally linear dynamics enable the use of optimal control algorithms such as iLQR for long-horizon planning and control for complex interactive tasks. Through experiments in simulation and in the real world, we evaluate the performance of our method by using the learned interaction dynamics for control and demonstrate generalization to more objects and interactions not seen during training. We introduce a control scheme that takes advantage of contact measurement updates and hence is robust to prediction inaccuracies during execution. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.05353 [pdf, other]

doi 10.21468/SciPostPhys.15.1.020

Pressure-induced transitions in FePS$_3$: Structural, magnetic and electronic properties

Authors: Shiyu Deng, Siyu Chen, Bartomeu Monserrat, Emilio Artacho, Siddharth S Saxena

Abstract: FePS$_3$ is a prototype van der Waals layered antiferromagnet and a Mott insulator under ambient conditions, which has been recently reported to go through a pressure-induced dimensionality crossover and an insulator-to-metal transition. These transitions also lead to the appearance of a novel magnetic metallic state. To further understand these emergent structural and physical properties, we have… ▽ More FePS$_3$ is a prototype van der Waals layered antiferromagnet and a Mott insulator under ambient conditions, which has been recently reported to go through a pressure-induced dimensionality crossover and an insulator-to-metal transition. These transitions also lead to the appearance of a novel magnetic metallic state. To further understand these emergent structural and physical properties, we have performed a first-principles study using van der Waals and Hubbard $U$ corrected density functional theory including a random structure search. Our computational study attempts to interpret the experimental coexistence of the low- and intermediate-pressure phases and we predict a novel high-pressure phase with distinctive dimensionality and different possible origins of metallicity. △ Less

Submitted 10 April, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: Re-Submission to SciPost

Journal ref: SciPost Phys. 15, 020 (2023)

arXiv:2208.02186 [pdf, ps, other]

On Brooks' Theorem

Authors: Gopalan Sajith, Sanjeev Saxena

Abstract: In this note we give two proofs of Brooks' Theorem. The first is obtained by modifying an earlier proof and the second by combining two earlier proofs. We believe these proofs are easier to teach in Computer Science courses. In this note we give two proofs of Brooks' Theorem. The first is obtained by modifying an earlier proof and the second by combining two earlier proofs. We believe these proofs are easier to teach in Computer Science courses. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: 5 pages

MSC Class: 05C15 ACM Class: G.2.2

arXiv:2207.11954 [pdf, ps, other]

Simpler O(1) Query Algorithm for Level Ancestors

Authors: Sanjeev Saxena

Abstract: This note describes a very simple O(1) query time algorithm for finding level ancestors. This is basically a serial (re)-implementation of the parallel algorithm of Berkman and Vishkin (O.Berkman and U.Vishkin, Finding level-ancestors in trees, JCSS, 48, 214--230, 1994). Although the basic algorithm has preprocessing time of O(n log n), by having additional levels or using table lookup, the prep… ▽ More This note describes a very simple O(1) query time algorithm for finding level ancestors. This is basically a serial (re)-implementation of the parallel algorithm of Berkman and Vishkin (O.Berkman and U.Vishkin, Finding level-ancestors in trees, JCSS, 48, 214--230, 1994). Although the basic algorithm has preprocessing time of O(n log n), by having additional levels or using table lookup, the preprocessing time can be reduced to almost linear or linear. The table lookup algorithm can be built in O(1) parallel time with $n$ processors and can also be used to simplify the parallel algorithm of Berkman and Vishkin and make it optimal. △ Less

Submitted 9 April, 2024; v1 submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.02419 [pdf, other]

BioTABQA: Instruction Learning for Biomedical Table Question Answering

Authors: Man Luo, Sharad Saxena, Swaroop Mishra, Mihir Parmar, Chitta Baral

Abstract: Table Question Answering (TQA) is an important but under-explored task. Most of the existing QA datasets are in unstructured text format and only few of them use tables as the context. To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information. In this paper, we first curate a table question answering dataset, BioTABQA,… ▽ More Table Question Answering (TQA) is an important but under-explored task. Most of the existing QA datasets are in unstructured text format and only few of them use tables as the context. To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information. In this paper, we first curate a table question answering dataset, BioTABQA, using 22 templates and the context from a biomedical textbook on differential diagnosis. BioTABQA can not only be used to teach a model how to answer questions from tables but also evaluate how a model generalizes to unseen questions, an important scenario for biomedical applications. To achieve the generalization evaluation, we divide the templates into 17 training and 5 cross-task evaluations. Then, we develop two baselines using single and multi-tasks learning on BioTABQA. Furthermore, we explore instructional learning, a recent technique showing impressive generalizing performance. Experimental results show that our instruction-tuned model outperforms single and multi-task baselines on an average by ~23% and ~6% across various evaluation settings, and more importantly, instruction-tuned model outperforms baselines by ~5% on cross-tasks. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: BioASQ10 Workshop

arXiv:2206.07669 [pdf, other]

A Unified Sequence Interface for Vision Tasks

Authors: Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton

Abstract: While language tasks are naturally expressed in a single, unified, modeling framework, i.e., generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of… ▽ More While language tasks are naturally expressed in a single, unified, modeling framework, i.e., generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of a shared pixel-to-sequence interface. We focus on four tasks, namely, object detection, instance segmentation, keypoint detection, and image captioning, all with diverse types of outputs, e.g., bounding boxes or dense masks. Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization. To solve a specific task, we use a short prompt as task description, and the sequence output adapts to the prompt so it can produce task-specific output. We show that such a model can achieve competitive performance compared to well-established task-specific models. △ Less

Submitted 15 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: The first three authors contributed equally

arXiv:2206.02518 [pdf]

A Model for Predicting Ignition Potential of Complex Fuel in Diurnally Variable Environment

Authors: Saurabh Saxena, Ritambhara Dubey, Neda Yaghoobian

Abstract: Fuel ignition potential is one of the primary drivers influencing the extent of damage in wildland and wildland-urban interface fires. Determining fire and ember exposure of fuels that vary spatially and temporally will help to recognize necessary defensive actions and reduce damages. In this paper, the development of a new computational model, Temperature And Moisture Evolution predictor for comp… ▽ More Fuel ignition potential is one of the primary drivers influencing the extent of damage in wildland and wildland-urban interface fires. Determining fire and ember exposure of fuels that vary spatially and temporally will help to recognize necessary defensive actions and reduce damages. In this paper, the development of a new computational model, Temperature And Moisture Evolution predictor for complex Fuel in Open Environment (TAMEFOE), is presented. TAMEFOE predicts the diurnal temperature and moisture content evolution and vulnerability to flame ignition of objects/fuels with complex shapes or settings and materials under variable environmental conditions. The model is applicable to complex fuel scenarios (e.g., interface or intermix communities) composed of natural and manmade random-shaped objects in open atmosphere under the influence of local weather and diurnal solar radiation. The vulnerability of fuel to ember or fire ignition is determined by predicting the transient temperature and dryness of fuel in connection with the surrounding, local environment, and flame heat if any exists. In this regard, a detailed surface energy balance analysis, coupled with a water budget analysis, is performed in high spatiotemporal resolution. The model performance was validated against several existing analytical and measured data. The discrete, high-resolution surface temperature and moisture content information obtained from the model can also provide unsteady boundary conditions for computational fluid dynamics simulations when coupled physics is desired. △ Less

Submitted 16 January, 2023; v1 submitted 8 May, 2022; originally announced June 2022.

arXiv:2205.11487 [pdf, other]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Authors: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only c… ▽ More We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment. See https://imagen.research.google/ for an overview of the results. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2204.13028 [pdf, other]

doi 10.1016/j.commatsci.2022.111511

Finite-temperature surface elasticity of crystalline solids

Authors: Shashank Saxena, Miguel Spinola, Prateek Gupta, Dennis M. Kochmann

Abstract: Surface energies and surface elasticity largely affect the mechanical response of nanostructures as well as the physical phenomenon associated with surfaces such as evaporation and adsorption. Studying surface energies at finite temperatures is therefore of immense interest for nanoscale applications. However, calculating surface energies and derived quantities from atomistic ensembles is usually… ▽ More Surface energies and surface elasticity largely affect the mechanical response of nanostructures as well as the physical phenomenon associated with surfaces such as evaporation and adsorption. Studying surface energies at finite temperatures is therefore of immense interest for nanoscale applications. However, calculating surface energies and derived quantities from atomistic ensembles is usually limited to zero temperature or involve cumbersome thermodynamic integration techniques at finite temperature. Here, we illustrate a technique to identify the energy and elastic properties of surfaces of solids at non-zero temperature based on a Gaussian phase packets (GPP) approach (which in the isothermal limit coincides with a maximum-entropy formulation). Using this setup, we investigate the effect of temperature on the surface properties of different crystal faces for six pure metals -- copper, nickel, alumimum, iron, tungsten and vanadium -- thus covering both FCC and BCC lattice structures. While the obtained surface energies and stresses usually show a decreasing trend with increasing temperature, the elastic constants do not show such a consistent trend across the different materials and are quite sensitive to temperature changes. Validation is performed by comparing the obtained surface energy densities of selected BCC and FCC materials to those calculated via molecular dynamics. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2203.12146 [pdf]

doi 10.1016/j.ijhydene.2022.02.161

Role of Functionalized Graphene Quantum Dots in Hydrogen Evolution Reaction: A Density Functional Theory Study

Authors: Vaishali Sharma, Basant Roondhe, Sumit Saxena, Alok Shukla

Abstract: Density functional theory (DFT) can be quite advantageous in advancing the field of catalysis because of the microscopic insights it provides, and thus can guide experimental searches of novel catalysts. Several recent works have demonstrated that low-dimensional materials can be very efficient catalysts. Graphene quantum dots (GQDs) have gained much attention in past years due to their unique pro… ▽ More Density functional theory (DFT) can be quite advantageous in advancing the field of catalysis because of the microscopic insights it provides, and thus can guide experimental searches of novel catalysts. Several recent works have demonstrated that low-dimensional materials can be very efficient catalysts. Graphene quantum dots (GQDs) have gained much attention in past years due to their unique properties like low toxicity, chemical inertness, biocompatibility, crystallinity, etc. These properties of GQDs which are due to quantum confinement and edge effects facilitate their applications in various fields like sensing, photoelectronics, catalysis, and many more. Furthermore, the properties of GQDs can be enhanced by do** and functionalization. In order to understand the effects of functionalization by oxygen and boron based groups on the catalytic properties relevant to the hydrogen-evolution reaction (HER), we perform a systematic study of GQDs functionalized with the oxygen (O), borinic acid (BC$_2$O), and boronic acid (BCO$_2$ ). All calculations that included geometry optimization, electronic and adsorption mechanism, were carried out using the Gaussian16 package, employing the hybrid functional B3LYP, and the basis set 6-31G(d,p). With the variation in functionalization groups in GQDs, we observe significant changes in their electronic properties. The adsorption energy E$_{ads}$ of hydrogen over O-GQD, BC$_2$O-GQD, and BCO$_2$-GQD is -0.059 eV, -0.031 eV and -0.032 eV respectively. Accordingly, Gibbs free energy ($ΔG$) of hydrogen adsorption is extraordinarily near the ideal value (0 eV) for all the three types of functionalized GQDs. Thus, the present work suggests pathways for experimental realization of low-cost and multifunctional GQDs based catalysts for clean and renewable hydrogen energy production. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 25 pages, 7 figures (included); in press in Int. J. Hyd. Energy

arXiv:2203.07015 [pdf, other]

doi 10.1103/PhysRevB.107.054106

Comparative structural evolution under pressure of powder and single crystals of the layered antiferromagnet FePS$_3$

Authors: David M. Jarvis, Matthew J. Coak, Hayrullo Hamidov, Charles R. S. Haines, Giulio I. Lampronti, Cheng Liu, Shiyu Deng, Dominik Daisenberger, David R. Allan, Mark R. Warren, Andrew R. Wildes, Siddharth S. Saxena

Abstract: The layered antiferromagnet FePS$_3$ has been shown to undergo a structural transition under pressure linked to an insulator-metal transition, with two incompatible models previously proposed for the highest-pressure structure. We present a study of the high-pressure crystal structures of FePS$_3$ using both single-crystal and powder x-ray diffraction. We show that the highest pressure transition… ▽ More The layered antiferromagnet FePS$_3$ has been shown to undergo a structural transition under pressure linked to an insulator-metal transition, with two incompatible models previously proposed for the highest-pressure structure. We present a study of the high-pressure crystal structures of FePS$_3$ using both single-crystal and powder x-ray diffraction. We show that the highest pressure transition involves a collapse of the inter-planar spacing of this material, along with an increase in symmetry from a monoclinic to a trigonal structure, to the exclusion of other models. The extent of this volume collapse is shown to be sensitive to the presence of a helium pressure medium in the sample environment, indicating that consideration of such experimental factors is important for understanding high-pressure behaviours in this material. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 5 pages, 3 figures

arXiv:2112.12625 [pdf, other]

Comparison and Analysis of Image-to-Image Generative Adversarial Networks: A Survey

Authors: Sagar Saxena, Mohammad Nayeem Teli

Abstract: Generative Adversarial Networks (GANs) have recently introduced effective methods of performing Image-to-Image translations. These models can be applied and generalized to a variety of domains in Image-to-Image translation without changing any parameters. In this paper, we survey and analyze eight Image-to-Image Generative Adversarial Networks: Pix2Pix, CycleGAN, CoGAN, StarGAN, MUNIT, StarGAN2, D… ▽ More Generative Adversarial Networks (GANs) have recently introduced effective methods of performing Image-to-Image translations. These models can be applied and generalized to a variety of domains in Image-to-Image translation without changing any parameters. In this paper, we survey and analyze eight Image-to-Image Generative Adversarial Networks: Pix2Pix, CycleGAN, CoGAN, StarGAN, MUNIT, StarGAN2, DA-GAN, and Self Attention GAN. Each of these models presented state-of-the-art results and introduced new techniques to build Image-to-Image GANs. In addition to a survey of the models, we also survey the 18 datasets they were trained on and the 9 metrics they were evaluated on. Finally, we present results of a controlled experiment for 6 of these models on a common set of metrics and datasets. The results were mixed and showed that, on certain datasets, tasks, and metrics, some models outperformed others. The last section of this paper discusses those results and establishes areas of future research. As researchers continue to innovate new Image-to-Image GANs, it is important to gain a good understanding of the existing methods, datasets, and metrics. This paper provides a comprehensive overview and discussion to help build this foundation. △ Less

Submitted 26 August, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: 36 pages, 22 figures, Preprint; format changed, typos corrected

arXiv:2112.02750 [pdf, other]

doi 10.1063/5.0081205

An FPGA-based Timing and Control System for the Dynamic Compression Sector

Authors: Shefali Saxena, Daniel R. Paskvan, Nicholas R. Weir, Nicholas Sinclair

Abstract: A field programmable gate array (FPGA) based timing and trigger control system has been developed for the Dynamic Compression Sector (DCS) user facility located at the Advanced Photon Source (APS) at Argonne National Laboratory. The DCS is a first-of-its-kind capability dedicated to dynamic compression science. All components of the DCS laser shock station - x-ray choppers, single-shot shutter, in… ▽ More A field programmable gate array (FPGA) based timing and trigger control system has been developed for the Dynamic Compression Sector (DCS) user facility located at the Advanced Photon Source (APS) at Argonne National Laboratory. The DCS is a first-of-its-kind capability dedicated to dynamic compression science. All components of the DCS laser shock station - x-ray choppers, single-shot shutter, internal laser triggers, and shot diagnostics-must be synchronized with respect to the arrival of x-rays in the hutch. A field-programmable gate array (FPGA) synchronized to the APS storage ring radio frequency (RF) clock (352 MHz) generates trigger signals for each stage of the laser and x-ray shutter system with low jitter. The system is composed of a Zynq FPGA, a debug card, line drivers and power supply. The delay and offsets of trigger signals can be adjusted using a user-friendly graphical user interface (GUI) with high precision. The details of the system architecture, timing requirements, firmware, and software implementation along with the performance evaluation are presented in this paper. The system offers low timing jitter (15.5 ps r.m.s.) with respect to APS 352 MHz clock, suitable for the 50 ps r.m.s. x-ray bunch duration at the APS. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2112.02530 [pdf, other]

Exploring and Mitigating Gender Bias in Recommender Systems with Explicit Feedback

Authors: Shrikant Saxena, Shweta Jain

Abstract: Recommender systems are indispensable because they influence our day-to-day behavior and decisions by giving us personalized suggestions. Services like Kindle, Youtube, and Netflix depend heavily on the performance of their recommender systems to ensure that their users have a good experience and to increase revenues. Despite their popularity, it has been shown that recommender systems reproduce a… ▽ More Recommender systems are indispensable because they influence our day-to-day behavior and decisions by giving us personalized suggestions. Services like Kindle, Youtube, and Netflix depend heavily on the performance of their recommender systems to ensure that their users have a good experience and to increase revenues. Despite their popularity, it has been shown that recommender systems reproduce and amplify the bias present in the real world. The resulting feedback creates a self-perpetuating loop that deteriorates the user experience and results in homogenizing recommendations over time. Further, biased recommendations can also reinforce stereotypes based on gender or ethnicity, thus reinforcing the filter bubbles that we live in. In this paper, we address the problem of gender bias in recommender systems with explicit feedback. We propose a model to quantify the gender bias present in book rating datasets and in the recommendations produced by the recommender systems. Our main contribution is to provide a principled approach to mitigate the bias being produced in the recommendations. We theoretically show that the proposed approach provides unbiased recommendations despite biased data. Through empirical evaluation on publicly available book rating datasets, we further show that the proposed model can significantly reduce bias without significant impact on accuracy. Our method is model agnostic and can be applied to any recommender system. To demonstrate the performance of our model, we present the results on four recommender algorithms, two from the K-nearest neighbors family, UserKNN and ItemKNN, and the other two from the matrix factorization family, Alternating least square and Singular value decomposition. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Comments: 19 pages, 13 figures

arXiv:2112.01860 [pdf, ps, other]

doi 10.1007/978-3-031-34347-6_2

Point Enclosure Problem for Homothetic Polygons

Authors: Waseem Akram, Sanjeev Saxena

Abstract: In this paper, we investigate the homothetic point enclosure problem: given a set $S$ of $n$ triangles with sides parallel to three fixed directions, find a data structure for $S$ that can report all the triangles of $S$ that contain a query point efficiently. The problem is "inverse" of the homothetic range search problem. We present an $O(n\log n)$ space solution that supports the queries in… ▽ More In this paper, we investigate the homothetic point enclosure problem: given a set $S$ of $n$ triangles with sides parallel to three fixed directions, find a data structure for $S$ that can report all the triangles of $S$ that contain a query point efficiently. The problem is "inverse" of the homothetic range search problem. We present an $O(n\log n)$ space solution that supports the queries in $O(\log n + k)$ time, where $k$ is the output size. The preprocessing time is $O(n\log n)$. The same results also hold for homothetic polygons. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Journal ref: In: Combinatorial Algorithms. IWOCA 2023. Lecture Notes in Computer Science, vol 13889

arXiv:2110.00557 [pdf, other]

doi 10.1063/5.0076249

The QICK (Quantum Instrumentation Control Kit): Readout and control for qubits and detectors

Authors: Leandro Stefanazzi, Ken Treptow, Neal Wilcer, Chris Stoughton, Salvatore Montella, Collin Bradford, Gustavo Cancelo, Shefali Saxena, Horacio Arnaldi, Sara Sussman, Andrew Houck, Ankur Agrawal, Helin Zhang, Chunyang Ding, David I Schuster

Abstract: We introduce a Xilinx RFSoC-based qubit controller (called the Quantum Instrumentation Control Kit, or QICK for short) which supports the direct synthesis of control pulses with carrier frequencies of up to 6 GHz. The QICK can control multiple qubits or other quantum devices. The QICK consists of a digital board hosting an RFSoC (RF System-on-Chip) FPGA \cite{zcu111}, custom firmware and software… ▽ More We introduce a Xilinx RFSoC-based qubit controller (called the Quantum Instrumentation Control Kit, or QICK for short) which supports the direct synthesis of control pulses with carrier frequencies of up to 6 GHz. The QICK can control multiple qubits or other quantum devices. The QICK consists of a digital board hosting an RFSoC (RF System-on-Chip) FPGA \cite{zcu111}, custom firmware and software and an optional companion custom-designed analog front-end board. We characterize the analog performance of the system, as well as its digital latency, important for quantum error correction and feedback protocols. We benchmark the controller by performing standard characterizations of a transmon qubit. We achieve an average Clifford gate fidelity of $\mathcal{F}_{avg}=99.93\%$. All of the schematics, firmware, and software are open-source \cite{QICKrepo}. △ Less

Submitted 10 March, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2109.10852 [pdf, other]

Pix2seq: A Language Modeling Framework for Object Detection

Authors: Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton

Abstract: We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceiv… ▽ More We present Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural network to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural network knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms. △ Less

Submitted 27 March, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: ICLR'22. Code and pretrained models at https://github.com/google-research/pix2seq

arXiv:2109.08771 [pdf, other]

Search-Based Task Planning with Learned Skill Effect Models for Lifelong Robotic Manipulation

Authors: Jacky Liang, Mohit Sharma, Alex LaGrassa, Shivam Vats, Saumya Saxena, Oliver Kroemer

Abstract: Robots deployed in many real-world settings need to be able to acquire new skills and solve new tasks over time. Prior works on planning with skills often make assumptions on the structure of skills and tasks, such as subgoal skills, shared skill implementations, or task-specific plan skeletons, which limit adaptation to new skills and tasks. By contrast, we propose doing task planning by jointly… ▽ More Robots deployed in many real-world settings need to be able to acquire new skills and solve new tasks over time. Prior works on planning with skills often make assumptions on the structure of skills and tasks, such as subgoal skills, shared skill implementations, or task-specific plan skeletons, which limit adaptation to new skills and tasks. By contrast, we propose doing task planning by jointly searching in the space of parameterized skills using high-level skill effect models learned in simulation. We use an iterative training procedure to efficiently generate relevant data to train such models. Our approach allows flexible skill parameterizations and task specifications to facilitate lifelong learning in general-purpose domains. Experiments demonstrate the ability of our planner to integrate new skills in a lifelong manner, finding new task strategies with lower costs in both train and test tasks. We additionally show that our method can transfer to the real world without further fine-tuning. △ Less

Submitted 13 April, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

Comments: To appear in the International Conference on Robotics and Automation (ICRA) 2022

arXiv:2109.02482 [pdf, other]

doi 10.1063/5.0075580

Direct statistical simulation of the Lorenz63 system

Authors: Kuan Li, J. B. Marston, Saloni Saxena, Steven M. Tobias

Abstract: We use direct statistical simulation (DSS) to find the low-order statistics of the well-known dynamical system, the Lorenz63 model. Instead of accumulating statistics from numerical simulation of the dynamical systems, we solve the equations of motion for the statistics themselves after closing them by making several different choices for the truncation. Fixed points of the statistics are obtained… ▽ More We use direct statistical simulation (DSS) to find the low-order statistics of the well-known dynamical system, the Lorenz63 model. Instead of accumulating statistics from numerical simulation of the dynamical systems, we solve the equations of motion for the statistics themselves after closing them by making several different choices for the truncation. Fixed points of the statistics are obtained either by time evolving, or by iterative methods. Statistics so obtained are compared to those found by the traditional approach. △ Less

Submitted 16 October, 2021; v1 submitted 27 August, 2021; originally announced September 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.12741

Journal ref: Chaos 32, 043111 (2022)

arXiv:2108.11554 [pdf, other]

XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

Authors: V Manushree, Sameer Saxena, Parna Chowdhury, Manisimha Varma, Harsh Rathod, Ankita Ghosh, Sahil Khose

Abstract: Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color… ▽ More Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. △ Less

Submitted 7 January, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: ML for Creativity and Design workshop at NeurIPS 2021

arXiv:2106.06129 [pdf, other]

Instance-Level Task Parameters: A Robust Multi-task Weighting Framework

Authors: Pavan Kumar Anasosalu Vasu, Shreyas Saxena, Oncel Tuzel

Abstract: Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks. However, performance of such systems depend on relative weighting between various losses involved during training. Prior works on loss weighting schemes assume that instances are equally easy or hard for all tasks. In order to break this assumption, w… ▽ More Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks. However, performance of such systems depend on relative weighting between various losses involved during training. Prior works on loss weighting schemes assume that instances are equally easy or hard for all tasks. In order to break this assumption, we let the training process dictate the optimal weighting of tasks for every instance in the dataset. More specifically, we equip every instance in the dataset with a set of learnable parameters (instance-level task parameters) where the cardinality is equal to the number of tasks learned by the model. These parameters model the weighting of each task for an instance. They are updated by gradient descent and do not require hand-crafted rules. We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks. In these tasks, our approach outperforms recent dynamic loss weighting approaches, e.g. reducing surface estimation errors by 8.97% on SURREAL. When applied to datasets where one or more tasks can have noisy annotations, the proposed method learns to prioritize learning from clean labels for a given task, e.g. reducing surface estimation errors by up to 60%. We also show that we can reliably detect corrupt labels for a given task as a by-product from learned instance-level task parameters. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2105.13464 [pdf, other]

Training With Data Dependent Dynamic Learning Rates

Authors: Shreyas Saxena, Nidhi Vyas, Dennis DeCoste

Abstract: Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances present in the dataset. This setting is widely adopted under the assumption that loss functions for each instance are similar in nature, and hence, a common lear… ▽ More Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances present in the dataset. This setting is widely adopted under the assumption that loss functions for each instance are similar in nature, and hence, a common learning rate can be used. In this work, we relax this assumption and propose an optimization framework which accounts for difference in loss function characteristics across instances. More specifically, our optimizer learns a dynamic learning rate for each instance present in the dataset. Learning a dynamic learning rate for each instance allows our optimization framework to focus on different modes of training data during optimization. When applied to an image classification task, across different CNN architectures, learning dynamic learning rates leads to consistent gains over standard optimizers. When applied to a dataset containing corrupt instances, our framework reduces the learning rates on noisy instances, and improves over the state-of-the-art. Finally, we show that our optimization framework can be used for personalization of a machine learning model towards a known targeted data distribution. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Showing 1–50 of 150 results for author: Saxena, S