Search | arXiv e-print repository

Calibrated Forecasting and Persuasion

Abstract: How should an expert send forecasts to maximize her utility subject to passing a calibration test? We consider a dynamic game where an expert sends probabilistic forecasts to a decision maker. The decision maker uses a calibration test based on past outcomes to verify the expert's forecasts. We characterize the optimal forecasting strategy by reducing the dynamic game to a static persuasion proble… ▽ More How should an expert send forecasts to maximize her utility subject to passing a calibration test? We consider a dynamic game where an expert sends probabilistic forecasts to a decision maker. The decision maker uses a calibration test based on past outcomes to verify the expert's forecasts. We characterize the optimal forecasting strategy by reducing the dynamic game to a static persuasion problem. A distribution of forecasts is implementable by a calibrated strategy if and only if it is a mean-preserving contraction of the distribution of conditionals (honest forecasts). We characterize the value of information by comparing what an informed and uninformed expert can attain. Moreover, we consider a decision maker who uses regret minimization, instead of the calibration test, to take actions. We show that the expert can achieve the same payoff against a regret minimizer as under the calibration test, and in some instances, she can achieve strictly more. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: The conference version of this work has been accepted to the Twenty-Fifth ACM Conference on Economics and Computation (EC'24)

arXiv:2406.13031 [pdf, other]

A machine learning pipeline for automated insect monitoring

Authors: Aditya Jain, Fagner Cunha, Michael Bunsen, Léonard Pasi, Anna Viklund, Maxim Larrivée, David Rolnick

Abstract: Climate change and other anthropogenic factors have led to a catastrophic decline in insects, endangering both biodiversity and the ecosystem services on which human society depends. Data on insect abundance, however, remains woefully inadequate. Camera traps, conventionally used for monitoring terrestrial vertebrates, are now being modified for insects, especially moths. We describe a complete, o… ▽ More Climate change and other anthropogenic factors have led to a catastrophic decline in insects, endangering both biodiversity and the ecosystem services on which human society depends. Data on insect abundance, however, remains woefully inadequate. Camera traps, conventionally used for monitoring terrestrial vertebrates, are now being modified for insects, especially moths. We describe a complete, open-source machine learning-based software pipeline for automated monitoring of moths via camera traps, including object detection, moth/non-moth classification, fine-grained identification of moth species, and tracking individuals. We believe that our tools, which are already in use across three continents, represent the future of massively scalable data collection in entomology. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Journal ref: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning

arXiv:2406.12452 [pdf, other]

Insect Identification in the Wild: The AMI Dataset

Authors: Aditya Jain, Fagner Cunha, Michael James Bunsen, Juan Sebastián Cañas, Léonard Pasi, Nathan Pinoy, Flemming Helsing, JoAnne Russo, Marc Botham, Michael Sabourin, Jonathan Fréchette, Alexandre Anctil, Yacksecari Lopez, Eduardo Navarro, Filonila Perez Pimentel, Ana Cecilia Zamora, José Alejandro Ramirez Silva, Jonathan Gagnon, Tom August, Kim Bjerge, Alba Gomez Segura, Marc Bélisle, Yves Basset, Kent P. McFarland, David Roy , et al. (3 additional authors not shown)

Abstract: Insects represent half of all global biodiversity, yet many of the world's insects are disappearing, with severe implications for ecosystems and agriculture. Despite this crisis, data on insect diversity and abundance remain woefully inadequate, due to the scarcity of human experts and the lack of scalable tools for monitoring. Ecologists have started to adopt camera traps to record and study inse… ▽ More Insects represent half of all global biodiversity, yet many of the world's insects are disappearing, with severe implications for ecosystems and agriculture. Despite this crisis, data on insect diversity and abundance remain woefully inadequate, due to the scarcity of human experts and the lack of scalable tools for monitoring. Ecologists have started to adopt camera traps to record and study insects, and have proposed computer vision algorithms as an answer for scalable data processing. However, insect monitoring in the wild poses unique challenges that have not yet been addressed within computer vision, including the combination of long-tailed data, extremely similar classes, and significant distribution shifts. We provide the first large-scale machine learning benchmarks for fine-grained insect recognition, designed to match real-world tasks faced by ecologists. Our contributions include a curated dataset of images from citizen science platforms and museums, and an expert-annotated dataset drawn from automated camera traps across multiple continents, designed to test out-of-distribution generalization under field conditions. We train and evaluate a variety of baseline algorithms and introduce a combination of data augmentation techniques that enhance generalization across geographies and hardware setups. Code and datasets are made publicly available. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11500 [pdf, other]

ESI-GAL: EEG Source Imaging-based Kinematics Parameter Estimation for Grasp and Lift Task

Authors: Anant Jain, Lalan Kumar

Abstract: Objective: Electroencephalogram (EEG) signals-based motor kinematics prediction (MKP) has been an active area of research to develop brain-computer interface (BCI) systems such as exosuits, prostheses, and rehabilitation devices. However, EEG source imaging (ESI) based kinematics prediction is sparsely explored in the literature. Approach: In this study, pre-movement EEG features are utilized to p… ▽ More Objective: Electroencephalogram (EEG) signals-based motor kinematics prediction (MKP) has been an active area of research to develop brain-computer interface (BCI) systems such as exosuits, prostheses, and rehabilitation devices. However, EEG source imaging (ESI) based kinematics prediction is sparsely explored in the literature. Approach: In this study, pre-movement EEG features are utilized to predict three-dimensional (3D) hand kinematics for the grasp-and-lift motor task. A public dataset, WAY-EEG-GAL, is utilized for MKP analysis. In particular, sensor-domain (EEG data) and source-domain (ESI data) based features from the frontoparietal region are explored for MKP. Deep learning-based models are explored to achieve efficient kinematics decoding. Various time-lagged and window sizes are analyzed for hand kinematics prediction. Subsequently, intra-subject and inter-subject MKP analysis is performed to investigate the subject-specific and subject-independent motor-learning capabilities of the neural decoders. The Pearson correlation coefficient (PCC) is used as the performance metric for kinematics trajectory decoding. Main results: The rEEGNet neural decoder achieved the best performance with sensor-domain and source-domain features with the time lag and window size of 100 ms and 450 ms, respectively. The highest mean PCC values of 0.790, 0.795, and 0.637 are achieved using sensor-domain features, while 0.769, 0.777, and 0.647 are achieved using source-domain features in x, y, and z-directions, respectively. Significance: This study explores the feasibility of trajectory prediction using EEG sensor-domain and source-domain EEG features for the grasp-and-lift task. Furthermore, inter-subject trajectory estimation is performed using the proposed deep learning decoder with EEG source domain features. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09607 [pdf, other]

Gate voltage modulation of the superconducting state in a degenerate semiconductor

Authors: Bikash C. Barik, Himadri Chakraborti, Buddhadeb Pal, Aditya K. Jain, Swagata Bhunia, Sounak Samanta, Apurba Laha, Suddhasatta Mahapatra, K. Das Gupta

Abstract: In this work, we demonstrate that the modulation of carrier density can alter the superconducting transition temperature by up to $204$ mK in epitaxial Indium Nitride on Gallium Nitride, accounting for the $10$% of the transition temperature in ungated conditions. Our samples are likely free from strong localization effects and significant granularity, as indicated by $( k_{f l} \gg 1 )$, suggesti… ▽ More In this work, we demonstrate that the modulation of carrier density can alter the superconducting transition temperature by up to $204$ mK in epitaxial Indium Nitride on Gallium Nitride, accounting for the $10$% of the transition temperature in ungated conditions. Our samples are likely free from strong localization effects and significant granularity, as indicated by $( k_{f l} \gg 1 )$, suggesting that the primary determinant of the transition temperature in InN is carrier density, rather than disorder scattering. The observed behavior is consistent with BCS s-wave superconductivity, corroborated by the superconducting parameters we measured. Furthermore, we observed a $60$% bipolar suppression of the supercurrent in our experiments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures, supplementary material attached. Comments are welcome

arXiv:2406.08728 [pdf, other]

Primordial magnetic relics and their signatures

Authors: Arka Banerjee, Lalit Singh Bhandari, Ashwat Jain, Arun M. Thalapillil

Abstract: Primordial black holes bearing magnetic charges may bypass the constraints imposed by Hawking radiation, thereby enabling reasonable present-day populations, even for masses below $10^{15}\,\text{g}$ -- a range previously considered improbable. They could, therefore, conceivably contribute to a component of dark matter. We investigate novel Faraday rotation signatures exhibited by primordial magne… ▽ More Primordial black holes bearing magnetic charges may bypass the constraints imposed by Hawking radiation, thereby enabling reasonable present-day populations, even for masses below $10^{15}\,\text{g}$ -- a range previously considered improbable. They could, therefore, conceivably contribute to a component of dark matter. We investigate novel Faraday rotation signatures exhibited by primordial magnetic black holes while also establishing new Parker-type bounds on their populations. For the latter, we bound the dark matter fraction from intergalactic magnetic fields in cosmic voids $\left(f_{\text{DM}} \lesssim 10^{-8}\right)$ and cosmic web filaments $\left(f_{\text{ DM}} \lesssim 10^{-4}\right)$, notably eclipsing previous bounds. Exploring Faraday rotation effects, we discern a pronounced rotation of the polarization angle and the rotation measure values for extremal primordial magnetic black holes with masses $M^{\text{ ex.}}_{\text{ BH}}\gtrsim 10^{-6}~ \text{M}_\odot$. This makes them potentially detectable in current observations. A comparative investigation finds that the effects are notably greater than for a neutron star, like a Magnetar, with a similar magnetic field at the surface. Moreover, the polarization angle maps for primordial magnetic black holes exhibit unique features, notably absent in other astrophysical magnetic configurations. In this context, we also introduce a simple integral measure, offering a quantitative measure for their discrimination in many scenarios. These traits potentially suggest a robust avenue for their observational detection and differentiation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 52 pages, 13 figures

arXiv:2406.08431 [pdf, other]

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

Authors: Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

Abstract: We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup… ▽ More We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup samples from a point in weight space that approximates the geometric mean of the distributions of constituent datasets, which offers anti-memorization guarantees and enables zero-shot style mixing. Empirically, Diffusion Soup outperforms a paragon model trained on the union of all data shards and achieves a 30% improvement in Image Reward (.34 $\to$ .44) on domain sharded data, and a 59% improvement in IR (.37 $\to$ .59) on aesthetic data. In both cases, sou** also prevails in TIFA score (respectively, 85.5 $\to$ 86.5 and 85.6 $\to$ 86.8). We demonstrate robust unlearning -- removing any individual domain shard only lowers performance by 1% in IR (.45 $\to$ .44) -- and validate our theoretical insights on anti-memorization using real data. Finally, we showcase Diffusion Soup's ability to blend the distinct styles of models finetuned on different shards, resulting in the zero-shot generation of hybrid styles. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07334 [pdf, other]

Fractonic solids

Authors: Akash Jain

Abstract: Fractons are exotic quasiparticles whose mobility in space is restricted by symmetries. In potential real-world realisations, fractons are likely lodged to a physical material rather than absolute space. Motivated by this, we propose and explore a new symmetry principle that restricts the motion of fractons relative to a physical solid. Unlike models with restricted mobility in absolute space, the… ▽ More Fractons are exotic quasiparticles whose mobility in space is restricted by symmetries. In potential real-world realisations, fractons are likely lodged to a physical material rather than absolute space. Motivated by this, we propose and explore a new symmetry principle that restricts the motion of fractons relative to a physical solid. Unlike models with restricted mobility in absolute space, these fractonic solids admit gauge-invariant momentum density, are compatible with boost symmetry, and can consistently be coupled to gravity. We also propose a holographic model for fractonic solids. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages + bibliography and supplementary material; a supplementary mathematica notebook is included containing the details of dispersion relations

arXiv:2406.06808 [pdf, ps, other]

Fast White-Box Adversarial Streaming Without a Random Oracle

Authors: Ying Feng, Aayush Jain, David P. Woodruff

Abstract: Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse reco… ▽ More Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse recovery problem and extend our result to other tasks such as distinct element estimation and low-rank approximation of matrices and tensors. The main drawback of previous work is that it requires a random oracle, which is especially problematic in the streaming model since the amount of randomness is counted in the space complexity of a streaming algorithm. Also, the previous work suffers from large update time. We construct a near-optimal solution for the sparse recovery problem in white-box adversarial streams, based on the subexponentially secure Learning with Errors assumption. Importantly, our solution does not require a random oracle and has a polylogarithmic per item processing time. We also give results in a related white-box adversarially robust distributed model. Our constructions are based on homomorphic encryption schemes satisfying very mild structural properties that are currently satisfied by most known schemes. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.00287 [pdf, other]

GenPalm: Contactless Palmprint Generation with Diffusion Models

Authors: Steven A. Grosz, Anil K. Jain

Abstract: The scarcity of large-scale palmprint databases poses a significant bottleneck to advancements in contactless palmprint recognition. To address this, researchers have turned to synthetic data generation. While Generative Adversarial Networks (GANs) have been widely used, they suffer from instability and mode collapse. Recently, diffusion probabilistic models have emerged as a promising alternative… ▽ More The scarcity of large-scale palmprint databases poses a significant bottleneck to advancements in contactless palmprint recognition. To address this, researchers have turned to synthetic data generation. While Generative Adversarial Networks (GANs) have been widely used, they suffer from instability and mode collapse. Recently, diffusion probabilistic models have emerged as a promising alternative, offering stable training and better distribution coverage. This paper introduces a novel palmprint generation method using diffusion probabilistic models, develops an end-to-end framework for synthesizing multiple palm identities, and validates the realism and utility of the generated palmprints. Experimental results demonstrate the effectiveness of our approach in generating palmprint images which enhance contactless palmprint recognition performance across several test databases utilizing challenging cross-database and time-separated evaluation protocols. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2406.00237 [pdf, other]

A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases

Authors: Ananya Jain, Aviral Bhardwaj, Kaushik Murali, Isha Surani

Abstract: Large language models, notably utilizing Transformer architectures, have emerged as powerful tools due to their scalability and ability to process large amounts of data. Dosovitskiy et al. expanded this architecture to introduce Vision Transformers (ViT), extending its applicability to image processing tasks. Motivated by this advancement, we fine-tuned two variants of ViT models, one pre-trained… ▽ More Large language models, notably utilizing Transformer architectures, have emerged as powerful tools due to their scalability and ability to process large amounts of data. Dosovitskiy et al. expanded this architecture to introduce Vision Transformers (ViT), extending its applicability to image processing tasks. Motivated by this advancement, we fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset containing over 100,000 frontal-view X-ray images. Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases, while using Convolutional Neural Networks (CNNs) and ResNet architectures as baseline models for comparison. Through rigorous assessment based on accuracy metrics, we identify that the pre-trained ViT model surpasses CNNs and ResNet in this multilabel classification task, highlighting its potential for accurate diagnosis of various lung conditions from chest X-ray images. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures

arXiv:2405.18296 [pdf, other]

Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

Authors: Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli

Abstract: Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations. Current understanding of bias formation mostly focuses on the initial and final stages of learning, leaving a gap in knowledge regarding the transient dynamics. To address this gap, this paper explores the evolution of bias in a teacher-student s… ▽ More Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations. Current understanding of bias formation mostly focuses on the initial and final stages of learning, leaving a gap in knowledge regarding the transient dynamics. To address this gap, this paper explores the evolution of bias in a teacher-student setup modeling different data sub-populations with a Gaussian-mixture model. We provide an analytical description of the stochastic gradient descent dynamics of a linear classifier in this setting, which we prove to be exact in high dimension. Notably, our analysis reveals how different properties of sub-populations influence bias at different timescales, showing a shifting preference of the classifier during training. Applying our findings to fairness and robustness, we delineate how and when heterogeneous data and spurious features can generate and amplify bias. We empirically validate our results in more complex scenarios by training deeper networks on synthetic and real datasets, including CIFAR10, MNIST, and CelebA. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.15282 [pdf, other]

Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation

Authors: Abhinav Jain, Swarat Chaudhuri, Thomas Reps, Chris Jermaine

Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific… ▽ More Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LOPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LOPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 11 pages, 4 figures, 3 tables

arXiv:2405.13930 [pdf, other]

AlabOS: A Python-based Reconfigurable Workflow Management Framework for Autonomous Laboratories

Authors: Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain, Gerbrand Ceder

Abstract: The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for… ▽ More The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. We demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory. AlabOS features a reconfigurable experiment workflow model, enabling the simultaneous execution of varied workflows composed of modular tasks. Therefore, AlabOS is well-suited to handle the rapidly changing experimental protocols defining the progress of self-driving laboratory development for materials research. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 30 pages, 5 figures

arXiv:2405.13653 [pdf, other]

Downlink Power Control based UE-Sided Initial Access for Tactical 5G NR

Authors: Akshay Jain, Karthik Upadhya, Mikko A. Uusitalo, Harish Viswanathan

Abstract: Communication technologies play a crucial role in battlefields. They are an inalienable part of any tactical response, whether at the battlefront or inland. Such scenarios require that the communication technologies be versatile, scalable, cost-effective, and stealthy. While multiple studies and past products have tried to address these requirements, none of them have been able to solve all the fo… ▽ More Communication technologies play a crucial role in battlefields. They are an inalienable part of any tactical response, whether at the battlefront or inland. Such scenarios require that the communication technologies be versatile, scalable, cost-effective, and stealthy. While multiple studies and past products have tried to address these requirements, none of them have been able to solve all the four challenges simultaneously. Hence, in this paper, we propose a tactical solution that is based on the versatile, scalable, and cost effective 5G NR system. Our focus is on the initial-access phase which is subject to a high probability of detection by an eavesdropper. To address this issue, we propose a novel approach that involves some modifications to the initial access procedure that lowers the probability of detection while not affecting standards compliance and not requiring any modifications to the user equipment chipset implementation. Further, we demonstrate that with a simple downlink power control algorithm, we reduce the probability of detection at an eavesdropper. The result is a 5G NR based initial-access method that improves stealthiness when compared with a vanilla 5G NR implementation. △ Less

Submitted 14 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE MILCOM 2024

arXiv:2405.12842 [pdf, other]

SmartFlow: Robotic Process Automation using LLMs

Authors: Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig, Gautam Shroff

Abstract: Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we p… ▽ More Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we present SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) coupled with deep-learning based image understanding. Our system can adapt to new scenarios, including changes in the user interface and variations in input data, without the need for human intervention. SmartFlow uses computer vision and natural language processing to perceive visible elements on the graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate a sequence of actions that are executed by a scripting engine to complete an assigned task. To assess the effectiveness of SmartFlow, we have developed a dataset that includes a set of generic enterprise applications with diverse layouts, which we are releasing for research use. Our evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications. SmartFlow can automate a wide range of business processes such as form filling, customer service, invoice processing, and back-office operations. SmartFlow can thus assist organizations in enhancing productivity by automating an even larger fraction of screen-based workflows. The demo-video and dataset are available at https://smartflow-4c5a0a.webflow.io/. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 32nd ACM International Conference on Information and Knowledge Management

arXiv:2405.12742 [pdf, other]

Multi-Subject Personalization

Authors: Arushi Jain, Shubham Paliwal, Monika Sharma, Vikram Jamwal, Lovekesh Vig

Abstract: Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to… ▽ More Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to alleviate some of these challenges. We implement MSP using Stable Diffusion and assess our approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 2023 Conference on Neural Information Processing Systems

arXiv:2405.12704 [pdf, other]

LPD-Aware Uplink CSI-based 5G NR Downlink Synchronization for Tactical Networks

Authors: Karthik Upadhya, Akshay Jain, Mikko A. Uusitalo, Harish Viswanathan

Abstract: 5G NR is touted to be an attractive candidate for tactical networks owing to its versatility, scalability, and low cost. However, tactical networks need to be stealthy, where an adversary is not able to detect or intercept the tactical communication. In this paper, we investigate the stealthiness of 5G NR by looking at the probability with which an adversary that monitors the downlink synchronizat… ▽ More 5G NR is touted to be an attractive candidate for tactical networks owing to its versatility, scalability, and low cost. However, tactical networks need to be stealthy, where an adversary is not able to detect or intercept the tactical communication. In this paper, we investigate the stealthiness of 5G NR by looking at the probability with which an adversary that monitors the downlink synchronization signals can detect the presence of the network. We simulate a single-cell single-eavesdropper scenario and evaluate the probability with which the eavesdropper can detect the synchronization signal block when using either a correlator or an energy detector. We show that this probability is close to $ 100\% $ suggesting that 5G out-of-the-box is not suitable for a tactical network. We then propose utilizing the uplink channel-state-information to beamform the downlink synchronization-signals towards the tactical user-equipment (UE) to lower the eavesdropper detection probability while not compromising the performance of the legitimate tactical UE. △ Less

Submitted 14 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE MILCOM 2024

arXiv:2405.12531 [pdf, other]

CustomText: Customized Textual Image Generation using Diffusion Models

Authors: Shubham Paliwal, Arushi Jain, Monika Sharma, Vikram Jamwal, Lovekesh Vig

Abstract: Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the s… ▽ More Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the synthesis of high-quality images with precise text customization, thereby contributing to the advancement of image generation models. We call our proposed method CustomText. Our implementation leverages a pre-trained TextDiffuser model to enable control over font color, background, and types. Additionally, to address the challenge of accurately rendering small-sized fonts, we train the ControlNet model for a consistency decoder, significantly enhancing text-generation performance. We assess the performance of CustomText in comparison to previous methods of textual image generation on the publicly available CTW-1500 dataset and a self-curated dataset for small-text generation, showcasing superior results. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: Accepted by AI for Content Creation (AI4CC) workshop at CVPR 2024

arXiv:2405.11023 [pdf, ps, other]

Hydrodynamics of thermal active matter

Authors: Jay Armas, Akash Jain, Ruben Lier

Abstract: Active matter concerns many-body systems comprised of living or self-driven agents that collectively exhibit macroscopic phenomena distinct from conventional passive matter. Using Schwinger-Keldysh effective field theory, we develop a novel hydrodynamic framework for thermal active matter that accounts for local temperature variations and the ensuing stochastic effects. This framework provides a d… ▽ More Active matter concerns many-body systems comprised of living or self-driven agents that collectively exhibit macroscopic phenomena distinct from conventional passive matter. Using Schwinger-Keldysh effective field theory, we develop a novel hydrodynamic framework for thermal active matter that accounts for local temperature variations and the ensuing stochastic effects. This framework provides a deeper understanding of energy balance, second law of thermodynamics, and thermostated steady states in active matter, while also addressing the systematic violations of fluctuation-dissipation theorem and detailed balance. We use our framework of active hydrodynamics to develop effective field theory actions for active superfluids and active nematics that offer a first-principle derivation of various active transport coefficients and feature activity-induced phase transitions. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.07838 [pdf, other]

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Authors: Arushi Jain, Josiah P. Hanna, Doina Precup

Abstract: General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This… ▽ More General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This leaves an open question: how can behavior policy be chosen for data-efficient GVF learning? To address this gap, we propose GVFExplorer, which aims at learning a behavior policy that efficiently gathers data for evaluating multiple GVFs in parallel. This behavior policy selects actions in proportion to the total variance in the return across all GVFs, reducing the number of environmental interactions. To enable accurate variance estimation, we use a recently proposed temporal-difference-style variance estimator. We prove that each behavior policy update reduces the mean squared error in the summed predictions over all GVFs. We empirically demonstrate our method's performance in both tabular representations and nonlinear function approximation. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 20 pages, 9 figures, Under Review

arXiv:2405.07417 [pdf, other]

Identifying Hate Speech Peddlers in Online Platforms. A Bayesian Social Learning Approach for Large Language Model Driven Decision-Makers

Authors: Adit Jain, Vikram Krishnamurthy

Abstract: This paper studies the problem of autonomous agents performing Bayesian social learning for sequential detection when the observations of the state belong to a high-dimensional space and are expensive to analyze. Specifically, when the observations are textual, the Bayesian agent can use a large language model (LLM) as a map to get a low-dimensional private observation. The agent performs Bayesian… ▽ More This paper studies the problem of autonomous agents performing Bayesian social learning for sequential detection when the observations of the state belong to a high-dimensional space and are expensive to analyze. Specifically, when the observations are textual, the Bayesian agent can use a large language model (LLM) as a map to get a low-dimensional private observation. The agent performs Bayesian learning and takes an action that minimizes the expected cost and is visible to subsequent agents. We prove that a sequence of such Bayesian agents herd in finite time to the public belief and take the same action disregarding the private observations. We propose a stop** time formulation for quickest time herding in social learning and optimally balance privacy and herding. Structural results are shown on the threshold nature of the optimal policy to the stop** time problem. We illustrate the application of our framework when autonomous Bayesian detectors aim to sequentially identify if a user is a hate speech peddler on an online platform by parsing text observations using an LLM. We numerically validate our results on real-world hate speech datasets. We show that autonomous Bayesian agents designed to flag hate speech peddlers in online platforms herd and misclassify the users when the public prior is strong. We also numerically show the effect of a threshold policy in delaying herding. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07415 [pdf, ps, other]

Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization

Authors: Adit Jain, Vikram Krishnamurthy

Abstract: This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG.… ▽ More This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG. The oracle probabilistically returns either a noisy gradient of the function} or a non-informative measurement, depending on the oracle state and incentive. The learner's query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This paper formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi-armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.06989 [pdf, other]

Stabilizing Circular Motion Within Nonconcentric Circular Boundary: A Mobius Transformation-Based Approach

Authors: Shubham Singh, Anoop Jain

Abstract: Nonuniform motion constraints are ubiquitous in robotic applications. Geofencing control is one such paradigm where the motion of a robot must be constrained within a predefined boundary. This paper addresses the problem of stabilizing a unicycle robot around a desired circular orbit while confining its motion within a nonconcentric external circular boundary. Our solution approach relies on the c… ▽ More Nonuniform motion constraints are ubiquitous in robotic applications. Geofencing control is one such paradigm where the motion of a robot must be constrained within a predefined boundary. This paper addresses the problem of stabilizing a unicycle robot around a desired circular orbit while confining its motion within a nonconcentric external circular boundary. Our solution approach relies on the concept of the so-called Mobius transformation that, under certain practical conditions, maps two nonconcentric circles to a pair of concentric circles, and hence, results in uniform spatial motion constraints. The choice of such a Mobius transformation is governed by the roots of a quadratic equation in the post-design analysis that decides how the regions enclosed by the two circles are mapped onto the two planes. We show that the problem can be formulated either as a trajectory-constraining problem or an obstacle-avoidance problem in the transformed plane, depending on these roots. Exploiting the idea of the barrier Lyapunov function, we propose a unique control law that solves both these contrasting problems in the transformed plane and renders a solution to the original problem in the actual plane. By relating parameters of two planes under Mobius transformation and its inverse map, we further establish a connection between the control laws in two planes and determine the control law to be applied in the actual plane. Simulation and experimental results are provided to illustrate the key theoretical developments. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.04030 [pdf, other]

Uncovering implementable dormant pruning decisions from three different stakeholder perspectives

Authors: Deanna Flynn, Abhinav Jain, Heather Knight, Cristina G. Wilson, Cindy Grimm

Abstract: Dormant pruning, or the removal of unproductive portions of a tree while a tree is not actively growing, is an important orchard task to help maintain yield, requiring years to build expertise. Because of long training periods and an increasing labor shortage in agricultural jobs, pruning could benefit from robotic automation. However, to program robots to prune branches, we first need to understa… ▽ More Dormant pruning, or the removal of unproductive portions of a tree while a tree is not actively growing, is an important orchard task to help maintain yield, requiring years to build expertise. Because of long training periods and an increasing labor shortage in agricultural jobs, pruning could benefit from robotic automation. However, to program robots to prune branches, we first need to understand how pruning decisions are made, and what variables in the environment (e.g., branch size and thickness) we need to capture. Working directly with three pruning stakeholders -- horticulturists, growers, and pruners -- we find that each group of human experts approaches pruning decision-making differently. To capture this knowledge, we present three studies and two extracted pruning protocols from field work conducted in Prosser, Washington in January 2022 and 2023. We interviewed six stakeholders (two in each group) and observed pruning across three cultivars -- Bing Cherries, Envy Apples, and Jazz Apples -- and two tree architectures -- Upright Fruiting Offshoot and V-Trellis. Leveraging participant interviews and video data, this analysis uses grounded coding to extract pruning terminology, discover horticultural contexts that influence pruning decisions, and find implementable pruning heuristics for autonomous systems. The results include a validated terminology set, which we offer for use by both pruning stakeholders and roboticists, to communicate general pruning concepts and heuristics. The results also highlight seven pruning heuristics utilizing this terminology set that would be relevant for use by future autonomous robot pruning systems, and characterize three discovered horticultural contexts (i.e., environmental management, crop-load management, and replacement wood) across all three cultivars. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 36 pages; 21 figures

arXiv:2405.01734 [pdf, other]

Diabetic Retinopathy Detection Using Quantum Transfer Learning

Authors: Ankush Jain, Rinav Gupta, Jai Singhal

Abstract: Diabetic Retinopathy (DR), a prevalent complication in diabetes patients, can lead to vision impairment due to lesions formed on the retina. Detecting DR at an advanced stage often results in irreversible blindness. The traditional process of diagnosing DR through retina fundus images by ophthalmologists is not only time-intensive but also expensive. While classical transfer learning models have b… ▽ More Diabetic Retinopathy (DR), a prevalent complication in diabetes patients, can lead to vision impairment due to lesions formed on the retina. Detecting DR at an advanced stage often results in irreversible blindness. The traditional process of diagnosing DR through retina fundus images by ophthalmologists is not only time-intensive but also expensive. While classical transfer learning models have been widely adopted for computer-aided detection of DR, their high maintenance costs can hinder their detection efficiency. In contrast, Quantum Transfer Learning offers a more effective solution to this challenge. This approach is notably advantageous because it operates on heuristic principles, making it highly optimized for the task. Our proposed methodology leverages this hybrid quantum transfer learning technique to detect DR. To construct our model, we utilize the APTOS 2019 Blindness Detection dataset, available on Kaggle. We employ the ResNet-18, ResNet34, ResNet50, ResNet101, ResNet152 and Inception V3, pre-trained classical neural networks, for the initial feature extraction. For the classification stage, we use a Variational Quantum Classifier. Our hybrid quantum model has shown remarkable results, achieving an accuracy of 97% for ResNet-18. This demonstrates that quantum computing, when integrated with quantum machine learning, can perform tasks with a level of power and efficiency unattainable by classical computers alone. By harnessing these advanced technologies, we can significantly improve the detection and diagnosis of Diabetic Retinopathy, potentially saving many from the risk of blindness. Keywords: Diabetic Retinopathy, Quantum Transfer Learning, Deep Learning △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 14 pages, 12 figures and 5 tables

arXiv:2404.18890 [pdf, other]

Hide and Seek: How Does Watermarking Impact Face Recognition?

Authors: Yuguang Yao, Steven Grosz, Sijia Liu, Anil Jain

Abstract: The recent progress in generative models has revolutionized the synthesis of highly realistic images, including face images. This technological development has undoubtedly helped face recognition, such as training data augmentation for higher recognition accuracy and data privacy. However, it has also introduced novel challenges concerning the responsible use and proper attribution of computer gen… ▽ More The recent progress in generative models has revolutionized the synthesis of highly realistic images, including face images. This technological development has undoubtedly helped face recognition, such as training data augmentation for higher recognition accuracy and data privacy. However, it has also introduced novel challenges concerning the responsible use and proper attribution of computer generated images. We investigate the impact of digital watermarking, a technique for embedding ownership signatures into images, on the effectiveness of face recognition models. We propose a comprehensive pipeline that integrates face image generation, watermarking, and face recognition to systematically examine this question. The proposed watermarking scheme, based on an encoder-decoder architecture, successfully embeds and recovers signatures from both real and synthetic face images while preserving their visual fidelity. Through extensive experiments, we unveil that while watermarking enables robust image attribution, it results in a slight decline in face recognition accuracy, particularly evident for face images with challenging poses and expressions. Additionally, we find that directly training face recognition models on watermarked images offers only a limited alleviation of this performance decline. Our findings underscore the intricate trade off between watermarking and face recognition accuracy. This work represents a pivotal step towards the responsible utilization of generative models in face recognition and serves to initiate discussions regarding the broader implications of watermarking in biometrics. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17627 [pdf, other]

Impact of Traffic-Following on Order of Autonomous Airspace Operations

Authors: Anahita Jain, Husni R. Idris, John-Paul Clarke

Abstract: In this paper, we investigate the dynamic emergence of traffic order in a distributed multi-agent system, aiming to minimize inefficiencies that stem from unnecessary structural impositions. We introduce a methodology for develo** a dynamically-updating traffic pattern map of the airspace by leveraging information about the consistency and frequency of flow directions used by current as well as… ▽ More In this paper, we investigate the dynamic emergence of traffic order in a distributed multi-agent system, aiming to minimize inefficiencies that stem from unnecessary structural impositions. We introduce a methodology for develo** a dynamically-updating traffic pattern map of the airspace by leveraging information about the consistency and frequency of flow directions used by current as well as preceding traffic. Informed by this map, an agent can discern the degree to which it is advantageous to follow traffic by trading off utilities such as time and order. We show that for the traffic levels studied, for low degrees of traffic-following behavior, there is minimal penalty in terms of aircraft travel times while improving the overall orderliness of the airspace. On the other hand, heightened traffic-following behavior may result in increased aircraft travel times, while marginally reducing the overall entropy of the airspace. Ultimately, the methods and metrics presented in this paper can be used to optimally and dynamically adjust an agent's traffic-following behavior based on these trade-offs. △ Less

Submitted 3 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17390 [pdf, other]

How Could AI Support Design Education? A Study Across Fields Fuels Situating Analytics

Authors: Ajit Jain, Andruid Kerne, Hannah Fowler, **sil Seo, Galen Newman, Nic Lupfer, Aaron Perrine

Abstract: We use the process and findings from a case study of design educators' practices of assessment and feedback to fuel theorizing about how to make AI useful in service of human experience. We build on Suchman's theory of situated actions. We perform a qualitative study of 11 educators in 5 fields, who teach design processes situated in project-based learning contexts. Through qualitative data gather… ▽ More We use the process and findings from a case study of design educators' practices of assessment and feedback to fuel theorizing about how to make AI useful in service of human experience. We build on Suchman's theory of situated actions. We perform a qualitative study of 11 educators in 5 fields, who teach design processes situated in project-based learning contexts. Through qualitative data gathering and analysis, we derive codes: design process; assessment and feedback challenges; and computational support. We twice invoke creative cognition's family resemblance principle. First, to explain how design instructors already use assessment rubrics and second, to explain the analogous role for design creativity analytics: no particular trait is necessary or sufficient; each only tends to indicate good design work. Human teachers remain essential. We develop a set of situated design creativity analytics--Fluency, Flexibility, Visual Consistency, Multiscale Organization, and Legible Contrast--to support instructors' efforts, by providing on-demand, learning objectives-based assessment and feedback to students. We theorize a methodology, which we call situating analytics, firstly because making AI support living human activity depends on aligning what analytics measure with situated practices. Further, we realize that analytics can become most significant to users by situating them through interfaces that integrate them into the material contexts of their use. Here, this means situating design creativity analytics into actual design environments. Through the case study, we identify situating analytics as a methodology for explaining analytics to users, because the iterative process of alignment with practice has the potential to enable data scientists to derive analytics that make sense as part of and support situated human experiences. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 31 pages, 3 figures, Submitted to ACM

ACM Class: H.5.2

arXiv:2404.13791 [pdf, other]

Universal Fingerprint Generation: Controllable Diffusion Model with Multimodal Conditions

Authors: Steven A. Grosz, Anil K. Jain

Abstract: The utilization of synthetic data for fingerprint recognition has garnered increased attention due to its potential to alleviate privacy concerns surrounding sensitive biometric data. However, current methods for generating fingerprints have limitations in creating impressions of the same finger with useful intra-class variations. To tackle this challenge, we present GenPrint, a framework to produ… ▽ More The utilization of synthetic data for fingerprint recognition has garnered increased attention due to its potential to alleviate privacy concerns surrounding sensitive biometric data. However, current methods for generating fingerprints have limitations in creating impressions of the same finger with useful intra-class variations. To tackle this challenge, we present GenPrint, a framework to produce fingerprint images of various types while maintaining identity and offering humanly understandable control over different appearance factors such as fingerprint class, acquisition type, sensor device, and quality level. Unlike previous fingerprint generation approaches, GenPrint is not confined to replicating style characteristics from the training dataset alone: it enables the generation of novel styles from unseen devices without requiring additional fine-tuning. To accomplish these objectives, we developed GenPrint using latent diffusion models with multimodal conditions (text and image) for consistent generation of style and identity. Our experiments leverage a variety of publicly available datasets for training and evaluation. Results demonstrate the benefits of GenPrint in terms of identity preservation, explainable control, and universality of generated images. Importantly, the GenPrint-generated images yield comparable or even superior accuracy to models trained solely on real data and further enhances performance when augmenting the diversity of existing real fingerprint datasets. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.06645 [pdf, other]

GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks

Authors: Kaylee Burns, A**kya Jain, Keegan Go, Fei Xia, Michael Stark, Stefan Schaal, Karol Hausman

Abstract: Large Language Models (LLMs) have been successful at generating robot policy code, but so far these results have been limited to high-level tasks that do not require precise movement. It is an open question how well such approaches work for tasks that require reasoning over contact forces and working within tight success tolerances. We find that, with the right action space, LLMs are capable of su… ▽ More Large Language Models (LLMs) have been successful at generating robot policy code, but so far these results have been limited to high-level tasks that do not require precise movement. It is an open question how well such approaches work for tasks that require reasoning over contact forces and working within tight success tolerances. We find that, with the right action space, LLMs are capable of successfully generating policies for a variety of contact-rich and high-precision manipulation tasks, even under noisy conditions, such as perceptual errors or gras** inaccuracies. Specifically, we reparameterize the action space to include compliance with constraints on the interaction forces and stiffnesses involved in reaching a target pose. We validate this approach on subtasks derived from the Functional Manipulation Benchmark (FMB) and NIST Task Board Benchmarks. Exposing this action space alongside methods for estimating object poses improves policy generation with an LLM by greater than 3x and 4x when compared to non-compliant action spaces △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 14 pages, 12 figures

ACM Class: I.2.9

arXiv:2404.05870 [pdf, other]

CoBT: Collaborative Programming of Behaviour Trees from One Demonstration for Robot Manipulation

Authors: Aayush Jain, Philip Long, Valeria Villani, John D. Kelleher, Maria Chiara Leva

Abstract: Mass customization and shorter manufacturing cycles are becoming more important among small and medium-sized companies. However, classical industrial robots struggle to cope with product variation and dynamic environments. In this paper, we present CoBT, a collaborative programming by demonstration framework for generating reactive and modular behavior trees. CoBT relies on a single demonstration… ▽ More Mass customization and shorter manufacturing cycles are becoming more important among small and medium-sized companies. However, classical industrial robots struggle to cope with product variation and dynamic environments. In this paper, we present CoBT, a collaborative programming by demonstration framework for generating reactive and modular behavior trees. CoBT relies on a single demonstration and a combination of data-driven machine learning methods with logic-based declarative learning to learn a task, thus eliminating the need for programming expertise or long development times. The proposed framework is experimentally validated on 7 manipulation tasks and we show that CoBT achieves approx. 93% success rate overall with an average of 7.5s programming time. We conduct a pilot study with non-expert users to provide feedback regarding the usability of CoBT. △ Less

Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted for presentation at IEEE ICRA 2024

arXiv:2404.05417 [pdf, other]

Indexing Analytics to Instances: How Integrating a Dashboard can Support Design Education

Authors: Ajit Jain, Andruid Kerne, Nic Lupfer, Gabriel Britain, Aaron Perrine, Yoonsuck Choe, John Keyser, Ruihong Huang, **sil Seo, Annie Sungkajun, Robert Lightfoot, Timothy McGuire

Abstract: We investigate how to use AI-based analytics to support design education. The analytics at hand measure multiscale design, that is, students' use of space and scale to visually and conceptually organize their design work. With the goal of making the analytics intelligible to instructors, we developed a research artifact integrating a design analytics dashboard with design instances, and the design… ▽ More We investigate how to use AI-based analytics to support design education. The analytics at hand measure multiscale design, that is, students' use of space and scale to visually and conceptually organize their design work. With the goal of making the analytics intelligible to instructors, we developed a research artifact integrating a design analytics dashboard with design instances, and the design environment that students use to create them. We theorize about how Suchman's notion of mutual intelligibility requires contextualized investigation of AI in order to develop findings about how analytics work for people. We studied the research artifact in 5 situated course contexts, in 3 departments. A total of 236 students used the multiscale design environment. The 9 instructors who taught those students experienced the analytics via the new research artifact. We derive findings from a qualitative analysis of interviews with instructors regarding their experiences. Instructors reflected on how the analytics and their presentation in the dashboard have the potential to affect design education. We develop research implications addressing: (1) how indexing design analytics in the dashboard to actual design work instances helps design instructors reflect on what they mean and, more broadly, is a technique for how AI-based design analytics can support instructors' assessment and feedback experiences in situated course contexts; and (2) how multiscale design analytics, in particular, have the potential to support design education. By indexing, we mean linking which provides context, here connecting the numbers of the analytics with visually annotated design work instances. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 22 pages, 4 figures, Submitted to ACM DIS

ACM Class: H.5.2

arXiv:2403.15697 [pdf, ps, other]

Passivity-based Attack Identification and Mitigation with Event-triggered Observer Feedback and Switching Controller

Authors: Pushkal Purohit, Anoop Jain

Abstract: This paper addresses the problem of output consensus in linear passive multi-agent systems under a False Data Injection (FDI) attack, considering the unavailability of complete state information. Our formulation relies on an event-based cryptographic authentication scheme for sensor integrity and considers FDI attacks at the actuator end, inspired by their practical nature and usages. For secure c… ▽ More This paper addresses the problem of output consensus in linear passive multi-agent systems under a False Data Injection (FDI) attack, considering the unavailability of complete state information. Our formulation relies on an event-based cryptographic authentication scheme for sensor integrity and considers FDI attacks at the actuator end, inspired by their practical nature and usages. For secure consensus, we propose (i) a passivity-based approach for detecting FDI attacks on the system and (ii) a Zeno-free event-triggered observer-based switching controller, which switches between the normal and the defense modes following an attack detection. We show that the closed-loop system achieves practical consensus under the controller's action in the defense mode. Simulation examples are provided to support the theoretical findings. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.14852 [pdf, other]

KeyPoint Relative Position Encoding for Face Recognition

Authors: Minchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu

Abstract: In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin… ▽ More In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: To appear in CVPR2024

arXiv:2403.12945 [pdf, other]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project website: https://droid-dataset.github.io/

arXiv:2403.12267 [pdf, other]

Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity

Authors: Siddharth Joshi, Arnav Jain, Ali Payani, Baharan Mirzasoleiman

Abstract: Contrastive Language-Image Pre-training (CLIP) on large-scale image-caption datasets learns representations that can achieve remarkable zero-shot generalization. However, such models require a massive amount of pre-training data. Improving the quality of the pre-training data has been shown to be much more effective in improving CLIP's performance than increasing its volume. Nevertheless, finding… ▽ More Contrastive Language-Image Pre-training (CLIP) on large-scale image-caption datasets learns representations that can achieve remarkable zero-shot generalization. However, such models require a massive amount of pre-training data. Improving the quality of the pre-training data has been shown to be much more effective in improving CLIP's performance than increasing its volume. Nevertheless, finding small subsets of training data that provably generalize the best has remained an open question. In this work, we propose the first theoretically rigorous data selection method for CLIP. We show that subsets that closely preserve the cross-covariance of the images and captions of the full data provably achieve a superior generalization performance. Our extensive experiments on ConceptualCaptions3M and ConceptualCaptions12M demonstrate that subsets found by \method\ achieve over 2.7x and 1.4x the accuracy of the next best baseline on ImageNet and its shifted versions. Moreover, we show that our subsets obtain 1.5x the average accuracy across 11 downstream datasets, of the next best baseline. The code is available at: https://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip. △ Less

Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: AISTATS 2024, Code: https://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip

arXiv:2403.12047 [pdf, other]

Alpha-wolves and Alpha-mammals: Exploring Dictionary Attacks on Iris Recognition Systems

Authors: Sudipta Banerjee, Anubhav Jain, Zehua Jiang, Nasir Memon, Julian Togelius, Arun Ross

Abstract: A dictionary attack in a biometric system entails the use of a small number of strategically generated images or templates to successfully match with a large number of identities, thereby compromising security. We focus on dictionary attacks at the template level, specifically the IrisCodes used in iris recognition systems. We present an hitherto unknown vulnerability wherein we mix IrisCodes usin… ▽ More A dictionary attack in a biometric system entails the use of a small number of strategically generated images or templates to successfully match with a large number of identities, thereby compromising security. We focus on dictionary attacks at the template level, specifically the IrisCodes used in iris recognition systems. We present an hitherto unknown vulnerability wherein we mix IrisCodes using simple bitwise operators to generate alpha-mixtures - alpha-wolves (combining a set of "wolf" samples) and alpha-mammals (combining a set of users selected via search optimization) that increase false matches. We evaluate this vulnerability using the IITD, CASIA-IrisV4-Thousand and Synthetic datasets, and observe that an alpha-wolf (from two wolves) can match upto 71 identities @FMR=0.001%, while an alpha-mammal (from two identities) can match upto 133 other identities @FMR=0.01% on the IITD dataset. △ Less

Submitted 20 November, 2023; originally announced March 2024.

Comments: 8 pages, 5 figures, 13 tables, Workshop on Manipulation, Adversarial, and Presentation Attacks in Biometrics, Winter Conference on Applications of Computer Vision

arXiv:2403.10955 [pdf, other]

Agonist-Antagonist Pouch Motors: Bidirectional Soft Actuators Enhanced by Thermally Responsive Peltier Elements

Authors: Trevor Exley, Rashmi Wijesundara, Nathan Tan, Akshay Sunkara, Xinyu He, Shuopu Wang, Bonnie Chan, Aditya Jain, Luis Espinosa, Amir Jafari

Abstract: In this study, we introduce a novel Mylar-based pouch motor design that leverages the reversible actuation capabilities of Peltier junctions to enable agonist-antagonist muscle mimicry in soft robotics. Addressing the limitations of traditional silicone-based materials, such as leakage and phase-change fluid degradation, our pouch motors filled with Novec 7000 provide a durable and leak-proof solu… ▽ More In this study, we introduce a novel Mylar-based pouch motor design that leverages the reversible actuation capabilities of Peltier junctions to enable agonist-antagonist muscle mimicry in soft robotics. Addressing the limitations of traditional silicone-based materials, such as leakage and phase-change fluid degradation, our pouch motors filled with Novec 7000 provide a durable and leak-proof solution for geometric modeling. The integration of flexible Peltier junctions offers a significant advantage over conventional Joule heating methods by allowing active and reversible heating and cooling cycles. This innovation not only enhances the reliability and longevity of soft robotic applications but also broadens the scope of design possibilities, including the development of agonist-antagonist artificial muscles, grippers with can manipulate through flexion and extension, and an anchor-slip style simple crawler design. Our findings indicate that this approach could lead to more efficient, versatile, and durable robotic systems, marking a significant advancement in the field of soft robotics. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: submitted to IROS 2024, 7 pages, 9 figures

arXiv:2403.09611 [pdf, other]

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, including both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting. △ Less

Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.03998 [pdf, other]

OpenVPN is Open to VPN Fingerprinting

Authors: Diwen Xue, Reethika Ramesh, Arham Jain, Michalis Kallitsis, J. Alex Halderman, Jedidiah R. Crandall, Roya Ensafi

Abstract: VPN adoption has seen steady growth over the past decade due to increased public awareness of privacy and surveillance threats. In response, certain governments are attempting to restrict VPN access by identifying connections using "dual use" DPI technology. To investigate the potential for VPN blocking, we develop mechanisms for accurately fingerprinting connections using OpenVPN, the most popula… ▽ More VPN adoption has seen steady growth over the past decade due to increased public awareness of privacy and surveillance threats. In response, certain governments are attempting to restrict VPN access by identifying connections using "dual use" DPI technology. To investigate the potential for VPN blocking, we develop mechanisms for accurately fingerprinting connections using OpenVPN, the most popular protocol for commercial VPN services. We identify three fingerprints based on protocol features such as byte pattern, packet size, and server response. Playing the role of an attacker who controls the network, we design a two-phase framework that performs passive fingerprinting and active probing in sequence. We evaluate our framework in partnership with a million-user ISP and find that we identify over 85% of OpenVPN flows with only negligible false positives, suggesting that OpenVPN-based services can be effectively blocked with little collateral damage. Although some commercial VPNs implement countermeasures to avoid detection, our framework successfully identified connections to 34 out of 41 "obfuscated" VPN configurations. We discuss the implications of the VPN fingerprintability for different threat models and propose short-term defenses. In the longer term, we urge commercial VPN providers to be more transparent about their obfuscation approaches and to adopt more principled detection countermeasures, such as those developed in censorship circumvention research. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: In: USENIX Security Symposium 2022 (USENIX Security '22)

Journal ref: 31st USENIX Security Symposium (USENIX Security 22). 2022

arXiv:2403.02709 [pdf, other]

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Authors: Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, A**kya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal

Abstract: Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can… ▽ More Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can also help a downstream policy to be spatially-aware and even go beyond images to disambiguate task-relevant from task-irrelevant objects. We present RT-Sketch, a goal-conditioned policy for manipulation that takes a hand-drawn sketch of the desired scene as input, and outputs actions. We train RT-Sketch on a dataset of paired trajectories and corresponding synthetically generated goal sketches. We evaluate this approach on six manipulation skills involving tabletop object rearrangements on an articulated countertop. Experimentally we find that RT-Sketch is able to perform on a similar level to image or language-conditioned agents in straightforward settings, while achieving greater robustness when language goals are ambiguous or visual distractors are present. Additionally, we show that RT-Sketch has the capacity to interpret and act upon sketches with varied levels of specificity, ranging from minimal line drawings to detailed, colored drawings. For supplementary material and videos, please refer to our website: http://rt-sketch.github.io. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02082 [pdf, ps, other]

Double magnetic transition, complex field-induced phases, and large magnetocaloric effect in the frustrated garnet compound Mn$_{3}$Cr$_{2}$Ge$_{3}$O$_{12}$

Authors: S. Mohanty, A. Magar, Vikram Singh, S. S. Islam, S. Guchhait, A. Jain, S. M. Yusuf, A. A. Tsirlin, R. Nath

Abstract: A detailed study of the magnetic and magnetocaloric properties of a garnet compound Mn$_{3}$Cr$_{2}$Ge$_{3}$O$_{12}$ is carried out using x-ray diffraction, magnetization, heat capacity, and neutron diffraction measurements as well as \textit{ab initio} band-structure calculations. This compound manifests two successive magnetic transitions at $T_{\rm N1} \simeq 4.5$ K and $T_{\rm N2} \simeq 2.7$… ▽ More A detailed study of the magnetic and magnetocaloric properties of a garnet compound Mn$_{3}$Cr$_{2}$Ge$_{3}$O$_{12}$ is carried out using x-ray diffraction, magnetization, heat capacity, and neutron diffraction measurements as well as \textit{ab initio} band-structure calculations. This compound manifests two successive magnetic transitions at $T_{\rm N1} \simeq 4.5$ K and $T_{\rm N2} \simeq 2.7$ K. Neutron powder diffraction experiments reveal that these two transitions correspond to the collinear and non-collinear antiferromagnetic ordering of the nonfrustrated Cr$^{3+}$ and frustrated Mn$^{2+}$ sublattices, respectively. The interactions within each of the Cr and Mn sublattices are antiferromagnetic, while the inter-sublattice interactions are ferromagnetic. The $H-T$ phase diagram is quite complex and displays multiple phases under magnetic field, which can be attributed to the frustrated nature of the spin lattice. Mn$_{3}$Cr$_{2}$Ge$_{3}$O$_{12}$ shows a large magnetocaloric effect with a maximum value of isothermal entropy change $ΔS_{\rm m} \simeq -23$ J/kg-K and adiabatic temperature change $ΔT_{\rm ad} \simeq 9$ K for a field change of 7 T. Further, a large value of the relative cooling power ($RCP \simeq 360$ J/kg) demonstrates the promise of using this compound in magnetic refrigeration. △ Less

Submitted 13 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Phys. Rev. B (Accepted) 12 pages, 10 figures, 54 references

arXiv:2403.01410 [pdf, other]

Barrier Functions Inspired Reward Sha** for Reinforcement Learning

Authors: Nilaksh Nilaksh, Abhishek Ranjan, Shreenabh Agrawal, Aayush Jain, Pushpak Jagtap, Shishir Kolathaya

Abstract: Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward sha** is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-sha** framework inspired by bar… ▽ More Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward sha** is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-sha** framework inspired by barrier functions, offering simplicity and ease of implementation across various environments and tasks. To evaluate the effectiveness of the proposed reward formulations, we conduct simulation experiments on CartPole, Ant, and Humanoid environments, along with real-world deployment on the Unitree Go1 quadruped robot. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework. △ Less

Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 7 pages, 10 figures, Accepted as contributed paper at ICRA 2024

ACM Class: I.2.9

arXiv:2403.01248 [pdf, other]

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Authors: Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi

Abstract: This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts which render complex scenes with up to a hundred 3D assets. This process requires complex spatial planning and arrangement. We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning. SceneCraft first models… ▽ More This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts which render complex scenes with up to a hundred 3D assets. This process requires complex spatial planning and arrangement. We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning. SceneCraft first models a scene graph as a blueprint, detailing the spatial relationships among assets in the scene. SceneCraft then writes Python scripts based on this graph, translating relationships into numerical constraints for asset layout. Next, SceneCraft leverages the perceptual strengths of vision-language foundation models like GPT-V to analyze rendered images and iteratively refine the scene. On top of this process, SceneCraft features a library learning mechanism that compiles common script functions into a reusable library, facilitating continuous self-improvement without expensive LLM parameter tuning. Our evaluation demonstrates that SceneCraft surpasses existing LLM-based agents in rendering complex scenes, as shown by its adherence to constraints and favorable human assessments. We also showcase the broader application potential of SceneCraft by reconstructing detailed 3D scenes from the Sintel movie and guiding a video generative model with generated scenes as intermediary control signal. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.19458 [pdf, ps, other]

Higher-group global symmetry and the bosonic M5 brane

Authors: Jay Armas, Giorgos Batzios, Akash Jain

Abstract: Higher-group symmetries are combinations of higher-form symmetries which appear in various field theories. In this paper, we explain how higher-group symmetries arise in 10d and 11d supergravities when the latter are coupled to brane sources. Motivated by this observation, we study field theories at zero and finite temperature invariant under a class of continuous Abelian higher-group symmetries.… ▽ More Higher-group symmetries are combinations of higher-form symmetries which appear in various field theories. In this paper, we explain how higher-group symmetries arise in 10d and 11d supergravities when the latter are coupled to brane sources. Motivated by this observation, we study field theories at zero and finite temperature invariant under a class of continuous Abelian higher-group symmetries. We restrict the analysis to the low-energy regime where the dynamical field content exclusively consists of Goldstone fields arising from the spontaneous breaking of higher-group and spacetime symmetries. Invariant quantities are constructed and the phases of matter are classified according to the pattern of spontaneous symmetry breaking. With respect to supergravity, we highlight how such Goldstone effective theories provide a symmetry-based interpretation for the theories living on D/M-branes. As an explicit example we construct a 6-group invariant action for the bosonic M5 brane, consistent with the self-duality of the 3-form field strength on the brane. While the self-duality condition in the bosonic case needs to be imposed externally as a constraint at zero temperature, we find an equilibrium effective action for the bosonic M5 brane at finite temperature that inherently implements self-duality. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 36 pages. v2: minor text improvements

arXiv:2402.17101 [pdf, other]

T-HITL Effectively Addresses Problematic Associations in Image Generation and Maintains Overall Visual Quality

Authors: Susan Epstein, Li Chen, Alessandro Vecchiato, Ankit Jain

Abstract: Generative AI image models may inadvertently generate problematic representations of people. Past research has noted that millions of users engage daily across the world with these models and that the models, including through problematic representations of people, have the potential to compound and accelerate real-world discrimination and other harms (Bianchi et al, 2023). In this paper, we focus… ▽ More Generative AI image models may inadvertently generate problematic representations of people. Past research has noted that millions of users engage daily across the world with these models and that the models, including through problematic representations of people, have the potential to compound and accelerate real-world discrimination and other harms (Bianchi et al, 2023). In this paper, we focus on addressing the generation of problematic associations between demographic groups and semantic concepts that may reflect and reinforce negative narratives embedded in social data. Building on sociological literature (Blumer, 1958) and map** representations to model behaviors, we have developed a taxonomy to study problematic associations in image generation models. We explore the effectiveness of fine tuning at the model level as a method to address these associations, identifying a potential reduction in visual quality as a limitation of traditional fine tuning. We also propose a new methodology with twice-human-in-the-loop (T-HITL) that promises improvements in both reducing problematic associations and also maintaining visual quality. We demonstrate the effectiveness of T-HITL by providing evidence of three problematic associations addressed by T-HITL at the model level. Our contributions to scholarship are two-fold. By defining problematic associations in the context of machine learning models and generative AI, we introduce a conceptual and technical taxonomy for addressing some of these associations. Finally, we provide a method, T-HITL, that addresses these associations and simultaneously maintains visual quality of image model generations. This mitigation need not be a tradeoff, but rather an enhancement. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 11 pages, 8 figures

MSC Class: I.I.2 ACM Class: I.2.1

arXiv:2402.13363 [pdf, other]

Low frequency resistance fluctuations in an ionic liquid gated channel probed by cross-correlation noise spectroscopy

Authors: Bikash C. Barik, Himadri Chakraborti, Aditya K. Jain, Buddhadeb Pal, H. E. Beere, D. A. Ritchie, K. Das Gupta

Abstract: A system in equilibrium keeps ``exploring" nearby states in the phase space and consequently, fluctuations can contain information, that the mean value does not. However, such measurements involve a fairly complex interplay of effects arising in the device and measurement electronics, that are non-trivial to disentangle. In this paper, we briefly analyse some of these issues and show the relevance… ▽ More A system in equilibrium keeps ``exploring" nearby states in the phase space and consequently, fluctuations can contain information, that the mean value does not. However, such measurements involve a fairly complex interplay of effects arising in the device and measurement electronics, that are non-trivial to disentangle. In this paper, we briefly analyse some of these issues and show the relevance of a two-amplifier cross-correlation technique for semiconductors and thin films commonly encountered. We show that by using home-built amplifiers costing less than $10$ USD/piece one can measure spectral densities as low as $\sim 10^{-18}-10^{-19}~ {\rm {V^2}{Hz^{-1}}}$. We apply this method to an ionic liquid gated Ga:ZnO channel and show that the glass transition of the ionic liquid brings about a change in the exponent of the low frequency resistance fluctuations. Our analysis suggests that a log-normal distribution of the Debye relaxation times of the fluctuations and an increased weight of the long timescale relaxations can give a semi-quantitative explanation of the observed change in the exponent. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 6 pages, 5 figures, supplementary material attached. Comments are welcome

arXiv:2402.11911 [pdf, other]

Analyzing the Impact of Design Factors on Solar Module Thermomechanical Durability Using Interpretable Machine Learning Techniques

Authors: Xin Chen, Todd Karin, Anubhav Jain

Abstract: Solar modules in utility-scale systems are expected to maintain decades of lifetime to rival conventional energy sources. However, cyclic thermomechanical loading often degrades their long-term performance, highlighting the importance of effective design to mitigate thermal expansion mismatches between module materials. Given the complex composition of solar modules, isolating the impact of indivi… ▽ More Solar modules in utility-scale systems are expected to maintain decades of lifetime to rival conventional energy sources. However, cyclic thermomechanical loading often degrades their long-term performance, highlighting the importance of effective design to mitigate thermal expansion mismatches between module materials. Given the complex composition of solar modules, isolating the impact of individual components on overall durability remains a challenging task. In this work, we analyze a comprehensive data set that comprises bill-of-materials (BOM) and thermal cycling power loss from 251 distinct module designs to identify the predominant design factors and their impacts on the thermomechanical durability of modules. The methodology of our analysis combines machine learning modeling (random forest) and Shapley additive explanation (SHAP) to correlate design factors with power loss and interpret the model's decision-making. The interpretation reveals that silicon type (monocrystalline or polycrystalline), encapsulant thickness, busbar numbers, and wafer thickness predominantly influence the degradation. With lower power loss of around 0.6\% on average in the SHAP analysis, monocrystalline cells present better durability than polycrystalline cells. This finding is further substantiated by statistical testing on our raw data set. The SHAP analysis also demonstrates that while thicker encapsulants lead to reduced power loss, further increasing their thickness over around 0.6 to 0.7mm does not yield additional benefits, particularly for the front side one. In addition, other important BOM features such as the number of busbars are analyzed. This study provides a blueprint for utilizing explainable machine learning techniques in a complex material system and can potentially guide future research on optimizing the design of solar modules. △ Less

Submitted 12 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11897 [pdf]

Enhancing Power Prediction of Photovoltaic Systems: Leveraging Dynamic Physical Model for Irradiance-to-Power Conversion

Authors: Baojie Li, Xin Chen, Anubhav Jain

Abstract: Power prediction is crucial to the efficiency and reliability of Photovoltaic (PV) systems. For the model-chain-based (also named indirect or physical) power prediction, the conversion of ground environmental data (plane-of-array irradiance and module temperature) to the output power is a fundamental step, commonly accomplished through physical modeling. The core of the physical model lies in the… ▽ More Power prediction is crucial to the efficiency and reliability of Photovoltaic (PV) systems. For the model-chain-based (also named indirect or physical) power prediction, the conversion of ground environmental data (plane-of-array irradiance and module temperature) to the output power is a fundamental step, commonly accomplished through physical modeling. The core of the physical model lies in the parameters. However, traditional parameter estimation either relies on datasheet information that cannot reflect the system's current health status or necessitates additional I-V characterization of the entire array, which is not commonly available. To address this, our paper introduces PVPro, a dynamic physical modeling method for irradiance-to-power conversion. It extracts model parameters from the recent production data without requiring I-V curve measurements. This dynamic model, periodically-updated (as short as daily), can closely capture the actual health status, enabling precise power estimation. To evaluate the performance, PVPro is compared with the smart persistence, nominal physical, and various machine learning models for day-ahead power prediction. The results indicate that PVPro achieves an outstanding power estimation performance with the average nMAE =1.4% across four field PV systems, reducing the error by 17.6% compared to the best of other techniques. Furthermore, PVPro demonstrates robustness across different seasons and weather conditions. More importantly, PVPro can also perform well with a limited amount of historical production data (3 days), rendering it applicable for new PV systems. The tool is available as a Python package at: https://github.com/DuraMAT/pvpro. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Showing 1–50 of 717 results for author: Jain, A