Search | arXiv e-print repository

doi 10.1109/CEC53210.2023.10254162

A Linear Programming Enhanced Genetic Algorithm for Hyperparameter Tuning in Machine Learning

Abstract: In this paper, we formulate the hyperparameter tuning problem in machine learning as a bilevel program. The bilevel program is solved using a micro genetic algorithm that is enhanced with a linear program. While the genetic algorithm searches over discrete hyperparameters, the linear program enhancement allows hyper local search over continuous hyperparameters. The major contribution in this paper… ▽ More In this paper, we formulate the hyperparameter tuning problem in machine learning as a bilevel program. The bilevel program is solved using a micro genetic algorithm that is enhanced with a linear program. While the genetic algorithm searches over discrete hyperparameters, the linear program enhancement allows hyper local search over continuous hyperparameters. The major contribution in this paper is the formulation of a linear program that supports fast search over continuous hyperparameters, and can be integrated with any hyperparameter search technique. It can also be applied directly on any trained machine learning or deep learning model for the purpose of fine-tuning. We test the performance of the proposed approach on two datasets, MNIST and CIFAR-10. Our results clearly demonstrate that using the linear program enhancement offers significant promise when incorporated with any population-based approach for hyperparameter tuning. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 8 pages; https://ieeexplore.ieee.org/document/10254162/

MSC Class: 49M37 ACM Class: I.2.6

arXiv:2406.17104 [pdf, other]

Automated Adversarial Discovery for Safety Classifiers

Authors: Yash Kumar Lal, Preethi Lahoti, Aradhana Sinha, Yao Qin, Ananth Balashankar

Abstract: Safety classifiers are critical in mitigating toxicity on online forums such as social media and in chatbots. Still, they continue to be vulnerable to emergent, and often innumerable, adversarial attacks. Traditional automated adversarial data generation methods, however, tend to produce attacks that are not diverse, but variations of previously observed harm types. We formalize the task of automa… ▽ More Safety classifiers are critical in mitigating toxicity on online forums such as social media and in chatbots. Still, they continue to be vulnerable to emergent, and often innumerable, adversarial attacks. Traditional automated adversarial data generation methods, however, tend to produce attacks that are not diverse, but variations of previously observed harm types. We formalize the task of automated adversarial discovery for safety classifiers - to find new attacks along previously unseen harm dimensions that expose new weaknesses in the classifier. We measure progress on this task along two key axes (1) adversarial success: does the attack fool the classifier? and (2) dimensional diversity: does the attack represent a previously unseen harm type? Our evaluation of existing attack generation methods on the CivilComments toxicity task reveals their limitations: Word perturbation attacks fail to fool classifiers, while prompt-based LLM attacks have more adversarial success, but lack dimensional diversity. Even our best-performing prompt-based method finds new successful attacks on unseen harm dimensions of attacks only 5\% of the time. Automatically finding new harmful dimensions of attack is crucial and there is substantial headroom for future research on our new task. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Published at Fourth Workshop on TrustworthyNLP (TrustNLP) at NAACL 2024

arXiv:2406.16135 [pdf, other]

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang

Abstract: Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation… ▽ More Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation and embedding space analyses, they struggle with deeper crosslingual knowledge transfer, revealing a crosslingual knowledge barrier in both general (MMLU benchmark) and domain-specific (Harry Potter quiz) contexts. We observe that simple inference-time mitigation methods offer only limited improvement. On the other hand, we propose fine-tuning of LLMs on mixed-language data, which effectively reduces these gaps, even when using out-of-domain datasets like WikiText. Our findings suggest the need for explicit optimization to unlock the full crosslingual potential of LLMs. Our code is publicly available at https://github.com/google-research/crosslingual-knowledge-barriers. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14322 [pdf, other]

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract: Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs most… ▽ More Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design choices like data selection strategies and parameter tuning for the best privacy-utility tradeoff. △ Less

Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.09760 [pdf, other]

Bootstrap** Language Models with DPO Implicit Rewards

Authors: Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

Abstract: Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that… ▽ More Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrap** fashion to further align the LLM. Our approach is to use the rewards from a current LLM model to construct a preference dataset, which is then used in subsequent DPO rounds. We incorporate refinements that debias the length of the responses and improve the quality of the preference dataset to further improve our approach. Our approach, named self-alignment with DPO ImpliCit rEwards (DICE), shows great improvements in alignment and achieves superior performance than Gemini Pro on AlpacaEval 2, reaching 27.55% length-controlled win rate against GPT-4 Turbo, but with only 8B parameters and no external feedback. Our code is available at https://github.com/sail-sg/dice. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09390 [pdf, other]

LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

Authors: Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, Francois Bremond, Srijan Das

Abstract: Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X,… ▽ More Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X, comprising 100K RGB video-instruction pairs, language descriptions, 3D skeletons, and action-conditioned object trajectories. We introduce LLAVIDAL, an LLVM capable of incorporating 3D poses and relevant object trajectories to understand the intricate spatiotemporal relationships within ADLs. Furthermore, we present a novel benchmark, ADLMCQ, for quantifying LLVM effectiveness in ADL scenarios. When trained on ADL-X, LLAVIDAL consistently achieves state-of-the-art performance across all ADL evaluation metrics. Qualitative analysis reveals LLAVIDAL's temporal reasoning capabilities in understanding ADL. The link to the dataset is provided at: https://adl-x.github.io/ △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.06967 [pdf, other]

Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples

Authors: Kailas Dayanandan, Anand Sinha, Brejesh Lall

Abstract: The dual thinking framework considers fast, intuitive processing and slower, logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ. We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision, which also aids in studying the qualitative behavior of deep learning models.… ▽ More The dual thinking framework considers fast, intuitive processing and slower, logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ. We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision, which also aids in studying the qualitative behavior of deep learning models. Our study also addresses a major criticism of using classification models as computational models of human vision by using instance segmentation models that localize objects. The evidence underscores the importance of shape in identifying instances in human vision and shows that deep learning models lack an understanding of sub-structures, as indicated by errors related to the position and number of sub-components. Additionally, the similarity in errors made by models and intuitive human processing indicates that models only address intuitive thinking in human vision. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04724 [pdf, other]

Probabilistic Perspectives on Error Minimization in Adversarial Reinforcement Learning

Authors: Roman Belaire, Arunesh Sinha, Pradeep Varakantham

Abstract: Deep Reinforcement Learning (DRL) policies are critically vulnerable to adversarial noise in observations, posing severe risks in safety-critical scenarios. For example, a self-driving car receiving manipulated sensory inputs about traffic signs could lead to catastrophic outcomes. Existing strategies to fortify RL algorithms against such adversarial perturbations generally fall into two categorie… ▽ More Deep Reinforcement Learning (DRL) policies are critically vulnerable to adversarial noise in observations, posing severe risks in safety-critical scenarios. For example, a self-driving car receiving manipulated sensory inputs about traffic signs could lead to catastrophic outcomes. Existing strategies to fortify RL algorithms against such adversarial perturbations generally fall into two categories: (a) using regularization methods that enhance robustness by incorporating adversarial loss terms into the value objectives, and (b) adopting "maximin" principles, which focus on maximizing the minimum value to ensure robustness. While regularization methods reduce the likelihood of successful attacks, their effectiveness drops significantly if an attack does succeed. On the other hand, maximin objectives, although robust, tend to be overly conservative. To address this challenge, we introduce a novel objective called Adversarial Counterfactual Error (ACoE), which naturally balances optimizing value and robustness against adversarial attacks. To optimize ACoE in a scalable manner in model-free settings, we propose a theoretically justified surrogate objective known as Cumulative-ACoE (C-ACoE). The core idea of optimizing C-ACoE is utilizing the belief about the underlying true state given the adversarially perturbed observation. Our empirical evaluations demonstrate that our method outperforms current state-of-the-art approaches for addressing adversarial RL problems across all established benchmarks (MuJoCo, Atari, and Highway) used in the literature. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03253 [pdf, other]

Generating Explanations for Cellular Neural Networks

Authors: Akshit Sinha, Sreeram Vennam, Charu Sharma, Ponnurangam Kumaraguru

Abstract: Recent advancements in graph learning contributed to explaining predictions generated by Graph Neural Networks. However, existing methodologies often fall short when applied to real-world datasets. We introduce HOGE, a framework to capture higher-order structures using cell complexes, which excel at modeling higher-order relationships. In the real world, higher-order structures are ubiquitous like… ▽ More Recent advancements in graph learning contributed to explaining predictions generated by Graph Neural Networks. However, existing methodologies often fall short when applied to real-world datasets. We introduce HOGE, a framework to capture higher-order structures using cell complexes, which excel at modeling higher-order relationships. In the real world, higher-order structures are ubiquitous like in molecules or social networks, thus our work significantly enhances the practical applicability of graph explanations. HOGE produces clearer and more accurate explanations compared to prior methods. Our method can be integrated with all existing graph explainers, ensuring seamless integration into current frameworks. We evaluate on GraphXAI benchmark datasets, HOGE achieves improved or comparable performance with minimal computational overhead. Ablation studies show that the performance gain observed can be attributed to the higher-order structures that come from introducing cell complexes. △ Less

Submitted 5 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

ACM Class: I.2.4

arXiv:2405.18506 [pdf, other]

An Algorithm for the Decomposition of Complete Graph into Minimum Number of Edge-disjoint Trees

Authors: Antika Sinha, Sanjoy Kumar Saha, Partha Basuchowdhuri

Abstract: In this work, we study methodical decomposition of an undirected, unweighted complete graph ($K_n$ of order $n$, size $m$) into minimum number of edge-disjoint trees. We find that $x$, a positive integer, is minimum and $x=\lceil\frac{n}{2}\rceil$ as the edge set of $K_n$ is decomposed into edge-disjoint trees of size sequence $M = \{m_1,m_2,...,m_x\}$ where $m_i\le(n-1)$ and $Σ_{i=1}^{x} m_i$ =… ▽ More In this work, we study methodical decomposition of an undirected, unweighted complete graph ($K_n$ of order $n$, size $m$) into minimum number of edge-disjoint trees. We find that $x$, a positive integer, is minimum and $x=\lceil\frac{n}{2}\rceil$ as the edge set of $K_n$ is decomposed into edge-disjoint trees of size sequence $M = \{m_1,m_2,...,m_x\}$ where $m_i\le(n-1)$ and $Σ_{i=1}^{x} m_i$ = $\frac{n(n-1)}{2}$. For decomposing the edge set of $K_n$ into minimum number of edge-disjoint trees, our proposed algorithm takes total $O(m)$ time. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures and 3 tables

arXiv:2405.17533 [pdf, other]

PAE: LLM-based Product Attribute Extraction for E-Commerce Fashion Trends

Authors: Apurva Sinha, Ekta Gujral

Abstract: Product attribute extraction is an growing field in e-commerce business, with several applications including product ranking, product recommendation, future assortment planning and improving online shop** customer experiences. Understanding the customer needs is critical part of online business, specifically fashion products. Retailers uses assortment planning to determine the mix of products to… ▽ More Product attribute extraction is an growing field in e-commerce business, with several applications including product ranking, product recommendation, future assortment planning and improving online shop** customer experiences. Understanding the customer needs is critical part of online business, specifically fashion products. Retailers uses assortment planning to determine the mix of products to offer in each store and channel, stay responsive to market dynamics and to manage inventory and catalogs. The goal is to offer the right styles, in the right sizes and colors, through the right channels. When shoppers find products that meet their needs and desires, they are more likely to return for future purchases, fostering customer loyalty. Product attributes are a key factor in assortment planning. In this paper we present PAE, a product attribute extraction algorithm for future trend reports consisting text and images in PDF format. Most existing methods focus on attribute extraction from titles or product descriptions or utilize visual information from existing product images. Compared to the prior works, our work focuses on attribute extraction from PDF files where upcoming fashion trends are explained. This work proposes a more comprehensive framework that fully utilizes the different modalities for attribute extraction and help retailers to plan the assortment in advance. Our contributions are three-fold: (a) We develop PAE, an efficient framework to extract attributes from unstructured data (text and images); (b) We provide catalog matching methodology based on BERT representations to discover the existing attributes using upcoming attribute values; (c) We conduct extensive experiments with several baselines and show that PAE is an effective, flexible and on par or superior (avg 92.5% F1-Score) framework to existing state-of-the-art for attribute value extraction task. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Attribute Extraction, PDF files, Bert Embedding, Hashtag, Large Language Model (LLM), Text and Images

arXiv:2405.13081 [pdf]

Children's Mental Models of Generative Visual and Text Based AI Models

Authors: Eliza Kosoy, Soo** Jeong, Anoop Sinha, Alison Gopnik, Tanya Kraljic

Abstract: In this work we investigate how children ages 5-12 perceive, understand, and use generative AI models such as a text-based LLMs ChatGPT and a visual-based model DALL-E. Generative AI is newly being used widely since chatGPT. Children are also building mental models of generative AI. Those haven't been studied before and it is also the case that the children's models are dynamic as they use the too… ▽ More In this work we investigate how children ages 5-12 perceive, understand, and use generative AI models such as a text-based LLMs ChatGPT and a visual-based model DALL-E. Generative AI is newly being used widely since chatGPT. Children are also building mental models of generative AI. Those haven't been studied before and it is also the case that the children's models are dynamic as they use the tools, even with just very short usage. Upon surveying and experimentally observing over 40 children ages 5-12, we found that children generally have a very positive outlook towards AI and are excited about the ways AI may benefit and aid them in their everyday lives. In a forced choice, children robustly associated AI with positive adjectives versus negative ones. We also categorize what children are querying AI models for and find that children search for more imaginative things that don't exist when using a visual-based AI and not when using a text-based one. Our follow-up study monitored children's responses and feelings towards AI before and after interacting with GenAI models. We even find that children find AI to be less scary after interacting with it. We hope that these findings will shine a light on children's mental models of AI and provide insight for how to design the best possible tools for children who will inevitably be using AI in their lifetimes. The motivation of this work is to bridge the gap between Human-Computer Interaction (HCI) and Psychology in an effort to study the effects of AI on society. We aim to identify the gaps in humans' mental models of what AI is and how it works. Previous work has investigated how both adults and children perceive various kinds of robots, computers, and other technological concepts. However, there is very little work investigating these concepts for generative AI models and not simply embodied robots or physical technology. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 7 pages, 6 figures

arXiv:2405.09296 [pdf, ps, other]

Tight Bounds for Online Convex Optimization with Adversarial Constraints

Authors: Abhishek Sinha, Rahul Vaze

Abstract: A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative… ▽ More A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.02705 [pdf, other]

Peak Age of Information under Tandem of Queues

Authors: Ashirwad Sinha, Shubhransh Singhvi, Praful D. Mankar, Harpreet S. Dhillon

Abstract: This paper considers a communication system where a source sends time-sensitive information to its destination via queues in tandem. We assume that the arrival process as well as the service process (of each server) are memoryless, and each of the servers has no buffer. For this setup, we develop a recursive framework to characterize the mean peak age of information (PAoI) under preemptive and non… ▽ More This paper considers a communication system where a source sends time-sensitive information to its destination via queues in tandem. We assume that the arrival process as well as the service process (of each server) are memoryless, and each of the servers has no buffer. For this setup, we develop a recursive framework to characterize the mean peak age of information (PAoI) under preemptive and non-preemptive policies with $N$ servers having different service rates. For the preemptive case, the proposed framework also allows to obtain mean age of information (AoI). △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: Accepted at IEEE ISIT'24

arXiv:2404.16312 [pdf, other]

3D Guidance Law for Maximal Coverage and Target Enclosing with Inherent Safety

Authors: Praveen Kumar Ranjan, Abhinav Sinha, Yongcan Cao

Abstract: In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer, which is an unmanned aerial vehicle (UAV), for maximum coverage while also ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding the target, allowing it to maintain a certai… ▽ More In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer, which is an unmanned aerial vehicle (UAV), for maximum coverage while also ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding the target, allowing it to maintain a certain distance from the latter while offering greater flexibility in positioning and converging to any orbit within this safe zone. Our approach is distinguished by the use of nonholonomic constraints to model vehicles with accelerations serving as control inputs and coupled engagement kinematics to craft the pursuer's guidance law meticulously. Furthermore, we leverage the concept of the Lyapunov Barrier Function as a powerful tool to constrain the distance between the pursuer and the target within asymmetric bounds, thereby ensuring the pursuer's safety within the predefined region. To validate the efficacy and robustness of our algorithm, we conduct experimental tests by implementing a high-fidelity quadrotor model within Software-in-the-loop (SITL) simulations, encompassing various challenging target maneuver scenarios. The results obtained showcase the resilience of the proposed guidance law, effectively handling arbitrarily maneuvering targets, vehicle/autopilot dynamics, and external disturbances. Our method consistently delivers stable global enclosing behaviors, even in response to aggressive target maneuvers, and requires only relative information for successful execution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13252 [pdf, other]

3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification

Authors: Shyam Varahagiri, Aryaman Sinha, Shiv Ram Dubey, Satish Kumar Singh

Abstract: In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cann… ▽ More In recent years, Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) due to their self-attention mechanism. Many researchers have incorporated ViTs for Hyperspectral Image (HSI) classification. HSIs are characterised by narrow contiguous spectral bands, providing rich spectral data. Although ViTs excel with sequential data, they cannot extract spectral-spatial information like CNNs. Furthermore, to have high classification performance, there should be a strong interaction between the HSI token and the class (CLS) token. To solve these issues, we propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification that utilizes a 3D-Convolution Guided Residual Module (CGRM) in-between encoders to "fuse" the local spatial and spectral information and to enhance the feature propagation. Furthermore, we forego the class token and instead apply Global Average Pooling, which effectively encodes more discriminative and pertinent high-level features for classification. Extensive experiments have been conducted on three public HSI datasets to show the superiority of the proposed model over state-of-the-art traditional, convolutional, and Transformer models. The code is available at https://github.com/ShyamVarahagiri/3D-ConvSST. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted in IEEE Conference on Artificial Intelligence, 2024

arXiv:2404.09473 [pdf, other]

Exploring the Nexus Between Retrievability and Query Generation Strategies

Authors: Aman Sinha, Priyanshu Raj Mall, Dwaipayan Roy

Abstract: Quantifying bias in retrieval functions through document retrievability scores is vital for assessing recall-oriented retrieval systems. However, many studies investigating retrieval model bias lack validation of their query generation methods as accurate representations of retrievability for real users and their queries. This limitation results from the absence of established criteria for query g… ▽ More Quantifying bias in retrieval functions through document retrievability scores is vital for assessing recall-oriented retrieval systems. However, many studies investigating retrieval model bias lack validation of their query generation methods as accurate representations of retrievability for real users and their queries. This limitation results from the absence of established criteria for query generation in retrievability assessments. Typically, researchers resort to using frequent collocations from document corpora when no query log is available. In this study, we address the issue of reproducibility and seek to validate query generation methods by comparing retrievability scores generated from artificially generated queries to those derived from query logs. Our findings demonstrate a minimal or negligible correlation between retrievability scores from artificial queries and those from query logs. This suggests that artificially generated queries may not accurately reflect retrievability scores as derived from query logs. We further explore alternative query generation techniques, uncovering a variation that exhibits the highest correlation. This alternative approach holds promise for improving reproducibility when query logs are unavailable. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at ECIR 2024

arXiv:2404.05938 [pdf, other]

Neural networks can be FLOP-efficient integrators of 1D oscillatory integrands

Authors: Anshuman Sinha, Spencer H. Bryngelson

Abstract: We demonstrate that neural networks can be FLOP-efficient integrators of one-dimensional oscillatory integrands. We train a feed-forward neural network to compute integrals of highly oscillatory 1D functions. The training set is a parametric combination of functions with varying characters and oscillatory behavior degrees. Numerical examples show that these networks are FLOP-efficient for sufficie… ▽ More We demonstrate that neural networks can be FLOP-efficient integrators of one-dimensional oscillatory integrands. We train a feed-forward neural network to compute integrals of highly oscillatory 1D functions. The training set is a parametric combination of functions with varying characters and oscillatory behavior degrees. Numerical examples show that these networks are FLOP-efficient for sufficiently oscillatory integrands with an average FLOP gain of 1000 FLOPs. The network calculates oscillatory integrals better than traditional quadrature methods under the same computational budget or number of floating point operations. We find that feed-forward networks of 5 hidden layers are satisfactory for a relative accuracy of 0.001. The computational burden of inference of the neural network is relatively small, even compared to inner-product pattern quadrature rules. We postulate that our result follows from learning latent patterns in the oscillatory integrands that are otherwise opaque to traditional numerical integrators. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 11 pages, 7 figures, 3 tables. Published in TMLR 03/2024. Code at https://github.com/comp-physics/deepOscillations

Journal ref: Transactions on Machine Learning Research; ISSN 2835-8856 (2024); https://openreview.net/forum?id=z9SIj-IM7tn

arXiv:2404.04497 [pdf, other]

Self-organizing Multiagent Target Enclosing under Limited Information and Safety Guarantees

Authors: Praveen Kumar Ranjan, Abhinav Sinha, Yongcan Cao

Abstract: This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents autonomously self-organize themselves in the desired formation around a fixed target. Our approach combines global enclosing behavior and local collision avoidance mechanisms by devising a novel potential function and sliding manifold. In our approach, agents independently… ▽ More This paper introduces an approach to address the target enclosing problem using non-holonomic multiagent systems, where agents autonomously self-organize themselves in the desired formation around a fixed target. Our approach combines global enclosing behavior and local collision avoidance mechanisms by devising a novel potential function and sliding manifold. In our approach, agents independently move toward the desired enclosing geometry when apart and activate the collision avoidance mechanism when a collision is imminent, thereby guaranteeing inter-agent safety. We rigorously show that an agent does not need to ensure safety with every other agent and put forth a concept of the nearest colliding agent (for any arbitrary agent) with whom ensuring safety is sufficient to avoid collisions in the entire swarm. The proposed control eliminates the need for a fixed or pre-established agent arrangement around the target and requires only relative information between an agent and the target. This makes our design particularly appealing for scenarios with limited global information, hence significantly reducing communication requirements. We finally present simulation results to vindicate the efficacy of the proposed method. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.00618 [pdf, other]

A Multi-Branched Radial Basis Network Approach to Predicting Complex Chaotic Behaviours

Authors: Aarush Sinha

Abstract: In this study, we propose a multi branched network approach to predict the dynamics of a physics attractor characterized by intricate and chaotic behavior. We introduce a unique neural network architecture comprised of Radial Basis Function (RBF) layers combined with an attention mechanism designed to effectively capture nonlinear inter-dependencies inherent in the attractor's temporal evolution.… ▽ More In this study, we propose a multi branched network approach to predict the dynamics of a physics attractor characterized by intricate and chaotic behavior. We introduce a unique neural network architecture comprised of Radial Basis Function (RBF) layers combined with an attention mechanism designed to effectively capture nonlinear inter-dependencies inherent in the attractor's temporal evolution. Our results demonstrate successful prediction of the attractor's trajectory across 100 predictions made using a real-world dataset of 36,700 time-series observations encompassing approximately 28 minutes of activity. To further illustrate the performance of our proposed technique, we provide comprehensive visualizations depicting the attractor's original and predicted behaviors alongside quantitative measures comparing observed versus estimated outcomes. Overall, this work showcases the potential of advanced machine learning algorithms in elucidating hidden structures in complex physical systems while offering practical applications in various domains requiring accurate short-term forecasting capabilities. △ Less

Submitted 30 May, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures

arXiv:2403.17673 [pdf, other]

How Private are DP-SGD Implementations?

Authors: Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract: We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in p… ▽ More We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD. △ Less

Submitted 6 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: Proceedings of ICML 2024

arXiv:2403.08974 [pdf, other]

$TrIND$: Representing Anatomical Trees by Denoising Diffusion of Implicit Neural Fields

Authors: Ashish Sinha, Ghassan Hamarneh

Abstract: Anatomical trees play a central role in clinical diagnosis and treatment planning. However, accurately representing anatomical trees is challenging due to their varying and complex topology and geometry. Traditional methods for representing tree structures, captured using medical imaging, while invaluable for visualizing vascular and bronchial networks, exhibit drawbacks in terms of limited resolu… ▽ More Anatomical trees play a central role in clinical diagnosis and treatment planning. However, accurately representing anatomical trees is challenging due to their varying and complex topology and geometry. Traditional methods for representing tree structures, captured using medical imaging, while invaluable for visualizing vascular and bronchial networks, exhibit drawbacks in terms of limited resolution, flexibility, and efficiency. Recently, implicit neural representations (INRs) have emerged as a powerful tool for representing shapes accurately and efficiently. We propose a novel approach, $TrIND$, for representing anatomical trees using INR, while also capturing the distribution of a set of trees via denoising diffusion in the space of INRs. We accurately capture the intricate geometries and topologies of anatomical trees at any desired resolution. Through extensive qualitative and quantitative evaluation, we demonstrate high-fidelity tree reconstruction with arbitrary resolution yet compact storage, and versatility across anatomical sites and tree complexities. The code is available at: \texttt{\url{https://github.com/sinashish/TreeDiffusion}}. △ Less

Submitted 18 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted to MICCAI 2024

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.19292 [pdf, other]

Fundamental Limits of Throughput and Availability: Applications to prophet inequalities & transaction fee mechanism design

Authors: Aadityan Ganesh, Jason Hartline, Atanu R Sinha, Matthew vonAllmen

Abstract: This paper studies the fundamental limits of availability and throughput for independent and heterogeneous demands of a limited resource. Availability is the probability that the demands are below the capacity of the resource. Throughput is the expected fraction of the resource that is utilized by the demands. We offer a concentration inequality generator that gives lower bounds on feasible availa… ▽ More This paper studies the fundamental limits of availability and throughput for independent and heterogeneous demands of a limited resource. Availability is the probability that the demands are below the capacity of the resource. Throughput is the expected fraction of the resource that is utilized by the demands. We offer a concentration inequality generator that gives lower bounds on feasible availability and throughput pairs with a given capacity and independent but not necessarily identical distributions of up-to-unit demands. We show that availability and throughput cannot both be poor. These bounds are analogous to tail inequalities on sums of independent random variables, but hold throughout the support of the demand distribution. This analysis gives analytically tractable bounds supporting the unit-demand characterization of Chawla, Devanur, and Lykouris (2023) and generalizes to up-to-unit demands. Our bounds also provide an approach towards improved multi-unit prophet inequalities (Hajiaghayi, Kleinberg, and Sandholm, 2007). They have applications to transaction fee mechanism design (for blockchains) where high availability limits the probability of profitable user-miner coalitions (Chung and Shi, 2023). △ Less

Submitted 19 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 34 pages, 7 figures; updated author information to include institutions and email addresses

arXiv:2402.17721 [pdf, other]

Content-Centric Prototy** of Generative AI Applications: Emerging Approaches and Challenges in Collaborative Software Teams

Authors: Hari Subramonyam, Divy Thakkar, Jürgen Dieber, Anoop Sinha

Abstract: Generative AI models are increasingly powering software applications, offering the capability to produce expressive content across varied contexts. However, unlike previous iterations of human-AI design, the emerging design process for generative capabilities primarily hinges on prompt engineering strategies. Given this fundamental shift in approach, our work aims to understand how collaborative s… ▽ More Generative AI models are increasingly powering software applications, offering the capability to produce expressive content across varied contexts. However, unlike previous iterations of human-AI design, the emerging design process for generative capabilities primarily hinges on prompt engineering strategies. Given this fundamental shift in approach, our work aims to understand how collaborative software teams set up and apply design guidelines and values, iteratively prototype prompts, and evaluate prompts to achieve desired outcomes. We conducted design studies with 39 industry professionals, including designers, software engineers, and product managers. Our findings reveal a content-centric prototy** approach in which teams begin with the content they want to generate, then identify specific attributes, constraints, and values, and explore methods to give users the ability to influence and interact with those attributes. Based on associated challenges, such as the lack of model interpretability and overfitting the design to examples, we outline considerations for generative AI prototy**. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.06176 [pdf, other]

Cooperative Nonlinear Guidance Strategies for Guaranteed Pursuit-Evasion

Authors: Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha

Abstract: This paper addresses the pursuit-evasion problem involving three agents -- a purser, an evader, and a defender. We develop cooperative guidance laws for the evader-defender team that guarantee that the defender intercepts the pursuer before it reaches the vicinity of the evader. Unlike heuristic methods, optimal control, differential game formulation, and recently proposed time-constrained guidanc… ▽ More This paper addresses the pursuit-evasion problem involving three agents -- a purser, an evader, and a defender. We develop cooperative guidance laws for the evader-defender team that guarantee that the defender intercepts the pursuer before it reaches the vicinity of the evader. Unlike heuristic methods, optimal control, differential game formulation, and recently proposed time-constrained guidance techniques, we propose a geometric solution to safeguard the evader from the pursuer's incoming threat. The proposed strategy is computationally efficient and expected to be scalable as the number of agents increases. Another alluring feature of the proposed strategy is that the evader-defender team does not require the knowledge of the pursuer's strategy and that the pursuer's interception is guaranteed from arbitrary initial engagement geometries. We further show that the necessary error variables for the evader-defender team vanish within a time that can be exactly prescribed prior to the three-body engagement. Finally, we demonstrate the efficacy of the proposed cooperative defense strategy via simulation in diverse engagement scenarios. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05918 [pdf, other]

Consensus-driven Deviated Pursuit for Guaranteed Simultaneous Interception of Moving Targets

Authors: Abhinav Sinha, Dwaipayan Mukherjee, Shashi Ranjan Kumar

Abstract: This work proposes a cooperative strategy that employs deviated pursuit guidance to simultaneously intercept a moving (but not manoeuvring) target. As opposed to many existing cooperative guidance strategies which use estimates of time-to-go, based on proportional-navigation guidance, the proposed strategy uses an exact expression for time-to-go to ensure simultaneous interception. The guidance de… ▽ More This work proposes a cooperative strategy that employs deviated pursuit guidance to simultaneously intercept a moving (but not manoeuvring) target. As opposed to many existing cooperative guidance strategies which use estimates of time-to-go, based on proportional-navigation guidance, the proposed strategy uses an exact expression for time-to-go to ensure simultaneous interception. The guidance design considers nonlinear engagement kinematics, allowing the proposed strategy to remain effective over a large operating regime. Unlike existing strategies on simultaneous interception that achieve interception at the average value of their initial time-to-go estimates, this work provides flexibility in the choice of impact time. By judiciously choosing the edge weights of the communication network, a weighted consensus in time-to-go can be achieved. It has been shown that by allowing an edge weight to be negative, consensus in time-to-go can even be achieved for an impact time that lies outside the convex hull of the set of initial time-to-go values of the individual interceptors. The bounds on such negative weights have been analysed for some special graphs, using Nyquist criterion. Simulations are provided to vindicate the efficacy of the proposed strategy. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03388 [pdf, other]

doi 10.1145/3583780.3614839

Delivery Optimized Discovery in Behavioral User Segmentation under Budget Constraint

Authors: Harshita Chopra, Atanu R. Sinha, Sunav Choudhary, Ryan A. Rossi, Paavan Kumar Indela, Veda Pranav Parwatala, Srinjayee Paul, Aurghya Maiti

Abstract: Users' behavioral footprints online enable firms to discover behavior-based user segments (or, segments) and deliver segment specific messages to users. Following the discovery of segments, delivery of messages to users through preferred media channels like Facebook and Google can be challenging, as only a portion of users in a behavior segment find match in a medium, and only a fraction of those… ▽ More Users' behavioral footprints online enable firms to discover behavior-based user segments (or, segments) and deliver segment specific messages to users. Following the discovery of segments, delivery of messages to users through preferred media channels like Facebook and Google can be challenging, as only a portion of users in a behavior segment find match in a medium, and only a fraction of those matched actually see the message (exposure). Even high quality discovery becomes futile when delivery fails. Many sophisticated algorithms exist for discovering behavioral segments; however, these ignore the delivery component. The problem is compounded because (i) the discovery is performed on the behavior data space in firms' data (e.g., user clicks), while the delivery is predicated on the static data space (e.g., geo, age) as defined by media; and (ii) firms work under budget constraint. We introduce a stochastic optimization based algorithm for delivery optimized discovery of behavioral user segmentation and offer new metrics to address the joint optimization. We leverage optimization under a budget constraint for delivery combined with a learning-based component for discovery. Extensive experiments on a public dataset from Google and a proprietary dataset show the effectiveness of our approach by simultaneously improving delivery metrics, reducing budget spend and achieving strong predictive performance in discovery. △ Less

Submitted 15 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.15897 [pdf, other]

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

Authors: Michael Feffer, Anusha Sinha, Wesley Hanwen Deng, Zachary C. Lipton, Hoda Heidari

Abstract: In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming's central role in policy discussions and corporate messaging, significant questions remain about what… ▽ More In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming's central role in policy discussions and corporate messaging, significant questions remain about what precisely it means, what role it can play in regulation, and how it relates to conventional red-teaming practices as originally conceived in the field of cybersecurity. In this work, we identify recent cases of red-teaming activities in the AI industry and conduct an extensive survey of relevant research literature to characterize the scope, structure, and criteria for AI red-teaming practices. Our analysis reveals that prior methods and practices of AI red-teaming diverge along several axes, including the purpose of the activity (which is often vague), the artifact under evaluation, the setting in which the activity is conducted (e.g., actors, resources, and methods), and the resulting decisions it informs (e.g., reporting, disclosure, and mitigation). In light of our findings, we argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, and that industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI, gestures towards red-teaming (based on public definitions) as a panacea for every possible risk verge on security theater. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices. △ Less

Submitted 15 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.15724 [pdf, other]

RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced Query Responses

Authors: Sahil Girhepuje, Siva Sankar Sajeev, Purvam Jain, Arya Sikder, Adithya Rama Varma, Ryan George, Akshay Govind Srinivasan, Mahendra Kurup, Ashmit Sinha, Sudip Mondal

Abstract: Large Language Models (LLMs) currently struggle with tool invocation and chaining, as they often hallucinate or miss essential steps in a sequence. We propose RE-GAINS and EnChAnT, two novel frameworks that empower LLMs to tackle complex user queries by making API calls to external tools based on tool descriptions and argument lists. Tools are chained based on the expected output, without receivin… ▽ More Large Language Models (LLMs) currently struggle with tool invocation and chaining, as they often hallucinate or miss essential steps in a sequence. We propose RE-GAINS and EnChAnT, two novel frameworks that empower LLMs to tackle complex user queries by making API calls to external tools based on tool descriptions and argument lists. Tools are chained based on the expected output, without receiving the actual results from each individual call. EnChAnT, an open-source solution, leverages an LLM format enforcer, OpenChat 3.5 (an LLM), and ToolBench's API Retriever. RE-GAINS utilizes OpenAI models and embeddings with a specialized prompt based on the $\underline{R}$easoning vi$\underline{a}$ $\underline{P}$lanning $(RAP)$ framework. Both frameworks are low cost (0.01\$ per query). Our key contribution is enabling LLMs for tool invocation and chaining using modifiable, externally described tools. △ Less

Submitted 20 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.15246 [pdf, other]

Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Authors: Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang

Abstract: Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This ta… ▽ More Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones). △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 7 pages, 4 figures

arXiv:2401.14322 [pdf, other]

Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images

Authors: Hansa Srinivasan, Candice Schumann, Aradhana Sinha, David Madras, Gbolahan Oluwafemi Olanubi, Alex Beutel, Susanna Ricco, Jilin Chen

Abstract: Capturing the diversity of people in images is challenging: recent literature tends to focus on diversifying one or two attributes, requiring expensive attribute labels or building classifiers. We introduce a diverse people image ranking method which more flexibly aligns with human notions of people diversity in a less prescriptive, label-free manner. The Perception-Aligned Text-derived Human repr… ▽ More Capturing the diversity of people in images is challenging: recent literature tends to focus on diversifying one or two attributes, requiring expensive attribute labels or building classifiers. We introduce a diverse people image ranking method which more flexibly aligns with human notions of people diversity in a less prescriptive, label-free manner. The Perception-Aligned Text-derived Human representation Space (PATHS) aims to capture all or many relevant features of people-related diversity, and, when used as the representation space in the standard Maximal Marginal Relevance (MMR) ranking algorithm, is better able to surface a range of types of people-related diversity (e.g. disability, cultural attire). PATHS is created in two stages. First, a text-guided approach is used to extract a person-diversity representation from a pre-trained image-text model. Then this representation is fine-tuned on perception judgments from human annotators so that it captures the aspects of people-related similarity that humans find most salient. Empirical results show that the PATHS method achieves diversity better than baseline methods, according to side-by-side ratings from human annotators. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.01690 [pdf, other]

Zero-shot Active Learning Using Self Supervised Learning

Authors: Abhishek Sinha, Shreya Singh

Abstract: Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the be… ▽ More Deep learning algorithms are often said to be data hungry. The performance of such algorithms generally improve as more and more annotated data is fed into the model. While collecting unlabelled data is easier (as they can be scraped easily from the internet), annotating them is a tedious and expensive task. Given a fixed budget available for data annotation, Active Learning helps selecting the best subset of data for annotation, such that the deep learning model when trained over that subset will have maximum generalization performance under this budget. In this work, we aim to propose a new Active Learning approach which is model agnostic as well as one doesn't require an iterative process. We aim to leverage self-supervised learnt features for the task of Active Learning. The benefit of self-supervised learning, is that one can get useful feature representation of the input data, without having any annotation. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.17476 [pdf, other]

doi 10.18653/v1/2023.findings-emnlp.241

Exploring the Sensitivity of LLMs' Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters

Authors: Manikanta Loya, Divya Anand Sinha, Richard Futrell

Abstract: The advancement of Large Language Models (LLMs) has led to their widespread use across a broad spectrum of tasks including decision making. Prior studies have compared the decision making abilities of LLMs with those of humans from a psychological perspective. However, these studies have not always properly accounted for the sensitivity of LLMs' behavior to hyperparameters and variations in the pr… ▽ More The advancement of Large Language Models (LLMs) has led to their widespread use across a broad spectrum of tasks including decision making. Prior studies have compared the decision making abilities of LLMs with those of humans from a psychological perspective. However, these studies have not always properly accounted for the sensitivity of LLMs' behavior to hyperparameters and variations in the prompt. In this study, we examine LLMs' performance on the Horizon decision making task studied by Binz and Schulz (2023) analyzing how LLMs respond to variations in prompts and hyperparameters. By experimenting on three OpenAI language models possessing different capabilities, we observe that the decision making abilities fluctuate based on the input prompts and temperature settings. Contrary to previous findings language models display a human-like exploration exploitation tradeoff after simple adjustments to the prompt. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: EMNLP 2023

arXiv:2312.16564 [pdf, other]

Balancing Priorities in Patrolling with Rabbit Walks

Authors: Rugved Katole, Deepak Mallya, Leena Vachhani, Arpita Sinha

Abstract: In an environment with certain locations of higher priority, it is required to patrol these locations as frequently as possible due to their importance. However, the Non-Priority locations are often neglected during the task. It is necessary to balance the patrols on both kinds of sites to avoid breaches in security. We present a distributed online algorithm that assigns the routes to agents that… ▽ More In an environment with certain locations of higher priority, it is required to patrol these locations as frequently as possible due to their importance. However, the Non-Priority locations are often neglected during the task. It is necessary to balance the patrols on both kinds of sites to avoid breaches in security. We present a distributed online algorithm that assigns the routes to agents that ensures a finite time visit to the Non-Priority locations along with Priority Patrolling. The proposed algorithm generates offline patrol routes (Rabbit Walks) with three segments (Hops) to explore non-priority locations. The generated number of offline walks depends exponentially on a parameter introduced in the proposed algorithm, thereby facilitating the scalable implementation based on the onboard resources available on each patrolling robot. A systematic performance evaluation through simulations and experimental results validates the proportionately balanced visits and suggests the proposed algorithm's versatile applicability in the implementation of deterministic and non-deterministic scenarios. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 8 pages, 8 figures, 2 tables

arXiv:2312.16177 [pdf, other]

Learning to Infer Unobserved Behaviors: Estimating User's Preference for a Site over Other Sites

Authors: Atanu R Sinha, Tanay Anand, Paridhi Maheshwari, A V Lakshmy, Vishal Jain

Abstract: A site's recommendation system relies on knowledge of its users' preferences to offer relevant recommendations to them. These preferences are for attributes that comprise items and content shown on the site, and are estimated from the data of users' interactions with the site. Another form of users' preferences is material too, namely, users' preferences for the site over other sites, since that s… ▽ More A site's recommendation system relies on knowledge of its users' preferences to offer relevant recommendations to them. These preferences are for attributes that comprise items and content shown on the site, and are estimated from the data of users' interactions with the site. Another form of users' preferences is material too, namely, users' preferences for the site over other sites, since that shows users' base level propensities to engage with the site. Estimating users' preferences for the site, however, faces major obstacles because (a) the focal site usually has no data of its users' interactions with other sites; these interactions are users' unobserved behaviors for the focal site; and (b) the Machine Learning literature in recommendation does not offer a model of this situation. Even if (b) is resolved, the problem in (a) persists since without access to data of its users' interactions with other sites, there is no ground truth for evaluation. Moreover, it is most useful when (c) users' preferences for the site can be estimated at the individual level, since the site can then personalize recommendations to individual users. We offer a method to estimate individual user's preference for a focal site, under this premise. In particular, we compute the focal site's share of a user's online engagements without any data from other sites. We show an evaluation framework for the model using only the focal site's data, allowing the site to test the model. We rely upon a Hierarchical Bayes Method and perform estimation in two different ways - Markov Chain Monte Carlo and Stochastic Gradient with Langevin Dynamics. Our results find good support for the approach to computing personalized share of engagement and for its evaluation. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.04566 [pdf, other]

Gen2Det: Generate to Detect

Authors: Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava

Abstract: Recently diffusion models have shown improvement in synthetic image quality as well as better control in generation. We motivate and present Gen2Det, a simple modular pipeline to create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods. Unlike existing works which generate individual object instances, require identifying foregrou… ▽ More Recently diffusion models have shown improvement in synthetic image quality as well as better control in generation. We motivate and present Gen2Det, a simple modular pipeline to create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods. Unlike existing works which generate individual object instances, require identifying foreground followed by pasting on other images, we simplify to directly generating scene-centric images. In addition to the synthetic data, Gen2Det also proposes a suite of techniques to best utilize the generated data, including image-level filtering, instance-level filtering, and better training recipe to account for imperfections in the generation. Using Gen2Det, we show healthy improvements on object detection and segmentation tasks under various settings and agnostic to detection methods. In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e.g. we see an improvement of 2.13 Box AP and 1.84 Mask AP over just training on real data on LVIS with Mask R-CNN. In the low-data regime setting on COCO, Gen2Det consistently improves both Box and Mask AP by 2.27 and 1.85 points. In the most general detection setting, Gen2Det still demonstrates robust performance gains, e.g. it improves the Box and Mask AP on COCO by 0.45 and 0.32 points. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04557 [pdf, other]

GenTron: Diffusion Transformers for Image and Video Generation

Authors: Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Animesh Sinha, ** Luo, Tao Xiang, Juan-Manuel Perez-Rua

Abstract: In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilizes CNN-based U-Net architectures, particularly in diffusion-based models. We introduce GenTron, a family of Generative models employing Transformer-bas… ▽ More In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilizes CNN-based U-Net architectures, particularly in diffusion-based models. We introduce GenTron, a family of Generative models employing Transformer-based diffusion, to address this gap. Our initial step was to adapt Diffusion Transformers (DiTs) from class to text conditioning, a process involving thorough empirical exploration of the conditioning mechanism. We then scale GenTron from approximately 900M to over 3B parameters, observing significant improvements in visual quality. Furthermore, we extend GenTron to text-to-video generation, incorporating novel motion-free guidance to enhance video quality. In human evaluations against SDXL, GenTron achieves a 51.1% win rate in visual quality (with a 19.8% draw rate), and a 42.3% win rate in text alignment (with a 42.9% draw rate). GenTron also excels in the T2I-CompBench, underscoring its strengths in compositional generation. We believe this work will provide meaningful insights and serve as a valuable reference for future research. △ Less

Submitted 2 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: CVPR2024 Camera Ready. Website: https://www.shoufachen.com/gentron_website/

arXiv:2312.03584 [pdf, other]

Context Diffusion: In-Context Aware Image Generation

Authors: Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic

Abstract: We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonst… ▽ More We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models are unable to truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and preserving the structure of the query images. This results in the ability to learn from the visual context and text prompts, but also from either one of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and user study demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and fidelity compared to counterpart models. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.17681 [pdf, other]

A low-cost Framework for Decentralized Autonomous Intersection Management

Authors: Rugved Katole, Arpita Sinha

Abstract: This paper addresses the traffic management problem for autonomous vehicles at intersections without traffic signals. In the current system, a road junction has no traffic signals when the traffic volume is low to medium. Installing infrastructure at each unsignalled crossing to coordinate autonomous cars can be formidable. We propose a novel low-cost solution strategy where the vehicles use a har… ▽ More This paper addresses the traffic management problem for autonomous vehicles at intersections without traffic signals. In the current system, a road junction has no traffic signals when the traffic volume is low to medium. Installing infrastructure at each unsignalled crossing to coordinate autonomous cars can be formidable. We propose a novel low-cost solution strategy where the vehicles use a harmony matrix to find the best possible combination of the cars to cross the intersection without any crashes. The harmony matrix defines the connection between different vehicle maneuvers and is queried online for intersection management. We maximize the throughput of the intersection by solving a maximal clique problem formulated based on the vehicles present at the intersection. The proposed algorithm relies on the intent perceived by the autonomous vehicles. We compare our work with a communication-based strategy that uses V2I communication protocols, and through extensive simulation, we showed that our algorithm is comparable when the traffic volume is less than 500 PCUs/hr/lane. △ Less

Submitted 28 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: Revised version

arXiv:2311.15373 [pdf, other]

Confidence Is All You Need for MI Attacks

Authors: Abhishek Sinha, Himanshi Tibrewal, Mansi Gupta, Nikhar Waghela, Shivank Garg

Abstract: In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss wit… ▽ More In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being 'fit' to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model's certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods. △ Less

Submitted 19 June, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 2 pages, 1 figure

arXiv:2311.15341 [pdf, other]

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

Authors: Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham

Abstract: Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for w… ▽ More Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy -- the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: Accepted in NeurIPS 2023. Website: https://cameron-chen.github.io/flow-iar/

arXiv:2311.12875 [pdf, other]

Nav-Q: Quantum Deep Reinforcement Learning for Collision-Free Navigation of Self-Driving Cars

Authors: Akash Sinha, Antonio Macaluso, Matthias Klusch

Abstract: The task of collision-free navigation (CFN) of self-driving cars is an NP-hard problem usually tackled using Deep Reinforcement Learning (DRL). While DRL methods have proven to be effective, their implementation requires substantial computing resources and extended training periods to develop a robust agent. On the other hand, quantum reinforcement learning has recently demonstrated faster converg… ▽ More The task of collision-free navigation (CFN) of self-driving cars is an NP-hard problem usually tackled using Deep Reinforcement Learning (DRL). While DRL methods have proven to be effective, their implementation requires substantial computing resources and extended training periods to develop a robust agent. On the other hand, quantum reinforcement learning has recently demonstrated faster convergence and improved stability in simple, non-real-world environments. In this work, we propose Nav-Q, the first quantum-supported DRL algorithm for CFN of self-driving cars, that leverages quantum computation for improving the training performance without the requirement for onboard quantum hardware. Nav-Q is based on the actor-critic approach, where the critic is implemented using a hybrid quantum-classical algorithm suitable for near-term quantum devices. We assess the performance of Nav-Q using the CARLA driving simulator, a de facto standard benchmark for evaluating state-of-the-art DRL methods. Our empirical evaluations showcase that Nav-Q surpasses its classical counterpart in terms of training stability and, in certain instances, with respect to the convergence rate. Furthermore, we assess Nav-Q in relation to effective dimension, unveiling that the incorporation of a quantum component results in a model with greater descriptive power compared to classical baselines. Finally, we evaluate the performance of Nav-Q using noisy quantum simulation, observing that the quantum noise deteriorates the training performances but enhances the exploratory tendencies of the agent during training. △ Less

Submitted 23 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: 28 pages, 12 figures, 4 tables

arXiv:2311.10794 [pdf, other]

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Authors: Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

Abstract: We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that r… ▽ More We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that relying on prompt engineering with a photorealistic model to generate stickers leads to poor prompt alignment and scene diversity. To overcome these drawbacks, we first finetune Emu on millions of sticker-like images collected using weak supervision to elicit diversity. Next, we curate human-in-the-loop (HITL) Alignment and Style datasets from model generations, and finetune to improve prompt alignment and style alignment respectively. Sequential finetuning on these datasets poses a tradeoff between better style alignment and prompt alignment gains. To address this tradeoff, we propose a novel fine-tuning method called Style Tailoring, which jointly fits the content and style distribution and achieves best tradeoff. Evaluation results show our method improves visual quality by 14%, prompt alignment by 16.2% and scene diversity by 15.3%, compared to prompt engineering the base Emu model for stickers generation. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 10 pages, 5 figures

arXiv:2311.10348 [pdf, other]

A Comparative Analysis of Retrievability and PageRank Measures

Authors: Aman Sinha, Priyanshu Raj Mall, Dwaipayan Roy

Abstract: The accessibility of documents within a collection holds a pivotal role in Information Retrieval, signifying the ease of locating specific content in a collection of documents. This accessibility can be achieved via two distinct avenues. The first is through some retrieval model using a keyword or other feature-based search, and the other is where a document can be navigated using links associated… ▽ More The accessibility of documents within a collection holds a pivotal role in Information Retrieval, signifying the ease of locating specific content in a collection of documents. This accessibility can be achieved via two distinct avenues. The first is through some retrieval model using a keyword or other feature-based search, and the other is where a document can be navigated using links associated with them, if available. Metrics such as PageRank, Hub, and Authority illuminate the pathways through which documents can be discovered within the network of content while the concept of Retrievability is used to quantify the ease with which a document can be found by a retrieval model. In this paper, we compare these two perspectives, PageRank and retrievability, as they quantify the importance and discoverability of content in a corpus. Through empirical experimentation on benchmark datasets, we demonstrate a subtle similarity between retrievability and PageRank particularly distinguishable for larger datasets. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: Accepted at FIRE 2023

arXiv:2311.09998 [pdf, other]

DeepEMD: A Transformer-based Fast Estimation of the Earth Mover's Distance

Authors: Atul Kumar Sinha, Francois Fleuret

Abstract: The Earth Mover's Distance (EMD) is the measure of choice between point clouds. However the computational cost to compute it makes it prohibitive as a training loss, and the standard approach is to use a surrogate such as the Chamfer distance. We propose an attention-based model to compute an accurate approximation of the EMD that can be used as a training loss for generative models. To get the ne… ▽ More The Earth Mover's Distance (EMD) is the measure of choice between point clouds. However the computational cost to compute it makes it prohibitive as a training loss, and the standard approach is to use a surrogate such as the Chamfer distance. We propose an attention-based model to compute an accurate approximation of the EMD that can be used as a training loss for generative models. To get the necessary accurate estimation of the gradients we train our model to explicitly compute the matching between point clouds instead of EMD itself. We cast this new objective as the estimation of an attention matrix that approximates the ground truth matching matrix. Experiments show that this model provides an accurate estimate of the EMD and its gradient with a wall clock speed-up of more than two orders of magnitude with respect to the exact Hungarian matching algorithm and one order of magnitude with respect to the standard approximate Sinkhorn algorithm, allowing in particular to train a point cloud VAE with the EMD itself. Extensive evaluation show the remarkable behaviour of this model when operating out-of-distribution, a key requirement for a distance surrogate. Finally, the model generalizes very well to point clouds during inference several times larger than during training. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.08357 [pdf, other]

Sparsity-Preserving Differentially Private Training of Large Embedding Models

Authors: Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract: As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can d… ▽ More As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: Neural Information Processing Systems (NeurIPS) 2023

arXiv:2311.06542 [pdf]

Generation Of Colors using Bidirectional Long Short Term Memory Networks

Authors: A. Sinha

Abstract: Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our… ▽ More Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately. A novel model has been developed to achieve this goal, leveraging Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning. This model operates on a proprietary dataset meticulously curated for this study. The primary objective of this research is to create a versatile tool for categorizing and naming previously unnamed colours or identifying intermediate shades that elude traditional colour terminology. The findings underscore the potential of this innovative approach in revolutionizing our understanding of colour perception and language. Through rigorous experimentation and analysis, this study illuminates a promising avenue for Natural Language Processing (NLP) applications in diverse industries. By facilitating the exploration of the vast colour spectrum the potential applications of NLP are extended beyond conventional boundaries. △ Less

Submitted 31 December, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2311.05705 [pdf, other]

doi 10.1007/s12648-024-03103-9

Achieving Maximum Utilization in Optimal Time for Learning or Convergence in the Kolkata Paise Restaurant Problem

Authors: Aniruddha Biswas, Antika Sinha, Bikas K. Chakrabarti

Abstract: The objective of the KPR agents are to learn themselves in the minimum (learning) time to have maximum success or utilization probability ($f$). A dictator can easily solve the problem with $f = 1$ in no time, by asking every one to form a queue and go to the respective restaurant, resulting in no fluctuation and full utilization from the first day (convergence time $τ= 0$). It has already been sh… ▽ More The objective of the KPR agents are to learn themselves in the minimum (learning) time to have maximum success or utilization probability ($f$). A dictator can easily solve the problem with $f = 1$ in no time, by asking every one to form a queue and go to the respective restaurant, resulting in no fluctuation and full utilization from the first day (convergence time $τ= 0$). It has already been shown that if each agent chooses randomly the restaurants, $f = 1 - e^{-1} \simeq 0.63$ (where $e \simeq 2.718$ denotes the Euler number) in zero time ($τ= 0$). With the only available information about yesterday's crowd size in the restaurant visited by the agent (as assumed for the rest of the strategies studied here), the crowd avoiding (CA) strategies can give higher values of $f$ but also of $τ$. Several numerical studies of modified learning strategies actually indicated increased value of $f = 1 - α$ for $α\to 0$, with $τ\sim 1/α$. We show here using Monte Carlo technique, a modified Greedy Crowd Avoiding (GCA) Strategy can assure full utilization ($f = 1$) in convergence time $τ\simeq eN$, with of course non-zero probability for an even larger convergence time. All these observations suggest that the strategies with single step memory of the individuals can never collectively achieve full utilization ($f = 1$) in finite convergence time and perhaps the maximum possible utilization that can be achieved is about eighty percent ($f \simeq 0.80$) in an optimal time $τ$ of order ten, even when $N$ the number of customers or of the restaurants goes to infinity. △ Less

Submitted 15 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: 9 pages, 6 figures included in manuscript; Accepted for publication in Indian Journal of Physics

Showing 1–50 of 269 results for author: Sinha, A