Search | arXiv e-print repository

On Fourier analysis of sparse Boolean functions over certain Abelian groups

Authors: Sourav Chakraborty, Swarnalipa Datta, Pranjal Dutta, Arijit Ghosh, Swagato Sanyal

Abstract: Given an Abelian group G, a Boolean-valued function f: G -> {-1,+1}, is said to be s-sparse, if it has at most s-many non-zero Fourier coefficients over the domain G. In a seminal paper, Gopalan et al. proved "Granularity" for Fourier coefficients of Boolean valued functions over Z_2^n, that have found many diverse applications in theoretical computer science and combinatorics. They also studied s… ▽ More Given an Abelian group G, a Boolean-valued function f: G -> {-1,+1}, is said to be s-sparse, if it has at most s-many non-zero Fourier coefficients over the domain G. In a seminal paper, Gopalan et al. proved "Granularity" for Fourier coefficients of Boolean valued functions over Z_2^n, that have found many diverse applications in theoretical computer science and combinatorics. They also studied structural results for Boolean functions over Z_2^n which are approximately Fourier-sparse. In this work, we obtain structural results for approximately Fourier-sparse Boolean valued functions over Abelian groups G of the form,G:= Z_{p_1}^{n_1} \times ... \times Z_{p_t}^{n_t}, for distinct primes p_i. We also obtain a lower bound of the form 1/(m^{2}s)^ceiling(phi(m)/2), on the absolute value of the smallest non-zero Fourier coefficient of an s-sparse function, where m=p_1 ... p_t, and phi(m)=(p_1-1) ... (p_t-1). We carefully apply probabilistic techniques from Gopalan et al., to obtain our structural results, and use some non-trivial results from algebraic number theory to get the lower bound. We construct a family of at most s-sparse Boolean functions over Z_p^n, where p > 2, for arbitrarily large enough s, where the minimum non-zero Fourier coefficient is 1/omega(n). The "Granularity" result of Gopalan et al. implies that the absolute values of non-zero Fourier coefficients of any s-sparse Boolean valued function over Z_2^n are 1/O(s). So, our result shows that one cannot expect such a lower bound for general Abelian groups. Using our new structural results on the Fourier coefficients of sparse functions, we design an efficient testing algorithm for Fourier-sparse Boolean functions, thata requires poly((ms)^phi(m),1/epsilon)-many queries. Further, we prove an Omega(sqrt{s}) lower bound on the query complexity of any adaptive sparsity testing algorithm. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.09547

doi 10.1145/3637528.3671899

FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

Authors: Tong Xia, Abhirup Ghosh, Xinchi Qiu, Cecilia Mascolo

Abstract: Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to… ▽ More Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to these challenges, we propose a pioneering framework called FLea, incorporating the following key components: i) A global feature buffer that stores activation-target pairs shared from multiple clients to support local training. This design mitigates local model drift caused by the absence of certain classes; ii) A feature augmentation approach based on local and global activation mix-ups for local training. This strategy enlarges the training samples, thereby reducing the risk of local overfitting; iii) An obfuscation method to minimize the correlation between intermediate activations and the source data, enhancing the privacy of shared features. To verify the superiority of FLea, we conduct extensive experiments using a wide range of data modalities, simulating different levels of local data scarcity and label skew. The results demonstrate that FLea consistently outperforms state-of-the-art FL counterparts (among 13 of the experimented 18 settings, the improvement is over 5% while concurrently mitigating the privacy vulnerabilities associated with shared features. Code is available at https://github.com/XTxiatong/FLea.git. △ Less

Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: This work was intended as a replacement of arXiv:2312.02327 and any subsequent updates will appear there

arXiv:2406.07661 [pdf, other]

ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones

Authors: Anurag Ghosh, Robert Tamburo, Shen Zheng, Juan R. Alvarez-Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa G. Narasimhan

Abstract: Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for develo** new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation mod… ▽ More Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for develo** new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset, we improve upon detecting work zone objects (+26.2 AP), while discovering work zones with higher precision (+32.5%) at a much higher discovery rate (12.8 times), significantly improve detecting (+23.9 AP) and reading (+14.2% 1-NED) work zone signs and describing work zones (+36.7 SPICE). We also compute drivable paths from work zone navigation videos and show that it is possible to predict navigational goals and pathways such that 53.6% goals have angular error (AE) < 0.5 degrees (+9.9 %) and 75.3% pathways have AE < 0.5 degrees (+8.1 %). △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05288 [pdf, other]

Optimal Eye Surgeon: Finding Image Priors through Sparse Generators at Initialization

Authors: Avrajit Ghosh, Xitong Zhang, Kenneth K. Sun, Qing Qu, Saiprasad Ravishankar, Rongrong Wang

Abstract: We introduce Optimal Eye Surgeon (OES), a framework for pruning and training deep image generator networks. Typically, untrained deep convolutional networks, which include image sampling operations, serve as effective image priors (Ulyanov et al., 2018). However, they tend to overfit to noise in image restoration tasks due to being overparameterized. OES addresses this by adaptively pruning networ… ▽ More We introduce Optimal Eye Surgeon (OES), a framework for pruning and training deep image generator networks. Typically, untrained deep convolutional networks, which include image sampling operations, serve as effective image priors (Ulyanov et al., 2018). However, they tend to overfit to noise in image restoration tasks due to being overparameterized. OES addresses this by adaptively pruning networks at random initialization to a level of underparameterization. This process effectively captures low-frequency image components even without training, by just masking. When trained to fit noisy images, these pruned subnetworks, which we term Sparse-DIP, resist overfitting to noise. This benefit arises from underparameterization and the regularization effect of masking, constraining them in the manifold of image priors. We demonstrate that subnetworks pruned through OES surpass other leading pruning methods, such as the Lottery Ticket Hypothesis, which is known to be suboptimal for image recovery tasks (Wu et al., 2023). Our extensive experiments demonstrate the transferability of OES-masks and the characteristics of sparse-subnetworks for image generation. Code is available at https://github.com/Avra98/Optimal-Eye-Surgeon.git. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Pruning image generator networks at initialization to alleviate overfitting

Journal ref: International Conference on Machine Learning (ICML 2024)

arXiv:2406.04231 [pdf, other]

Quantifying Misalignment Between Agents

Authors: Aidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen

Abstract: Growing concerns about the AI alignment problem have emerged in recent years, with previous work focusing mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a singular unit. Recent work in sociotechnical AI alignment has made… ▽ More Growing concerns about the AI alignment problem have emerged in recent years, with previous work focusing mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a singular unit. Recent work in sociotechnical AI alignment has made some progress in defining alignment inclusively, but the field as a whole still lacks a systematic understanding of how to specify, describe, and analyze misalignment among entities, which may include individual humans, AI agents, and complex compositional entities such as corporations, nation-states, and so forth. Previous work on controversy in computational social science offers a mathematical model of contention among populations (of humans). In this paper, we adapt this contention model to the alignment problem, and show how misalignment can vary depending on the population of agents (human or otherwise) being observed, the domain in question, and the agents' probability-weighted preferences between possible outcomes. Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice. We apply our model by analyzing several case studies ranging from social media moderation to autonomous vehicle behavior. By applying our model with appropriately representative value data, AI engineers can ensure that their systems learn values maximally aligned with diverse human interests. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 10 pages, 2 figures, 4 tables, submitted to AIES-24

ACM Class: I.2.11; K.4.m

arXiv:2406.04146 [pdf, other]

Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

Authors: Guangliang Liu, Milad Afshari, Xitong Zhang, Zhiyu Xue, Avrajit Ghosh, Bidhan Bashyal, Rongrong Wang, Kristen Johnson

Abstract: While task-agnostic debiasing provides notable generalizability and reduced reliance on downstream data, its impact on language modeling ability and the risk of relearning social biases from downstream task-specific data remain as the two most significant challenges when debiasing Pretrained Language Models (PLMs). The impact on language modeling ability can be alleviated given a high-quality and… ▽ More While task-agnostic debiasing provides notable generalizability and reduced reliance on downstream data, its impact on language modeling ability and the risk of relearning social biases from downstream task-specific data remain as the two most significant challenges when debiasing Pretrained Language Models (PLMs). The impact on language modeling ability can be alleviated given a high-quality and long-contextualized debiasing corpus, but there remains a deficiency in understanding the specifics of relearning biases. We empirically ascertain that the effectiveness of task-agnostic debiasing hinges on the quantitative bias level of both the task-specific data used for downstream applications and the debiased model. We empirically show that the lower bound of the bias level of the downstream fine-tuned model can be approximated by the bias level of the debiased model, in most practical cases. To gain more in-depth understanding about how the parameters of PLMs change during fine-tuning due to the forgetting issue of PLMs, we propose a novel framework which can Propagate Socially-fair Debiasing to Downstream Fine-tuning, ProSocialTuning. Our proposed framework can push the fine-tuned model to approach the bias lower bound during downstream fine-tuning, indicating that the ineffectiveness of debiasing can be alleviated by overcoming the forgetting issue through regularizing successfully debiased attention heads based on the PLMs' bias levels from stages of pretraining and debiasing. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03864 [pdf, other]

PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

Authors: Lokesh Nagalapatti, Pranava Singhal, Avishek Ghosh, Sunita Sarawagi

Abstract: Given a dataset of individuals each described by a covariate vector, a treatment, and an observed outcome on the treatment, the goal of the individual treatment effect (ITE) estimation task is to predict outcome changes resulting from a change in treatment. A fundamental challenge is that in the observational data, a covariate's outcome is observed only under one treatment, whereas we need to infe… ▽ More Given a dataset of individuals each described by a covariate vector, a treatment, and an observed outcome on the treatment, the goal of the individual treatment effect (ITE) estimation task is to predict outcome changes resulting from a change in treatment. A fundamental challenge is that in the observational data, a covariate's outcome is observed only under one treatment, whereas we need to infer the difference in outcomes under two different treatments. Several existing approaches address this issue through training with inferred pseudo-outcomes, but their success relies on the quality of these pseudo-outcomes. We propose PairNet, a novel ITE estimation training strategy that minimizes losses over pairs of examples based on their factual observed outcomes. Theoretical analysis for binary treatments reveals that PairNet is a consistent estimator of ITE risk, and achieves smaller generalization error than baseline models. Empirical comparison with thirteen existing methods across eight benchmarks, covering both discrete and continuous treatments, shows that PairNet achieves significantly lower ITE error compared to the baselines. Also, it is model-agnostic and easy to implement. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Lokesh and Pranava contributed equally. Accepted at ICML-24

arXiv:2406.01149 [pdf, ps, other]

Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

Authors: Avishek Ghosh, Arya Mazumdar

Abstract: Mixed linear regression is a well-studied problem in parametric statistics and machine learning. Given a set of samples, tuples of covariates and labels, the task of mixed linear regression is to find a small list of linear relationships that best fit the samples. Usually it is assumed that the label is generated stochastically by randomly selecting one of two or more linear functions, applying th… ▽ More Mixed linear regression is a well-studied problem in parametric statistics and machine learning. Given a set of samples, tuples of covariates and labels, the task of mixed linear regression is to find a small list of linear relationships that best fit the samples. Usually it is assumed that the label is generated stochastically by randomly selecting one of two or more linear functions, applying this chosen function to the covariates, and potentially introducing noise to the result. In that situation, the objective is to estimate the ground-truth linear functions up to some parameter error. The popular expectation maximization (EM) and alternating minimization (AM) algorithms have been previously analyzed for this. In this paper, we consider the more general problem of agnostic learning of mixed linear regression from samples, without such generative models. In particular, we show that the AM and EM algorithms, under standard conditions of separability and good initialization, lead to agnostic learning in mixed linear regression by converging to the population loss minimizers, for suitably defined loss functions. In some sense, this shows the strength of AM and EM algorithms that converges to ``optimal solutions'' even in the absence of realizable generative models. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: To appear in ICML 2024

arXiv:2406.00808 [pdf, other]

EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing

Authors: Hadrien Reynaud, Qingjie Meng, Mischa Dombrowski, Arijit Ghosh, Thomas Day, Alberto Gomez, Paul Leeson, Bernhard Kainz

Abstract: To make medical datasets accessible without sharing sensitive patient information, we introduce a novel end-to-end approach for generative de-identification of dynamic medical imaging data. Until now, generative methods have faced constraints in terms of fidelity, spatio-temporal coherence, and the length of generation, failing to capture the complete details of dataset distributions. We present a… ▽ More To make medical datasets accessible without sharing sensitive patient information, we introduce a novel end-to-end approach for generative de-identification of dynamic medical imaging data. Until now, generative methods have faced constraints in terms of fidelity, spatio-temporal coherence, and the length of generation, failing to capture the complete details of dataset distributions. We present a model designed to produce high-fidelity, long and complete data samples with near-real-time efficiency and explore our approach on a challenging task: generating echocardiogram videos. We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization. As an exemplar, we present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels. As part of our de-identification protocol, we evaluate the quality of the generated dataset and propose to use clinical downstream tasks as a measurement on top of widely used but potentially biased image quality metrics. Experimental outcomes demonstrate that EchoNet-Synthetic achieves comparable dataset fidelity to the actual dataset, effectively supporting the ejection fraction regression task. Code, weights and dataset are available at https://github.com/HReynaud/EchoNet-Synthetic. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted at MICCAI 2024

arXiv:2405.20933 [pdf, ps, other]

Concentration Bounds for Optimized Certainty Equivalent Risk Estimation

Authors: Ayon Ghosh, L. A. Prashanth, Krishna Jagannathan

Abstract: We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample b… ▽ More We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample bounds for the same. To show the applicability of our bounds, we consider a risk-aware bandit problem, with OCE as the risk. For this problem, we derive bound on the probability of mis-identification. Finally, we conduct numerical experiments to validate the theoretical findings. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.10167 [pdf, ps, other]

Near Uniform Triangle Sampling Over Adjacency List Graph Streams

Authors: Arijit Bishnu, Arijit Ghosh, Gopinath Mishra, Sayantan Sen

Abstract: Triangle counting and sampling are two fundamental problems for streaming algorithms. Arguably, designing sampling algorithms is more challenging than their counting variants. It may be noted that triangle counting has received far greater attention in the literature than the sampling variant. In this work, we consider the problem of approximately sampling triangles in different models of streamin… ▽ More Triangle counting and sampling are two fundamental problems for streaming algorithms. Arguably, designing sampling algorithms is more challenging than their counting variants. It may be noted that triangle counting has received far greater attention in the literature than the sampling variant. In this work, we consider the problem of approximately sampling triangles in different models of streaming with the focus being on the adjacency list model. In this problem, the edges of a graph $G$ will arrive over a data stream. The goal is to design efficient streaming algorithms that can sample and output a triangle from a distribution, over the triangles in $G$, that is close to the uniform distribution over the triangles in $G$. The distance between distributions is measured in terms of $\ell_1$-distance. The main technical contribution of this paper is to design algorithms for this triangle sampling problem in the adjacency list model with the space complexities matching their counting variants. For the sake of completeness, we also show results on the vertex and edge arrival models. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 26 pages

arXiv:2405.09589 [pdf, other]

Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey

Authors: Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha

Abstract: The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the b… ▽ More The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research in this pivotal area. △ Less

Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.06671 [pdf, other]

Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling

Authors: Subhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Abstract: We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata informatio… ▽ More We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata information to frame our target outputs while proposing a parameter efficient solution for the task using LoRA. We perform experiments on two recently released financial numeric labeling datasets. Our proposed model, FLAN-FinXC, achieves new state-of-the-art performances on both the datasets, outperforming several strong baselines. We explain the better scores of our proposed model by demonstrating its capability for zero-shot as well as the least frequently occurring tags. Also, even when we fail to predict the XBRL tags correctly, our generated output has substantial overlap with the ground-truth in majority of the cases. △ Less

Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Comments: This work has been accepted to appear at North American Chapter of the Association for Computational Linguistics (NAACL), 2024

arXiv:2405.02173 [pdf, other]

Task Synthesis for Elementary Visual Programming in XLogoOnline Environment

Authors: Chao Wen, Ahana Ghosh, Jacqueline Staub, Adish Singla

Abstract: In recent years, the XLogoOnline programming platform has gained popularity among novice learners. It integrates the Logo programming language with visual programming, providing a visual interface for learning computing concepts. However, XLogoOnline offers only a limited set of tasks, which are inadequate for learners to master the computing concepts that require sufficient practice. To address t… ▽ More In recent years, the XLogoOnline programming platform has gained popularity among novice learners. It integrates the Logo programming language with visual programming, providing a visual interface for learning computing concepts. However, XLogoOnline offers only a limited set of tasks, which are inadequate for learners to master the computing concepts that require sufficient practice. To address this, we introduce XLogoSyn, a novel technique for synthesizing high-quality tasks for varying difficulty levels. Given a reference task, XLogoSyn can generate practice tasks at varying difficulty levels that cater to the varied needs and abilities of different learners. XLogoSyn achieves this by combining symbolic execution and constraint satisfaction techniques. Our expert study demonstrates the effectiveness of XLogoSyn. We have also deployed synthesized practice tasks into XLogoOnline, highlighting the educational benefits of these synthesized practice tasks. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Accepted as a paper at the AIED'24 conference in the late-breaking results track

arXiv:2404.16156 [pdf, other]

Guardians of the Quantum GAN

Authors: Archisman Ghosh, Debarshi Kundu, Avimita Chatterjee, Swaroop Ghosh

Abstract: Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untruste… ▽ More Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untrusted vendors potentially stealing proprietary information from the quantum machine learning models. To address this concern we propose a novel watermarking technique that exploits the noise signature embedded during the training phase of qGANs as a non-invasive watermark. The watermark is identifiable in the images generated by the qGAN allowing us to trace the specific quantum hardware used during training hence providing strong proof of ownership. To further enhance the security robustness, we propose the training of qGANs on a sequence of multiple quantum hardware, embedding a complex watermark comprising the noise signatures of all the training hardware that is difficult for adversaries to replicate. We also develop a machine learning classifier to extract this watermark robustly, thereby identifying the training hardware (or the suite of hardware) from the images generated by the qGAN validating the authenticity of the model. We note that the watermark signature is robust against inferencing on hardware different than the hardware that was used for training. We obtain watermark extraction accuracy of 100% and ~90% for training the qGAN on individual and multiple quantum hardware setups (and inferencing on different hardware), respectively. Since parameter evolution during training is strongly modulated by quantum noise, the proposed watermark can be extended to other quantum machine learning models as well. △ Less

Submitted 15 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 11 pages, 10 figures

arXiv:2404.14332 [pdf, other]

Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Authors: Alexander Shmakov, Kevin Greif, Michael James Fenton, Aishik Ghosh, Pierre Baldi, Daniel Whiteson

Abstract: The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generati… ▽ More The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic top quark pair production at the Large Hadron Collider. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Submission to SciPost

arXiv:2404.10875 [pdf, other]

A Dataset for Large Language Model-Driven AI Accelerator Generation

Authors: Mahmoud Nazzal, Deepak Vungarala, Mehrdad Morsali, Chao Zhang, Arnob Ghosh, Abdallah Khreishah, Shaahin Angizi

Abstract: In the ever-evolving landscape of Deep Neural Networks (DNN) hardware acceleration, unlocking the true potential of systolic array accelerators has long been hindered by the daunting challenges of expertise and time investment. Large Language Models (LLMs) offer a promising solution for automating code generation which is key to unlocking unprecedented efficiency and performance in various domains… ▽ More In the ever-evolving landscape of Deep Neural Networks (DNN) hardware acceleration, unlocking the true potential of systolic array accelerators has long been hindered by the daunting challenges of expertise and time investment. Large Language Models (LLMs) offer a promising solution for automating code generation which is key to unlocking unprecedented efficiency and performance in various domains, including hardware descriptive code. However, the successful application of LLMs to hardware accelerator design is contingent upon the availability of specialized datasets tailored for this purpose. To bridge this gap, we introduce the Systolic Array-based Accelerator DataSet (SA-DS). SA-DS comprises of a diverse collection of spatial arrays following the standardized Berkeley's Gemmini accelerator generator template, enabling design reuse, adaptation, and customization. SA-DS is intended to spark LLM-centred research on DNN hardware accelerator architecture. We envision that SA-DS provides a framework which will shape the course of DNN hardware acceleration research for generations to come. SA-DS is open-sourced under the permissive MIT license at this https://github.com/ACADLab/SA-DS. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 4 pages, 4 Figures

arXiv:2404.07214 [pdf, other]

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

Authors: Akash Ghosh, Arkadeep Acharya, Sriparna Saha, Vinija Jain, Aman Chadha

Abstract: The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced… ▽ More The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced models are instrumental in tackling more intricate tasks such as image captioning and visual question answering. In our comprehensive survey paper, we delve into the key advancements within the realm of VLMs. Our classification organizes VLMs into three distinct categories: models dedicated to vision-language understanding, models that process multimodal inputs to generate unimodal (textual) outputs and models that both accept and produce multimodal inputs and outputs.This classification is based on their respective capabilities and functionalities in processing and generating various modalities of data.We meticulously dissect each model, offering an extensive analysis of its foundational architecture, training data sources, as well as its strengths and limitations wherever possible, providing readers with a comprehensive understanding of its essential components. We also analyzed the performance of VLMs in various benchmark datasets. By doing so, we aim to offer a nuanced understanding of the diverse landscape of VLMs. Additionally, we underscore potential avenues for future research in this dynamic domain, anticipating further breakthroughs and advancements. △ Less

Submitted 12 April, 2024; v1 submitted 20 February, 2024; originally announced April 2024.

Comments: The most extensive and up to date Survey on Visual Language Models covering 76 Visual Language Models

arXiv:2404.04224 [pdf, other]

Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions

Authors: Zachary R. Fox, Ayana Ghosh

Abstract: Predicting and enhancing inherent properties based on molecular structures is paramount to design tasks in medicine, materials science, and environmental management. Most of the current machine learning and deep learning approaches have become standard for predictions, but they face challenges when applied across different datasets due to reliance on correlations between molecular representation a… ▽ More Predicting and enhancing inherent properties based on molecular structures is paramount to design tasks in medicine, materials science, and environmental management. Most of the current machine learning and deep learning approaches have become standard for predictions, but they face challenges when applied across different datasets due to reliance on correlations between molecular representation and target properties. These approaches typically depend on large datasets to capture the diversity within the chemical space, facilitating a more accurate approximation, interpolation, or extrapolation of the chemical behavior of molecules. In our research, we introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling with the use of a graph loss function. This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space. The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously. While our implementation focused on the QM9 quantum-chemical dataset for a specific design task-finding molecules with a large dipole moment-our active causal learning approach, driven by intelligent sampling and interventions, holds potential for broader applications in molecular, materials design and discovery. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.04125 [pdf, other]

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Authors: Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H. S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge

Abstract: Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream conce… ▽ More Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found. △ Less

Submitted 8 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Extended version of the short paper accepted at DPFM, ICLR'24

arXiv:2404.00401 [pdf, other]

How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset

Authors: Akash Ghosh, B Venkata Sahith, Niloy Ganguly, Pawan Goyal, Mayank Singh

Abstract: Question-answering (QA) on hybrid scientific tabular and textual data deals with scientific information, and relies on complex numerical reasoning. In recent years, while tabular QA has seen rapid progress, understanding their robustness on scientific information is lacking due to absence of any benchmark dataset. To investigate the robustness of the existing state-of-the-art QA models on scientif… ▽ More Question-answering (QA) on hybrid scientific tabular and textual data deals with scientific information, and relies on complex numerical reasoning. In recent years, while tabular QA has seen rapid progress, understanding their robustness on scientific information is lacking due to absence of any benchmark dataset. To investigate the robustness of the existing state-of-the-art QA models on scientific hybrid tabular data, we propose a new dataset, "SciTabQA", consisting of 822 question-answer pairs from scientific tables and their descriptions. With the help of this dataset, we assess the state-of-the-art Tabular QA models based on their ability (i) to use heterogeneous information requiring both structured data (table) and unstructured data (text) and (ii) to perform complex scientific reasoning tasks. In essence, we check the capability of the models to interpret scientific tables and text. Our experiments show that "SciTabQA" is an innovative dataset to study question-answering over scientific heterogeneous data. We benchmark three state-of-the-art Tabular QA models, and find that the best F1 score is only 0.462. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.12712 [pdf, other]

Addressing Source Scale Bias via Image War** for Domain Adaptation

Authors: Shen Zheng, Anurag Ghosh, Srinivasa G. Narasimhan

Abstract: In visual recognition, scale bias is a key challenge due to the imbalance of object and image size distribution inherent in real scene datasets. Conventional solutions involve injecting scale invariance priors, oversampling the dataset at different scales during training, or adjusting scale at inference. While these strategies mitigate scale bias to some extent, their ability to adapt across diver… ▽ More In visual recognition, scale bias is a key challenge due to the imbalance of object and image size distribution inherent in real scene datasets. Conventional solutions involve injecting scale invariance priors, oversampling the dataset at different scales during training, or adjusting scale at inference. While these strategies mitigate scale bias to some extent, their ability to adapt across diverse datasets is limited. Besides, they increase computational load during training and latency during inference. In this work, we use adaptive attentional processing -- oversampling salient object regions by war** images in-place during training. Discovering that shifting the source scale distribution improves backbone features, we developed a instance-level war** guidance aimed at object region sampling to mitigate source scale bias in domain adaptation. Our approach improves adaptation across geographies, lighting and weather conditions, is agnostic to the task, domain adaptation algorithm, saliency guidance, and underlying model architecture. Highlights include +6.1 mAP50 for BDD100K Clear $\rightarrow$ DENSE Foggy, +3.7 mAP50 for BDD100K Day $\rightarrow$ Night, +3.0 mAP50 for BDD100K Clear $\rightarrow$ Rainy, and +6.3 mIoU for Cityscapes $\rightarrow$ ACDC. Our approach adds minimal memory during training and has no additional latency at inference time. Please see Appendix for more results and analysis. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12227 [pdf, other]

Analyzing-Evaluating-Creating: Assessing Computational Thinking and Problem Solving in Visual Programming Domains

Authors: Ahana Ghosh, Liina Malva, Adish Singla

Abstract: Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide. Consequently, there is a growing need to develop reliable assessments for measuring students' proficiency in these skills. Recent works have proposed tests for assessing these skills across various CT concepts and practices, in particular, based on multi-choice items enabling psy… ▽ More Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide. Consequently, there is a growing need to develop reliable assessments for measuring students' proficiency in these skills. Recent works have proposed tests for assessing these skills across various CT concepts and practices, in particular, based on multi-choice items enabling psychometric validation and usage in large-scale studies. Despite their practical relevance, these tests are limited in how they measure students' computational creativity, a crucial ability when applying CT and problem solving in real-world settings. In our work, we have developed ACE, a novel test focusing on the three higher cognitive levels in Bloom's Taxonomy, i.e., Analyze, Evaluate, and Create. ACE comprises a diverse set of 7x3 multi-choice items spanning these three levels, grounded in elementary block-based visual programming. We evaluate the psychometric properties of ACE through a study conducted with 371 students in grades 3-7 from 10 schools. Based on several psychometric analysis frameworks, our results confirm the reliability and validity of ACE. Our study also shows a positive correlation between students' performance on ACE and performance on Hour of Code: Maze Challenge by Code.org. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: This extended version of the SIGCSE 2024 paper includes all 21 test items from ACE along with their answers in the appendix

arXiv:2403.06890 [pdf, other]

Application of Quantum Tensor Networks for Protein Classification

Authors: Debarshi Kundu, Archisman Ghosh, Srinivasan Ekambaram, Jian Wang, Nikolay Dokholyan, Swaroop Ghosh

Abstract: We show that protein sequences can be thought of as sentences in natural language processing and can be parsed using the existing Quantum Natural Language framework into parameterized quantum circuits of reasonable qubits, which can be trained to solve various protein-related machine-learning problems. We classify proteins based on their subcellular locations, a pivotal task in bioinformatics that… ▽ More We show that protein sequences can be thought of as sentences in natural language processing and can be parsed using the existing Quantum Natural Language framework into parameterized quantum circuits of reasonable qubits, which can be trained to solve various protein-related machine-learning problems. We classify proteins based on their subcellular locations, a pivotal task in bioinformatics that is key to understanding biological processes and disease mechanisms. Leveraging the quantum-enhanced processing capabilities, we demonstrate that Quantum Tensor Networks (QTN) can effectively handle the complexity and diversity of protein sequences. We present a detailed methodology that adapts QTN architectures to the nuanced requirements of protein data, supported by comprehensive experimental results. We demonstrate two distinct QTNs, inspired by classical recurrent neural networks (RNN) and convolutional neural networks (CNN), to solve the binary classification task mentioned above. Our top-performing quantum model has achieved a 94% accuracy rate, which is comparable to the performance of a classical model that uses the ESM2 protein language model embeddings. It's noteworthy that the ESM2 model is extremely large, containing 8 million parameters in its smallest configuration, whereas our best quantum model requires only around 800 parameters. We demonstrate that these hybrid models exhibit promising performance, showcasing their potential to compete with classical models of similar complexity. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 7 pages, 8 figures

arXiv:2403.01234 [pdf]

Active Deep Kernel Learning of Molecular Functionalities: Realizing Dynamic Structural Embeddings

Authors: Ayana Ghosh, Maxim Ziatdinov and, Sergei V. Kalinin

Abstract: Exploring molecular spaces is crucial for advancing our understanding of chemical properties and reactions, leading to groundbreaking innovations in materials science, medicine, and energy. This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL), a novel approach surpassing the limits of classical Variational Autoencoders (VAEs). Employing the QM… ▽ More Exploring molecular spaces is crucial for advancing our understanding of chemical properties and reactions, leading to groundbreaking innovations in materials science, medicine, and energy. This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL), a novel approach surpassing the limits of classical Variational Autoencoders (VAEs). Employing the QM9 dataset, we contrast DKL with traditional VAEs, which analyze molecular structures based on similarity, revealing limitations due to sparse regularities in latent spaces. DKL, however, offers a more holistic perspective by correlating structure with properties, creating latent spaces that prioritize molecular functionality. This is achieved by recalculating embedding vectors iteratively, aligning with the experimental availability of target properties. The resulting latent spaces are not only better organized but also exhibit unique characteristics such as concentrated maxima representing molecular functionalities and a correlation between predictive uncertainty and error. Additionally, the formation of exclusion regions around certain compounds indicates unexplored areas with potential for groundbreaking functionalities. This study underscores DKL's potential in molecular research, offering new avenues for understanding and discovering molecular functionalities beyond classical VAE limitations. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.17604 [pdf, ps, other]

Equivariant ideals of polynomials

Authors: Arka Ghosh, Sławomir Lasota

Abstract: We study existence and computability of finite bases for ideals of polynomials over infinitely many variables. In our setting, variables come from a countable logical structure A, and embeddings from A to A act on polynomials by renaming variables. First, we give a sufficient and necessary condition for A to guarantee the following generalisation of Hilbert's Basis Theorem: every polynomial ideal… ▽ More We study existence and computability of finite bases for ideals of polynomials over infinitely many variables. In our setting, variables come from a countable logical structure A, and embeddings from A to A act on polynomials by renaming variables. First, we give a sufficient and necessary condition for A to guarantee the following generalisation of Hilbert's Basis Theorem: every polynomial ideal which is equivariant, i.e. invariant under renaming of variables, is finitely generated. Second, we develop an extension of classical Buchberger's algorithm to compute a Gröbner basis of a given equivariant ideal. This implies decidability of the membership problem for equivariant ideals. Finally, we sketch upon various applications of these results to register automata, Petri nets with data, orbit-finitely generated vector spaces, and orbit-finite systems of linear equations. △ Less

Submitted 22 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

ACM Class: F.1.1; F.4.1

arXiv:2402.10012 [pdf, other]

Countably Colorful Hyperplane Transversal

Authors: Sutanoya Chakraborty, Arijit Ghosh, Soumi Nandi

Abstract: Let $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ be an infinite sequence of families of compact connected sets in $\mathbb{R}^{d}$. An infinite sequence of compact connected sets $\left\{ B_{n} \right\}_{n\in \mathbb{N}}$ is called heterochromatic sequence from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ if there exists an infinite sequence… ▽ More Let $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ be an infinite sequence of families of compact connected sets in $\mathbb{R}^{d}$. An infinite sequence of compact connected sets $\left\{ B_{n} \right\}_{n\in \mathbb{N}}$ is called heterochromatic sequence from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ if there exists an infinite sequence $\left\{ i_{n} \right\}_{n\in \mathbb{N}}$ of natural numbers satisfying the following two properties: (a) $\{i_{n}\}_{n\in \mathbb{N}}$ is a monotonically increasing sequence, and (b) for all $n \in \mathbb{N}$, we have $B_{n} \in \mathcal{F}_{i_n}$. We show that if every heterochromatic sequence from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ contains $d+1$ sets that can be pierced by a single hyperplane then there exists a finite collection $\mathcal{H}$ of hyperplanes from $\mathbb{R}^{d}$ that pierces all but finitely many families from $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$. As a direct consequence of our result, we get that if every countable subcollection from an infinite family $\mathcal{F}$ of compact connected sets in $\mathbb{R}^{d}$ contains $d+1$ sets that can be pierced by a single hyperplane then $\mathcal{F}$ can be pierced by finitely many hyperplanes. To establish the optimality of our result we show that, for all $d \in \mathbb{N}$, there exists an infinite sequence $\left\{ \mathcal{F}_{n}\right\}_{n \in \mathbb{N}}$ of families of compact connected sets satisfying the following two conditions: (1) for all $n \in \mathbb{N}$, $\mathcal{F}_{n}$ is not pierceable by finitely many hyperplanes, and (2) for any $m \in \mathbb{N}$ and every sequence $\left\{B_n\right\}_{n=m}^{\infty}$ of compact connected sets in $\mathbb{R}^d$, where $B_i\in\mathcal{F}_i$ for all $i \geq m$, there exists a hyperplane in $\mathbb{R}^d$ that pierces at least $d+1$ sets in the sequence. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 23 pages, 7 figures

MSC Class: 52A35

arXiv:2402.09030 [pdf, other]

Awareness in robotics: An early perspective from the viewpoint of the EIC Pathfinder Challenge "Awareness Inside''

Authors: Cosimo Della Santina, Carlos Hernandez Corbato, Burak Sisman, Luis A. Leiva, Ioannis Arapakis, Michalis Vakalellis, Jean Vanderdonckt, Luis Fernando D'Haro, Guido Manzi, Cristina Becchio, Aïda Elamrani, Mohsen Alirezaei, Ginevra Castellano, Dimos V. Dimarogonas, Arabinda Ghosh, Sofie Haesaert, Sadegh Soudjani, Sybert Stroeve, Paul Verschure, Davide Bacciu, Ophelia Deroy, Bahador Bahrami, Claudio Gallicchio, Sabine Hauert, Ricardo Sanz , et al. (6 additional authors not shown)

Abstract: Consciousness has been historically a heavily debated topic in engineering, science, and philosophy. On the contrary, awareness had less success in raising the interest of scholars in the past. However, things are changing as more and more researchers are getting interested in answering questions concerning what awareness is and how it can be artificially generated. The landscape is rapidly evolvi… ▽ More Consciousness has been historically a heavily debated topic in engineering, science, and philosophy. On the contrary, awareness had less success in raising the interest of scholars in the past. However, things are changing as more and more researchers are getting interested in answering questions concerning what awareness is and how it can be artificially generated. The landscape is rapidly evolving, with multiple voices and interpretations of the concept being conceived and techniques being developed. The goal of this paper is to summarize and discuss the ones among these voices connected with projects funded by the EIC Pathfinder Challenge called ``Awareness Inside'', a nonrecurring call for proposals within Horizon Europe designed specifically for fostering research on natural and synthetic awareness. In this perspective, we dedicate special attention to challenges and promises of applying synthetic awareness in robotics, as the development of mature techniques in this new field is expected to have a special impact on generating more capable and trustworthy embodied systems. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.08541 [pdf, other]

Continuous-Time Best-Response and Related Dynamics in Tullock Contests with Convex Costs

Authors: Edith Elkind, Abheek Ghosh, Paul W. Goldberg

Abstract: Tullock contests model real-life scenarios that range from competition among proof-of-work blockchain miners to rent-seeking and lobbying activities. We show that continuous-time best-response dynamics in Tullock contests with convex costs converges to the unique equilibrium using Lyapunov-style arguments. We then use this result to provide an algorithm for computing an approximate equilibrium. We… ▽ More Tullock contests model real-life scenarios that range from competition among proof-of-work blockchain miners to rent-seeking and lobbying activities. We show that continuous-time best-response dynamics in Tullock contests with convex costs converges to the unique equilibrium using Lyapunov-style arguments. We then use this result to provide an algorithm for computing an approximate equilibrium. We also establish convergence of related discrete-time dynamics, e.g., when the agents best-respond to the empirical average action of other agents. These results indicate that the equilibrium is a reliable predictor of the agents' behavior in these games. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07039 [pdf, other]

Coordinated Disclosure for AI: Beyond Security Vulnerabilities

Authors: Sven Cattell, Avijit Ghosh, Lucie-Aimée Kaffee

Abstract: Harm reporting in the field of Artificial Intelligence (AI) currently operates on an ad hoc basis, lacking a structured process for disclosing or addressing algorithmic flaws. In contrast, the Coordinated Vulnerability Disclosure (CVD) ethos and ecosystem play a pivotal role in software security and transparency. Globally, there are ongoing efforts to establish frameworks that promote transparency… ▽ More Harm reporting in the field of Artificial Intelligence (AI) currently operates on an ad hoc basis, lacking a structured process for disclosing or addressing algorithmic flaws. In contrast, the Coordinated Vulnerability Disclosure (CVD) ethos and ecosystem play a pivotal role in software security and transparency. Globally, there are ongoing efforts to establish frameworks that promote transparency and collaboration in addressing AI-related issues, though challenges persist. Algorithmic flaws in machine learning (ML) models present distinct challenges compared to traditional software vulnerabilities, warranting a specialized approach. To address this gap, we propose the implementation of a dedicated Coordinated Flaw Disclosure (CFD) framework tailored to the intricacies of machine learning and artificial intelligence issues. This paper delves into the historical landscape of disclosures in ML, encompassing the ad hoc reporting of harms and the emergence of participatory auditing. By juxtaposing these practices with the well-established disclosure norms in cybersecurity, we argue that the broader adoption of CFD has the potential to enhance public trust through transparent processes that carefully balance the interests of both organizations and the community. △ Less

Submitted 24 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.05592 [pdf, other]

MERP: Metaverse Extended Realtiy Portal

Authors: Anisha Ghosh, Aditya Mitra, Anik Saha, Sibi Chakkaravarthy Sethuraman, Anitha Subramanian

Abstract: A standardized control system called Metaverse Extended Reality Portal (MERP) is presented as a solution to the issues with conventional VR eyewear. The MERP system improves user awareness of the physical world while offering an immersive 3D view of the metaverse by using a shouldermounted projector to display a Heads-Up Display (HUD) in a designated Metaverse Experience Room. To provide natural a… ▽ More A standardized control system called Metaverse Extended Reality Portal (MERP) is presented as a solution to the issues with conventional VR eyewear. The MERP system improves user awareness of the physical world while offering an immersive 3D view of the metaverse by using a shouldermounted projector to display a Heads-Up Display (HUD) in a designated Metaverse Experience Room. To provide natural and secure interaction inside the metaverse, a compass module and gyroscope integration enable accurate map** of real-world motions to avatar actions. Through user tests and research, the MERP system shows that it may reduce mishaps brought on by poor spatial awareness, offering an improved metaverse experience and laying the groundwork for future developments in virtual reality technology. MERP, which is compared with existing Virtual Reality (VR) glasses used to traverse the metaverse, is projected to become a seamless, novel and better alternative. Existing VR headsets and AR glasses have well-known drawbacks that making them ineffective for prolonged usage as it causes harm to the eyes. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.01033 [pdf, other]

End-to-End Deep Learning for TDD MIMO Systems in the 6G Upper Midbands

Authors: Juseong Park, Foad Sohrabi, Amitava Ghosh, Jeffrey G. Andrews

Abstract: This paper proposes and analyzes novel deep learning methods for downlink (DL) single-user multiple-input multiple-output (SU-MIMO) and multi-user MIMO (MU-MIMO) systems operating in time division duplex (TDD) mode. A motivating application is the 6G upper midbands (7-24 GHz), where the base station (BS) antenna arrays are large, user equipment (UE) array sizes are moderate, and theoretically opti… ▽ More This paper proposes and analyzes novel deep learning methods for downlink (DL) single-user multiple-input multiple-output (SU-MIMO) and multi-user MIMO (MU-MIMO) systems operating in time division duplex (TDD) mode. A motivating application is the 6G upper midbands (7-24 GHz), where the base station (BS) antenna arrays are large, user equipment (UE) array sizes are moderate, and theoretically optimal approaches are practically infeasible for several reasons. To deal with uplink (UL) pilot overhead and low signal power issues, we introduce the channel-adaptive pilot, as part of an analog channel state information feedback mechanism. Deep neural network (DNN)-generated pilots are used to linearly transform the UL channel matrix into lower-dimensional latent vectors. Meanwhile, the BS employs a second DNN that processes the received UL pilots to directly generate near-optimal DL precoders. The training is end-to-end which exploits synergies between the two DNNs. For MU-MIMO precoding, we propose a DNN structure inspired by theoretically optimum linear precoding. The proposed methods are evaluated against genie-aided upper bounds and conventional approaches, using realistic upper midband datasets. Numerical results demonstrate the potential of our approach to achieve significantly increased sum-rate, particularly at moderate to high signal-to-noise ratio (SNR) and when UL pilot overhead is constrained. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.17594 [pdf, other]

5G NR Positioning Enhancements in 3GPP Release-18

Authors: Hyun-Su Cha, Gilsoo Lee, Amitava Ghosh, Matthew Baker, Sean Kelley, Juergen Hofmann

Abstract: New radio (NR) positioning in the Third Generation Partnership Project (3GPP) Release 18 (Rel-18) enables 5G-advanced networks to achieve ultra-high accuracy positioning without dependence on global navigation satellite systems (GNSS) with key enablers such as the carrier phase positioning technique, standardized for the first time in a cellular communications standard and setting a new baseline f… ▽ More New radio (NR) positioning in the Third Generation Partnership Project (3GPP) Release 18 (Rel-18) enables 5G-advanced networks to achieve ultra-high accuracy positioning without dependence on global navigation satellite systems (GNSS) with key enablers such as the carrier phase positioning technique, standardized for the first time in a cellular communications standard and setting a new baseline for future generations. In addition, Rel-18 NR supports positioning functionalities for reduced capability (RedCap) user equipment and bandwidth aggregation for positioning measurements. Moreover, the low power solutions are designed for low power high accuracy positioning use cases. Lastly, sidelink-based positioning is introduced in Rel-18. This article constitutes a comprehensive treatment of the Rel-18 NR positioning enhancements crucial for the development of next-generation networks. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.09839 [pdf, other]

doi 10.1016/j.commatsci.2023.112659

MatSciRE: Leveraging Pointer Networks to Automate Entity and Relation Extraction for Material Science Knowledge-base Construction

Authors: Ankan Mullick, Akash Ghosh, G Sai Chaitanya, Samir Ghui, Tapas Nayak, Seung-Cheol Lee, Satadeep Bhattacharjee, Pawan Goyal

Abstract: Material science literature is a rich source of factual information about various categories of entities (like materials and compositions) and various relations between these entities, such as conductivity, voltage, etc. Automatically extracting this information to generate a material science knowledge base is a challenging task. In this paper, we propose MatSciRE (Material Science Relation Extrac… ▽ More Material science literature is a rich source of factual information about various categories of entities (like materials and compositions) and various relations between these entities, such as conductivity, voltage, etc. Automatically extracting this information to generate a material science knowledge base is a challenging task. In this paper, we propose MatSciRE (Material Science Relation Extractor), a Pointer Network-based encoder-decoder framework, to jointly extract entities and relations from material science articles as a triplet ($entity1, relation, entity2$). Specifically, we target the battery materials and identify five relations to work on - conductivity, coulombic efficiency, capacity, voltage, and energy. Our proposed approach achieved a much better F1-score (0.771) than a previous attempt using ChemDataExtractor (0.716). The overall graphical framework of MatSciRE is shown in Fig 1. The material information is extracted from material science literature in the form of entity-relation triplets using MatSciRE. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Journal ref: Computational Material Science 2023 (Elsevier)

arXiv:2401.01596 [pdf, other]

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

Authors: Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Aman Chadha, Raghav Jain, Setu Sinha, Shivani Agarwal

Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical quest… ▽ More In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: ECIR 2024

arXiv:2401.00629 [pdf, other]

Adversarially Trained Actor Critic for offline CMDPs

Authors: Honghao Wei, Xiyue Peng, Xin Liu, Arnob Ghosh

Abstract: We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for offline reinforcement learning (RL) with general function approximation in the presence of limited data coverage. SATAC operates as a two-player Stackelberg game featuring a refined objective function. The actor (leader player) optimizes the policy against two adversarially trained value critics (follower players), who focus… ▽ More We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for offline reinforcement learning (RL) with general function approximation in the presence of limited data coverage. SATAC operates as a two-player Stackelberg game featuring a refined objective function. The actor (leader player) optimizes the policy against two adversarially trained value critics (follower players), who focus on scenarios where the actor's performance is inferior to the behavior policy. Our framework provides both theoretical guarantees and a robust deep-RL implementation. Theoretically, we demonstrate that when the actor employs a no-regret optimization oracle, SATAC achieves two guarantees: (i) For the first time in the offline RL setting, we establish that SATAC can produce a policy that outperforms the behavior policy while maintaining the same level of safety, which is critical to designing an algorithm for offline RL. (ii) We demonstrate that the algorithm guarantees policy improvement across a broad range of hyperparameters, indicating its practical robustness. Additionally, we offer a practical version of SATAC and compare it with existing state-of-the-art offline safe-RL algorithms in continuous control environments. SATAC outperforms all baselines across a range of tasks, thus validating the theoretical performance. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11541 [pdf, other]

CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare

Authors: Akash Ghosh, Arkadeep Acharya, Raghav Jain, Sriparna Saha, Aman Chadha, Setu Sinha

Abstract: In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of… ▽ More In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution to our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries. Our comprehensive framework harnesses the power of CLIP, a multimodal foundation model, and various general-purpose LLMs, comprising four main modules: the medical disorder identification module, the relevant context generation module, the context filtration module for distilling relevant medical concepts and knowledge, and finally, a general-purpose LLM to generate visually aware medical question summaries. Leveraging our MMQS dataset, we showcase how visual cues from images enhance the generation of medically nuanced summaries. This multimodal approach not only enhances the decision-making process in healthcare but also fosters a more nuanced understanding of patient queries, laying the groundwork for future research in personalized and responsive medical care △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2312.10725 [pdf, other]

Addressing Sample Inefficiency in Multi-View Representation Learning

Authors: Kumar Krishna Agrawal, Arna Ghosh, Adam Oberman, Blake Richards

Abstract: Non-contrastive self-supervised learning (NC-SSL) methods like BarlowTwins and VICReg have shown great promise for label-free representation learning in computer vision. Despite the apparent simplicity of these techniques, researchers must rely on several empirical heuristics to achieve competitive performance, most notably using high-dimensional projector heads and two augmentations of the same i… ▽ More Non-contrastive self-supervised learning (NC-SSL) methods like BarlowTwins and VICReg have shown great promise for label-free representation learning in computer vision. Despite the apparent simplicity of these techniques, researchers must rely on several empirical heuristics to achieve competitive performance, most notably using high-dimensional projector heads and two augmentations of the same image. In this work, we provide theoretical insights on the implicit bias of the BarlowTwins and VICReg loss that can explain these heuristics and guide the development of more principled recommendations. Our first insight is that the orthogonality of the features is more critical than projector dimensionality for learning good representations. Based on this, we empirically demonstrate that low-dimensional projector heads are sufficient with appropriate regularization, contrary to the existing heuristic. Our second theoretical insight suggests that using multiple data augmentations better represents the desiderata of the SSL objective. Based on this, we demonstrate that leveraging more augmentations per sample improves representation quality and trainability. In particular, it improves optimization convergence, leading to better features emerging earlier in the training. Remarkably, we demonstrate that we can reduce the pretraining dataset size by up to 4x while maintaining accuracy and improving convergence simply by using more data augmentations. Combining these insights, we present practical pretraining recommendations that improve wall-clock time by 2x and improve performance on CIFAR-10/STL-10 datasets using a ResNet-50 backbone. Thus, this work provides a theoretical insight into NC-SSL and produces practical recommendations for enhancing its sample and compute efficiency. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.02327 [pdf, other]

FLea: Improving federated learning on scarce and label-skewed data via privacy-preserving feature augmentation

Authors: Tong Xia, Abhirup Ghosh, Cecilia Mascolo

Abstract: Learning a global model by abstracting the knowledge, distributed across multiple clients, without aggregating the raw data is the primary goal of Federated Learning (FL). Typically, this works in rounds alternating between parallel local training at several clients, followed by model aggregation at a server. We found that existing FL methods under-perform when local datasets are small and present… ▽ More Learning a global model by abstracting the knowledge, distributed across multiple clients, without aggregating the raw data is the primary goal of Federated Learning (FL). Typically, this works in rounds alternating between parallel local training at several clients, followed by model aggregation at a server. We found that existing FL methods under-perform when local datasets are small and present severe label skew as these lead to over-fitting and local model bias. This is a realistic setting in many real-world applications. To address the problem, we propose \textit{FLea}, a unified framework that tackles over-fitting and local bias by encouraging clients to exchange privacy-protected features to aid local training. The features refer to activations from an intermediate layer of the model, which are obfuscated before being shared with other clients to protect sensitive information in the data. \textit{FLea} leverages a novel way of combining local and shared features as augmentations to enhance local model learning. Our extensive experiments demonstrate that \textit{FLea} outperforms the start-of-the-art FL methods, sharing only model parameters, by up to $17.6\%$, and FL methods that share data augmentations by up to $6.3\%$, while reducing the privacy vulnerability associated with shared data augmentations. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.17057 [pdf, other]

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Authors: Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek

Abstract: Current approaches for 3D human motion synthesis generate high-quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in addressing the complex dynamics of multi-human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion-based model that synthesizes full-body reactive motion of a person… ▽ More Current approaches for 3D human motion synthesis generate high-quality animations of digital humans performing a wide variety of actions and gestures. However, a notable technological gap exists in addressing the complex dynamics of multi-human interactions within this paradigm. In this work, we present ReMoS, a denoising diffusion-based model that synthesizes full-body reactive motion of a person in a two-person interaction scenario. Assuming the motion of one person is given, we employ a combined spatio-temporal cross-attention mechanism to synthesize the reactive body and hand motion of the second person, thereby completing the interactions between the two. We demonstrate ReMoS across challenging two-person scenarios such as pair-dancing, Ninjutsu, kickboxing, and acrobatics, where one person's movements have complex and diverse influences on the other. We also contribute the ReMoCap dataset for two-person interactions containing full-body and finger motions. We evaluate ReMoS through multiple quantitative metrics, qualitative visualizations, and a user study, and also indicate usability in interactive motion editing applications. △ Less

Submitted 26 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 17 pages, 7 figures, 5 tables

arXiv:2311.13177 [pdf, other]

Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and Dynamic PROPELLER MRI

Authors: Annesha Ghosh, Gordon Wetzstein, Mert Pilanci, Sara Fridovich-Keil

Abstract: Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information. These discrepancies can be caused by a variety of factors, including magnetic field inhomogeneities, chemical shifts, or susceptibility differences within the ti… ▽ More Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information. These discrepancies can be caused by a variety of factors, including magnetic field inhomogeneities, chemical shifts, or susceptibility differences within the tissues. Such artifacts can manifest as blurring, ghosting, or misregistration of the reconstructed image, and they often compromise its diagnostic quality. We propose to resolve these artifacts by lifting the 2D MRI reconstruction problem to 3D, introducing an additional "spectral" dimension to model this off-resonance. Our approach is inspired by recent progress in modeling radiance fields, and is capable of reconstructing both static and dynamic MR images as well as separating fat and water, which is of independent clinical interest. We demonstrate our approach in the context of PROPELLER (Periodically Rotated Overlap** ParallEL Lines with Enhanced Reconstruction) MRI acquisitions, which are popular for their robustness to motion artifacts. Our method operates in a few minutes on a single GPU, and to our knowledge is the first to correct for chemical shift in gradient echo PROPELLER MRI reconstruction without additional measurements or pretraining data. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Code is available at https://github.com/sarafridov/volumetric-propeller

arXiv:2310.15119 [pdf, other]

Compressed Sensing of Generative Sparse-latent (GSL) Signals

Authors: Antoine Honoré, Anubhab Ghosh, Saikat Chatterjee

Abstract: We consider reconstruction of an ambient signal in a compressed sensing (CS) setup where the ambient signal has a neural network based generative model. The generative model has a sparse-latent input and we refer to the generated ambient signal as generative sparse-latent signal (GSL). The proposed sparsity inducing reconstruction algorithm is inherently non-convex, and we show that a gradient bas… ▽ More We consider reconstruction of an ambient signal in a compressed sensing (CS) setup where the ambient signal has a neural network based generative model. The generative model has a sparse-latent input and we refer to the generated ambient signal as generative sparse-latent signal (GSL). The proposed sparsity inducing reconstruction algorithm is inherently non-convex, and we show that a gradient based search provides a good reconstruction performance. We evaluate our proposed algorithm using simulated data. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted at 31st European Signal Processing Conference, EUSIPCO 2023

arXiv:2310.10543 [pdf, other]

ViPE: Visualise Pretty-much Everything

Authors: Hassan Shahmohammadi, Adhiraj Ghosh, Hendrik P. A. Lensch

Abstract: Figurative and non-literal expressions are profoundly integrated in human communication. Visualising such expressions allow us to convey our creative thoughts, and evoke nuanced emotions. Recent text-to-image models like Stable Diffusion, on the other hand, struggle to depict non-literal expressions. Recent works primarily deal with this issue by compiling humanly annotated datasets on a small sca… ▽ More Figurative and non-literal expressions are profoundly integrated in human communication. Visualising such expressions allow us to convey our creative thoughts, and evoke nuanced emotions. Recent text-to-image models like Stable Diffusion, on the other hand, struggle to depict non-literal expressions. Recent works primarily deal with this issue by compiling humanly annotated datasets on a small scale, which not only demands specialised expertise but also proves highly inefficient. To address this issue, we introduce ViPE: Visualise Pretty-much Everything. ViPE offers a series of lightweight and robust language models that have been trained on a large-scale set of lyrics with noisy visual descriptions that represent their implicit meaning. The synthetic visual descriptions are generated by GPT3.5 relying on neither human annotations nor images. ViPE effectively expresses any arbitrary piece of text into a visualisable description, enabling meaningful and high-quality image generation. We provide compelling evidence that ViPE is more robust than GPT3.5 in synthesising visual elaborations. ViPE also exhibits an understanding of figurative expressions comparable to human experts, providing a powerful and open-source backbone to many downstream applications such as music video and caption generation. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: To be presented in EMNLP2023 Main Conference

arXiv:2310.08914 [pdf, ps, other]

doi 10.5220/0012251500003595

Differential Evolution Algorithm based Hyper-Parameters Selection of Convolutional Neural Network for Speech Command Recognition

Authors: Sandipan Dhar, Anuvab Sen, Aritra Bandyopadhyay, Nanda Dulal Jana, Arjun Ghosh, Zahra Sarayloo

Abstract: Speech Command Recognition (SCR), which deals with identification of short uttered speech commands, is crucial for various applications, including IoT devices and assistive technology. Despite the promise shown by Convolutional Neural Networks (CNNs) in SCR tasks, their efficacy relies heavily on hyper-parameter selection, which is typically laborious and time-consuming when done manually. This pa… ▽ More Speech Command Recognition (SCR), which deals with identification of short uttered speech commands, is crucial for various applications, including IoT devices and assistive technology. Despite the promise shown by Convolutional Neural Networks (CNNs) in SCR tasks, their efficacy relies heavily on hyper-parameter selection, which is typically laborious and time-consuming when done manually. This paper introduces a hyper-parameter selection method for CNNs based on the Differential Evolution (DE) algorithm, aiming to enhance performance in SCR tasks. Training and testing with the Google Speech Command (GSC) dataset, the proposed approach showed effectiveness in classifying speech commands. Moreover, a comparative analysis with Genetic Algorithm based selections and other deep CNN (DCNN) models highlighted the efficiency of the proposed DE algorithm in hyper-parameter selection for CNNs in SCR tasks. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 8 Pages, 7 Figures, 4 Tables, Accepted by the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), November 13-15, 2023, Rome, Italy

Journal ref: Proceedings of the 15th International Joint Conference on Computational Intelligence (2023)

arXiv:2310.03528 [pdf, ps, other]

Best-Response Dynamics in Tullock Contests with Convex Costs

Authors: Abheek Ghosh

Abstract: We study the convergence of best-response dynamics in Tullock contests with convex cost functions (these games always have a unique pure-strategy Nash equilibrium). We show that best-response dynamics rapidly converges to the equilibrium for homogeneous agents. For two homogeneous agents, we show convergence to an $ε$-approximate equilibrium in $Θ(\log\log(1/ε))$ steps. For $n \ge 3$ agents, the d… ▽ More We study the convergence of best-response dynamics in Tullock contests with convex cost functions (these games always have a unique pure-strategy Nash equilibrium). We show that best-response dynamics rapidly converges to the equilibrium for homogeneous agents. For two homogeneous agents, we show convergence to an $ε$-approximate equilibrium in $Θ(\log\log(1/ε))$ steps. For $n \ge 3$ agents, the dynamics is not unique because at each step $n-1 \ge 2$ agents can make non-trivial moves. We consider the model proposed by Ghosh and Goldberg (2023), where the agent making the move is randomly selected at each time step. We show convergence to an $ε$-approximate equilibrium in $O(β\log(n/(εδ)))$ steps with probability $1-δ$, where $β$ is a parameter of the agent selection process, e.g., $β= n^2 \log(n)$ if agents are selected uniformly at random at each time step. We complement this result with a lower bound of $Ω(n + \log(1/ε)/\log(n))$ applicable for any agent selection process. △ Less

Submitted 6 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 43 pages. WINE '23 version

arXiv:2308.14435 [pdf, other]

Do Successful Researchers Reach the Self-Organized Critical Point?

Authors: Asim Ghosh, Bikas K. Chakrabarti

Abstract: The index of success of the researchers is now mostly measured using the Hirsch index ($h$). Our recent precise demonstration, that statistically $h \sim \sqrt {N_c} \sim \sqrt {N_p}$, where $N_p$ and $N_c$ denote respectively the total number of publications and total citations for the researcher, suggests that average number of citations per paper ($N_c/N_p$), and hence $h$, are statistical numb… ▽ More The index of success of the researchers is now mostly measured using the Hirsch index ($h$). Our recent precise demonstration, that statistically $h \sim \sqrt {N_c} \sim \sqrt {N_p}$, where $N_p$ and $N_c$ denote respectively the total number of publications and total citations for the researcher, suggests that average number of citations per paper ($N_c/N_p$), and hence $h$, are statistical numbers (Dunbar numbers) depending on the community or network to which the researcher belongs. We show here, extending our earlier observations, that the indications of success are not reflected by the total citations $N_c$, rather by the inequalities among citations from publications to publications. Specifically, we show that for very successful authors, the yearly variations in the Gini index ($g$, giving the average inequality of citations for the publications) and the Kolkata index ($k$, giving the fraction of total citations received by the top $1 - k$ fraction of publications; $k = 0.80$ corresponds to Pareto's 80/20 law) approach each other to $g = k \simeq 0.82$, signaling a precursor for the arrival of (or departure from) the Self-Organized Critical (SOC) state of his/her publication statistics. Analyzing the citation statistics (from Google Scholar) of thirty successful scientists throughout their recorded publication history, we find that the $g$ and $k$ for very successful among them (mostly Nobel Laureates, highest rank Stanford Cite-Scorers, and a few others) reach and hover just above (and then) below that $g = k \simeq 0.82$ mark, while for others they remain below that mark. We also find that all the lower (than the SOC mark 0.82) values of $k$ and $g$ fit a linear relationship $k = 1/2 + cg$, with $c = 0.39$. △ Less

Submitted 4 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Invited contribution to Galam Special Issue in Physics (MDPI, in press)

arXiv:2308.12199 [pdf, other]

Towards Real-Time Analysis of Broadcast Badminton Videos

Authors: Nitin Nilesh, Tushar Sharma, Anurag Ghosh, C. V. Jawahar

Abstract: Analysis of player movements is a crucial subset of sports analysis. Existing player movement analysis methods use recorded videos after the match is over. In this work, we propose an end-to-end framework for player movement analysis for badminton matches on live broadcast match videos. We only use the visual inputs from the match and, unlike other approaches which use multi-modal sensor data, our… ▽ More Analysis of player movements is a crucial subset of sports analysis. Existing player movement analysis methods use recorded videos after the match is over. In this work, we propose an end-to-end framework for player movement analysis for badminton matches on live broadcast match videos. We only use the visual inputs from the match and, unlike other approaches which use multi-modal sensor data, our approach uses only visual cues. We propose a method to calculate the on-court distance covered by both the players from the video feed of a live broadcast badminton match. To perform this analysis, we focus on the gameplay by removing replays and other redundant parts of the broadcast match. We then perform player tracking to identify and track the movements of both players in each frame. Finally, we calculate the distance covered by each player and the average speed with which they move on the court. We further show a heatmap of the areas covered by the player on the court which is useful for analyzing the gameplay of the player. Our proposed framework was successfully used to analyze live broadcast matches in real-time during the Premier Badminton League 2019 (PBL 2019), with commentators and broadcasters appreciating the utility. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.10480 [pdf, other]

Dimension Independent Helly Theorem for Lines and Flats

Authors: Sutanoya Chakraborty, Arijit Ghosh, Soumi Nandi

Abstract: We give a generalization of dimension independent Helly Theorem of Adiprasito, Bárány, Mustafa, and Terpai (Discrete & Computational Geometry 2022) to higher dimensional transversal. We also prove some impossibility results that establish the tightness of our extension. We give a generalization of dimension independent Helly Theorem of Adiprasito, Bárány, Mustafa, and Terpai (Discrete & Computational Geometry 2022) to higher dimensional transversal. We also prove some impossibility results that establish the tightness of our extension. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 10 pages

MSC Class: 52A35

arXiv:2308.10479 [pdf, ps, other]

Stabbing boxes with finitely many axis-parallel lines and flats

Authors: Sutanoya Chakraborty, Arijit Ghosh, Soumi Nandi

Abstract: We give necessary and sufficient condition for an infinite collection of axis-parallel boxes in $\mathbb{R}^{d}$ to be pierceable by finitely many axis-parallel $k$-flats, where $0 \leq k < d$. We also consider colorful generalizations of the above result and establish their feasibility. The problem considered in this paper is an infinite variant of the Hadwiger-Debrunner $(p,q)$-problem. We give necessary and sufficient condition for an infinite collection of axis-parallel boxes in $\mathbb{R}^{d}$ to be pierceable by finitely many axis-parallel $k$-flats, where $0 \leq k < d$. We also consider colorful generalizations of the above result and establish their feasibility. The problem considered in this paper is an infinite variant of the Hadwiger-Debrunner $(p,q)$-problem. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 13 pages

MSC Class: 52A35

Showing 1–50 of 381 results for author: Ghosh, A