Search | arXiv e-print repository

A Perspective on Foundation Models for the Electric Power Grid

Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belvi, Ricardo J. Bessa, Bishnu Prasad Bhattari , et al. (2 additional authors not shown)

Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transition and climate change. In this paper, we call for the development of, and state why we believe in, the potential of FMs for electric grids. We highlight their strengths and weaknesses amidst the challenges of a changing grid. We argue that an FM learning from diverse grid data and topologies could unlock transformative capabilities, pioneering a new approach in leveraging AI to redefine how we manage complexity and uncertainty in the electric grid. Finally, we discuss a power grid FM concept, namely GridFM, based on graph neural networks and show how different downstream tasks benefit. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Lead contact: H.F.H.; Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S

arXiv:2407.08323 [pdf, other]

Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research

Authors: Henry Tari, Danial Khan, Justus Rutten, Darian Othman, Rishabh Kaushal, Thales Bertaglia, Adriana Iamnitchi

Abstract: Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyberbullying, and other significant topics. However, access to these datasets is often restricted due to costs and platform regulations. As such, acquiring datasets that span multiple platforms which are crucial for a comprehensive understanding of the digital ecosystem… ▽ More Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyberbullying, and other significant topics. However, access to these datasets is often restricted due to costs and platform regulations. As such, acquiring datasets that span multiple platforms which are crucial for a comprehensive understanding of the digital ecosystem is particularly challenging. This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms, aiming to match the quality of real datasets. We employ ChatGPT to generate synthetic data from two real datasets, each consisting of posts from three different social media platforms. We assess the lexical and semantic properties of the synthetic data and compare them with those of the real data. Our empirical findings suggest that using large language models to generate synthetic multi-platform social media data is promising. However, further enhancements are necessary to improve the fidelity of the outputs. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07914 [pdf, other]

doi 10.1145/3462203.3475898

Health Misinformation Detection in Web Content via Web2Vec: A Structural-, Content-based, and Context-aware Approach based on Web2Vec

Authors: Rishabh Upadhyay, Gabriella Pasi, Marco Viviani

Abstract: In recent years, we have witnessed the proliferation of large amounts of online content generated directly by users with virtually no form of external control, leading to the possible spread of misinformation. The search for effective solutions to this problem is still ongoing, and covers different areas of application, from opinion spam to fake news detection. A more recently investigated scenari… ▽ More In recent years, we have witnessed the proliferation of large amounts of online content generated directly by users with virtually no form of external control, leading to the possible spread of misinformation. The search for effective solutions to this problem is still ongoing, and covers different areas of application, from opinion spam to fake news detection. A more recently investigated scenario, despite the serious risks that incurring disinformation could entail, is that of the online dissemination of health information. Early approaches in this area focused primarily on user-based studies applied to Web page content. More recently, automated approaches have been developed for both Web pages and social media content, particularly with the advent of the COVID-19 pandemic. These approaches are primarily based on handcrafted features extracted from online content in association with Machine Learning. In this scenario, we focus on Web page content, where there is still room for research to study structural-, content- and context-based features to assess the credibility of Web pages. Therefore, this work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.07726 [pdf, other]

PaliGemma: A versatile 3B VLM for transfer

Authors: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer , et al. (10 additional authors not shown)

Abstract: PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more… ▽ More PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.04622 [pdf, other]

On scalable oversight with weak LLMs judging strong LLMs

Authors: Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah

Abstract: Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI a… ▽ More Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies. △ Less

Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: 15 pages (53 including appendices). V2: minor correction to Figure 3; add Figure A.9 comparing open vs assigned consultancy; add a reference

arXiv:2407.02665 [pdf, other]

SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection

Authors: Anay Majee, Ryan Sharp, Rishabh Iyer

Abstract: Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information Learning (SMILe) framework which adopts combinatorial mutual information functions to enforce the creation of tighter and discriminative feature clusters in FSOD… ▽ More Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information Learning (SMILe) framework which adopts combinatorial mutual information functions to enforce the creation of tighter and discriminative feature clusters in FSOD. Our proposed approach generalizes to several existing approaches in FSOD, agnostic of the backbone architecture demonstrating elevated performance gains. A paradigm shift from instance based objective functions to combinatorial objectives in SMILe naturally preserves the diversity within an object class resulting in reduced forgetting when subjected to few training examples. Furthermore, the application of mutual information between the already learnt (base) and newly added (novel) objects ensures sufficient separation between base and novel classes, minimizing the effect of class confusion. Experiments on popular FSOD benchmarks, PASCAL-VOC and MS-COCO show that our approach generalizes to State-of-the-Art (SoTA) approaches improving their novel class performance by up to 5.7% (3.3 mAP points) and 5.4% (2.6 mAP points) on the 10-shot setting of VOC (split 3) and 30-shot setting of COCO datasets respectively. Our experiments also demonstrate better retention of base class performance and up to 2x faster convergence over existing approaches agnostic of the underlying architecture. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024, 16 pages, 5 figures, 7 tables

arXiv:2407.02089 [pdf, other]

GPTCast: a weather language model for precipitation nowcasting

Authors: Gabriele Franch, Elena Tomasi, Rishabh Wanjari, Virginia Poli, Chiara Cardinali, Pier Paolo Alberoni, Marco Cristoforetti

Abstract: This work introduces GPTCast, a generative deep-learning method for ensemble nowcast of radar-based precipitation, inspired by advancements in large language models (LLMs). We employ a GPT model as a forecaster to learn spatiotemporal precipitation dynamics using tokenized radar images. The tokenizer is based on a Quantized Variational Autoencoder featuring a novel reconstruction loss tailored for… ▽ More This work introduces GPTCast, a generative deep-learning method for ensemble nowcast of radar-based precipitation, inspired by advancements in large language models (LLMs). We employ a GPT model as a forecaster to learn spatiotemporal precipitation dynamics using tokenized radar images. The tokenizer is based on a Quantized Variational Autoencoder featuring a novel reconstruction loss tailored for the skewed distribution of precipitation that promotes faithful reconstruction of high rainfall rates. The approach produces realistic ensemble forecasts and provides probabilistic outputs with accurate uncertainty estimation. The model is trained without resorting to randomness, all variability is learned solely from the data and exposed by model at inference for ensemble generation. We train and test GPTCast using a 6-year radar dataset over the Emilia-Romagna region in Northern Italy, showing superior results compared to state-of-the-art ensemble extrapolation methods. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 16 pages, 10 figures

arXiv:2406.17415 [pdf, other]

Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

Authors: Razvan-Gabriel Dumitru, Vikas Yadav, Rishabh Maheshwary, Paul-Ioan Clotan, Sathwik Tejaswi Madhusudhan, Mihai Surdeanu

Abstract: We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first me… ▽ More We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first measures the importance of a layer based on how different its output embeddings are from the input embeddings (the higher the better); the second estimates the importance of a layer using the number of layer weights that are much larger than average (the smaller the better). We show that quantizing different layers at varying bits according to our importance scores results in minimal performance drop with a far more compressed model size. Finally, we present several practical key takeaways from our variable layer-wise quantization experiments: (a) LLM performance under variable quantization remains close to the original model until 25-50% of layers are moved in lower quantization using our proposed ordering but only until 5-10% if moved using no specific ordering; (b) Quantizing LLMs to lower bits performs substantially better than pruning unless extreme quantization (2-bit) is used; and (c) Layer-wise quantization to lower bits works better in the case of larger LLMs with more layers compared to smaller LLMs with fewer layers. The code used to run the experiments is available at: https://github.com/RazvanDu/LayerwiseQuant. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: submitted to EMNLP, 15 pages, 10 figures, 4 tables

ACM Class: I.2.7; I.2.0

arXiv:2406.16996 [pdf, other]

Study of Heavy Quarkonia in the presence of magnetic field by Nikiforov Uvarov method

Authors: Rishabh Sharma, Siddhartha Solanki, Manohar Lal, Vineet Kumar Agotiya

Abstract: The N-dimensional radial Schrodinger equation has been solved using the Nikiforov Uvarov (NU) method, in which we used the medium modified form of Cornell potential and quasi-particle Debye mass with strong magnetic field background. The binding energies and the mass spectra of heavy quarkonium have been studied in the N-dimensional space for different values of magnetic field, the binding energy… ▽ More The N-dimensional radial Schrodinger equation has been solved using the Nikiforov Uvarov (NU) method, in which we used the medium modified form of Cornell potential and quasi-particle Debye mass with strong magnetic field background. The binding energies and the mass spectra of heavy quarkonium have been studied in the N-dimensional space for different values of magnetic field, the binding energy decreases with increasing magnetic field, which shows early dissociation of heavy quarkonium system. The influence of dimensionality number has also been discussed on binding energies of J/ψ and Υ for fixed value of magnetic field. It is found that with an increase in dimensionality, the binding energy starts decreasing from a higher initial value. The results obtained are quite consistent with recent studies. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16783 [pdf, other]

M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Authors: Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan

Abstract: Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instructi… ▽ More Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction finetuning dataset, called M2Lingual. It is constructed by first selecting a diverse set of seed examples and then utilizing the proposed Evol taxonomy to convert these seeds into complex and challenging multi-turn instructions. We demonstrate the effectiveness of M2Lingual by training LLMs of varying sizes and showcasing the enhanced performance across a diverse set of languages. We contribute the 2 step Evol taxonomy with the guided generation code: https://github.com/ServiceNow/M2Lingual, as well as the first fully synthetic, general and task-oriented, multi-turn, multilingual dataset built with Evol - M2Lingual: https://huggingface.co/datasets/ServiceNow-AI/ M2Lingual - containing 182K total IFT pairs, covering 70 languages and 17+ NLP tasks. △ Less

Submitted 28 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 39 pages

arXiv:2406.15025 [pdf, other]

SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning

Authors: Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara

Abstract: An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach… ▽ More An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach is Graph Symmetric Attention, which refines the traditional self-attention mechanism to preserve graph symmetries, resulting in invariant and equivariant latent representations. We showcase SiT's superior generalization over ViTs on MiniGrid and Procgen RL benchmarks, and its sample efficiency on Atari 100k and CIFAR10. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 9 main pages, accepted to ICML2024

arXiv:2406.14580 [pdf, other]

Quantum theory of a potential biological magnetic field sensor: radical pair mechanism in flavin adenine dinucleotide biradicals

Authors: Amirhosein Sotoodehfar, Rishabh, Hadi Zadeh-Haghighi, Christoph Simon

Abstract: Recent studies in vitro and in vivo suggest that flavin adenine dinucleotide (FAD) on its own might be able to act as a biological magnetic field sensor. Motivated by these observations, in this study, we develop a detailed quantum theoretical model for the radical pair mechanism (RPM) for the flavin adenine biradical within the FAD molecule. We perform molecular dynamics simulations to determine… ▽ More Recent studies in vitro and in vivo suggest that flavin adenine dinucleotide (FAD) on its own might be able to act as a biological magnetic field sensor. Motivated by these observations, in this study, we develop a detailed quantum theoretical model for the radical pair mechanism (RPM) for the flavin adenine biradical within the FAD molecule. We perform molecular dynamics simulations to determine the distance between the radicals on FAD, which we then feed into a quantum master equation treatment of the RPM. In contrast to previous semi-classical models which are limited to the low-field and high-field cases, our quantum model can predict the full magnetic field dependence of the transient absorption signal. Our model's predictions are consistent with experiments. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 12 pages, 14 figures (9 main + 5 supplementary)

arXiv:2406.13839 [pdf, other]

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Authors: Rishabh Anand, Chaitanya K. Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon V. Mathis, Kieran Didi, Bryan Hooi, Pietro Liò

Abstract: We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally fle… ▽ More We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and crop** augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: To be presented as an Oral at ICML 2024 Structured Probabilistic Inference & Generative Modeling Workshop, and a Spotlight at ICML 2024 AI4Science Workshop

arXiv:2406.12049 [pdf, ps, other]

Combinatorial interpretations of cranks of overpartitions and partitions into distinct odd parts

Authors: F. G. Garvan, Rishabh Sarma

Abstract: We give combinatorial interpretations of two residual cranks of overpartitions defined by Bringmann, Lovejoy and Osburn in 2009 analogous to the crank of partitions given by Andrews and Garvan in 1988. As a consequence, we improve upon their definitions and find the true residual cranks of overpartitions. Furthermore, we investigate the combinatorial interpretation of an $M_2$-crank of partitions… ▽ More We give combinatorial interpretations of two residual cranks of overpartitions defined by Bringmann, Lovejoy and Osburn in 2009 analogous to the crank of partitions given by Andrews and Garvan in 1988. As a consequence, we improve upon their definitions and find the true residual cranks of overpartitions. Furthermore, we investigate the combinatorial interpretation of an $M_2$-crank of partitions without repeated odd parts and explore connections of these statistics with their companion rank counterparts and the tenth order mock theta functions of Ramanujan. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 16 pages

MSC Class: Primary 11P81; Secondary 33D15; 05A17; 11F37

arXiv:2406.11654 [pdf, other]

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Authors: Vernon Toh Yan Han, Rishabh Bhardwaj, Soujanya Poria

Abstract: We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity… ▽ More We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity, Ruby Teaming outperforms Rainbow Teaming by 6% and 3% on Shannon's Evenness Index (SEI) and Simpson's Diversity Index (SDI), respectively. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11617 [pdf, other]

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Authors: Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria

Abstract: With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows si… ▽ More With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random drop** by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10723

Eye in the Sky: Detection and Compliance Monitoring of Brick Kilns using Satellite Imagery

Authors: Rishabh Mondal, Shataxi Dubey, Vannsh Jani, Shrimay Shah, Suraj Jaiswal, Zeel B Patel, Nipun Batra

Abstract: Air pollution kills 7 million people annually. The brick manufacturing industry accounts for 8%-14% of air pollution in the densely populated Indo-Gangetic plain. Due to the unorganized nature of brick kilns, policy violation detection, such as proximity to human habitats, remains challenging. While previous studies have utilized computer vision-based machine learning methods for brick kiln detect… ▽ More Air pollution kills 7 million people annually. The brick manufacturing industry accounts for 8%-14% of air pollution in the densely populated Indo-Gangetic plain. Due to the unorganized nature of brick kilns, policy violation detection, such as proximity to human habitats, remains challenging. While previous studies have utilized computer vision-based machine learning methods for brick kiln detection from satellite imagery, they utilize proprietary satellite data and rarely focus on compliance with government policies. In this research, we introduce a scalable framework for brick kiln detection and automatic compliance monitoring. We use Google Maps Static API to download the satellite imagery followed by the YOLOv8x model for detection. We identified and hand-verified 19579 new brick kilns across 9 states within the Indo-Gangetic plain. Furthermore, we automate and test the compliance to the policies affecting human habitats, rivers and hospitals. Our results show that a substantial number of brick kilns do not meet the compliance requirements. Our framework offers a valuable tool for governments worldwide to automate and enforce policy regulations for brick kilns, addressing critical environmental and public health concerns. △ Less

Submitted 23 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: The PI was not in favour of making the work public on arXiv as the content is not yet ready to be released

arXiv:2406.09292 [pdf, other]

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

Authors: Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas Kipf

Abstract: We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are train… ▽ More We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are trained to reconstruct the respective objects in a different image, e.g., a later frame in the video. Importantly, we encode object visuals from the reference image while conditioning on object poses from the target frame. This enables learning disentangled appearance and pose features. Combining visual and 3D pose representations in a sequence-of-tokens format allows us to keep the text-to-image architecture of existing models, with Neural Assets in place of text tokens. By fine-tuning a pre-trained text-to-image diffusion model with this information, our approach enables fine-grained 3D pose and placement control of individual objects in a scene. We further demonstrate that Neural Assets can be transferred and recomposed across different scenes. Our model achieves state-of-the-art multi-object editing results on both synthetic 3D scene datasets, as well as two real-world video datasets (Objectron, Waymo Open). △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Additional details and video results are available at https://neural-assets-paper.github.io/

arXiv:2406.07027 [pdf]

Inverse melting and re-entrant transformations of the vortex lattice in amorphous Re6Zr thin film

Authors: Rishabh Duhan, Subhamita Sengupta, John Jesudasan, Somak Basistha, Pratap Raychaudhuri

Abstract: Melting of a solid is one of the most ubiquitous phenomena observed in nature. Most solids, when heated, melt from a crystalline state to an isotropic liquid at a characteristic temperature. There are however situations where increase in temperature can induce a transition to a more ordered state. Broadly termed as "inverse melting", experimental realisations of such situations are rare. Here, we… ▽ More Melting of a solid is one of the most ubiquitous phenomena observed in nature. Most solids, when heated, melt from a crystalline state to an isotropic liquid at a characteristic temperature. There are however situations where increase in temperature can induce a transition to a more ordered state. Broadly termed as "inverse melting", experimental realisations of such situations are rare. Here, we report such a phenomenon in the 2-dimensional vortex liquid that forms in a moderately pinned amorphous Re6Zr (a-ReZr) thin film, from direct imaging of the vortex lattice using a scanning tunnelling microscope. At low temperature and magnetic fields, we find that the vortices form a "pinned liquid" , that is characterised by a low mobility of the vortices and vortex density that is spatially inhomogeneous. As the temperature or magnetic field is increased the vortices become more ordered, eventually forming a nearly perfectly ordered vortex lattice. Above this temperature/magnetic field, the ordered vortex lattice melts again into a vortex liquid. This re-entrant transformation from a liquid to solid-like state and then back to a liquid also leaves distinct signature in the magnetotransport properties of the superconductor. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05697 [pdf, other]

Decision-Focused Surrogate Modeling for Mixed-Integer Linear Optimization

Authors: Shivi Dixit, Rishabh Gupta, Qi Zhang

Abstract: Mixed-integer optimization is at the core of many online decision-making systems that demand frequent updates of decisions in real time. However, due to their combinatorial nature, mixed-integer linear programs (MILPs) can be difficult to solve, rendering them often unsuitable for time-critical online applications. To address this challenge, we develop a data-driven approach for constructing surro… ▽ More Mixed-integer optimization is at the core of many online decision-making systems that demand frequent updates of decisions in real time. However, due to their combinatorial nature, mixed-integer linear programs (MILPs) can be difficult to solve, rendering them often unsuitable for time-critical online applications. To address this challenge, we develop a data-driven approach for constructing surrogate optimization models in the form of linear programs (LPs) that can be solved much more efficiently than the corresponding MILPs. We train these surrogate LPs in a decision-focused manner such that for different model inputs, they achieve the same or close to the same optimal solutions as the original MILPs. One key advantage of the proposed method is that it allows the incorporation of all the original MILP's linear constraints, which significantly increases the likelihood of obtaining feasible predicted solutions. Results from two computational case studies indicate that this decision-focused surrogate modeling approach is highly data-efficient and provides very accurate predictions of the optimal solutions. In these examples, it outperforms more commonly used neural-network-based optimization proxies. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04410 [pdf, other]

On Unitarity of Bespoke Amplitudes

Authors: Rishabh Bhardwaj, Marcus Spradlin, Anastasia Volovich, He-Chen Weng

Abstract: We use partial wave unitarity to constrain various bespoke four-point amplitudes. We start by constructing bespoke generalizations of the type I superstring amplitude, which we show satisfy dual resonance and have suitable high-energy limits. By analyzing the behavior of partial wave coefficients for highly massive states, we strictly rule out all bespoke amplitudes with asymptotically non-linear… ▽ More We use partial wave unitarity to constrain various bespoke four-point amplitudes. We start by constructing bespoke generalizations of the type I superstring amplitude, which we show satisfy dual resonance and have suitable high-energy limits. By analyzing the behavior of partial wave coefficients for highly massive states, we strictly rule out all bespoke amplitudes with asymptotically non-linear Regge trajectories and place constraints on the first few non-trivial parameters in asymptotically linear cases. Finally, we argue that while a large class of unitary bespoke amplitudes fails to satisfy Regge Sum Rules, there exists a smaller sub-class with a vanishing mass gap that is superpolynomially bounded. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 36 pages, 7 figures

arXiv:2406.02567 [pdf, other]

Tunable drag drop via flow-induced snap-through in origami

Authors: Rishabh Nain, Tom Marzin, Sophie Ramananarivo

Abstract: We leverage the snap-through response of a bistable origami mechanism to induce a discontinuous evolution of drag with flow speed. The transition between equilibrium states is passively actuated by airflow, and we demonstrate that large shape reconfiguration over a small increment of flow velocity leads to a pronounced and sudden drop in drag. Moreover, we show that systematically varying the geom… ▽ More We leverage the snap-through response of a bistable origami mechanism to induce a discontinuous evolution of drag with flow speed. The transition between equilibrium states is passively actuated by airflow, and we demonstrate that large shape reconfiguration over a small increment of flow velocity leads to a pronounced and sudden drop in drag. Moreover, we show that systematically varying the geometrical and mechanical properties of the origami unit enables the tuning of this drag discontinuity and the critical speed and loading at which it occurs. Experimental results are supported by a theoretical aero-elastic model, which further guides inverse design to identify the combination of structural origami parameters for targeted drag collapse. This approach sheds light on harnessing origami-inspired mechanisms for efficient passive drag control in a fluid environment, applicable for load alleviation or situations requiring swift transitions in aerodynamic performances. △ Less

Submitted 6 May, 2024; originally announced June 2024.

Comments: Main text: 14 pages long with 5 figures. We have also included supplementary material of 8 pages

arXiv:2406.02272 [pdf, other]

Computation-Aware Learning for Stable Control with Gaussian Process

Authors: Wenhan Cao, Alexandre Capone, Rishabh Yadav, Sandra Hirche, Wei Pan

Abstract: In Gaussian Process (GP) dynamical model learning for robot control, particularly for systems constrained by computational resources like small quadrotors equipped with low-end processors, analyzing stability and designing a stable controller present significant challenges. This paper distinguishes between two types of uncertainty within the posteriors of GP dynamical models: the well-documented m… ▽ More In Gaussian Process (GP) dynamical model learning for robot control, particularly for systems constrained by computational resources like small quadrotors equipped with low-end processors, analyzing stability and designing a stable controller present significant challenges. This paper distinguishes between two types of uncertainty within the posteriors of GP dynamical models: the well-documented mathematical uncertainty stemming from limited data and computational uncertainty arising from constrained computational capabilities, which has been largely overlooked in prior research. Our work demonstrates that computational uncertainty, quantified through a probabilistic approximation of the inverse covariance matrix in GP dynamical models, is essential for stable control under computational constraints. We show that incorporating computational uncertainty can prevent overestimating the region of attraction, a safe subset of the state space with asymptotic stability, thus improving system safety. Building on these insights, we propose an innovative controller design methodology that integrates computational uncertainty within a second-order cone programming framework. Simulations of canonical stable control tasks and experiments of quadrotor tracking exhibit the effectiveness of our method under computational constraints. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.20648 [pdf, other]

Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization

Authors: Richard Luo, Austin Peng, Adithya Vasudev, Rishabh Jain

Abstract: Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comp… ▽ More Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comprehension of the entire video requires not only understanding the visual-audio information of each shot but also requires that the model links the ideas between each shot to generate a larger, all-encompassing story. Despite significant progress in the field, current works often overlook videos' more granular shot-by-shot semantic information. In this project, we propose a family of efficient large language vision models (LLVMs) to boost video summarization and captioning called Shotluck Holmes. By leveraging better pretraining and data collection strategies, we extend the abilities of existing small LLVMs from being able to understand a picture to being able to understand a sequence of frames. Specifically, we show that Shotluck Holmes achieves better performance than state-of-the-art results on the Shot2Story video captioning and summary task with significantly smaller and more computationally efficient models. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19107 [pdf, ps, other]

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Authors: Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Avila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Remi Munos, Bilal Piot

Abstract: The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses… ▽ More The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses, yielding a preferred and a dis-preferred response. Such data is typically scarce and expensive to collect. On the other hand, \emph{single-trajectory} datasets where each element is a triplet composed of a prompt, a response and a human feedback is naturally more abundant. The canonical element of such datasets is for instance an LLM's response to a user's prompt followed by a user's feedback such as a thumbs-up/down. Consequently, in this work, we propose DRO, or \emph{Direct Reward Optimisation}, as a framework and associated algorithms that do not require pairwise preferences. DRO uses a simple mean-squared objective that can be implemented in various ways. We validate our findings empirically, using T5 encoder-decoder language models, and show DRO's performance over selected baselines such as Kahneman-Tversky Optimization (KTO). Thus, we confirm that DRO is a simple and empirically compelling method for single-trajectory policy optimisation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18508 [pdf, other]

Exploring waveforms with non-GR deviations for extreme mass-ratio inspirals

Authors: Shailesh Kumar, Rishabh Kumar Singh, Abhishek Chowdhuri, Arpan Bhattacharyya

Abstract: The fundamental process of detecting and examining the polarization modes of gravitational waves plays a pivotal role in enhancing our grasp on the precise mechanisms behind their generation. A thorough investigation is essential for delving deeper into the essence of gravitational waves and rigorously evaluating and validating the range of modified gravity theories. In this line of interest, a ge… ▽ More The fundamental process of detecting and examining the polarization modes of gravitational waves plays a pivotal role in enhancing our grasp on the precise mechanisms behind their generation. A thorough investigation is essential for delving deeper into the essence of gravitational waves and rigorously evaluating and validating the range of modified gravity theories. In this line of interest, a general description of black holes in theories beyond general relativity can serve a meaningful purpose where distinct deviation parameters can be mapped to solutions representing distinct theories. Employing a refined version of the deformed Kerr geometry, which is free from pathological behaviours such as unphysical divergences in the metric, we explore an extreme mass-ratio inspiral system, wherein a stellar-mass object perturbs a supermassive black hole. We compute the effects of deformation parameters on gravitational wave fluxes, orbital evolution and phase dynamics with leading order post-Newtonian corrections. With the waveform analysis, we assess the plausibility of detecting deviations from general relativity through observations facilitated by the Laser Interferometer Space Antenna (LISA), simultaneously constraining the extent of these deviations. Therefore, this analysis provides an understanding while highlighting the essential role of observations in advancing gravitational phenomena beyond general relativity. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 26 pages, 3 Figures

arXiv:2405.18081 [pdf, other]

Optimality of Approximate Message Passing Algorithms for Spiked Matrix Models with Rotationally Invariant Noise

Authors: Rishabh Dudeja, Songbin Liu, Junjie Ma

Abstract: We study the problem of estimating a rank one signal matrix from an observed matrix generated by corrupting the signal with additive rotationally invariant noise. We develop a new class of approximate message-passing algorithms for this problem and provide a simple and concise characterization of their dynamics in the high-dimensional limit. At each iteration, these algorithms exploit prior knowle… ▽ More We study the problem of estimating a rank one signal matrix from an observed matrix generated by corrupting the signal with additive rotationally invariant noise. We develop a new class of approximate message-passing algorithms for this problem and provide a simple and concise characterization of their dynamics in the high-dimensional limit. At each iteration, these algorithms exploit prior knowledge about the noise structure by applying a non-linear matrix denoiser to the eigenvalues of the observed matrix and prior information regarding the signal structure by applying a non-linear iterate denoiser to the previous iterates generated by the algorithm. We exploit our result on the dynamics of these algorithms to derive the optimal choices for the matrix and iterate denoisers. We show that the resulting algorithm achieves the smallest possible asymptotic estimation error among a broad class of iterative algorithms under a fixed iteration budget. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.13370 [pdf, other]

Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

Authors: Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

Abstract: This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptiv… ▽ More This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptive fields are lost, hampering diagnosis accuracy. To address this, this paper presents the Multilevel Collaborative Attention Knowledge (MLCAK) method. This approach leverages the self-attention mechanism of Vision Transformers (ViT) to transfer critical diagnostic knowledge from high-resolution images to enhance the diagnostic efficacy of low-resolution CXRs. MLCAK incorporates local pathological findings to boost model explainability, enabling more accurate global predictions in a multi-task framework tailored for low-resolution CXR analysis. Our research, utilizing the Vindr CXR dataset, shows a considerable enhancement in the ability to diagnose diseases from low-resolution images (e.g. 28 x 28), suggesting a critical transition from the traditional reliance on high-resolution imaging (e.g. 224 x 224). △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: IEEE ISBI 2024

arXiv:2405.10895 [pdf, other]

The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl

Authors: Zheyu Lin, Ning Jiang, Tinggui Wang, Xu Kong, Dongyue Li, Han He, Yibo Wang, Jiazheng Zhu, Wentao Li, Ji-an Jiang, Avinash Singh, Rishabh Singh Teja, D. K. Sahu, Chichuan **, Keiichi Maeda, Shifeng Huang

Abstract: The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m… ▽ More The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from multiple stars can also produce similar flares. In this letter, we report the discovery of a repeated pTDE, AT 2022dbl. In a quiescent galaxy at z=0.0284, two separate optical/UV flares have been observed in 2022 and 2024, with no bright X-ray, radio or mid-infrared counterparts. Compared to the first flare, the second flare has a similar blackbody temperature of ~26,000 K, slightly lower peak luminosity, and slower rise and fall phases. Compared to the ZTF TDEs, their blackbody parameters, bolometric energies and light curve shapes are all similar. The spectra taken during the second flare show a steeper continuum than the late-time spectra of the previous flare, consistent with a newly risen flare. More importantly, the possibility of two independent TDEs can be largely ruled out because the optical spectra taken around the peak of the two flares exhibit highly similar broad Balmer, N III and possible He II emission lines, especially the extreme ~4100Å emission lines. This represents the first robust spectroscopic evidence for a repeated pTDE, which can soon be verified by observing the third flare, given its short orbital period. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures, submitted to ApJ Letters on 2024 Apr 27

arXiv:2405.10170 [pdf, other]

A Mess of Memory System Benchmarking, Simulation and Application Profiling

Authors: Pouya Esmaili-Dokht, Francesco Sgherzi, Valeria Soldera Girelli, Isaac Boixaderas, Mariana Carmin, Alireza Momeni, Adria Armejach, Estanislao Mercadal, German Llort, Petar Radojkovic, Miquel Moreto, Judit Gimenez, Xavier Martorell, Eduard Ayguade, Jesus Labarta, Emanuele Confalonieri, Rishabh Dubey, Jason Adlard

Abstract: The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds of measurements that are represented as a family of bandwidth--latency curves. The benchmark increases the coverage of all the previous tools and leads to new f… ▽ More The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds of measurements that are represented as a family of bandwidth--latency curves. The benchmark increases the coverage of all the previous tools and leads to new findings in the behavior of the actual and simulated memory systems. We deploy the Mess benchmark to characterize Intel, AMD, IBM, Fujitsu, Amazon and NVIDIA servers with DDR4, DDR5, HBM2 and HBM2E memory. The Mess memory simulator uses bandwidth--latency concept for the memory performance simulation. We integrate Mess with widely-used CPUs simulators enabling modeling of all high-end memory technologies. The Mess simulator is fast, easy to integrate and it closely matches the actual system performance. By design, it enables a quick adoption of new memory technologies in hardware simulators. Finally, the Mess application profiling positions the application in the bandwidth--latency space of the target memory system. This information can be correlated with other application runtime activities and the source code, leading to a better overall understanding of the application's behavior. The current Mess benchmark release covers all major CPU and GPU ISAs, x86, ARM, Power, RISC-V, and NVIDIA's PTX. We also release as open source the ZSim, gem5 and OpenPiton Metro-MPI integrated with the Mess memory simulator for DDR4, DDR5, Optane, HBM2, HBM2E and CXL memory expanders. The Mess application profiling is already integrated into a suite of production HPC performance analysis tools. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 17 pages

arXiv:2405.07867 [pdf, other]

doi 10.1021/acs.jpcc.4c02062

Phonon Assisted Exciton Processes in Two-Dimensional Tungsten Monocarbide

Authors: Rishabh Saraswat, Miroslav Kolos, Rekha Verma, František Karlický, Sitangshu Bhattacharya

Abstract: n this study, we utilize a rigorous ab initio-based finite momentum Bethe-Salpeter equation to investigate the photoluminescence emission in two-dimensional hexagonal tungsten carbide (h-WC). This thermodynamically stable monolayer exhibits an indirect optical gap, resulting in phonon-assisted emission. We observe that light absorption is a direct process centered around the direct quasiparticle g… ▽ More n this study, we utilize a rigorous ab initio-based finite momentum Bethe-Salpeter equation to investigate the photoluminescence emission in two-dimensional hexagonal tungsten carbide (h-WC). This thermodynamically stable monolayer exhibits an indirect optical gap, resulting in phonon-assisted emission. We observe that light absorption is a direct process centered around the direct quasiparticle gap, while light emission is indirect and requires modes between $Γ$-$M$ in the phonon dispersion. The emission lines feature prominent phonon replicas at cryogenic temperatures, particularly near-infrared wavelengths (1.09 and 1.17 eV), and we observe exciton thermalization with the crystal beyond 25 K. Additionally, non-radiative recombination is a remarkably fast process, occurring at order of a few femtoseconds (4.8 fs at 0 K and 2.8 fs at 300 K) compared to radiative recombination (2.3 ps at 0 K and 214 ns at 300 K). These optical characteristics of 2D h-WC may facilitate the promise of photon-emitter devices for near-infrared signal communication. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Journal ref: J. Phys. Chem. C 128, 8341 (2024)

arXiv:2405.05053 [pdf, other]

Observation of an unconventional giant negative exchange bias effect in La$_{0.5}$Sr$_{0.5}$Co$_{0.85}$Nb$_{0.15}$O$_3$

Authors: Rishabh Shukla, B. Schwarz, R. S. Dhaka

Abstract: We find an unconventional giant negative exchange bias (EB) of $H_{\rm EB}$ = --14.1~kOe at 2~K (cooling field of 50~kOe) in the cluster spin-glass (CSG) La$_{0.5}$Sr$_{0.5}$Co$_{0.85}$Nb$_{0.15}$O$_3$ perovsikte cobaltites. The magnetic memory effect, aging measurements, and nonlineraity in specific heat capacity reveal the glassy magnetic state at low temperatures. Further, the detailed analysis… ▽ More We find an unconventional giant negative exchange bias (EB) of $H_{\rm EB}$ = --14.1~kOe at 2~K (cooling field of 50~kOe) in the cluster spin-glass (CSG) La$_{0.5}$Sr$_{0.5}$Co$_{0.85}$Nb$_{0.15}$O$_3$ perovsikte cobaltites. The magnetic memory effect, aging measurements, and nonlineraity in specific heat capacity reveal the glassy magnetic state at low temperatures. Further, the detailed analysis of {\it ac-}magnetic susceptibility confirms the glassy state below $\sim$58~K and the obtained characteristic spin-relaxation time-scale of $τ_0$ = 8.4$\times$10$^{-10}$~s indicates the presence of CSG. Moreover, the analysis of magnetic training effect using the classical EB relaxation model reveals that the frozen spins relax slowly as compared to the rotatable spins at the interface of antiferromagnetic/ferromagnetic (AFM/FM) regions in CSG. Interestingly, the dependence of EB parameters is found to be unconventional for cooling field $>$50~kOe as the $H\rm_{EB}$ and $M\rm_{EB}$ show decreasing trend instead of expected saturation at higher fields. This unusual nature emerges due to large negative values of intrinsic interface exchange coupling ($J_i$), i.e., --10.24$\pm$0.22~meV and --12.55$\pm$0.49~meV for the measuring fields of $\pm$50~kOe and $\pm$90~kOe, respectively, whereas the number of spins in the FM cluster ($N_{\rm FM}$) are found to be small in the range of 2.4--3.1. These obtained values of $J_i $ and $N_{\rm FM}$ indicate the dominant AFM interactions and the presence of FM clusters in the AFM matrix, respectively, which correlate well with the observed unconventional behavior of giant negative exchange bias in the present sample. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: submitted

arXiv:2405.03904 [pdf, other]

Transformer models classify random numbers

Authors: Rishabh Goel, YiZi Xiao, Ramin Ramezani

Abstract: Random numbers are incredibly important in a variety of fields, and the need for their validation remains important. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers however this does not remove the need to thoroughly test their randomness. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests fr… ▽ More Random numbers are incredibly important in a variety of fields, and the need for their validation remains important. A Quantum Random Number Generator (QRNG) can theoretically generate truly random numbers however this does not remove the need to thoroughly test their randomness. Generally, the task of validating random numbers has been delegated to different statistical tests such as the tests from the NIST Statistical Test Suite (STS) which are often slow and only perform one task at a time. Our work presents a deep learning model that utilizes the transformer architecture to encode some of the tests from the NIST STS in a single model that also runs much faster. This model performs multi-label classification on these tests and outputs the probability of passing each statistical test that it encodes. We perform a thorough hyper-parameter optimization to converge on the best possible model and as a result, achieve a high degree of accuracy with a sample f1 score of above 0.9. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures

arXiv:2405.03155 [pdf, other]

CushSense: Soft, Stretchable, and Comfortable Tactile-Sensing Skin for Physical Human-Robot Interaction

Authors: Boxin Xu, Luoyan Zhong, Grace Zhang, Xiaoyu Liang, Diego Virtue, Rishabh Madan, Tapomayukh Bhattacharjee

Abstract: Whole-arm tactile feedback is crucial for robots to ensure safe physical interaction with their surroundings. This paper introduces CushSense, a fabric-based soft and stretchable tactile-sensing skin designed for physical human-robot interaction (pHRI) tasks such as robotic caregiving. Using stretchable fabric and hyper-elastic polymer, CushSense identifies contacts by monitoring capacitive change… ▽ More Whole-arm tactile feedback is crucial for robots to ensure safe physical interaction with their surroundings. This paper introduces CushSense, a fabric-based soft and stretchable tactile-sensing skin designed for physical human-robot interaction (pHRI) tasks such as robotic caregiving. Using stretchable fabric and hyper-elastic polymer, CushSense identifies contacts by monitoring capacitive changes due to skin deformation. CushSense is cost-effective ($\sim$US\$7 per taxel) and easy to fabricate. We detail the sensor design and fabrication process and perform characterization, highlighting its high sensing accuracy (relative error of 0.58%) and durability (0.054% accuracy drop after 1000 interactions). We also present a user study underscoring its perceived safety and comfort for the assistive task of limb manipulation. We open source all sensor-related resources on https://emprise.cs.cornell.edu/cushsense. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures, ICRA2024

arXiv:2405.01488 [pdf, other]

Digital Twin Generators for Disease Modeling

Authors: Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D'Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova-Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky, Judy Viduya, Jonathan R. Walsh

Abstract: A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine… ▽ More A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine learning approaches that leverage large datasets of historical patients' longitudinal health records to generate patients' digital twins are more tractable than potential mechanistic models. In this manuscript, we describe a neural network architecture that can learn conditional generative models of clinical trajectories, which we call Digital Twin Generators (DTGs), that can create digital twins of individual patients. We show that the same neural network architecture can be trained to generate accurate digital twins for patients across 13 different indications simply by changing the training set and tuning hyperparameters. By introducing a general purpose architecture, we aim to unlock the ability to scale machine learning approaches to larger datasets and across more indications so that a digital twin could be created for any patient in the world. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00597 [pdf, other]

Non-abelian symmetry-resolved entanglement entropy

Authors: Eugenio Bianchi, Pietro Dona, Rishabh Kumar

Abstract: We introduce a mathematical framework for symmetry-resolved entanglement entropy with a non-abelian symmetry group. To obtain a reduced density matrix that is block-diagonal in the non-abelian charges, we define subsystems operationally in terms of subalgebras of invariant observables. We derive exact formulas for the average and the variance of the typical entanglement entropy for the ensemble of… ▽ More We introduce a mathematical framework for symmetry-resolved entanglement entropy with a non-abelian symmetry group. To obtain a reduced density matrix that is block-diagonal in the non-abelian charges, we define subsystems operationally in terms of subalgebras of invariant observables. We derive exact formulas for the average and the variance of the typical entanglement entropy for the ensemble of random pure states with fixed non-abelian charges. We focus on compact, semisimple Lie groups. We show that, compared to the abelian case, new phenomena arise from the interplay of locality and non-abelian symmetry, such as the asymmetry of the entanglement entropy under subsystem exchange, which we show in detail by computing the Page curve of a many-body system with $SU(2)$ symmetry. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 50 pages, 5 figures

arXiv:2404.14158 [pdf, other]

Temporal characterization of laser pulses using an air-based knife-edge technique

Authors: Pierre Béjot, Rishabh Kumar Bhalavi, Adrien Leblanc, Antoine Dubrouil, Franck Billard, Olivier Faucher, Edouard Hertz

Abstract: We present the characterization of ultrashort laser pulses by using the plasma-induced frequency resolved optical switching (PI-FROSt) technique, implemented in ambient air. This recently developed method allows for a temporal reconstruction of a pulse at its focal spot by utilizing a moderately intense pump laser pulse for generating an ionization-induced ultrafast defocusing lens. When propagati… ▽ More We present the characterization of ultrashort laser pulses by using the plasma-induced frequency resolved optical switching (PI-FROSt) technique, implemented in ambient air. This recently developed method allows for a temporal reconstruction of a pulse at its focal spot by utilizing a moderately intense pump laser pulse for generating an ionization-induced ultrafast defocusing lens. When propagating through the produced plasma lens, the probe beam to characterize experiences an increase of its size in the far field. The spectrum of the defocused probe field, measured as a function of the pump-probe delay, allows for a comprehensive characterization of the temporal and spectral attributes of the pulse. We report herein the ability of this technique, initially designed for use in rare gases, to operate in ambient air conditions with similar performance. The method is remarkably straightforward to implement and requires no additional optical component other than a focusing mirror, while delivering laser pulse reconstructions of high reliability. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11283 [pdf, ps, other]

Robust and composable device-independent quantum protocols for oblivious transfer and bit commitment

Authors: Rishabh Batra, Sayantan Chakraborty, Rahul Jain, Upendra Kapshikar

Abstract: We present robust and composable device-independent quantum protocols for oblivious transfer (OT) and bit commitment (BC) using Magic Square devices. We assume there is no long-term quantum memory, that is, after a finite time interval, referred to as \textbf{DELAY}, the states stored in the devices decohere. By robustness, which is a highlight of our protocols, we mean that the protocols are corr… ▽ More We present robust and composable device-independent quantum protocols for oblivious transfer (OT) and bit commitment (BC) using Magic Square devices. We assume there is no long-term quantum memory, that is, after a finite time interval, referred to as \textbf{DELAY}, the states stored in the devices decohere. By robustness, which is a highlight of our protocols, we mean that the protocols are correct and secure even when devices are slightly off from their ideal specifications (the \emph{faulty but non-malicious} regime). This is an important property, since in the real world, devices would certainly have small manufacturing errors and cannot be expected to be ideal. To the best of our understanding and knowledge, none of the known DI protocols for OT and BC in the literature are robust; they can not guarantee correctness in the faulty but non-malicious regime. Our protocols are sequentially composable and hence, can be used as building blocks to construct larger protocols, while still preserving security guarantees. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11018 [pdf, other]

Many-Shot In-Context Learning

Authors: Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co-Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle

Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative… ▽ More Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to fine-tuning. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance. △ Less

Submitted 22 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.07815 [pdf, other]

Post-Hoc Reversal: Are We Selecting Models Prematurely?

Authors: Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Chase Lipton

Abstract: Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical st… ▽ More Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical study. In particular, we demonstrate a phenomenon that we call post-hoc reversal, where performance trends are reversed after applying these post-hoc transforms. This phenomenon is especially prominent in high-noise settings. For example, while base models overfit badly early in training, both conventional ensembling and SWA favor base models trained for more epochs. Post-hoc reversal can also suppress the appearance of double descent and mitigate mismatches between test loss and test error seen in base models. Based on our findings, we propose post-hoc selection, a simple technique whereby post-hoc metrics inform model development decisions such as early stop**, checkpointing, and broader hyperparameter choices. Our experimental analyses span real-world vision, language, tabular and graph datasets from domains like satellite imaging, language modeling, census prediction and social network analysis. On an LLM instruction tuning dataset, post-hoc selection results in > 1.5x MMLU improvement compared to naive selection. Code is available at https://github.com/rishabh-ranjan/post-hoc-reversal. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 9 pages + references + appendix, 7 figures

arXiv:2404.05460 [pdf, other]

Thermal melting of a vortex lattice in a quasi two-dimensional Bose gas

Authors: Rishabh Sharma, David Rey, Laurent Longchambon, Aurélien Perrin, Hélène Perrin, Romain Dubessy

Abstract: We report the observation of the melting of a vortex lattice in a fast rotating quasi-two dimensional Bose gas, under the influence of thermal fluctuations. We image the vortex lattice after a time-of-flight expansion, for increasing rotation frequency at constant atom number and temperature. We detect the vortex positions and study the order of the lattice using the pair correlation function and… ▽ More We report the observation of the melting of a vortex lattice in a fast rotating quasi-two dimensional Bose gas, under the influence of thermal fluctuations. We image the vortex lattice after a time-of-flight expansion, for increasing rotation frequency at constant atom number and temperature. We detect the vortex positions and study the order of the lattice using the pair correlation function and the orientational correlation function. We evidence the melting transition by an abrupt change in the decay of orientational correlations, associated to a proliferation of dislocations. Our findings are consistent with the hexatic to liquid transition in the KTHNY scenario for two-dimensional melting. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04645 [pdf, other]

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

Authors: Yingting Li, Rishabh Bhardwaj, Ambuj Mehrish, Bo Cheng, Soujanya Poria

Abstract: Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While develo** TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for… ▽ More Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While develo** TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems. The audio samples and code are available at https://github.com/declare-lab/HyperTTS. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.03220 [pdf, ps, other]

Commitments are equivalent to one-way state generators

Authors: Rishabh Batra, Rahul Jain

Abstract: One-way state generators (OWSG) are natural quantum analogs to classical one-way functions. We show that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs ($n$ represents the input length) are equivalent to $poly(n)$-copy OWSG and to quantum commitments. Since known results show that $o\left(\frac{n}{\log(n)}\right)$-copy OWSG cannot imply commitments, this shows that $O\left(\frac{n}{\log(n)}\right)$-… ▽ More One-way state generators (OWSG) are natural quantum analogs to classical one-way functions. We show that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs ($n$ represents the input length) are equivalent to $poly(n)$-copy OWSG and to quantum commitments. Since known results show that $o\left(\frac{n}{\log(n)}\right)$-copy OWSG cannot imply commitments, this shows that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs are the weakest OWSGs from which we can get commitments (and hence much of quantum cryptography). Our construction follows along the lines of Håstad, Impagliazzo, Levin and Luby [HILL], who obtained classical pseudorandom generators (PRG) from classical one-way functions (OWF), however with crucial modifications. Our construction, when applied to the classical case, provides an alternative to the construction provided by [HILL]. Since we do not argue conditioned on the output of the one-way function, our construction and analysis are arguably simpler and may be of independent interest. △ Less

Submitted 17 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: minor changes to previous version

arXiv:2404.02894 [pdf, other]

Automated Transparency: A Legal and Empirical Analysis of the Digital Services Act Transparency Database

Authors: Rishabh Kaushal, Jacob van de Kerkhof, Catalina Goanta, Gerasimos Spanakis, Adriana Iamnitchi

Abstract: The Digital Services Act (DSA) is a much awaited platforms liability reform in the European Union that was adopted on 1 November 2022 with the ambition to set a global example in terms of accountability and transparency. Among other obligations, the DSA emphasizes the need for online platforms to report on their content moderation decisions (`statements of reasons' - SoRs), which is a novel transp… ▽ More The Digital Services Act (DSA) is a much awaited platforms liability reform in the European Union that was adopted on 1 November 2022 with the ambition to set a global example in terms of accountability and transparency. Among other obligations, the DSA emphasizes the need for online platforms to report on their content moderation decisions (`statements of reasons' - SoRs), which is a novel transparency mechanism we refer to as automated transparency in this study. SoRs are currently made available in the DSA Transparency Database, launched by the European Commission in September 2023. The DSA Transparency Database marks a historical achievement in platform governance, and allows investigations about the actual transparency gains, both at structure level as well as at the level of platform compliance. This study aims to understand whether the Transparency Database helps the DSA to live up to its transparency promises. We use legal and empirical arguments to show that while there are some transparency gains, compliance remains problematic, as the current database structure allows for a lot of discretion from platforms in terms of transparency practices. In our empirical study, we analyze a representative sample of the Transparency Database (131m SoRs) submitted in November 2023, to characterise and evaluate platform content moderation practices. △ Less

Submitted 3 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: accepted to FAccT 2024; camera-ready version; 19 pages

arXiv:2403.18820 [pdf, other]

MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering

Authors: Guoxing Sun, Rishabh Dabral, Pascal Fua, Christian Theobalt, Marc Habermann

Abstract: Faithful human performance capture and free-view rendering from sparse RGB observations is a long-standing problem in Vision and Graphics. The main challenges are the lack of observations and the inherent ambiguities of the setting, e.g. occlusions and depth ambiguity. As a result, radiance fields, which have shown great promise in capturing high-frequency appearance and geometry details in dense… ▽ More Faithful human performance capture and free-view rendering from sparse RGB observations is a long-standing problem in Vision and Graphics. The main challenges are the lack of observations and the inherent ambiguities of the setting, e.g. occlusions and depth ambiguity. As a result, radiance fields, which have shown great promise in capturing high-frequency appearance and geometry details in dense setups, perform poorly when naïvely supervising them on sparse camera views, as the field simply overfits to the sparse-view inputs. To address this, we propose MetaCap, a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human. Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos, which can serve as a prior when fine-tuning them on sparse imagery depicting the human. This prior provides a good network weight initialization, thereby effectively addressing ambiguities in sparse-view capture. Due to the articulated structure of the human body and motion-induced surface deformations, learning such a prior is non-trivial. Therefore, we propose to meta-learn the field weights in a pose-canonicalized space, which reduces the spatial feature range and makes feature learning more effective. Consequently, one can fine-tune our field parameters to quickly generalize to unseen poses, novel illumination conditions as well as novel and sparse (even monocular) camera views. For evaluating our method under different scenarios, we collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs, and demonstrate superior results compared to recent state-of-the-art methods on both public and WildDynaCap dataset. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Project page: https://vcai.mpi-inf.mpg.de/projects/MetaCap/

arXiv:2403.17936 [pdf, other]

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Authors: Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobalt

Abstract: Gestures play a key role in human communication. Recent methods for co-speech gesture generation, while managing to generate beat-aligned motions, struggle generating gestures that are semantically aligned with the utterance. Compared to beat gestures that align naturally to the audio signal, semantically coherent gestures require modeling the complex interactions between the language and human mo… ▽ More Gestures play a key role in human communication. Recent methods for co-speech gesture generation, while managing to generate beat-aligned motions, struggle generating gestures that are semantically aligned with the utterance. Compared to beat gestures that align naturally to the audio signal, semantically coherent gestures require modeling the complex interactions between the language and human motion, and can be controlled by focusing on certain words. Therefore, we present ConvoFusion, a diffusion-based approach for multi-modal gesture synthesis, which can not only generate gestures based on multi-modal speech inputs, but can also facilitate controllability in gesture synthesis. Our method proposes two guidance objectives that allow the users to modulate the impact of different conditioning modalities (e.g. audio vs text) as well as to choose certain words to be emphasized during gesturing. Our method is versatile in that it can be trained either for generating monologue gestures or even the conversational gestures. To further advance the research on multi-party interactive gestures, the DnD Group Gesture dataset is released, which contains 6 hours of gesture data showing 5 people interacting with one another. We compare our method with several recent works and demonstrate effectiveness of our method on a variety of tasks. We urge the reader to watch our supplementary video at our website. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CVPR 2024. Project Page: https://vcai.mpi-inf.mpg.de/projects/ConvoFusion/

arXiv:2403.16012 [pdf, ps, other]

Determination of Hilbert modular forms using squarefree coefficients

Authors: Rishabh Agnihotri, Krishnarjun Krishnamoorthy

Abstract: Let $F$ (over $\mathbb{Q}$) be a totally real number field of narrow class number $1$. We generalize a result of Kohnen on the determination of half integral weight modular forms by their Fourier coefficients supported on squarefree (algebraic) integers. We also give a soft proof that infinitely many Fourier coefficients supported on squarefree integers are non-vanishing. Let $F$ (over $\mathbb{Q}$) be a totally real number field of narrow class number $1$. We generalize a result of Kohnen on the determination of half integral weight modular forms by their Fourier coefficients supported on squarefree (algebraic) integers. We also give a soft proof that infinitely many Fourier coefficients supported on squarefree integers are non-vanishing. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Comments and suggestions welcome

MSC Class: 11F41; 11F37; 11F27

arXiv:2403.15214 [pdf, other]

InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Authors: Thales Bertaglia, Lily Heisig, Rishabh Kaushal, Adriana Iamnitchi

Abstract: Large Language Models (LLMs) raise concerns about lowering the cost of generating texts that could be used for unethical or illegal purposes, especially on social media. This paper investigates the promise of such models to help enforce legal requirements related to the disclosure of sponsored content online. We investigate the use of LLMs for generating synthetic Instagram captions with two objec… ▽ More Large Language Models (LLMs) raise concerns about lowering the cost of generating texts that could be used for unethical or illegal purposes, especially on social media. This paper investigates the promise of such models to help enforce legal requirements related to the disclosure of sponsored content online. We investigate the use of LLMs for generating synthetic Instagram captions with two objectives: The first objective (fidelity) is to produce realistic synthetic datasets. For this, we implement content-level and network-level metrics to assess whether synthetic captions are realistic. The second objective (utility) is to create synthetic data that is useful for sponsored content detection. For this, we evaluate the effectiveness of the generated synthetic data for training classifiers to identify undisclosed advertisements on Instagram. Our investigations show that the objectives of fidelity and utility may conflict and that prompt engineering is a useful but insufficient strategy. Additionally, we find that while individual synthetic posts may appear realistic, collectively they lack diversity, topic connectivity, and realistic user interaction patterns. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: To appear at the 18th International AAAI Conference on Web and Social Media (ICWSM 2024) -- please cite accordingly

arXiv:2403.14384 [pdf, other]

Krylov localization as a probe for ergodicity breaking

Authors: Heiko Georg Menzler, Rishabh Jha

Abstract: Krylov complexity has recently gained attention where the growth of operator complexity in time is measured in terms of the off-diagonal operator Lanczos coefficients. The operator Lanczos algorithm reduces the problem of complexity growth to a single-particle semi-infinite tight-binding chain (known as the Krylov chain). Employing the phenomenon of Anderson localization, we propose the inverse lo… ▽ More Krylov complexity has recently gained attention where the growth of operator complexity in time is measured in terms of the off-diagonal operator Lanczos coefficients. The operator Lanczos algorithm reduces the problem of complexity growth to a single-particle semi-infinite tight-binding chain (known as the Krylov chain). Employing the phenomenon of Anderson localization, we propose the inverse localization length on the Krylov chain as a probe to detect weak ergodicity-breaking. On the Krylov chain we find delocalization in an ergodic regime, as we show for the SYK model, and localization in case of a weakly ergodicity-broken regime. Considering the dynamics beyond scrambling, we find a collapse across different system sizes at the point of weak ergodicity-breaking leading to a quantitative prediction. We further show universal traits of different operators in the ergodic regime beyond the scrambling dynamics. We test for two settings: (1) the coupled SYK model, and (2) the quantum East model. Our findings open avenues for map** ergodicity/weak ergodicity-breaking transitions to delocalization/localization phenomenology on the Krylov chain. △ Less

Submitted 16 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 5 pages and 4 figures plus references, followed by 13 pages of supplemental material containing 9 figures

arXiv:2403.10443 [pdf, other]

Celestial soft currents at one-loop and their OPEs

Authors: Rishabh Bhardwaj, Akshay Yelleshpur Srikant

Abstract: Conformally soft operators and their associated soft theorems on the celestial sphere encode the low energy behaviour of bulk scattering amplitudes. They lead to an infinite dimensional symmetry algebra of the celestial CFT at tree-level. In this paper, we introduce new operators in the celestial CFT in order to extend the definition of conformally soft currents to include one-loop effects. We the… ▽ More Conformally soft operators and their associated soft theorems on the celestial sphere encode the low energy behaviour of bulk scattering amplitudes. They lead to an infinite dimensional symmetry algebra of the celestial CFT at tree-level. In this paper, we introduce new operators in the celestial CFT in order to extend the definition of conformally soft currents to include one-loop effects. We then compute their OPEs with other operators in the theory. We also examine new subtleties that arise in defining OPEs of two conformally soft operators. We elucidate the connection between the new operators and loop corrected soft theorems in the bulk. Finally, we conclude by demonstrating how these operators fit into the framework of a logarithmic CFT. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 23 pages, 0 figures

Showing 1–50 of 644 results for author: Rishabh