Search | arXiv e-print repository

Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Map** in Mobile Robots

Authors: Siva Krishna Ravipati, Ehsan Latif, Ramviyas Parasuraman, Suchendra M. Bhandarkar

Abstract: Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic map** could unlock new pot… ▽ More Classification of different object surface material types can play a significant role in the decision-making algorithms for mobile robots and autonomous vehicles. RGB-based scene-level semantic segmentation has been well-addressed in the literature. However, improving material recognition using the depth modality and its integration with SLAM algorithms for 3D semantic map** could unlock new potential benefits in the robotics perception pipeline. To this end, we propose a complementarity-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. The approach further integrates the ORB-SLAM2 method for 3D scene map** with multiscale clustering of the detected material semantics in the point cloud map generated by the visual SLAM algorithm. Extensive experimental results with existing public datasets and newly contributed real-world robot datasets demonstrate a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene map**. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted to IROS 2024

arXiv:2407.05931 [pdf, other]

Competing nucleation pathways in nanocrystal formation

Authors: Carlos R. Salazar, Akshay Krishna Ammothum Kandy, Jean Furstoss, Quentin Gromoff, Jacek Goniakowski, Julien Lam

Abstract: Despite numerous efforts from numerical approaches to complement experimental measurements, several fundamental challenges have still hindered one's ability to truly provide an atomistic picture of the nucleation process in nanocrystals. Among them, our study resolves three obstacles: (1) Machine-learning force fields including long-range interactions able to capture the finesse of the underlying… ▽ More Despite numerous efforts from numerical approaches to complement experimental measurements, several fundamental challenges have still hindered one's ability to truly provide an atomistic picture of the nucleation process in nanocrystals. Among them, our study resolves three obstacles: (1) Machine-learning force fields including long-range interactions able to capture the finesse of the underlying atomic interactions, (2) Data-driven characterization of the local ordering in a complex structural landscape associated with several crystal polymorphs and (3) Comparing results from a large range of temperatures using both brute-force and rare-event sampling. Altogether, our simulation strategy has allowed us to study zinc oxide crystallization from nano-droplet melt. Remarkably, our results show that different nucleation pathways compete depending on the investigated degree of supercooling. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05528 [pdf, other]

An accurate detection is not all you need to combat label noise in web-noisy datasets

Authors: Paul Albert, Jack Valmadre, Eric Arazo, Tarun Krishna, Noel E. O'Connor, Kevin McGuinness

Abstract: Training a classifier on web-crawled data demands learning algorithms that are robust to annotation errors and irrelevant examples. This paper builds upon the recent empirical observation that applying unsupervised contrastive learning to noisy, web-crawled datasets yields a feature representation under which the in-distribution (ID) and out-of-distribution (OOD) samples are linearly separable. We… ▽ More Training a classifier on web-crawled data demands learning algorithms that are robust to annotation errors and irrelevant examples. This paper builds upon the recent empirical observation that applying unsupervised contrastive learning to noisy, web-crawled datasets yields a feature representation under which the in-distribution (ID) and out-of-distribution (OOD) samples are linearly separable. We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples, and yet, surprisingly, this detection does not translate into gains in classification accuracy. Digging deeper into this phenomenon, we discover that the near-perfect detection misses a type of clean examples that are valuable for supervised learning. These examples often represent visually simple images, which are relatively easy to identify as clean examples using standard loss- or distance-based methods despite being poorly separated from the OOD distribution using unsupervised learning. Because we further observe a low correlation with SOTA metrics, this urges us to propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach. When combined with the SOTA algorithm PLS, we substantially improve SOTA results for real-world image classification in the presence of web noise github.com/PaulAlbert31/LSA △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted in the European Conference on Computer Vision (ECCV) 2024

arXiv:2407.05266 [pdf, other]

CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs

Authors: Akshat Ramachandran, Souvik Kundu, Tushar Krishna

Abstract: We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs). We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships, leading to the generation of simplistic and semantically vague data, impacting quantization accuracy. CLAMP-ViT employs a two-stage approach, cyclically adapting between data g… ▽ More We present CLAMP-ViT, a data-free post-training quantization method for vision transformers (ViTs). We identify the limitations of recent techniques, notably their inability to leverage meaningful inter-patch relationships, leading to the generation of simplistic and semantically vague data, impacting quantization accuracy. CLAMP-ViT employs a two-stage approach, cyclically adapting between data generation and model quantization. Specifically, we incorporate a patch-level contrastive learning scheme to generate richer, semantically meaningful data. Furthermore, we leverage contrastive learning in layer-wise evolutionary search for fixed- and mixed-precision quantization to identify optimal quantization parameters while mitigating the effects of a non-smooth loss landscape. Extensive evaluations across various vision tasks demonstrate the superiority of CLAMP-ViT, with performance improvements of up to 3% in top-1 accuracy for classification, 0.6 mAP for object detection, and 1.5 mIoU for segmentation at similar or better compression ratio over existing alternatives. Code is available at https://github.com/georgia-tech-synergy-lab/CLAMP-ViT.git △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.05185 [pdf]

doi 10.1016/j.compgeo.2024.106525

Sequential hybrid finite element and material point method to simulate slope failures

Authors: Brent Sordo, Ellen Rathje, Krishna Kumar

Abstract: Numerical modeling of slope failures seeks to predict two key phenomena: the initiation of failure and the post-failure runout. Currently, most modeling methods for slope failure analysis excel at one of these two but are deficient in the other. For example, the Finite Element Method (FEM) models the initiation of instability well but quickly loses accuracy when modeling large deformations because… ▽ More Numerical modeling of slope failures seeks to predict two key phenomena: the initiation of failure and the post-failure runout. Currently, most modeling methods for slope failure analysis excel at one of these two but are deficient in the other. For example, the Finite Element Method (FEM) models the initiation of instability well but quickly loses accuracy when modeling large deformations because of mesh distortion, restricting its ability to predict runout. Conversely, the Material Point Method (MPM) utilizes material points which move freely across a background grid, allowing for indefinite deformations without computational issues. However, MPM is restricted in its ability to model slope failure initiation due to limitations of the available boundary conditions and reduced accuracy of its stress distributions. The sequential hybridization of these two methods, initiating a model in FEM and then transferring to MPM, presents an opportunity to accurately capture both initiation and runout by a single model. The exact time for this transfer is not self-apparent, but it must be conducted after the initiation mechanism and before excessive mesh distortion. By simulating two granular column failures and two slope failures, we demonstrate the effectiveness of this hybrid FEM-MPM method and identify the appropriate time to transfer. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Journal ref: Computers and Geotechnics (2024)

arXiv:2407.04953 [pdf, other]

Effective-LDAM: An Effective Loss Function To Mitigate Data Imbalance for Robust Chest X-Ray Disease Classification

Authors: Sree Rama Vamsidhar S, Bhargava Satya, Rama Krishna Gorthi

Abstract: Deep Learning (DL) approaches have gained prominence in medical imaging for disease diagnosis. Chest X-ray (CXR) classification has emerged as an effective method for detecting various diseases. Among these methodologies, Chest X-ray (CXR) classification has proven to be an effective approach for detecting and analyzing various diseases. However, the reliable performance of DL classification algor… ▽ More Deep Learning (DL) approaches have gained prominence in medical imaging for disease diagnosis. Chest X-ray (CXR) classification has emerged as an effective method for detecting various diseases. Among these methodologies, Chest X-ray (CXR) classification has proven to be an effective approach for detecting and analyzing various diseases. However, the reliable performance of DL classification algorithms is dependent upon access to large and balanced datasets, which pose challenges in medical imaging due to the impracticality of acquiring sufficient data for every disease category. To tackle this problem, we propose an algorithmic-centric approach called Effective-Label Distribution Aware Margin (E-LDAM), which modifies the margin of the widely adopted Label Distribution Aware Margin (LDAM) loss function using an effective number of samples in each class. Experimental evaluations on the COVIDx CXR dataset focus on Normal, Pneumonia, and COVID-19 classification. The experimental results demonstrate the effectiveness of the proposed E-LDAM approach, achieving a remarkable recall score of 97.81% for the minority class (COVID-19) in CXR image prediction. Furthermore, the overall accuracy of the three-class classification task attains an impressive level of 95.26%. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04865 [pdf, other]

A differentiable Gillespie algorithm for simulating chemical kinetics, parameter estimation, and designing synthetic biological circuits

Authors: Krishna Rijal, Pankaj Mehta

Abstract: The Gillespie algorithm is commonly used to simulate and analyze complex chemical reaction networks. Here, we leverage recent breakthroughs in deep learning to develop a fully differentiable variant of the Gillespie algorithm. The differentiable Gillespie algorithm (DGA) approximates discontinuous operations in the exact Gillespie algorithm using smooth functions, allowing for the calculation of g… ▽ More The Gillespie algorithm is commonly used to simulate and analyze complex chemical reaction networks. Here, we leverage recent breakthroughs in deep learning to develop a fully differentiable variant of the Gillespie algorithm. The differentiable Gillespie algorithm (DGA) approximates discontinuous operations in the exact Gillespie algorithm using smooth functions, allowing for the calculation of gradients using backpropagation. The DGA can be used to quickly and accurately learn kinetic parameters using gradient descent and design biochemical networks with desired properties. As an illustration, we apply the DGA to study stochastic models of gene promoters. We show that the DGA can be used to: (i) successfully learn kinetic parameters from experimental measurements of mRNA expression levels from two distinct $\textit{E. coli}$ promoters and (ii) design nonequilibrium promoter architectures with desired input-output relationships. These examples illustrate the utility of the DGA for analyzing stochastic chemical kinetics, including a wide variety of problems of interest to synthetic and systems biology. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04815 [pdf, other]

NSD-DIL: Null-Shot Deblurring Using Deep Identity Learning

Authors: Sree Rama Vamsidhar S, Rama Krishna Gorthi

Abstract: In this paper, we propose to reformulate the blind image deblurring task to directly learn an inverse of the degradation model using a deep linear network. We introduce Deep Identity Learning (DIL), a novel learning strategy that includes a dedicated regularization term based on the properties of linear systems, to exploit the identity relation between the degradation and inverse degradation model… ▽ More In this paper, we propose to reformulate the blind image deblurring task to directly learn an inverse of the degradation model using a deep linear network. We introduce Deep Identity Learning (DIL), a novel learning strategy that includes a dedicated regularization term based on the properties of linear systems, to exploit the identity relation between the degradation and inverse degradation models. The salient aspect of our proposed framework is it neither relies on a deblurring dataset nor a single input blurred image (like Polyblur, a self-supervised method). Since it is purely image-data-independent, we term our model as Null-Shot deblurring Using Deep Identity Learning (NSD-DIL). We also provide an explicit representation of the learned deep linear network in a matrix form, called Deep Restoration Kernel (DRK) for deblurring task. The proposed framework detours the typical degradation kernel estimation step involved in most of the existing blind deblurring solutions by the proposition of our Random Kernel Gallery (RKG) dataset. In this work, we focus on the restoration of mild blur images, generated by small out-of-focus, lens blur, or slight camera motion, which often occurs in real images. Our experiments show that the proposed method outperforms both traditional and deep learning based deblurring methods, with at least an order of 100 lesser computational resources. The proposed NSD-DIL method can be effortlessly extended to the Image Super-Resolution (ISR) task as well to restore the low-resolution images with fine details. The NSD-DIL model and its kernel form representation (DRK) are lightweight yet robust and restore the mild blur input in a fraction of a second. Hence, more suitable for wide real-time applications. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04325 [pdf, other]

Understanding the Role of Invariance in Transfer Learning

Authors: Till Speicher, Vedant Nanda, Krishna P. Gummadi

Abstract: Transfer learning is a powerful technique for knowledge-sharing between different tasks. Recent work has found that the representations of models with certain invariances, such as to adversarial input perturbations, achieve higher performance on downstream tasks. These findings suggest that invariance may be an important property in the context of transfer learning. However, the relationship of in… ▽ More Transfer learning is a powerful technique for knowledge-sharing between different tasks. Recent work has found that the representations of models with certain invariances, such as to adversarial input perturbations, achieve higher performance on downstream tasks. These findings suggest that invariance may be an important property in the context of transfer learning. However, the relationship of invariance with transfer performance is not fully understood yet and a number of questions remain. For instance, how important is invariance compared to other factors of the pretraining task? How transferable is learned invariance? In this work, we systematically investigate the importance of representational invariance for transfer learning, as well as how it interacts with other parameters during pretraining. To do so, we introduce a family of synthetic datasets that allow us to precisely control factors of variation both in training and test data. Using these datasets, we a) show that for learning representations with high transfer performance, invariance to the right transformations is as, or often more, important than most other factors such as the number of training samples, the model architecture and the identity of the pretraining classes, b) show conditions under which invariance can harm the ability to transfer representations and c) explore how transferable invariance is between tasks. The code is available at \url{https://github.com/tillspeicher/representation-invariance-transfer}. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Published at TMLR 2024

arXiv:2407.03462 [pdf, other]

An Update on the External Calibrator for Hydrogen Observatories (ECHO)

Authors: Yifan Zhao, Daniel C. Jacobs, Titu Samson, Mrudula Gopal Krishna, Michael Horn, Marc-Olivier R. Lalonde, Raven Braithwaite, Logan Skabelund

Abstract: Precision measurements of the beam pattern response are needed to predict the response of a radio telescope. Map** the beam of a low frequency radio array presents a unique challenge and science cases such as the observation of the 21\,cm line at high redshift have demanding requirements. Drone-based systems offer the unique potential for a measurement which is entirely under experimenter contro… ▽ More Precision measurements of the beam pattern response are needed to predict the response of a radio telescope. Map** the beam of a low frequency radio array presents a unique challenge and science cases such as the observation of the 21\,cm line at high redshift have demanding requirements. Drone-based systems offer the unique potential for a measurement which is entirely under experimenter control, but progress has been paced by practical implementation challenges. Previously, a prototype drone system, called the External Calibrator for Hydrogen Observatories (ECHO), demonstrated good performance in making a complete hemispherical beam measurement. This paper reports updates to the system focusing on performance of a new drone platform, minimizing interference from the drone, and a new transmitter. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03267 [pdf]

Insulator-to-Metal Transition and Isotropic Gigantic Magnetoresistance in Layered Magnetic Semiconductors

Authors: Gokul Acharya, Bimal Neupane, Chia-Hsiu Hsu, Xian P. Yang, David Graf, Eun Sang Choi, Krishna Pandey, Md Rafique Un Nabi, Santosh Karki Chhetri, Rabindra Basnet, Sumaya Rahman, Jian Wang, Zhengxin Hu, Bo Da, Hugh Churchill, Guoqing Chang, M. Zahid Hasan, Yuanxi Wang, ** Hu

Abstract: Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology ap… ▽ More Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology applications such as omnidirectional sensing, is rarely seen, especially for pristine crystals. Here we propose a strategy to realize extremely strong modulation of electron conduction by magnetic field which is independent of field direction. GdPS, a layered antiferromagnetic semiconductor with resistivity anisotropies, supports a field-driven insulator-to-metal transition with a paradoxically isotropic gigantic negative magnetoresistance insensitive to magnetic field orientations. This isotropic magnetoresistance originates from the combined effects of a near-zero spin-orbit coupling of Gd3+-based half-filling f-electron system and the strong on-site f-d exchange coupling in Gd atoms. Our results not only provide a novel material system with extraordinary magnetotransport that offers a missing block for antiferromagnet-based ultrafast and efficient spintronic devices, but also demonstrate the key ingredients for designing magnetic materials with desired transport properties for advanced functionalities. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 44 pages, 18 figures

arXiv:2407.03093 [pdf, other]

Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets

Authors: Partha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel, Meiyappan Nagappan, Shane McIntosh

Abstract: The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of up to 99%, but these models underperform in practical scenarios, particularly when assessed on entire codebases rather than just the fixing commit. This paper i… ▽ More The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of up to 99%, but these models underperform in practical scenarios, particularly when assessed on entire codebases rather than just the fixing commit. This paper introduces Real-Vul, a comprehensive dataset representing real-world scenarios for evaluating vulnerability detection models. Evaluating DeepWukong, LineVul, ReVeal, and IVDetect shows a significant drop in performance, with precision decreasing by up to 95 percentage points and F1 scores by up to 91 points. Furthermore, Model performance fluctuates based on vulnerability characteristics, with better F1 scores for information leaks or code injection than for path resolution or predictable return values. The results highlight a significant performance gap that needs addressing before deploying deep learning-based vulnerability detection in practical settings. Overfitting is identified as a key issue, and an augmentation technique is proposed, potentially improving performance by up to 30%. Contributions include a dataset creation approach for better model evaluation, Real-Vul dataset, and empirical evidence of deep learning models struggling in real-world settings. △ Less

Submitted 3 July, 2024; originally announced July 2024.

ACM Class: D.2; I.2

Journal ref: 10.1109/TSE.2024.3423712

arXiv:2407.02960 [pdf, other]

ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

Authors: Ahmed Frikha, Nassim Walha, Ricardo Mendes, Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou

Abstract: This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provi… ▽ More This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of ObfuscaTune by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a naïve version of our approach to highlight the necessity of using random matrices with low condition numbers in our approach to reduce errors induced by the obfuscation. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.02956 [pdf, other]

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

Authors: Ahmed Frikha, Nassim Walha, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou

Abstract: In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while kee** the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a redu… ▽ More In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while kee** the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90%. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.02943 [pdf, other]

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Authors: Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou

Abstract: The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no conse… ▽ More The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no consensus on the optimal methodology to evaluate this risk, resulting in underestimating realistic adversaries. In this work, we empirically demonstrate that it is possible to improve the extractability of PII by over ten-fold by grounding the prefix of the manually constructed extraction prompt with in-domain data. Our approach, PII-Compass, achieves phone number extraction rates of 0.92%, 3.9%, and 6.86% with 1, 128, and 2308 queries, respectively, i.e., the phone number of 1 person in 15 is extractable. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted at ACL 2024

arXiv:2407.02543 [pdf, other]

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

Authors: Varun Krishna, Sriram Ganapathy

Abstract: The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and related tasks, this has largely ignored factors of speech that are encoded at coarser level, like characteristics of the speaker or channel that remain consistent… ▽ More The popular frameworks for self-supervised learning of speech representations have largely focused on frame-level masked prediction of speech regions. While this has shown promising downstream task performance for speech recognition and related tasks, this has largely ignored factors of speech that are encoded at coarser level, like characteristics of the speaker or channel that remain consistent through-out a speech utterance. In this work, we propose a framework for Learning Disentangled Self Supervised (termed as Learn2Diss) representations of speech, which consists of frame-level and an utterance-level encoder modules. The two encoders are initially learned independently, where the frame-level model is largely inspired by existing self supervision techniques, thereby learning pseudo-phonemic representations, while the utterance-level encoder is inspired by constrastive learning of pooled embeddings, thereby learning pseudo-speaker representations. The joint learning of these two modules consists of disentangling the two encoders using a mutual information based criterion. With several downstream evaluation experiments, we show that the proposed Learn2Diss achieves state-of-the-art results on a variety of tasks, with the frame-level encoder representations improving semantic tasks, while the utterance-level representations improve non-semantic tasks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02013 [pdf, other]

DiGRAF: Diffeomorphic Graph-Adaptive Activation Function

Authors: Krishna Sri Ipsit Mantri, Xinzhi Wang, Carola-Bibiane Schönlieb, Bruno Ribeiro, Beatrice Bevilacqua, Moshe Eliasof

Abstract: In this paper, we propose a novel activation function tailored specifically for graph data in Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible activation functions, we introduce DiGRAF, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which we augment with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-… ▽ More In this paper, we propose a novel activation function tailored specifically for graph data in Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible activation functions, we introduce DiGRAF, leveraging Continuous Piecewise-Affine Based (CPAB) transformations, which we augment with an additional GNN to learn a graph-adaptive diffeomorphic activation function in an end-to-end manner. In addition to its graph-adaptivity and flexibility, DiGRAF also possesses properties that are widely recognized as desirable for activation functions, such as differentiability, boundness within the domain and computational efficiency. We conduct an extensive set of experiments across diverse datasets and tasks, demonstrating a consistent and superior performance of DiGRAF compared to traditional and graph-specific activation functions, highlighting its effectiveness as an activation function for GNNs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01732 [pdf, other]

Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Authors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

Abstract: E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or ship** services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the mark… ▽ More E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or ship** services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the marketplace helps a customer in selecting an offer (by a seller) through (a) a default offer selection algorithm, (b) showing features about each of the offers and the corresponding sellers (price, seller performance metrics, seller's number of ratings etc.), and (c) finally evaluating the sellers along these features. In this paper, we perform an end-to-end investigation into how the above apparatus can nudge customers toward the Related Sellers on Amazon's four different marketplaces in India, USA, Germany and France. We find that given explicit choices, customers' preferred offers and algorithmically selected offers can be significantly different. We highlight that Amazon is adopting different performance metric evaluation policies for different sellers, potentially benefiting Related Sellers. For instance, such policies result in notable discrepancy between the actual performance metric and the presented performance metric of Related Sellers. We further observe that among the seller-centric features visible to customers, sellers' number of ratings influences their decisions the most, yet it may not reflect the true quality of service by the seller, rather reflecting the scale at which the seller operates, thereby implicitly steering customers toward larger Related Sellers. Moreover, when customers are shown the rectified metrics for the different sellers, their preference toward Related Sellers is almost halved. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This work has been accepted for presentation at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024. It will appear in Proceedings of the ACM on Human-Computer Interaction

arXiv:2407.01549 [pdf, other]

FFT and Linear Convolution Implementation with Bit Slicing Multiplier: A Novel Approach

Authors: Aravind Kumar N, Hari Krishna S, Anita Angeline A

Abstract: This paper presents a comprehensive exploration of Fast Fourier Transform (FFT) and linear convolution implementations, integrating both conventional methods and novel approaches leveraging the Bit Slicing Multiplier (BSM) technique. The Bit Slicing Multiplier utilizes Look-Up Tables (LUTs) to execute bitwise operations in parallel, offering efficient arithmetic operations ideally suited for digit… ▽ More This paper presents a comprehensive exploration of Fast Fourier Transform (FFT) and linear convolution implementations, integrating both conventional methods and novel approaches leveraging the Bit Slicing Multiplier (BSM) technique. The Bit Slicing Multiplier utilizes Look-Up Tables (LUTs) to execute bitwise operations in parallel, offering efficient arithmetic operations ideally suited for digital signal processing tasks. We extensively investigate the integration of BSM into FFT and linear convolution algorithms, emphasizing its advantages in terms of speed and resource utilization. Additionally, we introduce our own innovative ideas for FFT and convolution algorithms, contributing to the broader discourse on efficient signal processing techniques. Experimental validation of our implementations is conducted using Vivado, a leading FPGA synthesis and implementation tool. Comparative analysis demonstrates the superior performance of our BSM-enhanced approaches, showcasing their potential for real-time signal processing applications. This study not only advances the understanding of FFT and convolution implementations but also highlights the effectiveness of novel techniques like BSM in enhancing computational efficiency in FPGA-based systems. △ Less

Submitted 25 April, 2024; originally announced July 2024.

arXiv:2407.00385 [pdf, other]

Sparse Actuator Scheduling for Discrete-Time Linear Dynamical Systems

Authors: Krishna Praveen V. S. Kondapi, Chandrasekhar Sriram, Geethu Joseph, Chandra R. Murthy

Abstract: We consider the control of discrete-time linear dynamical systems using sparse inputs where we limit the number of active actuators at every time step. We develop an algorithm for determining a sparse actuator schedule that ensures the existence of a sparse control input sequence, following the schedule, that takes the system from any given initial state to any desired final state. Since such an a… ▽ More We consider the control of discrete-time linear dynamical systems using sparse inputs where we limit the number of active actuators at every time step. We develop an algorithm for determining a sparse actuator schedule that ensures the existence of a sparse control input sequence, following the schedule, that takes the system from any given initial state to any desired final state. Since such an actuator schedule is not unique, we look for a schedule that minimizes the energy of sparse inputs. For this, we optimize the trace of the inverse of the resulting controllability Gramian, which is an approximate measure of the average energy of the inputs. We present a greedy algorithm along with its theoretical guarantees. Finally, we empirically show that our greedy algorithm ensures the controllability of the linear system with a small number of active actuators per time step without a significant average energy expenditure compared to the fully actuated system. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00167 [pdf, other]

Can GPT-4 Help Detect Quit Va** Intentions? An Exploration of Automatic Data Annotation Approach

Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

Abstract: In recent years, the United States has witnessed a significant surge in the popularity of va** or e-cigarette use, leading to a notable rise in cases of e-cigarette and va** use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cessation. Due… ▽ More In recent years, the United States has witnessed a significant surge in the popularity of va** or e-cigarette use, leading to a notable rise in cases of e-cigarette and va** use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one va** sub-community on Reddit to analyze users' quit-va** intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit va** intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Accepted for the AI Applications in Public Health and Social Services workshop at the 22nd International Conference on Artificial Intelligence in Medicine (AIME 2024)

arXiv:2406.19954 [pdf, other]

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

Authors: Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Żelasko, Jagadeesh Balam, Boris Ginsburg

Abstract: Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a sequence of LLM inputs like a decoder-only model; ii) T5-style, introduce speech cross-attention to each layer of the pretrained LLMs. We propose BESTO… ▽ More Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a sequence of LLM inputs like a decoder-only model; ii) T5-style, introduce speech cross-attention to each layer of the pretrained LLMs. We propose BESTOW architecture to bring the BESt features from TwO Worlds into a single model that is highly efficient and has strong multitask capabilities. Moreover, there is no clear streaming solution for either style, especially considering the solution should generalize to speech multitask. We reformulate streamable SpeechLLM as a read-write policy problem and unifies the offline and streaming research with BESTOW architecture. Hence we demonstrate the first open-source SpeechLLM solution that enables Streaming and Multitask at scale (beyond ASR) at the same time. This streamable solution achieves very strong performance on a wide range of speech tasks (ASR, AST, SQA, unseen DynamicSuperb). It is end-to-end optimizable, with lower training/inference cost, and demonstrates LLM knowledge transferability to speech. △ Less

Submitted 28 June, 2024; originally announced June 2024.

MSC Class: 68T10 ACM Class: I.2.7

arXiv:2406.19738 [pdf, other]

Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States

Authors: Bharati. K, Vikesh Siddhu, Krishna Jagannathan

Abstract: Entanglement is a key resource for a wide range of tasks in quantum information and computing. Thus, verifying availability of this quantum resource is essential. Extensive research on entanglement detection has led to no-go theorems (Lu et al. [Phys. Rev. Lett., 116, 230501 (2016)]) that highlight the need for full state tomography (FST) in the absence of adaptive or joint measurements. Recent ad… ▽ More Entanglement is a key resource for a wide range of tasks in quantum information and computing. Thus, verifying availability of this quantum resource is essential. Extensive research on entanglement detection has led to no-go theorems (Lu et al. [Phys. Rev. Lett., 116, 230501 (2016)]) that highlight the need for full state tomography (FST) in the absence of adaptive or joint measurements. Recent advancements, as proposed by Zhu, Teo, and Englert [Phys. Rev. A, 81, 052339, 2010], introduce a single-parameter family of entanglement witness measurements which are capable of conclusively detecting certain entangled states and only resort to FST when all witness measurements are inconclusive. We find a variety of realistic noisy two-qubit quantum states $\mathcal{F}$ that yield conclusive results under this witness family. We solve the problem of detecting entanglement among $K$ quantum states in $\mathcal{F}$, of which $m$ states are entangled, with $m$ potentially unknown. We recognize a structural connection of this problem to the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). In contrast to existing quantum bandit frameworks, we establish a new correspondence tailored for entanglement detection and term it the $(m,K)$-quantum Multi-Armed Bandit. We implement two well-known MAB policies for arbitrary states derived from $\mathcal{F}$, present theoretical guarantees on the measurement/sample complexity and demonstrate the practicality of the policies through numerical simulations. More broadly, this paper highlights the potential for employing classical machine learning techniques for quantum entanglement detection. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 20 pages, 5 figures

arXiv:2406.19674 [pdf, other]

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Authors: Krishna C. Puvvada, Piotr Żelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

Abstract: Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while b… ▽ More Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while being trained on an order of magnitude less data than these models. Three key factors enables such data-efficient model: (1) a FastConformer-based attention encoder-decoder architecture (2) training on synthetic data generated with machine translation and (3) advanced training techniques: data-balancing, dynamic data blending, dynamic bucketing and noise-robust fine-tuning. The model, weights, and training code will be open-sourced. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech-2024

arXiv:2406.19580 [pdf, other]

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna

Abstract: Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating h… ▽ More Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of different parallelization strategies. Furthermore, FRED supports in-switch collective communication execution that reduces the network traffic by approximately 2X. Our results show that FRED can improve the average end-to-end training time of ResNet-152, Transformer-17B, GPT-3, and Transformer-1T by 1.76X, 1.87X, 1.34X, and 1.4X, respectively when compared to a baseline waferscale 2D-Mesh fabric. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18915 [pdf, other]

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Authors: Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

Abstract: Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited t… ▽ More Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances. We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object. We evaluate our method using two setups. First, Manipulate-Anything successfully generates trajectories for all 5 real-world and 12 simulation tasks, significantly outperforming existing methods like VoxPoser. Second, Manipulate-Anything's demonstrations can train more robust behavior cloning policies than training with human demonstrations, or from data generated by VoxPoser and Code-As-Policies. We believe Manipulate-Anything can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting. △ Less

Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: Project page: https://robot-ma.github.io/

arXiv:2406.18095 [pdf, other]

Observational Evidence to Logistic Dark Energy Driving the Accelerating Universe

Authors: Sarath Nelleri, Gopi Krishna, Navaneeth Poonthottathil

Abstract: We present logistic dark energy model (LDEM), where the dark energy density follows a logistic function for the scale factor. The equation of state parameter of dark energy ($w_D$) transitioned from $-1$ in the distant past to its current value of $-0.76$, closely resembling the $Λ$CDM model in the early epoch and showing significant deviation in the late phase. The evolution of the deceleration p… ▽ More We present logistic dark energy model (LDEM), where the dark energy density follows a logistic function for the scale factor. The equation of state parameter of dark energy ($w_D$) transitioned from $-1$ in the distant past to its current value of $-0.76$, closely resembling the $Λ$CDM model in the early epoch and showing significant deviation in the late phase. The evolution of the deceleration parameter in the LDEM signifies its success in explaining the late-time cosmic acceleration. Model selection based on the Bayesian Information Criterion (BIC), incorporating observations from Type Ia Supernovae (SNe Ia), Observational Hubble data (OHD), and Baryon Acoustic Oscillation (BAO) strongly favors the LDEM over the conventional $Λ$CDM model, where BIC is estimated to be $\sim -20$. Incorporating the shift parameter derived from the Cosmic Microwave Background (CMB) data shows competing evidence of the LDEM over the standard $Λ$CDM. Remarkably, the Hubble constant ($H_0$) value computed using any of the datasets tends to align closely with the predictions from the Cosmic Microwave Background (CMB), suggesting a need to reconsider the local measurement. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17968 [pdf, other]

Efficient Document Ranking with Learnable Late Interactions

Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer based on query and document token embeddings. However, these lightweight scorers are often hand-crafted, and there is no understanding of their approximation power; further, such scorers require access to individual document token embeddings, which imposes an increased latency and storage burden. In this paper, we propose novel learnable late-interaction models (LITE) that resolve these issues. Theoretically, we prove that LITE is a universal approximator of continuous scoring functions, even for relatively small embedding dimension. Empirically, LITE outperforms previous late-interaction models such as ColBERT on both in-domain and zero-shot re-ranking tasks. For instance, experiments on MS MARCO passage re-ranking show that LITE not only yields a model with better generalization, but also lowers latency and requires 0.25x storage compared to ColBERT. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17774 [pdf, other]

Fast and Uncertainty-Aware SVBRDF Recovery from Multi-View Capture using Frequency Domain Analysis

Authors: Ruben Wiersma, Julien Philip, Miloš Hašan, Krishna Mullia, Fujun Luan, Elmar Eisemann, Valentin Deschaintre

Abstract: Relightable object acquisition is a key challenge in simplifying digital asset creation. Complete reconstruction of an object typically requires capturing hundreds to thousands of photographs under controlled illumination, with specialized equipment. The recent progress in differentiable rendering improved the quality and accessibility of inverse rendering optimization. Nevertheless, under uncontr… ▽ More Relightable object acquisition is a key challenge in simplifying digital asset creation. Complete reconstruction of an object typically requires capturing hundreds to thousands of photographs under controlled illumination, with specialized equipment. The recent progress in differentiable rendering improved the quality and accessibility of inverse rendering optimization. Nevertheless, under uncontrolled illumination and unstructured viewpoints, there is no guarantee that the observations contain enough information to reconstruct the appearance properties of the captured object. We thus propose to consider the acquisition process from a signal-processing perspective. Given an object's geometry and a lighting environment, we estimate the properties of the materials on the object's surface in seconds. We do so by leveraging frequency domain analysis, considering the recovery of material properties as a deconvolution, enabling fast error estimation. We then quantify the uncertainty of the estimation, based on the available data, highlighting the areas for which priors or additional samples would be required for improved acquisition quality. We compare our approach to previous work and quantitatively evaluate our results, showing similar quality as previous work in a fraction of the time, and providing key information about the certainty of the results. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Project page: https://brdf-uncertainty.github.io

arXiv:2406.17562 [pdf]

Low Excess Noise, High Quantum Efficiency Avalanche Photodiodes for Beyond 2 μm Wavelength Detection

Authors: Hyemin Jung, Seunghyun Lee, Xiao **, Yifan Liu, Theodore J. Ronningen, Christoph H. Grein, John P. R. David, Sanjay Krishna

Abstract: The increasing concentration of greenhouse gases, notably CH4 and CO2, has fueled global temperature increases, intensifying concerns regarding the prevailing climate crisis. Effectively monitoring these gases demands a detector spanning the extended short-wavelength infrared (~2.4 μm) range, covering wavelengths of CH4 (1.65 μm) and CO2 (2.05 μm). The state-of-the-art HgCdTe avalanche photodetect… ▽ More The increasing concentration of greenhouse gases, notably CH4 and CO2, has fueled global temperature increases, intensifying concerns regarding the prevailing climate crisis. Effectively monitoring these gases demands a detector spanning the extended short-wavelength infrared (~2.4 μm) range, covering wavelengths of CH4 (1.65 μm) and CO2 (2.05 μm). The state-of-the-art HgCdTe avalanche photodetectors (APDs) offer exceptional performance metrics, including high gain (M) and low excess noise (F). However, their widespread adoption is hindered by inherent challenges such as manufacturability, reproducibility, and cost factors. Moreover, their reliance on cryogenic cooling adds to the cost, size, weight, and power of the system. We have demonstrated a linear mode APD combining an InGaAs/GaAsSb type-II superlattice absorber and an AlGaAsSb multiplier lattice matched to InP substrates. This APD has demonstrated a room temperature M of 178, a maximum measurable external quantum efficiency of 3560 % at 2 μm, an extremely low excess noise (F < 2 at M < 20), and a small temperature coefficient of breakdown (7.58 mV/K μm). Such a high performance APD with manufacturable semiconductor materials could lead to a rapid transition to a commercial III-V foundry, holding the promise of revolutionizing high-sensitivity receivers for greenhouse gas monitoring. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17377 [pdf, other]

A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs

Authors: Vaibhav Singh, Amrith Krishna, Karthika NJ, Ganesh Ramakrishnan

Abstract: Low-resource languages, by its very definition, tend to be under represented in the pre-training corpora of Large Language Models. In this work, we investigate three low-resource cross-lingual approaches that enable an LLM adapt to tasks in previously unseen languages. Llama-2 is an LLM where Indic languages, among many other language families, contribute to less than $0.005\%$ of the total $2$ tr… ▽ More Low-resource languages, by its very definition, tend to be under represented in the pre-training corpora of Large Language Models. In this work, we investigate three low-resource cross-lingual approaches that enable an LLM adapt to tasks in previously unseen languages. Llama-2 is an LLM where Indic languages, among many other language families, contribute to less than $0.005\%$ of the total $2$ trillion token pre-training corpora. In this work, we experiment with the English-dominated Llama-2 for cross-lingual transfer to three Indic languages, Bengali, Hindi, and Tamil as target languages. We study three approaches for cross-lingual transfer, under ICL and fine-tuning. One, we find that adding additional supervisory signals via a dominant language in the LLM, leads to improvements, both under in-context learning and fine-tuning. Two, adapting the target languages to word reordering may be beneficial under ICL, but its impact diminishes with fine tuning. Finally, continued pre-training in one low-resource language can improve model performance for other related low-resource languages. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16820 [pdf]

EFECT -- A Method and Metric to Assess the Reproducibility of Stochastic Simulation Studies

Authors: T. J. Sego, Matthias König, Luis L. Fonseca, Baylor Fain, Adam C. Knapp, Krishna Tiwari, Henning Hermjakob, Herbert M. Sauro, James A. Glazier, Reinhard C. Laubenbacher, Rahuman S. Malik-Sheriff

Abstract: Reproducibility is a foundational standard for validating scientific claims in computational research. Stochastic computational models are employed across diverse fields such as systems biology, financial modelling and environmental sciences. Existing infrastructure and software tools support various aspects of reproducible model development, application, and dissemination, but do not adequately a… ▽ More Reproducibility is a foundational standard for validating scientific claims in computational research. Stochastic computational models are employed across diverse fields such as systems biology, financial modelling and environmental sciences. Existing infrastructure and software tools support various aspects of reproducible model development, application, and dissemination, but do not adequately address independently reproducing simulation results that form the basis of scientific conclusions. To bridge this gap, we introduce the Empirical Characteristic Function Equality Convergence Test (EFECT), a data-driven method to quantify the reproducibility of stochastic simulation results. EFECT employs empirical characteristic functions to compare reported results with those independently generated by assessing distributional inequality, termed EFECT error, a metric to quantify the likelihood of equality. Additionally, we establish the EFECT convergence point, a metric for determining the required number of simulation runs to achieve an EFECT error value of a priori statistical significance, setting a reproducibility benchmark. EFECT supports all real-valued and bounded results irrespective of the model or method that produced them, and accommodates stochasticity from intrinsic model variability and random sampling of model inputs. We tested EFECT with stochastic differential equations, agent-based models, and Boolean networks, demonstrating its broad applicability and effectiveness. EFECT standardizes stochastic simulation reproducibility, establishing a workflow that guarantees reliable results, supporting a wide range of stakeholders, and thereby enhancing validation of stochastic simulation studies, across a model's lifecycle. To promote future standardization efforts, we are develo** open source software library libSSR in diverse programming languages for easy integration of EFECT. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 25 pages, 4 figures

arXiv:2406.16532 [pdf]

Terahertz photocurrent probe of quantum geometry and interactions in magic-angle twisted bilayer graphene

Authors: Roshan Krishna Kumar, Geng Li, Riccardo Bertini, Swati Chaudhary, Krystian Nowakowski, Jeong Min Park, Sebastian Castilla, Zhen Zhan, Pierre A. Pantaleón, Hitesh Agarwal, Sergi Battle-Porro, Eike Icking, Matteo Ceccanti, Antoine Reserbat-Plantey, Giulia Piccinini, Julien Barrier, Ekaterina Khestanova, Takashi Taniguchi, Kenji Watanabe, Christoph Stampfer, Gil Refael, Francisco Guinea, Pablo Jarillo-Herrero, Justin C. W. Song, Petr Stepanov , et al. (2 additional authors not shown)

Abstract: Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompa… ▽ More Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompasses quantum "textures" of electron wavefunctions. Using terahertz light resonant with optical transitions of its flat bands, we observe bulk photocurrents driven by broken symmetries and reveal the interplay between electron interactions and quantum geometry. We observe inversion-breaking gapped states undetectable through quantum transport, sharp changes in the polarization axes caused by interaction-induced band renormalization, and recurring photocurrent patterns at integer fillings of the moiré unit cell that track the evolution of quantum geometry through the cascade of phase transitions. The large and tunable terahertz response intrinsic to flat-band systems offers direct insights into the quantum geometry of interacting electrons and paves the way for innovative terahertz quantum technologies. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16008 [pdf, other]

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Authors: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between… ▽ More Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences. △ Less

Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: ACL Findings 2024

arXiv:2406.14517 [pdf, other]

PostMark: A Robust Blackbox Watermark for Large Language Models

Authors: Yapei Chang, Kalpesh Krishna, Amir Houmansadr, John Wieting, Mohit Iyyer

Abstract: The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. I… ▽ More The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. In this paper, we develop PostMark, a modular post-hoc watermarking procedure in which an input-dependent set of words (determined via a semantic embedding) is inserted into the text after the decoding process has completed. Critically, PostMark does not require logit access, which means it can be implemented by a third party. We also show that PostMark is more robust to paraphrasing attacks than existing watermarking methods: our experiments cover eight baseline algorithms, five base LLMs, and three datasets. Finally, we evaluate the impact of PostMark on text quality using both automated and human assessments, highlighting the trade-off between quality and robustness to paraphrasing. We release our code, outputs, and annotations at https://github.com/lilakk/PostMark. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: preprint; 18 pages, 5 figures

arXiv:2406.14486 [pdf, other]

Rule-based outlier detection of AI-generated anatomy segmentations

Authors: Deepa Krishnaswamy, Vamsi Krishna Thiriveedhi, Cosmin Ciausu, David Clunie, Steve Pieper, Ron Kikinis, Andrey Fedorov

Abstract: There is a dire need for medical imaging datasets with accompanying annotations to perform downstream patient analysis. However, it is difficult to manually generate these annotations, due to the time-consuming nature, and the variability in clinical conventions. Artificial intelligence has been adopted in the field as a potential method to annotate these large datasets, however, a lack of expert… ▽ More There is a dire need for medical imaging datasets with accompanying annotations to perform downstream patient analysis. However, it is difficult to manually generate these annotations, due to the time-consuming nature, and the variability in clinical conventions. Artificial intelligence has been adopted in the field as a potential method to annotate these large datasets, however, a lack of expert annotations or ground truth can inhibit the adoption of these annotations. We recently made a dataset publicly available including annotations and extracted features of up to 104 organs for the National Lung Screening Trial using the TotalSegmentator method. However, the released dataset does not include expert-derived annotations or an assessment of the accuracy of the segmentations, limiting its usefulness. We propose the development of heuristics to assess the quality of the segmentations, providing methods to measure the consistency of the annotations and a comparison of results to the literature. We make our code and related materials publicly available at https://github.com/ImagingDataCommons/CloudSegmentatorResults and interactive tools at https://huggingface.co/spaces/ImagingDataCommons/CloudSegmentatorResults. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14458 [pdf, other]

Centimeter Positioning Accuracy using AI/ML for 6G Applications

Authors: Sai Prasanth Kotturi, Radha Krishna Ganti

Abstract: This research looks at using AI/ML to achieve centimeter-level user positioning in 6G applications such as the Industrial Internet of Things (IIoT). Initial results show that our AI/ML-based method can estimate user positions with an accuracy of 17 cm in an indoor factory environment. In this proposal, we highlight our approaches and future directions. This research looks at using AI/ML to achieve centimeter-level user positioning in 6G applications such as the Industrial Internet of Things (IIoT). Initial results show that our AI/ML-based method can estimate user positions with an accuracy of 17 cm in an indoor factory environment. In this proposal, we highlight our approaches and future directions. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 2 Pages, 2 Figures, ICMLCN Conference, Stockholm, Sweden

arXiv:2406.14433 [pdf]

Structural and Electrical Properties of Grafted Si/GaAsSb Heterojunction

Authors: Haris Naeem Abbasi, Seunghyun Lee, Hyemin Jung, Nathan Gajowski, Yi Lu, Linus Wang, Donghyeok Kim, Jie Zhou, Jiarui Gong, Chris Chae, **woo Hwang, Manisha Muduli, Subramanya Nookala, Zhenqiang Ma, Sanjay Krishna

Abstract: The short-wave infrared (SWIR) wavelength, especially 1.55 um, has attracted significant attention in various areas such as high-speed optical communication and LiDAR systems. Avalanche photodiodes (APDs) are a critical component as a receiver in these systems due to their internal gain which enhances the system performance. Silicon-based APDs are promising since they are CMOS compatible, but they… ▽ More The short-wave infrared (SWIR) wavelength, especially 1.55 um, has attracted significant attention in various areas such as high-speed optical communication and LiDAR systems. Avalanche photodiodes (APDs) are a critical component as a receiver in these systems due to their internal gain which enhances the system performance. Silicon-based APDs are promising since they are CMOS compatible, but they are limited in detecting 1.55 um light detection. This study proposes a p-type Si on n-type GaAs0.51Sb0.49 (GaAsSb) lattice matched to InP substrates heterojunction formed using a grafting technique for future GaAsSb/Si APD technology. A p+Si nanomembrane is transferred onto the GaAsSb/AlInAs/InP substrate, with an ultrathin ALD-Al2O3 oxide at the interface, which behaves as both double-side passivation and quantum tunneling layers. The devices exhibit excellent surface morphology and interface quality, confirmed by atomic force microscope (AFM) and transmission electron microscope (TEM). Also, the current-voltage (I-V) of the p+Si/n-GaAsSb heterojunction shows ideal rectifying characteristics with an ideality factor of 1.15. The I-V tests across multiple devices confirm high consistency and yield. Furthermore, the X-ray photoelectron spectroscopy (XPS) measurement reveals that GaAsSb and Si are found to have type-II band alignment with a conduction band offset of 50 meV which is favorable for the high-bandwidth APD application. The demonstration of the GaAsSb/Si heterojunction highlights the potential to advance current SWIR PD technologies. △ Less

Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 14 pages, 6 figures

arXiv:2406.14072 [pdf, other]

IGRINS observations of WASP-127 b: H$_2$O, CO, and super-Solar atmospheric metallicity in the inflated sub-Saturn

Authors: Krishna Kanumalla, Michael R. Line, Megan Weiner Mansfield, Luis Welbanks, Peter C. B. Smith, Jacob L. Bean, Lorenzo Pino, Matteo Brogi, Vatsal Panwar

Abstract: High resolution spectroscopy of exoplanet atmospheres provides insights into their composition and dynamics from the resolved line shape and depth of thousands of spectral lines. WASP-127 b is an extremely inflated sub-Saturn (R$_\mathrm{p}$= 1.311 R$_\mathrm{Jup}$, M$_\mathrm{p}$= 0.16 M$_\mathrm{Jup}$) with previously reported detections of H$_2$O, CO$_2$, and Na. However, the seeming absence of… ▽ More High resolution spectroscopy of exoplanet atmospheres provides insights into their composition and dynamics from the resolved line shape and depth of thousands of spectral lines. WASP-127 b is an extremely inflated sub-Saturn (R$_\mathrm{p}$= 1.311 R$_\mathrm{Jup}$, M$_\mathrm{p}$= 0.16 M$_\mathrm{Jup}$) with previously reported detections of H$_2$O, CO$_2$, and Na. However, the seeming absence of the primary carbon reservoir expected at WASP-127 b temperatures (T$_{eq}$ $\sim$ 1400 K) from chemical equilibrium, CO, posed a mystery. In this manuscript, we present the analysis of high resolution observations of WASP-127 b with the Immersion GRating INfrared Spectrometer (IGRINS) on Gemini South. We confirm the presence of H$_2$O (8.67 $σ$) and report the detection of CO (4.34 $σ$). Additionally, we conduct a suite of Bayesian retrieval analyses covering a hierarchy of model complexity and self-consistency. When freely fitting for the molecular gas volume mixing ratios, we obtain super-solar metal enrichment for H$_2$O abundance of log$_{10}$X$_\mathrm{H_2O}$ = --1.23$^{+0.29}_{-0.49}$ and a lower limit on the CO abundance of log$_{10}$X$_\mathrm{CO}$ $\ge$ --2.20 at 2$σ$ confidence. We also report a tentative evidence of photochemistry in WASP-127 b based upon the indicative depletion of H$_2$S. This is also supported by the data preferring models with photochemistry over free-chemistry and thermochemistry. The overall analysis implies a super-solar ($\sim$ 39$\times$ Solar; [M/H] = $1.59^{+0.30}_{-0.30}$) metallicity for the atmosphere of WASP-127 b and an upper limit on its atmospheric C/O ratio as $<$ 0.68. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 18 pages, 15 figures, submitted to AJ, poster at Exo5 conference area-A

arXiv:2406.13868 [pdf, other]

SDQ: Sparse Decomposed Quantization for LLM Inference

Authors: Geonhwa Jeong, Po-An Tsai, Stephen W. Keckler, Tushar Krishna

Abstract: Recently, large language models (LLMs) have shown surprising performance in task-specific workloads as well as general tasks with the given prompts. However, to achieve unprecedented performance, recent LLMs use billions to trillions of parameters, which hinder the wide adaptation of those models due to their extremely large compute and memory requirements. To resolve the issue, various model comp… ▽ More Recently, large language models (LLMs) have shown surprising performance in task-specific workloads as well as general tasks with the given prompts. However, to achieve unprecedented performance, recent LLMs use billions to trillions of parameters, which hinder the wide adaptation of those models due to their extremely large compute and memory requirements. To resolve the issue, various model compression methods are being actively investigated. In this work, we propose SDQ (Sparse Decomposed Quantization) to exploit both structured sparsity and quantization to achieve both high compute and memory efficiency. From our evaluations, we observe that SDQ can achieve 4x effective compute throughput with <1% quality drop. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.13129 [pdf, other]

M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation

Authors: Nagur Shareef Shaik, Teja Krishna Cherukuri, Dong Hye Ye

Abstract: Automated retinal image medical description generation is crucial for streamlining medical diagnosis and treatment planning. Existing challenges include the reliance on learned retinal image representations, difficulties in handling multiple imaging modalities, and the lack of clinical context in visual representations. Addressing these issues, we propose the Multi-Modal Medical Transformer (M3T),… ▽ More Automated retinal image medical description generation is crucial for streamlining medical diagnosis and treatment planning. Existing challenges include the reliance on learned retinal image representations, difficulties in handling multiple imaging modalities, and the lack of clinical context in visual representations. Addressing these issues, we propose the Multi-Modal Medical Transformer (M3T), a novel deep learning architecture that integrates visual representations with diagnostic keywords. Unlike previous studies focusing on specific aspects, our approach efficiently learns contextual information and semantics from both modalities, enabling the generation of precise and coherent medical descriptions for retinal images. Experimental studies on the DeepEyeNet dataset validate the success of M3T in meeting ophthalmologists' standards, demonstrating a substantial 13.5% improvement in BLEU@4 over the best-performing baseline model. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: This paper has been accepted for presentation at the IEEE International Conference on Image Processing (ICIP 2024)

arXiv:2406.13126 [pdf, other]

Guided Context Gating: Learning to leverage salient lesions in retinal fundus images

Authors: Teja Krishna Cherukuri, Nagur Shareef Shaik, Dong Hye Ye

Abstract: Effectively representing medical images, especially retinal images, presents a considerable challenge due to variations in appearance, size, and contextual information of pathological signs called lesions. Precise discrimination of these lesions is crucial for diagnosing vision-threatening issues such as diabetic retinopathy. While visual attention-based neural networks have been introduced to lea… ▽ More Effectively representing medical images, especially retinal images, presents a considerable challenge due to variations in appearance, size, and contextual information of pathological signs called lesions. Precise discrimination of these lesions is crucial for diagnosing vision-threatening issues such as diabetic retinopathy. While visual attention-based neural networks have been introduced to learn spatial context and channel correlations from retinal images, they often fall short in capturing localized lesion context. Addressing this limitation, we propose a novel attention mechanism called Guided Context Gating, an unique approach that integrates Context Formulation, Channel Correlation, and Guided Gating to learn global context, spatial correlations, and localized lesion context. Our qualitative evaluation against existing attention mechanisms emphasize the superiority of Guided Context Gating in terms of explainability. Notably, experiments on the Zenodo-DR-7 dataset reveal a substantial 2.63% accuracy boost over advanced attention mechanisms & an impressive 6.53% improvement over the state-of-the-art Vision Transformer for assessing the severity grade of retinopathy, even with imbalanced and limited training samples for each class. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: This paper has been accepted for presentation at the IEEE International Conference on Image Processing (ICIP 2024)

arXiv:2406.12997 [pdf, other]

Suitability of CCA for Generating Latent State/ Variables in Multi-View Textual Data

Authors: Akanksha Mehndiratta, Krishna Asawa

Abstract: The probabilistic interpretation of Canonical Correlation Analysis (CCA) for learning low-dimensional real vectors, called as latent variables, has been exploited immensely in various fields. This study takes a step further by demonstrating the potential of CCA in discovering a latent state that captures the contextual information within the textual data under a two-view setting. The interpretatio… ▽ More The probabilistic interpretation of Canonical Correlation Analysis (CCA) for learning low-dimensional real vectors, called as latent variables, has been exploited immensely in various fields. This study takes a step further by demonstrating the potential of CCA in discovering a latent state that captures the contextual information within the textual data under a two-view setting. The interpretation of CCA discussed in this study utilizes the multi-view nature of textual data, i.e. the consecutive sentences in a document or turns in a dyadic conversation, and has a strong theoretical foundation. Furthermore, this study proposes a model using CCA to perform the Automatic Short Answer Grading (ASAG) task. The empirical analysis confirms that the proposed model delivers competitive results and can even beat various sophisticated supervised techniques. The model is simple, linear, and adaptable and should be used as the baseline especially when labeled training data is scarce or nonexistent. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12818 [pdf, other]

Optimal Bailouts in Diversified Financial Networks

Authors: Krishna Dasaratha, Santosh Venkatesh, Rakesh Vohra

Abstract: Widespread default involves substantial deadweight costs which could be countered by injecting capital into failing firms. Injections have positive spillovers that can trigger a repayment cascade. But which firms should a regulator bailout so as to minimize the total injection of capital while ensuring solvency of all firms? While the problem is, in general, NP-hard, for a wide range of networks t… ▽ More Widespread default involves substantial deadweight costs which could be countered by injecting capital into failing firms. Injections have positive spillovers that can trigger a repayment cascade. But which firms should a regulator bailout so as to minimize the total injection of capital while ensuring solvency of all firms? While the problem is, in general, NP-hard, for a wide range of networks that arise from a stochastic block model, we show that the optimal bailout can be implemented by a simple policy that targets firms based on their characteristics and position in the network. Specific examples of the setting include core-periphery networks. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12683 [pdf, other]

Spatial Sequence Attention Network for Schizophrenia Classification from Structural Brain MR Images

Authors: Nagur Shareef Shaik, Teja Krishna Cherukuri, Vince Calhoun, Dong Hye Ye

Abstract: Schizophrenia is a debilitating, chronic mental disorder that significantly impacts an individual's cognitive abilities, behavior, and social interactions. It is characterized by subtle morphological changes in the brain, particularly in the gray matter. These changes are often imperceptible through manual observation, demanding an automated approach to diagnosis. This study introduces a deep lear… ▽ More Schizophrenia is a debilitating, chronic mental disorder that significantly impacts an individual's cognitive abilities, behavior, and social interactions. It is characterized by subtle morphological changes in the brain, particularly in the gray matter. These changes are often imperceptible through manual observation, demanding an automated approach to diagnosis. This study introduces a deep learning methodology for the classification of individuals with Schizophrenia. We achieve this by implementing a diversified attention mechanism known as Spatial Sequence Attention (SSA) which is designed to extract and emphasize significant feature representations from structural MRI (sMRI). Initially, we employ the transfer learning paradigm by leveraging pre-trained DenseNet to extract initial feature maps from the final convolutional block which contains morphological alterations associated with Schizophrenia. These features are further processed by the proposed SSA to capture and emphasize intricate spatial interactions and relationships across volumes within the brain. Our experimental studies conducted on a clinical dataset have revealed that the proposed attention mechanism outperforms the existing Squeeze & Excitation Network for Schizophrenia classification. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: This paper has been accepted for the 21st IEEE International Symposium on Biomedical Imaging (ISBI 2024)

arXiv:2406.12336 [pdf, other]

A Compass for Navigating the World of Sentence Embeddings for the Telecom Domain

Authors: Sujoy Roychowdhury, Sumit Soman, H. G. Ranjani, Vansh Chhabra, Neeraj Gunda, Subhadip Bandyopadhyay, Sai Krishna Bala

Abstract: A plethora of sentence embedding models makes it challenging to choose one, especially for domains such as telecom, rich with specialized vocabulary. We evaluate multiple embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies as well as their (95\%) confidence intervals. We establish a systematic method to obtain thresholds for simi… ▽ More A plethora of sentence embedding models makes it challenging to choose one, especially for domains such as telecom, rich with specialized vocabulary. We evaluate multiple embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies as well as their (95\%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. We observe that fine-tuning improves mean bootstrapped accuracies as well as tightens confidence intervals. The pre-training combined with fine-tuning makes confidence intervals even tighter. To understand these variations, we analyse and report significant correlations between the distributional overlap between top-$K$, correct and random sentence similarities with retrieval accuracies and similarity thresholds. Following current literature, we analyze if retrieval accuracy variations can be attributed to isotropy of embeddings. Our conclusions are that isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) cannot be attributed to better retrieval performance. However, domain adaptation which improves retrieval accuracies also improves isotropy. We establish that domain adaptation moves domain specific embeddings further away from general domain embeddings. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 3 figures, 4 tables

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2406.11930 [pdf, other]

A Critical Study of What Code-LLMs (Do Not) Learn

Authors: Abhinav Anand, Shweta Verma, Krishna Narasimhan, Mira Mezini

Abstract: Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as suggesting codes with syntactic errors, variable misuse etc. Some studies argue that code-LLMs perform well on coding tasks because they use self-attention and hidd… ▽ More Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as suggesting codes with syntactic errors, variable misuse etc. Some studies argue that code-LLMs perform well on coding tasks because they use self-attention and hidden representations to encode relations among input tokens. However, previous works have not studied what code properties are not encoded by code-LLMs. In this paper, we conduct a fine-grained analysis of attention maps and hidden representations of code-LLMs. Our study indicates that code-LLMs only encode relations among specific subsets of input tokens. Specifically, by categorizing input tokens into syntactic tokens and identifiers, we found that models encode relations among syntactic tokens and among identifiers, but they fail to encode relations between syntactic tokens and identifiers. We also found that fine-tuned models encode these relations poorly compared to their pre-trained counterparts. Additionally, larger models with billions of parameters encode significantly less information about code than models with only a few hundred million parameters. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11877 [pdf]

Solar Power Prediction Using Satellite Data in Different Parts of Nepal

Authors: Raj Krishna Nepal, Bibek Khanal, Vibek Ghimire, Kismat Neupane, Atul Pokharel, Kshitij Niraula, Baburam Tiwari, Nawaraj Bhattarai, Khem N. Poudyal, Nawaraj Karki, Mohan B Dangi, John Biden

Abstract: Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors,… ▽ More Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors, and deep learning models like LSTM and ANN-MLP are employed and evaluated for their performance. The results indicate high accuracy in predicting solar irradiance, with R-squared(R2) scores close to unity for both train and test datasets. The impact of parameter integration on model performance is analyzed, revealing the significance of various parameters in enhancing predictive accuracy. Each model demonstrates strong performance across all parameters, consistently achieving MAE values below 6, RMSE values under 10, MBE within |2|, and nearly unity R2 values. Upon removal of various solar parameters such as "Solar_Irradiance_Clear_Sky", "UVA", etc. from the datasets, the model's performance is significantly affected. This exclusion leads to considerable increases in MAE, reaching up to 82, RMSE up to 135, and MBE up to |7|. Among the models, KNN displays the weakest performance, with an R2 of 0.7582546. Conversely, ANN exhibits the strongest performance, boasting an R2 value of 0.9245877. Hence, the study concludes that Artificial Neural Network (ANN) performs exceptionally well, showcasing its versatility even under sparse data parameter conditions. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 20 pages, 12 figures, 5 tables

arXiv:2406.11775 [pdf, other]

Task Me Anything

Authors: Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their spec… ▽ More Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case. This paper introduces Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user's needs. Task-Me-Anything maintains an extendable taxonomy of visual assets and can programmatically generate a vast number of task instances. Additionally, it algorithmically addresses user queries regarding MLM performance efficiently within a computational budget. It contains 113K images, 10K videos, 2K 3D object assets, over 365 object categories, 655 attributes, and 335 relationships. It can generate 750M image/video question-answering pairs, which focus on evaluating MLM perceptual capabilities. Task-Me-Anything reveals critical insights: open-source MLMs excel in object and attribute recognition but lack spatial and temporal understanding; each model exhibits unique strengths and weaknesses; larger models generally perform better, though exceptions exist; and GPT4o demonstrates challenges in recognizing rotating/moving objects and distinguishing colors. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: website: https://www.task-me-anything.org

arXiv:2406.11488 [pdf, other]

Reversible Transducers over Infinite Words

Authors: Luc Dartois, Paul Gastin, Loïc Germerie Guizouarn, R. Govind, Shankaranarayanan Krishna

Abstract: Deterministic two-way transducers capture the class of regular functions. The efficiency of composing two-way transducers has a direct implication in algorithmic problems related to reactive synthesis, where transformation specifications are converted into equivalent transducers. These specifications are presented in a modular way, and composing the resultant machines simulates the full specificat… ▽ More Deterministic two-way transducers capture the class of regular functions. The efficiency of composing two-way transducers has a direct implication in algorithmic problems related to reactive synthesis, where transformation specifications are converted into equivalent transducers. These specifications are presented in a modular way, and composing the resultant machines simulates the full specification. An important result by Dartois et al. shows that composition of two-way transducers enjoy a polynomial composition when the underlying transducer is reversible, that is, if they are both deterministic and co-deterministic. This is a major improvement over general deterministic two-way transducers, for which composition causes a doubly exponential blow-up in the size of the inputs in general. Moreover, they show that reversible two-way transducers have the same expressiveness as deterministic two-way transducers. However, the question of expressiveness of reversible transducers over infinite words is still open. In this article, we introduce the class of reversible two-way transducers over infinite words and show that they enjoy the same expressive power as deterministic two-way transducers over infinite words. This is done through a non-trivial, effective construction inducing a single exponential blow-up in the set of states. Further, we also prove that composing two reversible two-way transducers over infinite words incurs only a polynomial complexity, thereby providing foundations for efficient procedure for composition of transducers over infinite words. △ Less

Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Showing 1–50 of 3,387 results for author: Krishna