Search | arXiv e-print repository

Wiretapped Commitment over Binary Channels

Authors: Anuj Kumar Yadav, Manideep Mamindlapally, Amitalok J. Budkuley

Abstract: We propose the problem of wiretapped commitment, where two parties, say committer Alice and receiver Bob, engage in a commitment protocol using a noisy channel as a resource, in the presence of an eavesdropper, say Eve. Noisy versions of Alice's transmission over the wiretap channel are received at both Bob and Eve. We seek to determine the maximum commitment throughput in the presence of an eaves… ▽ More We propose the problem of wiretapped commitment, where two parties, say committer Alice and receiver Bob, engage in a commitment protocol using a noisy channel as a resource, in the presence of an eavesdropper, say Eve. Noisy versions of Alice's transmission over the wiretap channel are received at both Bob and Eve. We seek to determine the maximum commitment throughput in the presence of an eavesdropper, i.e., wiretapped commitment capacity, where in addition to the standard security requirements for two-party commitment, one seeks to ensure that Eve doesn't learn about the commit string. A key interest in this work is to explore the effect of collusion (or lack of it) between the eavesdropper Eve and either Alice or Bob. Toward the same, we present results on the wiretapped commitment capacity under the so-called 1-private regime (when Alice or Bob cannot collude with Eve) and the 2-private regime (when Alice or Bob may possibly collude with Eve). △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 13 Pages, 1 figure

arXiv:2406.05599 [pdf, other]

Reliable Quantum Memories with Unreliable Components

Authors: Anuj K. Nayak, Eric Chitambar, Lav R. Varshney

Abstract: Quantum memory systems are vital in quantum information processing for dependable storage and retrieval of quantum states. Inspired by classical reliability theories that synthesize reliable computing systems from unreliable components, we formalize the problem of reliable storage of quantum information using noisy components. We introduce the notion of stable quantum memories and define the stora… ▽ More Quantum memory systems are vital in quantum information processing for dependable storage and retrieval of quantum states. Inspired by classical reliability theories that synthesize reliable computing systems from unreliable components, we formalize the problem of reliable storage of quantum information using noisy components. We introduce the notion of stable quantum memories and define the storage rate as the ratio of the number of logical qubits to the total number of physical qubits, as well as the circuit complexity of the decoder, which includes both quantum gates and measurements. We demonstrate that a strictly positive storage rate can be achieved by constructing a quantum memory system with quantum expander codes. Moreover, by reducing the reliable storage problem to reliable quantum communication, we provide upper bounds on the achievable storage capacity. In the case of physical qubits corrupted by noise satisfying hypercontractivity conditions, we provide a tighter upper bound on storage capacity using an entropy dissipation argument. Furthermore, observing that the time complexity of the decoder scales non-trivially with the number of physical qubits, achieving asymptotic rates may not be possible due to the induced dependence of the noise on the number of physical qubits. In this constrained non-asymptotic setting, we derive upper bounds on storage capacity using finite blocklength communication bounds. Finally, we numerically analyze the gap between upper and lower bounds in both asymptotic and non-asymptotic cases, and provide suggestions to tighten the gap. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 15 pages, 3 figures

arXiv:2406.04744 [pdf, other]

CRAG -- Comprehensive RAG Benchmark

Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03747 [pdf, other]

Instance Segmentation and Teeth Classification in Panoramic X-rays

Authors: Devichand Budagam, Ayush Kumar, Sayan Ghosh, Anuj Shrivastav, Azamat Zhanatuly Imanbayev, Iskander Rafailovich Akhmetov, Dmitrii Kaplun, Sergey Antonov, Artem Rychenkov, Gleb Cyganov, Aleksandr Sinitca

Abstract: Teeth segmentation and recognition are critical in various dental applications and dental diagnosis. Automatic and accurate segmentation approaches have been made possible by integrating deep learning models. Although teeth segmentation has been studied in the past, only some techniques were able to effectively classify and segment teeth simultaneously. This article offers a pipeline of two deep l… ▽ More Teeth segmentation and recognition are critical in various dental applications and dental diagnosis. Automatic and accurate segmentation approaches have been made possible by integrating deep learning models. Although teeth segmentation has been studied in the past, only some techniques were able to effectively classify and segment teeth simultaneously. This article offers a pipeline of two deep learning models, U-Net and YOLOv8, which results in BB-UNet, a new architecture for the classification and segmentation of teeth on panoramic X-rays that is efficient and reliable. We have improved the quality and reliability of teeth segmentation by utilising the YOLOv8 and U-Net capabilities. The proposed networks have been evaluated using the mean average precision (mAP) and dice coefficient for YOLOv8 and BB-UNet, respectively. We have achieved a 3\% increase in mAP score for teeth classification compared to existing methods, and a 10-15\% increase in dice coefficient for teeth segmentation compared to U-Net across different categories of teeth. A new Dental dataset was created based on UFBA-UESC dataset with Bounding-Box and Polygon annotations of 425 dental panoramic X-rays. The findings of this research pave the way for a wider adoption of object detection models in the field of dental diagnosis. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: submtted to Expert Systems with Applications Journal

arXiv:2405.19147 [pdf, ps, other]

Homomorphism Counts to Trees

Authors: Anuj Dawar

Abstract: We construct a pair of non-isomorphic, bipartite graphs which are not distinguished by counting the number of homomorphisms to any tree. This answers a question raised by Atserias et al. (LICS 2021). In order to establish the construction, we analyse the equivalence relations induced by counting homomorphisms to trees of diameter two and three and obtain necessary and sufficient conditions for two… ▽ More We construct a pair of non-isomorphic, bipartite graphs which are not distinguished by counting the number of homomorphisms to any tree. This answers a question raised by Atserias et al. (LICS 2021). In order to establish the construction, we analyse the equivalence relations induced by counting homomorphisms to trees of diameter two and three and obtain necessary and sufficient conditions for two graphs to be equivalent. We show that three is the optimal diameter for our construction. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 13 pages

MSC Class: 05C60 (primary) 68R10 (secondary) ACM Class: G.2.1; G.2.2

arXiv:2405.14876 [pdf, other]

Precise and Robust Sidewalk Detection: Leveraging Ensemble Learning to Surpass LLM Limitations in Urban Environments

Authors: Ibne Farabi Shihab, Benjir Islam Alvee, Sudesh Ramesh Bhagat, Anuj Sharma

Abstract: This study aims to compare the effectiveness of a robust ensemble model with the state-of-the-art ONE-PEACE Large Language Model (LLM) for accurate detection of sidewalks. Accurate sidewalk detection is crucial in improving road safety and urban planning. The study evaluated the model's performance on Cityscapes, Ade20k, and the Boston Dataset. The results showed that the ensemble model performed… ▽ More This study aims to compare the effectiveness of a robust ensemble model with the state-of-the-art ONE-PEACE Large Language Model (LLM) for accurate detection of sidewalks. Accurate sidewalk detection is crucial in improving road safety and urban planning. The study evaluated the model's performance on Cityscapes, Ade20k, and the Boston Dataset. The results showed that the ensemble model performed better than the individual models, achieving mean Intersection Over Union (mIOU) scores of 93.1\%, 90.3\%, and 90.6\% on these datasets under ideal conditions. Additionally, the ensemble model maintained a consistent level of performance even in challenging conditions such as Salt-and-Pepper and Speckle noise, with only a gradual decrease in efficiency observed. On the other hand, the ONE-PEACE LLM performed slightly better than the ensemble model in ideal scenarios but experienced a significant decline in performance under noisy conditions. These findings demonstrate the robustness and reliability of the ensemble model, making it a valuable asset for improving urban infrastructure related to road safety and curb space management. This study contributes positively to the broader context of urban health and mobility. △ Less

Submitted 1 April, 2024; originally announced May 2024.

arXiv:2405.14830 [pdf, other]

Deep learning lattice gauge theories

Authors: Anuj Apte, Anthony Ashmore, Clay Cordova, Tzu-Chen Huang

Abstract: Monte Carlo methods have led to profound insights into the strong-coupling behaviour of lattice gauge theories and produced remarkable results such as first-principles computations of hadron masses. Despite tremendous progress over the last four decades, fundamental challenges such as the sign problem and the inability to simulate real-time dynamics remain. Neural network quantum states have emerg… ▽ More Monte Carlo methods have led to profound insights into the strong-coupling behaviour of lattice gauge theories and produced remarkable results such as first-principles computations of hadron masses. Despite tremendous progress over the last four decades, fundamental challenges such as the sign problem and the inability to simulate real-time dynamics remain. Neural network quantum states have emerged as an alternative method that seeks to overcome these challenges. In this work, we use gauge-invariant neural network quantum states to accurately compute the ground state of $\mathbb{Z}_N$ lattice gauge theories in $2+1$ dimensions. Using transfer learning, we study the distinct topological phases and the confinement phase transition of these theories. For $\mathbb{Z}_2$, we identify a continuous transition and compute critical exponents, finding excellent agreement with existing numerics for the expected Ising universality class. In the $\mathbb{Z}_3$ case, we observe a weakly first-order transition and identify the critical coupling. Our findings suggest that neural network quantum states are a promising method for precise studies of lattice gauge theory. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.10887 [pdf, ps, other]

Preservation theorems on sparse classes revisited

Authors: Anuj Dawar, Ioannis Eleftheriadis

Abstract: We revisit the work studying homomorphism preservation for first-order logic in sparse classes of structures initiated in [Atserias et al., JACM 2006] and [Dawar, JCSS 2010]. These established that first-order logic has the homomorphism preservation property in any sparse class that is monotone and addable. It turns out that the assumption of addability is not strong enough for the proofs given. W… ▽ More We revisit the work studying homomorphism preservation for first-order logic in sparse classes of structures initiated in [Atserias et al., JACM 2006] and [Dawar, JCSS 2010]. These established that first-order logic has the homomorphism preservation property in any sparse class that is monotone and addable. It turns out that the assumption of addability is not strong enough for the proofs given. We demonstrate this by constructing classes of graphs of bounded treewidth which are monotone and addable but fail to have homomorphism preservation. We also show that homomorphism preservation fails on the class of planar graphs. On the other hand, the proofs of homomorphism preservation can be recovered by replacing addability by a stronger condition of amalgamation over bottlenecks. This is analogous to a similar condition formulated for extension preservation in [Atserias et al., SiCOMP 2008]. △ Less

Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 16 pages

MSC Class: 03B70; 03C13; 68Q19; 05C10

arXiv:2405.09247 [pdf, other]

Graph Neural Network based Handwritten Trajectories Recognition

Authors: Anuj Sharma, Sukhdeep Singh, S Ratna

Abstract: The graph neural networks has been proved to be an efficient machine learning technique in real life applications. The handwritten recognition is one of the useful area in real life use where both offline and online handwriting recognition are required. The chain code as feature extraction technique has shown significant results in literature and we have been able to use chain codes with graph neu… ▽ More The graph neural networks has been proved to be an efficient machine learning technique in real life applications. The handwritten recognition is one of the useful area in real life use where both offline and online handwriting recognition are required. The chain code as feature extraction technique has shown significant results in literature and we have been able to use chain codes with graph neural networks. To the best of our knowledge, this work presents first time a novel combination of handwritten trajectories features as chain codes and graph neural networks together. The handwritten trajectories for offline handwritten text has been evaluated using recovery of drawing order, whereas online handwritten trajectories are directly used with chain codes. Our results prove that present combination surpass previous results and minimize error rate in few epochs only. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.14062 [pdf, other]

GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

Authors: Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, Anuj Sharma

Abstract: The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neur… ▽ More The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13767 [pdf, other]

Autonomous Robot for Disaster Map** and Victim Localization

Authors: Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, **gming Cheng

Abstract: In response to the critical need for effective reconnaissance in disaster scenarios, this research article presents the design and implementation of a complete autonomous robot system using the Turtlebot3 with Robotic Operating System (ROS) Noetic. Upon deployment in closed, initially unknown environments, the system aims to generate a comprehensive map and identify any present 'victims' using Apr… ▽ More In response to the critical need for effective reconnaissance in disaster scenarios, this research article presents the design and implementation of a complete autonomous robot system using the Turtlebot3 with Robotic Operating System (ROS) Noetic. Upon deployment in closed, initially unknown environments, the system aims to generate a comprehensive map and identify any present 'victims' using AprilTags as stand-ins. We discuss our solution for search and rescue missions, while additionally exploring more advanced algorithms to improve search and rescue functionalities. We introduce a Cubature Kalman Filter to help reduce the mean squared error [m] for AprilTag localization and an information-theoretic exploration algorithm to expedite exploration in unknown environments. Just like turtles, our system takes it slow and steady, but when it's time to save the day, it moves at ninja-like speed! Despite Donatello's shell, he's no slowpoke - he zips through obstacles with the agility of a teenage mutant ninja turtle. So, hang on tight to your shells and get ready for a whirlwind of reconnaissance! Full pipeline code https://github.com/rzhao5659/MRProject/tree/main Exploration code https://github.com/rzhao5659/MRProject/tree/main △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: Class final project for Northeastern University EECE 5550 Mobile Robotics Course

arXiv:2404.12258 [pdf, ps, other]

DeepLocalization: Using change point detection for Temporal Action Localization

Authors: Mohammed Shaiqur Rahman, Ibne Farabi Shihab, Lynna Chu, Anuj Sharma

Abstract: In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveragi… ▽ More In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveraging Graph-Based Change-Point Detection for pinpointing actions in time alongside a Video Large Language Model (Video-LLM) for precisely categorizing activities. Through careful prompt engineering, we customize the Video-LLM to adeptly handle driving activities' nuances, ensuring its classification efficacy even with sparse data. Engineered to be lightweight, our framework is optimized for consumer-grade GPUs, making it vastly applicable in practical scenarios. We subjected our method to rigorous testing on the SynDD2 dataset, a complex benchmark for distracted driving behaviors, where it demonstrated commendable performance-achieving 57.5% accuracy in event classification and 51% in event detection. These outcomes underscore the substantial promise of DeepLocalization in accurately identifying diverse driver behaviors and their temporal occurrences, all within the bounds of limited computational resources. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.09432 [pdf, other]

The 8th AI City Challenge

Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, **-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

arXiv:2404.08744 [pdf, other]

Routing and Spectrum Allocation in Broadband Quantum Entanglement Distribution

Authors: Rohan Bali, Ashley N. Tittelbaugh, Shelbi L. Jenkins, Anuj Agrawal, Jerry Horgan, Marco Ruffini, Daniel C. Kilper, Boulat A. Bash

Abstract: We investigate resource allocation for quantum entanglement distribution over an optical network. We characterize and model a network architecture that employs a single quasi-deterministic time-frequency heralded Einstein-Podolsky-Rosen (EPR) pair source, and develop a routing scheme for distributing entangled photon pairs over such a network. We focus on max-min fairness in entanglement distribut… ▽ More We investigate resource allocation for quantum entanglement distribution over an optical network. We characterize and model a network architecture that employs a single quasi-deterministic time-frequency heralded Einstein-Podolsky-Rosen (EPR) pair source, and develop a routing scheme for distributing entangled photon pairs over such a network. We focus on max-min fairness in entanglement distribution and compare the performance of various spectrum allocation schemes by examining the max-min and median number of EPR-pairs assigned by them, and the Jain index associated with this assignment. Since this presents an NP-hard problem, we identify two approximation algorithms that outperform others in minimum and mean EPR-pair rate distribution and are comparable to others in the Jain index. We also analyze how the network size and connectivity affect these metrics using Watts-Strogatz random graphs. We find that a spectrum allocation approach that achieves high minimum EPR-pair rate can perform significantly worse when the median EPR-pair rate, Jain index, and runtimes are considered. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2311.14613

arXiv:2404.08011 [pdf, other]

An inclusive review on deep learning techniques and their scope in handwriting recognition

Authors: Sukhdeep Singh, Sudhir Rohilla, Anuj Sharma

Abstract: Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For t… ▽ More Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.16419 [pdf, other]

Terrain-Attentive Learning for Efficient 6-DoF Kinodynamic Modeling on Vertically Challenging Terrain

Authors: Aniket Datar, Chenhui Pan, Mohammad Nazeri, Anuj Pokhrel, Xuesu Xiao

Abstract: Wheeled robots have recently demonstrated superior mechanical capability to traverse vertically challenging terrain (e.g., extremely rugged boulders comparable in size to the vehicles themselves). Negotiating such terrain introduces significant variations of vehicle pose in all six Degrees-of-Freedom (DoFs), leading to imbalanced contact forces, varying momentum, and chassis deformation due to non… ▽ More Wheeled robots have recently demonstrated superior mechanical capability to traverse vertically challenging terrain (e.g., extremely rugged boulders comparable in size to the vehicles themselves). Negotiating such terrain introduces significant variations of vehicle pose in all six Degrees-of-Freedom (DoFs), leading to imbalanced contact forces, varying momentum, and chassis deformation due to non-rigid tires and suspensions. To autonomously navigate on vertically challenging terrain, all these factors need to be efficiently reasoned within limited onboard computation and strict real-time constraints. In this paper, we propose a 6-DoF kinodynamics learning approach that is attentive only to the specific underlying terrain critical to the current vehicle-terrain interaction, so that it can be efficiently queried in real-time motion planners onboard small robots. Physical experiment results show our Terrain-Attentive Learning demonstrates on average 51.1% reduction in model prediction error among all 6 DoFs compared to a state-of-the-art model for vertically challenging terrain. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15989 [pdf, other]

Knowledge-guided Machine Learning: Current Trends and Future Prospects

Authors: Anuj Karpatne, Xiaowei Jia, Vipin Kumar

Abstract: This paper presents an overview of scientific modeling and discusses the complementary strengths and weaknesses of ML methods for scientific modeling in comparison to process-based models. It also provides an introduction to the current state of research in the emerging field of scientific knowledge-guided machine learning (KGML) that aims to use both scientific knowledge and data in ML frameworks… ▽ More This paper presents an overview of scientific modeling and discusses the complementary strengths and weaknesses of ML methods for scientific modeling in comparison to process-based models. It also provides an introduction to the current state of research in the emerging field of scientific knowledge-guided machine learning (KGML) that aims to use both scientific knowledge and data in ML frameworks to achieve better generalizability, scientific consistency, and explainability of results. We discuss different facets of KGML research in terms of the type of scientific knowledge used, the form of knowledge-ML integration explored, and the method for incorporating scientific knowledge in ML. We also discuss some of the common categories of use cases in environmental sciences where KGML methods are being developed, using illustrative examples in each category. △ Less

Submitted 1 May, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.15637 [pdf, other]

CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Mohamed Elnoor, Anuj Zore, Brian Ichter, Fei Xia, Jie Tan, Wenhao Yu, Dinesh Manocha

Abstract: We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot's surroundings, and formulate context-based naviga… ▽ More We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot's surroundings, and formulate context-based navigation behaviors as simple text prompts (e.g. ``stay on the pavement"). Second, we utilize their state-of-the-art semantic understanding and logical reasoning capabilities to compute a suitable trajectory given the identified context. To this end, we propose a novel multi-modal visual marking approach to annotate the obstacle-free regions in the RGB image used as input to the VLM with numbers, by correlating it with a local occupancy map of the environment. The marked numbers ground image locations in the real-world, direct the VLM's attention solely to navigable locations, and elucidate the spatial relationships between them and terrains depicted in the image to the VLM. Next, we query the VLM to select numbers on the marked image that satisfy the context-based behavior text prompt, and construct a reference path using the selected numbers. Finally, we propose a method to extrapolate the reference trajectory when the robot's environmental context has not changed to prevent unnecessary VLM queries. We use the reference trajectory to guide a motion planner, and demonstrate that it leads to human-like behaviors (e.g. not cutting through a group of people, using crosswalks, etc.) in various real-world indoor and outdoor scenarios. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures

arXiv:2403.15077 [pdf, other]

GTAGCN: Generalized Topology Adaptive Graph Convolutional Networks

Authors: Sukhdeep Singh, Anuj Sharma, Vinod Kumar Chauhan

Abstract: Graph Neural Networks (GNN) have emerged as a popular and standard approach for learning from graph-structured data. The literature on GNN highlights the potential of this evolving research area and its widespread adoption in real-life applications. However, most of the approaches are either new in concept or derived from specific techniques. Therefore, the potential of more than one approach in h… ▽ More Graph Neural Networks (GNN) have emerged as a popular and standard approach for learning from graph-structured data. The literature on GNN highlights the potential of this evolving research area and its widespread adoption in real-life applications. However, most of the approaches are either new in concept or derived from specific techniques. Therefore, the potential of more than one approach in hybrid form has not been studied extensively, which can be well utilized for sequenced data or static data together. We derive a hybrid approach based on two established techniques as generalized aggregation networks and topology adaptive graph convolution networks that solve our purpose to apply on both types of sequenced and static nature of data, effectively. The proposed method applies to both node and graph classification. Our empirical analysis reveals that the results are at par with literature results and better for handwritten strokes as sequenced data, where graph structures have not been explored. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 2 figures, 3 tables and 26 pages

arXiv:2403.12223 [pdf, ps, other]

HRI in Indian Education: Challenges Opportunities

Authors: Chinmaya Mishra, Anuj Nandanwar, Sashikala Mishra

Abstract: With the recent advancements in the field of robotics and the increased focus on having general-purpose robots widely available to the general public, it has become increasingly necessary to pursue research into Human-robot interaction (HRI). While there have been a lot of works discussing frameworks for teaching HRI in educational institutions with a few institutions already offering courses to s… ▽ More With the recent advancements in the field of robotics and the increased focus on having general-purpose robots widely available to the general public, it has become increasingly necessary to pursue research into Human-robot interaction (HRI). While there have been a lot of works discussing frameworks for teaching HRI in educational institutions with a few institutions already offering courses to students, a consensus on the course content still eludes the field. In this work, we highlight a few challenges and opportunities while designing an HRI course from an Indian perspective. These topics warrant further deliberations as they have a direct impact on the design of HRI courses and wider implications for the entire field. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Presented at the Designing an Intro to HRI Course Workshop at HRI 2024 (arXiv:2403.05588)

Report number: HRI101/2024/9

arXiv:2403.08283 [pdf, other]

Optimized Detection and Classification on GTRSB: Advancing Traffic Sign Recognition with Convolutional Neural Networks

Authors: Dhruv Toshniwal, Saurabh Loya, Anuj Khot, Yash Marda

Abstract: In the rapidly evolving landscape of transportation, the proliferation of automobiles has made road traffic more complex, necessitating advanced vision-assisted technologies for enhanced safety and navigation. These technologies are imperative for providing critical traffic sign information, influencing driver behavior, and supporting vehicle control, especially for drivers with disabilities and i… ▽ More In the rapidly evolving landscape of transportation, the proliferation of automobiles has made road traffic more complex, necessitating advanced vision-assisted technologies for enhanced safety and navigation. These technologies are imperative for providing critical traffic sign information, influencing driver behavior, and supporting vehicle control, especially for drivers with disabilities and in the burgeoning field of autonomous vehicles. Traffic sign detection and recognition have emerged as key areas of research due to their essential roles in ensuring road safety and compliance with traffic regulations. Traditional computer vision methods have faced challenges in achieving optimal accuracy and speed due to real-world variabilities. However, the advent of deep learning and Convolutional Neural Networks (CNNs) has revolutionized this domain, offering solutions that significantly surpass previous capabilities in terms of speed and reliability. This paper presents an innovative approach leveraging CNNs that achieves an accuracy of nearly 96\%, highlighting the potential for even greater precision through advanced localization techniques. Our findings not only contribute to the ongoing advancement of traffic sign recognition technology but also underscore the critical impact of these developments on road safety and the future of autonomous driving. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 8 pages, 8 figures, 1 table

arXiv:2403.07867 [pdf, other]

The Virtues of Laziness: Multi-Query Kinodynamic Motion Planning with Lazy Methods

Authors: Anuj Pasricha, Alessandro Roncone

Abstract: In this work, we introduce LazyBoE, a multi-query method for kinodynamic motion planning with forward propagation. This algorithm allows for the simultaneous exploration of a robot's state and control spaces, thereby enabling a wider suite of dynamic tasks in real-world applications. Our contributions are three-fold: i) a method for discretizing the state and control spaces to amortize planning ti… ▽ More In this work, we introduce LazyBoE, a multi-query method for kinodynamic motion planning with forward propagation. This algorithm allows for the simultaneous exploration of a robot's state and control spaces, thereby enabling a wider suite of dynamic tasks in real-world applications. Our contributions are three-fold: i) a method for discretizing the state and control spaces to amortize planning times across multiple queries; ii) lazy approaches to collision checking and propagation of control sequences that decrease the cost of physics-based simulation; and iii) LazyBoE, a robust kinodynamic planner that leverages these two contributions to produce dynamically-feasible trajectories. The proposed framework not only reduces planning time but also increases success rate in comparison to previous approaches. △ Less

Submitted 4 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted to ICRA 2022 (International Conference on Robotics and Automation)

arXiv:2403.07003 [pdf, other]

Evacuation Management Framework towards Smart City-wide Intelligent Emergency Interactive Response System

Authors: Anuj Abraham, Yi Zhang, Shitala Prasad

Abstract: A smart city solution toward future 6G network deployment allows small and medium sized enterprises (SMEs), industry, and government entities to connect with the infrastructures and play a crucial role in enhancing emergency preparedness with advanced sensors. The objective of this work is to propose a set of coordinated technological solutions to transform an existing emergency response system in… ▽ More A smart city solution toward future 6G network deployment allows small and medium sized enterprises (SMEs), industry, and government entities to connect with the infrastructures and play a crucial role in enhancing emergency preparedness with advanced sensors. The objective of this work is to propose a set of coordinated technological solutions to transform an existing emergency response system into an intelligent interactive system, thereby improving the public services and the quality of life for residents at home, on road, in hospitals, transport hubs, etc. In this context, we consider a city wide view from three different application scenes that are closely related to peoples daily life, to optimize the actions taken at relevant departments. Therefore, using artificial intelligence (AI) and machine learning (ML) techniques to enable the next generation connected vehicle experiences, we specifically focus on accidents happening in indoor households, urban roads, and at large public facilities. This smart interactive response system will benefit from advanced sensor fusion and AI by formulating a real time dynamic model. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02439 [pdf, other]

Root Causing Prediction Anomalies Using Explainable AI

Authors: Ramanathan Vishnampet, Rajesh Shenoy, Jianhui Chen, Anuj Gupta

Abstract: This paper presents a novel application of explainable AI (XAI) for root-causing performance degradation in machine learning models that learn continuously from user engagement data. In such systems a single feature corruption can cause cascading feature, label and concept drifts. We have successfully applied this technique to improve the reliability of models used in personalized advertising. Per… ▽ More This paper presents a novel application of explainable AI (XAI) for root-causing performance degradation in machine learning models that learn continuously from user engagement data. In such systems a single feature corruption can cause cascading feature, label and concept drifts. We have successfully applied this technique to improve the reliability of models used in personalized advertising. Performance degradation in such systems manifest as prediction anomalies in the models. These models are typically trained continuously using features that are produced by hundreds of real time data processing pipelines or derived from other upstream models. A failure in any of these pipelines or an instability in any of the upstream models can cause feature corruption, causing the model's predicted output to deviate from the actual output and the training data to become corrupted. The causal relationship between the features and the predicted output is complex, and root-causing is challenging due to the scale and dynamism of the system. We demonstrate how temporal shifts in the global feature importance distribution can effectively isolate the cause of a prediction anomaly, with better recall than model-to-feature correlation methods. The technique appears to be effective even when approximating the local feature importance using a simple perturbation-based method, and aggregating over a few thousand examples. We have found this technique to be a model-agnostic, cheap and effective way to monitor complex data pipelines in production and have deployed a system for continuously analyzing the global feature importance distribution of continuously trained models. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Submitted to The 2nd World Conference on eXplainable Artificial Intelligence, 17-19 July, 2024, Malta, Valletta

arXiv:2402.12937 [pdf, other]

GRAPHGINI: Fostering Individual and Group Fairness in Graph Neural Networks

Authors: Anuj Kumar Sirohi, Anjali Gupta, Sayan Ranu, Sandeep Kumar, Amitabha Bagchi

Abstract: We address the growing apprehension that GNNs, in the absence of fairness constraints, might produce biased decisions that disproportionately affect underprivileged groups or individuals. Departing from previous work, we introduce for the first time a method for incorporating the Gini coefficient as a measure of fairness to be used within the GNN framework. Our proposal, GRAPHGINI, works with the… ▽ More We address the growing apprehension that GNNs, in the absence of fairness constraints, might produce biased decisions that disproportionately affect underprivileged groups or individuals. Departing from previous work, we introduce for the first time a method for incorporating the Gini coefficient as a measure of fairness to be used within the GNN framework. Our proposal, GRAPHGINI, works with the two different goals of individual and group fairness in a single system, while maintaining high prediction accuracy. GRAPHGINI enforces individual fairness through learnable attention scores that help in aggregating more information through similar nodes. A heuristic-based maximum Nash social welfare constraint ensures the maximum possible group fairness. Both the individual fairness constraint and the group fairness constraint are stated in terms of a differentiable approximation of the Gini coefficient. This approximation is a contribution that is likely to be of interest even beyond the scope of the problem studied in this paper. Unlike other state-of-the-art, GRAPHGINI automatically balances all three optimization objectives (utility, individual, and group fairness) of the GNN and is free from any manual tuning of weight parameters. Extensive experimentation on real-world datasets showcases the efficacy of GRAPHGINI in making significant improvements in individual fairness compared to all currently available state-of-the-art methods while maintaining utility and group equality. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10600 [pdf, other]

Envisioning the Future Role of 3D Wireless Networks in Preventing and Managing Disasters and Emergency Situations

Authors: Ahmed Alhammadi, Anuj Abraham, Aymen Fakhreddine, Yu Tian, Jun Du, Faouzi Bader

Abstract: In an era marked by unprecedented climatic upheavals and evolving urban landscapes, the role of advanced communication networks in disaster prevention and management is becoming increasingly critical. This paper explores the transformative potential of 3D wireless networks, an innovative amalgamation of terrestrial, aerial, and satellite technologies, in enhancing disaster response mechanisms. We… ▽ More In an era marked by unprecedented climatic upheavals and evolving urban landscapes, the role of advanced communication networks in disaster prevention and management is becoming increasingly critical. This paper explores the transformative potential of 3D wireless networks, an innovative amalgamation of terrestrial, aerial, and satellite technologies, in enhancing disaster response mechanisms. We delve into a myriad of use cases, ranging from large facility evacuations to wildfire management, underscoring the versatility of these networks in ensuring timely communication, real-time situational awareness, and efficient resource allocation during crises. We also present an overview of cutting-edge prototypes, highlighting the practical feasibility and operational efficacy of 3D wireless networks in real-world scenarios. Simultaneously, we acknowledge the challenges posed by aspects such as cybersecurity, cross-border coordination, and physical layer technological hurdles, and propose future directions for research and development in this domain. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10465 [pdf, ps, other]

Subfield codes of $C_D$-codes over $\mathbb{F}_2[x]/\langle x^3-x \rangle$ are really nice!

Authors: Anuj Kumar Bhagat, Ritumoni Sarma, Vidya Sagar

Abstract: A non-zero $\mathbb{F}$-linear map from a finite-dimensional commutative $\mathbb{F}$-algebra to $\mathbb{F}$ is called an $\mathbb{F}$-valued trace if its kernel does not contain any non-zero ideals. In this article, we utilize an $\mathbb{F}_2$-valued trace of the $\mathbb{F}_2$-algebra $\mathcal{R}_2:=\mathbb{F}_2[x]/\langle x^3-x\rangle$ to study binary subfield code $\mathcal{C}_D^{(2)}$ of… ▽ More A non-zero $\mathbb{F}$-linear map from a finite-dimensional commutative $\mathbb{F}$-algebra to $\mathbb{F}$ is called an $\mathbb{F}$-valued trace if its kernel does not contain any non-zero ideals. In this article, we utilize an $\mathbb{F}_2$-valued trace of the $\mathbb{F}_2$-algebra $\mathcal{R}_2:=\mathbb{F}_2[x]/\langle x^3-x\rangle$ to study binary subfield code $\mathcal{C}_D^{(2)}$ of $\mathcal{C}_D:=\{\left(x\cdot d\right)_{d\in D}: x\in \mathcal{R}_2^m\}$ for each defining set $D$ derived from a certain simplicial complex. For $m\in \mathbb{N}$ and $X\subseteq \{1, 2, \dots, m\}$, define $Δ_X:=\{v\in \mathbb{F}_2^m: \Supp(v)\subseteq X\}$ and $D:=(1+u^2)D_1+u^2D_2+(u+u^2)D_3,$ a subset of $\mathcal{R}_2^m,$ where $u=x+\langle x^3-x\rangle, D_1\in \{Δ_L, Δ_L^c\},\, D_2\in \{Δ_M, Δ_M^c\}$ and $ D_3\in \{Δ_N, Δ_N^c\}$, for $L, M, N\subseteq \{1, 2, \dots, m\}.$ The parameters and the Hamming weight distribution of the binary subfield code $\mathcal{C}_D^{(2)}$ of $\mathcal{C}_D$ are determined for each $D.$ These binary subfield codes are minimal under certain mild conditions on the cardinalities of $L, M$ and $N$. Moreover, most of these codes are distance-optimal. Consequently, we obtain a few infinite families of minimal, self-orthogonal and distance-optimal binary linear codes that are either $2$-weight or $4$-weight. It is worth mentioning that we have obtained several new distance-optimal binary linear codes. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.08017 [pdf, other]

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Authors: Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Abhay Harpale, Vikas Bhardwaj, Di Xu, Shicong Zhao, Longfang Zhao, Ankit Ramchandani, Xin Luna Dong, Anuj Kumar

Abstract: We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to… ▽ More We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency. △ Less

Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted to KDD 2024 (ADS Track)

arXiv:2402.07065 [pdf, other]

CAHSOR: Competence-Aware High-Speed Off-Road Ground Navigation in SE(3)

Authors: Anuj Pokhrel, Aniket Datar, Mohammad Nazeri, Xuesu Xiao

Abstract: While the workspace of traditional ground vehicles is usually assumed to be in a 2D plane, i.e., SE(2), such an assumption may not hold when they drive at high speeds on unstructured off-road terrain: High-speed sharp turns on high-friction surfaces may lead to vehicle rollover; Turning aggressively on loose gravel or grass may violate the non-holonomic constraint and cause significant lateral sli… ▽ More While the workspace of traditional ground vehicles is usually assumed to be in a 2D plane, i.e., SE(2), such an assumption may not hold when they drive at high speeds on unstructured off-road terrain: High-speed sharp turns on high-friction surfaces may lead to vehicle rollover; Turning aggressively on loose gravel or grass may violate the non-holonomic constraint and cause significant lateral sliding; Driving quickly on rugged terrain will produce extensive vibration along the vertical axis. Therefore, most offroad vehicles are currently limited to drive only at low speeds to assure vehicle stability and safety. In this work, we aim at empowering high-speed off-road vehicles with competence awareness in SE(3) so that they can reason about the consequences of taking aggressive maneuvers on different terrain with a 6-DoF forward kinodynamic model. The model is learned from visual and inertial Terrain Representation for Off-road Navigation (TRON) using multimodal, self-supervised vehicle-terrain interactions. We demonstrate the efficacy of our Competence-Aware High-Speed Off-Road (CAHSOR) navigation approach on a physical ground robot in both an autonomous navigation and a human shared-control setup and show that CAHSOR can efficiently reduce vehicle instability by 62% while only compromising 8.6% average speed with the help of TRON. △ Less

Submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.06159 [pdf, other]

Passwords Are Meant to Be Secret: A Practical Secure Password Entry Channel for Web Browsers

Authors: Anuj Gautam, Tarun Kumar Yadav, Kent Seamons, Scott Ruoti

Abstract: Password-based authentication faces various security and usability issues. Password managers help alleviate some of these issues by enabling users to manage their passwords effectively. However, malicious client-side scripts and browser extensions can steal passwords after they have been autofilled by the manager into the web page. In this paper, we explore what role the password manager can take… ▽ More Password-based authentication faces various security and usability issues. Password managers help alleviate some of these issues by enabling users to manage their passwords effectively. However, malicious client-side scripts and browser extensions can steal passwords after they have been autofilled by the manager into the web page. In this paper, we explore what role the password manager can take in preventing the theft of autofilled credentials without requiring a change to user behavior. To this end, we identify a threat model for password exfiltration and then use this threat model to explore the design space for secure password entry implemented using a password manager. We identify five potential designs that address this issue, each with varying security and deployability tradeoffs. Our analysis shows the design that best balances security and usability is for the manager to autofill a fake password and then rely on the browser to replace the fake password with the actual password immediately before the web request is handed over to the operating system to be transmitted over the network. This removes the ability for malicious client-side scripts or browser extensions to access and exfiltrate the real password. We implement our design in the Firefox browser and conduct experiments, which show that it successfully thwarts malicious scripts and extensions on 97\% of the Alexa top 1000 websites, while also maintaining the capability to revert to default behavior on the remaining websites, avoiding functionality regressions. Most importantly, this design is transparent to users, requiring no change to user behavior. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03255 [pdf, other]

Security Advice for Parents and Children About Content Filtering and Circumvention as Found on YouTube and TikTok

Authors: Ran Elgedawy, John Sadik, Anuj Gautam, Trinity Bissahoyo, Christopher Childress, Jacob Leonard, Clay Shubert, Scott Ruoti

Abstract: In today's digital age, concerns about online security and privacy have become paramount. However, addressing these issues can be difficult, especially within the context of family relationships, wherein parents and children may have conflicting interests. In this environment, parents and children may turn to online security advice to determine how to proceed. In this paper, we examine the advice… ▽ More In today's digital age, concerns about online security and privacy have become paramount. However, addressing these issues can be difficult, especially within the context of family relationships, wherein parents and children may have conflicting interests. In this environment, parents and children may turn to online security advice to determine how to proceed. In this paper, we examine the advice available to parents and children regarding content filtering and circumvention as found on YouTube and TikTok. In an analysis of 839 videos returned from queries on these topics, we found that half (n=399) provide relevant advice. Our results show that of these videos, roughly three-quarters are accurate, with the remaining one-fourth containing factually incorrect advice. We find that videos targeting children are both more likely to be incorrect and actionable than videos targeting parents, leaving children at increased risk of taking harmful action. Moreover, we find that while advice videos targeting parents will occasionally discuss the ethics of content filtering and device monitoring (including recommendations to respect children's autonomy) no such discussion of the ethics or risks of circumventing content filtering is given to children, leaving them unaware of any risks that may be involved with doing so. Ultimately, our research indicates that video-based social media sites are already effective sources of security advice propagation and that the public would benefit from security researchers and practitioners engaging more with these platforms, both for the creation of content and of tools designed to help with more effective filtering. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 15 pages, 5 figures, 8 tables

arXiv:2402.00689 [pdf, other]

Ocassionally Secure: A Comparative Analysis of Code Generation Assistants

Authors: Ran Elgedawy, John Sadik, Senjuti Dutta, Anuj Gautam, Konstantinos Georgiou, Farzin Gholamrezae, Fujiao Ji, Kyungchan Lim, Qian Liu, Scott Ruoti

Abstract: $ $Large Language Models (LLMs) are being increasingly utilized in various applications, with code generations being a notable example. While previous research has shown that LLMs have the capability to generate both secure and insecure code, the literature does not take into account what factors help generate secure and effective code. Therefore in this paper we focus on identifying and understan… ▽ More $ $Large Language Models (LLMs) are being increasingly utilized in various applications, with code generations being a notable example. While previous research has shown that LLMs have the capability to generate both secure and insecure code, the literature does not take into account what factors help generate secure and effective code. Therefore in this paper we focus on identifying and understanding the conditions and contexts in which LLMs can be effectively and safely deployed in real-world scenarios to generate quality code. We conducted a comparative analysis of four advanced LLMs--GPT-3.5 and GPT-4 using ChatGPT and Bard and Gemini from Google--using 9 separate tasks to assess each model's code generation capabilities. We contextualized our study to represent the typical use cases of a real-life developer employing LLMs for everyday tasks as work. Additionally, we place an emphasis on security awareness which is represented through the use of two distinct versions of our developer persona. In total, we collected 61 code outputs and analyzed them across several aspects: functionality, security, performance, complexity, and reliability. These insights are crucial for understanding the models' capabilities and limitations, guiding future development and practical applications in the field of automated code generation. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 12 pages, 2 figures

arXiv:2401.08711 [pdf]

doi 10.55982/openpraxis.16.1.631

Assistant, Parrot, or Colonizing Loudspeaker? ChatGPT Metaphors for Develo** Critical AI Literacies

Authors: Anuj Gupta, Yasser Atef, Anna Mills, Maha Bali

Abstract: This study explores how discussing metaphors for AI can help build awareness of the frames that shape our understanding of AI systems, particularly large language models (LLMs) like ChatGPT. Given the pressing need to teach "critical AI literacy", discussion of metaphor provides an opportunity for inquiry and dialogue with space for nuance, playfulness, and critique. Using a collaborative autoethn… ▽ More This study explores how discussing metaphors for AI can help build awareness of the frames that shape our understanding of AI systems, particularly large language models (LLMs) like ChatGPT. Given the pressing need to teach "critical AI literacy", discussion of metaphor provides an opportunity for inquiry and dialogue with space for nuance, playfulness, and critique. Using a collaborative autoethnographic methodology, we analyzed metaphors from a range of sources, and reflected on them individually according to seven questions, then met and discussed our interpretations. We then analyzed how our reflections contributed to the three kinds of literacies delineated in Selber's multiliteracies framework: functional, critical, and rhetorical. These allowed us to analyze questions of ethics, equity, and accessibility in relation to AI. We explored each metaphor along the dimension of whether or not it was promoting anthropomorphizing, and to what extent such metaphors imply that AI is sentient. Our findings highlight the role of metaphor reflection in fostering a nuanced understanding of AI, suggesting that our collaborative autoethnographic approach as well as the heuristic model of plotting AI metaphors on dimensions of anthropomorphism and multiliteracies, might be useful for educators and researchers in the pursuit of advancing critical AI literacy. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: This is a preprint (accepted version) of an article that has been accepted for publication at the journal Open Praxis: https://openpraxis.org/

ACM Class: I.2.0; K.3.0; K.3.1; K.4.0; K.4.2; J.4; J.5

arXiv:2312.14919 [pdf, other]

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

Authors: James Gunn, Zygmunt Lenyk, Anuj Sharma, Andrea Donati, Alexandru Buburuzan, John Redford, Romain Mueller

Abstract: Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not levera… ▽ More Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance. Strikingly, we also find that removing depth estimation altogether does not degrade object detection performance substantially, suggesting that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation. △ Less

Submitted 21 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Updated method figure; camera ready

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10437 [pdf]

Tender Notice Extraction from E-papers Using Neural Network

Authors: Ashmin Bhattarai, Anuj Sedhai, Devraj Neupane, Manish Khadka, Rama Bastola

Abstract: Tender notices are usually sought by most of the companies at regular intervals as a means for obtaining the contracts of various projects. These notices consist of all the required information like description of the work, period of construction, estimated amount of project, etc. In the context of Nepal, tender notices are usually published in national as well as local newspapers. The interested… ▽ More Tender notices are usually sought by most of the companies at regular intervals as a means for obtaining the contracts of various projects. These notices consist of all the required information like description of the work, period of construction, estimated amount of project, etc. In the context of Nepal, tender notices are usually published in national as well as local newspapers. The interested bidders should search all the related tender notices in newspapers. However, it is very tedious for these companies to manually search tender notices in every newspaper and figure out which bid is best suited for them. This project is built with the purpose of solving this tedious task of manually searching the tender notices. Initially, the newspapers are downloaded in PDF format using the selenium library of python. After downloading the newspapers, the e-papers are scanned and tender notices are automatically extracted using a neural network. For extraction purposes, different architectures of CNN namely ResNet, GoogleNet and Xception are used and a model with highest performance has been implemented. Finally, these extracted notices are then published on the website and are accessible to the users. This project is helpful for construction companies as well as contractors assuring quality and efficiency. This project has great application in the field of competitive bidding as well as managing them in a systematic manner. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.10080 [pdf, ps, other]

No prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation

Authors: Nimesh Agrawal, Anuj Kumar Sirohi, Jayadeva, Sandeep Kumar

Abstract: Ensuring fairness in Recommendation Systems (RSs) across demographic groups is critical due to the increased integration of RSs in applications such as personalized healthcare, finance, and e-commerce. Graph-based RSs play a crucial role in capturing intricate higher-order interactions among entities. However, integrating these graph models into the Federated Learning (FL) paradigm with fairness c… ▽ More Ensuring fairness in Recommendation Systems (RSs) across demographic groups is critical due to the increased integration of RSs in applications such as personalized healthcare, finance, and e-commerce. Graph-based RSs play a crucial role in capturing intricate higher-order interactions among entities. However, integrating these graph models into the Federated Learning (FL) paradigm with fairness constraints poses formidable challenges as this requires access to the entire interaction graph and sensitive user information (such as gender, age, etc.) at the central server. This paper addresses the pervasive issue of inherent bias within RSs for different demographic groups without compromising the privacy of sensitive user attributes in FL environment with the graph-based model. To address the group bias, we propose F2PGNN (Fair Federated Personalized Graph Neural Network), a novel framework that leverages the power of Personalized Graph Neural Network (GNN) coupled with fairness considerations. Additionally, we use differential privacy techniques to fortify privacy protection. Experimental evaluation on three publicly available datasets showcases the efficacy of F2PGNN in mitigating group unfairness by 47% - 99% compared to the state-of-the-art while preserving privacy and maintaining the utility. The results validate the significance of our framework in achieving equitable and personalized recommendations using GNN within the FL landscape. △ Less

Submitted 20 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: To appear as a full paper in AAAI 2024

arXiv:2312.02548 [pdf, other]

GeNIe: Generative Hard Negative Images Through Diffusion

Authors: Soroush Abbasi Koohpayegani, Anuj Singh, K L Navaneet, Hadi Jamali-Rad, Hamed Pirsiavash

Abstract: Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Recent advances in generative AI, e.g., diffusion models, have enabled more sophisticated augmentation techniques that produce data resembling natural images. We introduce GeNIe a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to merge contrasting… ▽ More Data augmentation is crucial in training deep models, preventing them from overfitting to limited data. Recent advances in generative AI, e.g., diffusion models, have enabled more sophisticated augmentation techniques that produce data resembling natural images. We introduce GeNIe a novel augmentation method which leverages a latent diffusion model conditioned on a text prompt to merge contrasting data points (an image from the source category and a text prompt from the target category) to generate challenging samples. To achieve this, inspired by recent diffusion based image editing techniques, we limit the number of diffusion iterations to ensure the generated image retains low-level and background features from the source image while representing the target category, resulting in a hard negative sample for the source category. We further enhance the proposed approach by finding the appropriate noise level adaptively for each image (coined as GeNIe-Ada) leading to further performance improvement. Our extensive experiments, in both few-shot and long-tail distribution settings, demonstrate the effectiveness of our novel augmentation method and its superior performance over the prior art. Our code is available here: https://github.com/UCDvision/GeNIe △ Less

Submitted 23 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Our code is available https://github.com/UCDvision/GeNIe

arXiv:2312.00796 [pdf]

Multiple Protein Profiler 1.0 (MPP): A webserver for predicting and visualizing physiochemical properties of proteins at the proteome level

Authors: Gustavo Sganzerla Martinez, Mansi Dutt, Anuj Kumar, David J Kelvin

Abstract: Determining the physicochemical properties of a protein can reveal important insights in their structure, biological functions, stability, and interactions with other molecules. Although tools for computing properties of proteins already existed, we could not find a comprehensive tool that enables the calculations of multiple properties for multiple input proteins on the proteome level at once. Fa… ▽ More Determining the physicochemical properties of a protein can reveal important insights in their structure, biological functions, stability, and interactions with other molecules. Although tools for computing properties of proteins already existed, we could not find a comprehensive tool that enables the calculations of multiple properties for multiple input proteins on the proteome level at once. Facing this limitation, we have developed Multiple Protein Profiler (MPP) 1.0 as an integrated tool that allows the profiling of 12 individual properties of multiple proteins in a significant manner. MPP provides a tabular and graphic visualization of properties of multiple proteins. The tool is freely accessible at https://mproteinprofiler.microbiologyandimmunology.dal.ca/ △ Less

Submitted 17 November, 2023; originally announced December 2023.

arXiv:2312.00038 [pdf, other]

A Posteriori Evaluation of a Physics-Constrained Neural Ordinary Differential Equations Approach Coupled with CFD Solver for Modeling Stiff Chemical Kinetics

Authors: Tadbhagya Kumar, Anuj Kumar, Pinaki Pal

Abstract: The high computational cost associated with solving for detailed chemistry poses a significant challenge for predictive computational fluid dynamics (CFD) simulations of turbulent reacting flows. These models often require solving a system of coupled stiff ordinary differential equations (ODEs). While deep learning techniques have been experimented with to develop faster surrogate models, they oft… ▽ More The high computational cost associated with solving for detailed chemistry poses a significant challenge for predictive computational fluid dynamics (CFD) simulations of turbulent reacting flows. These models often require solving a system of coupled stiff ordinary differential equations (ODEs). While deep learning techniques have been experimented with to develop faster surrogate models, they often fail to integrate reliably with CFD solvers. This instability arises because deep learning methods optimize for training error without ensuring compatibility with ODE solvers, leading to accumulation of errors over time. Recently, NeuralODE-based techniques have offered a promising solution by effectively modeling chemical kinetics. In this study, we extend the NeuralODE framework for stiff chemical kinetics by incorporating mass conservation constraints directly into the loss function during training. This ensures that the total mass and the elemental mass are conserved, a critical requirement for reliable downstream integration with CFD solvers. Proof-of-concept studies are performed with physics-constrained neuralODE (PC-NODE) approach for homogeneous autoignition of hydrogen-air mixture over a range of composition and thermodynamic conditions. Our results demonstrate that this enhancement not only improves the physical consistency with respect to mass conservation criteria but also ensures better robustness. Lastly, a posteriori studies are performed wherein the trained PC-NODE model is coupled with a 3D CFD solver for computing the chemical source terms. PC-NODE is shown to be more accurate relative to the purely data-driven neuralODE approach. Moreover, PC-NODE also exhibits robustness and generalizability to unseen initial conditions from within (interpolative capability) as well as outside (extrapolative capability) the training regime. △ Less

Submitted 4 March, 2024; v1 submitted 22 November, 2023; originally announced December 2023.

arXiv:2311.16766 [pdf, other]

Rescuing referral failures during automated diagnosis of domain-shifted medical images

Authors: Anuj Srivastava, Karm Patel, Pradeep Shenoy, Devarajan Sridharan

Abstract: The success of deep learning models deployed in the real world depends critically on their ability to generalize well across diverse data domains. Here, we address a fundamental challenge with selective classification during automated diagnosis with domain-shifted medical images. In this scenario, models must learn to avoid making predictions when label confidence is low, especially when tested wi… ▽ More The success of deep learning models deployed in the real world depends critically on their ability to generalize well across diverse data domains. Here, we address a fundamental challenge with selective classification during automated diagnosis with domain-shifted medical images. In this scenario, models must learn to avoid making predictions when label confidence is low, especially when tested with samples far removed from the training set (covariate shift). Such uncertain cases are typically referred to the clinician for further analysis and evaluation. Yet, we show that even state-of-the-art domain generalization approaches fail severely during referral when tested on medical images acquired from a different demographic or using a different technology. We examine two benchmark diagnostic medical imaging datasets exhibiting strong covariate shifts: i) diabetic retinopathy prediction with retinal fundus images and ii) multilabel disease prediction with chest X-ray images. We show that predictive uncertainty estimates do not generalize well under covariate shifts leading to non-monotonic referral curves, and severe drops in performance (up to 50%) at high referral rates (>70%). We evaluate novel combinations of robust generalization and post hoc referral approaches, that rescue these failures and achieve significant performance improvements, typically >10%, over baseline methods. Our study identifies a critical challenge with referral in domain-shifted medical images and finds key applications in reliable, automated disease diagnosis. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.14613 [pdf, other]

Routing and Spectrum Allocation in Broadband Degenerate EPR-Pair Distribution

Authors: Rohan Bali, Ashley Tittelbaugh, Shelbi L. Jenkins, Anuj Agrawal, Jerry Horgan, Marco Ruffini, Daniel Kilper, Boulat A. Bash

Abstract: We investigate resource allocation for quantum entanglement distribution over an optical network. We characterize and model a network architecture that employs a single quasideterministic time-frequency heralded EPR-pair source, and develop a routing scheme for distributing entangled photon pairs over such a network. We focus on fairness in entanglement distribution, and compare both the performan… ▽ More We investigate resource allocation for quantum entanglement distribution over an optical network. We characterize and model a network architecture that employs a single quasideterministic time-frequency heralded EPR-pair source, and develop a routing scheme for distributing entangled photon pairs over such a network. We focus on fairness in entanglement distribution, and compare both the performance of various spectrum allocation schemes as well as their Jain index. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.04157 [pdf, other]

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Authors: Dipanjyoti Paul, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David Carlyn, Samuel Stevens, Kaiya L. Provost, Anuj Karpatne, Bryan Carstens, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao

Abstract: We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR)… ▽ More We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head" cross-attention, INTR could identify different "attributes" of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR. △ Less

Submitted 14 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted to International Conference on Learning Representations 2024 (ICLR 2024)

arXiv:2310.09441 [pdf, other]

MEMTRACK: A Deep Learning-Based Approach to Microrobot Tracking in Dense and Low-Contrast Environments

Authors: Medha Sawhney, Bhas Karmarkar, Eric J. Leaman, Arka Daw, Anuj Karpatne, Bahareh Behkam

Abstract: Tracking microrobots is challenging, considering their minute size and high speed. As the field progresses towards develo** microrobots for biomedical applications and conducting mechanistic studies in physiologically relevant media (e.g., collagen), this challenge is exacerbated by the dense surrounding environments with feature size and shape comparable to microrobots. Herein, we report Motion… ▽ More Tracking microrobots is challenging, considering their minute size and high speed. As the field progresses towards develo** microrobots for biomedical applications and conducting mechanistic studies in physiologically relevant media (e.g., collagen), this challenge is exacerbated by the dense surrounding environments with feature size and shape comparable to microrobots. Herein, we report Motion Enhanced Multi-level Tracker (MEMTrack), a robust pipeline for detecting and tracking microrobots using synthetic motion features, deep learning-based object detection, and a modified Simple Online and Real-time Tracking (SORT) algorithm with interpolation for tracking. Our object detection approach combines different models based on the object's motion pattern. We trained and validated our model using bacterial micro-motors in collagen (tissue phantom) and tested it in collagen and aqueous media. We demonstrate that MEMTrack accurately tracks even the most challenging bacteria missed by skilled human annotators, achieving precision and recall of 77% and 48% in collagen and 94% and 35% in liquid media, respectively. Moreover, we show that MEMTrack can quantify average bacteria speed with no statistically significant difference from the laboriously-produced manual tracking data. MEMTrack represents a significant contribution to microrobot localization and tracking, and opens the potential for vision-based deep learning approaches to microrobot control in dense and low-contrast settings. All source code for training and testing MEMTrack and reproducing the results of the paper have been made publicly available https://github.com/sawhney-medha/MEMTrack. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.02351 [pdf]

Investigating Speed Deviation Patterns During Glucose Episodes: A Quantile Regression Approach

Authors: Aparna Joshi, Jennifer Merickel, Cyrus V. Desouza, Matthew Rizzo, Pujitha Gunaratne, Anuj Sharma

Abstract: Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed beh… ▽ More Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed behavior during acute glucose to drivers with diabetes who were euglycemic or control drivers without diabetes in a naturalistic driving environment. By employing distribution-based analytic methods which capture distribution patterns, our study advances prior literature that has focused on conventional approach of average speed to explore speed deviation patterns. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 6 pages, 2 figures, 5 Tables, Accepted and Presented at IEEE ITSC 2023 Conference in Bilbao Spain

arXiv:2309.16058 [pdf, other]

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Authors: Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

Abstract: We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a… ▽ More We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.14601 [pdf, other]

Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method

Authors: Mohannad Elhamod, Anuj Karpatne

Abstract: In recent years, there has been a growing interest in visualizing the loss landscape of neural networks. Linear landscape visualization methods, such as principal component analysis, have become widely used as they intuitively help researchers study neural networks and their training process. However, these linear methods suffer from limitations and drawbacks due to their lack of flexibility and l… ▽ More In recent years, there has been a growing interest in visualizing the loss landscape of neural networks. Linear landscape visualization methods, such as principal component analysis, have become widely used as they intuitively help researchers study neural networks and their training process. However, these linear methods suffer from limitations and drawbacks due to their lack of flexibility and low fidelity at representing the high dimensional landscape. In this paper, we present a novel auto-encoder-based non-linear landscape visualization method called Neuro-Visualizer that addresses these shortcoming and provides useful insights about neural network loss landscapes. To demonstrate its potential, we run experiments on a variety of problems in two separate applications of knowledge-guided machine learning (KGML). Our findings show that Neuro-Visualizer outperforms other linear and non-linear baselines and helps corroborate, and sometime challenge, claims proposed by machine learning community. All code and data used in the experiments of this paper are available at an anonymous link https://anonymous.4open.science/r/NeuroVisualizer-FDD6 △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.09595 [pdf, ps, other]

$\mathbb{F}$-valued trace of a finite-dimensional commutative $\mathbb{F}$-algebra

Authors: Anuj Kr Bhagat, Ritumoni Sarma

Abstract: A non-zero $\mathbb{F}$-valued $\mathbb{F}$-linear map on a finite dimensional $\mathbb{F}$-algebra is called an $\mathbb{F}$-valued trace if its kernel does not contain any non-zero ideals. However, given an $\mathbb{F}$-algebra such a map may not always exist. We find an infinite class of finite-dimensional commutative $\mathbb{F}$-algebras which admit an $\mathbb{F}$-valued trace. In fact, in t… ▽ More A non-zero $\mathbb{F}$-valued $\mathbb{F}$-linear map on a finite dimensional $\mathbb{F}$-algebra is called an $\mathbb{F}$-valued trace if its kernel does not contain any non-zero ideals. However, given an $\mathbb{F}$-algebra such a map may not always exist. We find an infinite class of finite-dimensional commutative $\mathbb{F}$-algebras which admit an $\mathbb{F}$-valued trace. In fact, in these cases, we explicitly construct a trace map. The existence of an $\mathbb{F}$-valued trace on a finite dimensional commutative $\mathbb{F}$-algebra induces a non-degenerate bilinear form on the $\mathbb{F}$-algebra which may be helpful both theoretically and computationally. In this article, we suggest a couple of applications of an $\mathbb{F}$-valued trace map of an $\mathbb{F}$-algebra to algebraic coding theory. △ Less

Submitted 19 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.07430 [pdf, other]

doi 10.1038/s41591-024-02855-5

Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization

Authors: Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, Nidhi Rohatgi, Poonam Hosamani, William Collins, Neera Ahuja, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, John Pauly, Akshay S. Chaudhari

Abstract: Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs,… ▽ More Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Quantitative assessments with syntactic, semantic, and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with ten physicians evaluates summary completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care. △ Less

Submitted 11 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 27 pages, 19 figures

Journal ref: Nature Medicine, 2024

arXiv:2309.04892 [pdf, ps, other]

Descriptive complexity of controllable graphs

Authors: Aida Abiad, Anuj Dawar, Octavio Zapata

Abstract: Let $G$ be a graph on $n$ vertices with adjacency matrix $A$, and let $\mathbf{1}$ be the all-ones vector. We call $G$ controllable if the set of vectors $\mathbf{1}, A\mathbf{1}, \dots, A^{n-1}\mathbf{1}$ spans the whole space $\mathbb{R}^n$. We characterize the isomorphism problem of controllable graphs in terms of other combinatorial, geometric and logical problems. We also describe a polynomia… ▽ More Let $G$ be a graph on $n$ vertices with adjacency matrix $A$, and let $\mathbf{1}$ be the all-ones vector. We call $G$ controllable if the set of vectors $\mathbf{1}, A\mathbf{1}, \dots, A^{n-1}\mathbf{1}$ spans the whole space $\mathbb{R}^n$. We characterize the isomorphism problem of controllable graphs in terms of other combinatorial, geometric and logical problems. We also describe a polynomial time algorithm for graph isomorphism that works for almost all graphs. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: 14 pages

Showing 1–50 of 311 results for author: Anuj