-
SubLock: Sub-Circuit Replacement based Input Dependent Key-based Logic Locking for Robust IP Protection
Authors:
Vijaypal Singh Rathor,
Munesh Singh,
Kshira Sagar Sahoo,
Saraju P. Mohanty
Abstract:
Intellectual Property (IP) piracy, overbuilding, reverse engineering, and hardware Trojan are serious security concerns during integrated circuit (IC) development. Logic locking has proven to be a solid defence for mitigating these threats. The existing logic locking techniques are vulnerable to SAT-based attacks. However, several SAT-resistant logic locking methods are reported; they require sign…
▽ More
Intellectual Property (IP) piracy, overbuilding, reverse engineering, and hardware Trojan are serious security concerns during integrated circuit (IC) development. Logic locking has proven to be a solid defence for mitigating these threats. The existing logic locking techniques are vulnerable to SAT-based attacks. However, several SAT-resistant logic locking methods are reported; they require significant overhead. This paper proposes a novel input dependent key-based logic locking (IDKLL) that effectively prevents SAT-based attacks with low overhead. We first introduce a novel idea of IDKLL, where a design is locked such that it functions correctly for all input patterns only when their corresponding valid key sequences are applied. In contrast to conventional logic locking, the proposed IDKLL method uses multiple key sequences (instead of a single key sequence) as a valid key that provides correct functionality for all inputs. Further, we propose a sub-circuit replacement based IDKLL approach called SubLock that locks the design by replacing the original sub-circuitry with the corresponding IDKLL based locked circuit to prevent SAT attack with low overhead. The experimental evaluation on ISCAS benchmarks shows that the proposed SubLock mitigates the SAT attack with high security and reduced overhead over the well-known existing methods.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
ToSA: Token Selective Attention for Efficient Vision Transformers
Authors:
Manish Kumar Singh,
Rajeev Yasarla,
Hong Cai,
Mingu Lee,
Fatih Porikli
Abstract:
In this paper, we propose a novel token selective attention approach, ToSA, which can identify tokens that need to be attended as well as those that can skip a transformer layer. More specifically, a token selector parses the current attention maps and predicts the attention maps for the next layer, which are then used to select the important tokens that should participate in the attention operati…
▽ More
In this paper, we propose a novel token selective attention approach, ToSA, which can identify tokens that need to be attended as well as those that can skip a transformer layer. More specifically, a token selector parses the current attention maps and predicts the attention maps for the next layer, which are then used to select the important tokens that should participate in the attention operation. The remaining tokens simply bypass the next layer and are concatenated with the attended ones to re-form a complete set of tokens. In this way, we reduce the quadratic computation and memory costs as fewer tokens participate in self-attention while maintaining the features for all the image patches throughout the network, which allows it to be used for dense prediction tasks. Our experiments show that by applying ToSA, we can significantly reduce computation costs while maintaining accuracy on the ImageNet classification benchmark. Furthermore, we evaluate on the dense prediction task of monocular depth estimation on NYU Depth V2, and show that we can achieve similar depth prediction accuracy using a considerably lighter backbone with ToSA.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection
Authors:
Orchid Chetia Phukan,
Sarthak Jain,
Shubham Singh,
Muskaan Singh,
Arun Balaji Buduru,
Rajesh Sharma
Abstract:
In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce…
▽ More
In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce ComFeAT, an application that employs a CNN model trained on a combination of features extracted from PTMs, a.k.a. neural features and spectral features to enhance depression detection. Spectral features are robust to domain variations, but, they are not as good as neural features in performance, suprisingly, combining them shows complementary behavior and improves over both neural and spectral features individually. The proposed method also improves over previous state-of-the-art (SOTA) works on E-DAIC benchmark.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations
Authors:
Mohit Kumar Singh,
Georgina Cosma,
Patrick Waterson,
Jonathan Back,
Gyuchan Thomas Jun
Abstract:
Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health cond…
▽ More
Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health conditions, procedures and tests), overlooking the importance of human factors. We developed a new approach called I-SIRch, using artificial intelligence to automatically identify and label human factors concepts in maternity healthcare investigation reports describing adverse maternity incidents produced by England's Healthcare Safety Investigation Branch (HSIB). These incident investigation reports aim to identify opportunities for learning and improving maternal safety across the entire healthcare system. I-SIRch was trained using real data and tested on both real and simulated data to evaluate its performance in identifying human factors concepts. When applied to real reports, the model achieved a high level of accuracy, correctly identifying relevant concepts in 90\% of the sentences from 97 reports. Applying I-SIRch to analyse these reports revealed that certain human factors disproportionately affected mothers from different ethnic groups. Our work demonstrates the potential of using automated tools to identify human factors concepts in maternity incident investigation reports, rather than focusing solely on biomedical concepts. This approach opens up new possibilities for understanding the complex interplay between social, technical, and organisational factors influencing maternal safety and population health outcomes. By taking a more comprehensive view of maternal healthcare delivery, we can develop targeted interventions to address disparities and improve maternal outcomes.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
SilentCipher: Deep Audio Watermarking
Authors:
Mayank Kumar Singh,
Naoya Takahashi,
Weihsiang Liao,
Yuki Mitsufuji
Abstract:
In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study…
▽ More
In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
TAGMol: Target-Aware Gradient-guided Molecule Generation
Authors:
Vineeth Dorna,
D. Subhalingam,
Keshav Kolluru,
Shreshth Tuli,
Mrityunjay Singh,
Saurabh Singal,
N. M. Anoop Krishnan,
Sayan Ranu
Abstract:
3D generative models have shown significant promise in structure-based drug design (SBDD), particularly in discovering ligands tailored to specific target binding sites. Existing algorithms often focus primarily on ligand-target binding, characterized by binding affinity. Moreover, models trained solely on target-ligand distribution may fall short in addressing the broader objectives of drug disco…
▽ More
3D generative models have shown significant promise in structure-based drug design (SBDD), particularly in discovering ligands tailored to specific target binding sites. Existing algorithms often focus primarily on ligand-target binding, characterized by binding affinity. Moreover, models trained solely on target-ligand distribution may fall short in addressing the broader objectives of drug discovery, such as the development of novel ligands with desired properties like drug-likeness, and synthesizability, underscoring the multifaceted nature of the drug design process. To overcome these challenges, we decouple the problem into molecular generation and property prediction. The latter synergistically guides the diffusion sampling process, facilitating guided diffusion and resulting in the creation of meaningful molecules with the desired properties. We call this guided molecular generation process as TAGMol. Through experiments on benchmark datasets, TAGMol demonstrates superior performance compared to state-of-the-art baselines, achieving a 22% improvement in average Vina Score and yielding favorable outcomes in essential auxiliary properties. This establishes TAGMol as a comprehensive framework for drug generation.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Reasoning about concepts with LLMs: Inconsistencies abound
Authors:
Rosario Uceda-Sosa,
Karthikeyan Natesan Ramamurthy,
Maria Chang,
Moninder Singh
Abstract:
The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in the…
▽ More
The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in their knowledge. Computationally, the basic aspects of the conceptualization of a given domain can be represented as Is-A hierarchies in a knowledge graph (KG) or ontology, together with a few properties or axioms that enable straightforward reasoning. We show that even simple ontologies can be used to reveal conceptual inconsistencies across several LLMs. We also propose strategies that domain experts can use to evaluate and improve the coverage of key domain concepts in LLMs of various sizes. In particular, we have been able to significantly enhance the performance of LLMs of various sizes with openly available weights using simple knowledge-graph (KG) based prompting strategies.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
CloudSense: A Model for Cloud Type Identification using Machine Learning from Radar data
Authors:
Mehzooz Nizar,
Jha K. Ambuj,
Manmeet Singh,
Vaisakh S. B,
G. Pandithurai
Abstract:
The knowledge of type of precipitating cloud is crucial for radar based quantitative estimates of precipitation. We propose a novel model called CloudSense which uses machine learning to accurately identify the type of precipitating clouds over the complex terrain locations in the Western Ghats (WGs) of India. CloudSense uses vertical reflectivity profiles collected during July-August 2018 from an…
▽ More
The knowledge of type of precipitating cloud is crucial for radar based quantitative estimates of precipitation. We propose a novel model called CloudSense which uses machine learning to accurately identify the type of precipitating clouds over the complex terrain locations in the Western Ghats (WGs) of India. CloudSense uses vertical reflectivity profiles collected during July-August 2018 from an X-band radar to classify clouds into four categories namely stratiform,mixed stratiform-convective,convective and shallow clouds. The machine learning(ML) model used in CloudSense was trained using a dataset balanced by Synthetic Minority Oversampling Technique (SMOTE), with features selected based on physical characteristics relevant to different cloud types. Among various ML models evaluated Light Gradient Boosting Machine (LightGBM) demonstrate superior performance in classifying cloud types with a BAC of 0.8 and F1-Score of 0.82. CloudSense generated results are also compared against conventional radar algorithms and we find that CloudSense performs better than radar algorithms. For 200 samples tested, the radar algorithm achieved a BAC of 0.69 and F1-Score of 0.68, whereas CloudSense achieved a BAC and F1-Score of 0.77. Our results show that ML based approach can provide more accurate cloud detection and classification which would be useful to improve precipitation estimates over the complex terrain of the WG.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
Authors:
Salam Michael Singh,
Shubhmoy Kumar Garg,
Amitesh Misra,
Aaditeshwar Seth,
Tanmoy Chakraborty
Abstract:
Sexual education aims to foster a healthy lifestyle in terms of emotional, mental and social well-being. In countries like India, where adolescents form the largest demographic group, they face significant vulnerabilities concerning sexual health. Unfortunately, sexual education is often stigmatized, creating barriers to providing essential counseling and information to this at-risk population. Co…
▽ More
Sexual education aims to foster a healthy lifestyle in terms of emotional, mental and social well-being. In countries like India, where adolescents form the largest demographic group, they face significant vulnerabilities concerning sexual health. Unfortunately, sexual education is often stigmatized, creating barriers to providing essential counseling and information to this at-risk population. Consequently, issues such as early pregnancy, unsafe abortions, sexually transmitted infections, and sexual violence become prevalent. Our current proposal aims to provide a safe and trustworthy platform for sexual education to the vulnerable rural Indian population, thereby fostering the healthy and overall growth of the nation. In this regard, we strive towards designing SUKHSANDESH, a multi-staged AI-based Question Answering platform for sexual education tailored to rural India, adhering to safety guardrails and regional language support. By utilizing information retrieval techniques and large language models, SUKHSANDESH will deliver effective responses to user queries. We also propose to anonymise the dataset to mitigate safety measures and set AI guardrails against any harmful or unwanted response generation. Moreover, an innovative feature of our proposal involves integrating ``avatar therapy'' with SUKHSANDESH. This feature will convert AI-generated responses into real-time audio delivered by an animated avatar speaking regional Indian languages. This approach aims to foster empathy and connection, which is particularly beneficial for individuals with limited literacy skills. Partnering with Gram Vaani, an industry leader, we will deploy SUKHSANDESH to address sexual education needs in rural India.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Semantically Aligned Question and Code Generation for Automated Insight Generation
Authors:
Ananya Singha,
Bhavya Chopra,
Anirudh Khatry,
Sumit Gulwani,
Austin Z. Henley,
Vu Le,
Chris Parnin,
Mukul Singh,
Gust Verbruggen
Abstract:
Automated insight generation is a common tactic for hel** knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to…
▽ More
Automated insight generation is a common tactic for hel** knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions.
△ Less
Submitted 21 March, 2024;
originally announced May 2024.
-
3D Face Morphing Attack Generation using Non-Rigid Registration
Authors:
Jag Mohan Singh,
Raghavendra Ramachandra
Abstract:
Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. T…
▽ More
Face Recognition Systems (FRS) are widely used in commercial environments, such as e-commerce and e-banking, owing to their high accuracy in real-world conditions. However, these systems are vulnerable to facial morphing attacks, which are generated by blending face color images of different subjects. This paper presents a new method for generating 3D face morphs from two bona fide point clouds. The proposed method first selects bona fide point clouds with neutral expressions. The two input point clouds were then registered using a Bayesian Coherent Point Drift (BCPD) without optimization, and the geometry and color of the registered point clouds were averaged to generate a face morphing point cloud. The proposed method generates 388 face-morphing point clouds from 200 bona fide subjects. The effectiveness of the method was demonstrated through extensive vulnerability experiments, achieving a Generalized Morphing Attack Potential (G-MAP) of 97.93%, which is superior to the existing state-of-the-art (SOTA) with a G-MAP of 81.61%.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection
Authors:
Raghavendra Ramachandra,
Narayan Vetrekar,
Sushma Venkatesh,
Savita Nageshker,
Jag Mohan Singh,
R. S. Gad
Abstract:
Facial biometrics are an essential components of smartphones to ensure reliable and trustworthy authentication. However, face biometric systems are vulnerable to Presentation Attacks (PAs), and the availability of more sophisticated presentation attack instruments such as 3D silicone face masks will allow attackers to deceive face recognition systems easily. In this work, we propose a novel Presen…
▽ More
Facial biometrics are an essential components of smartphones to ensure reliable and trustworthy authentication. However, face biometric systems are vulnerable to Presentation Attacks (PAs), and the availability of more sophisticated presentation attack instruments such as 3D silicone face masks will allow attackers to deceive face recognition systems easily. In this work, we propose a novel Presentation Attack Detection (PAD) algorithm based on 3D point clouds captured using the frontal camera of a smartphone to detect presentation attacks. The proposed PAD algorithm, VoxAtnNet, processes 3D point clouds to obtain voxelization to preserve the spatial structure. Then, the voxelized 3D samples were trained using the novel convolutional attention network to detect PAs on the smartphone. Extensive experiments were carried out on the newly constructed 3D face point cloud dataset comprising bona fide and two different 3D PAIs (3D silicone face mask and wrap photo mask), resulting in 3480 samples. The performance of the proposed method was compared with existing methods to benchmark the detection performance using three different evaluation protocols. The experimental results demonstrate the improved performance of the proposed method in detecting both known and unknown face presentation attacks.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Extracting Norms from Contracts Via ChatGPT: Opportunities and Challenges
Authors:
Amanul Haque,
Munindar P. Singh
Abstract:
We investigate the effectiveness of ChatGPT in extracting norms from contracts. Norms provide a natural way to engineer multiagent systems by capturing how to govern the interactions between two or more autonomous parties. We extract norms of commitment, prohibition, authorization, and power, along with associated norm elements (the parties involved, antecedents, and consequents) from contracts. O…
▽ More
We investigate the effectiveness of ChatGPT in extracting norms from contracts. Norms provide a natural way to engineer multiagent systems by capturing how to govern the interactions between two or more autonomous parties. We extract norms of commitment, prohibition, authorization, and power, along with associated norm elements (the parties involved, antecedents, and consequents) from contracts. Our investigation reveals ChatGPT's effectiveness and limitations in norm extraction from contracts. ChatGPT demonstrates promising performance in norm extraction without requiring training or fine-tuning, thus obviating the need for annotated data, which is not generally available in this domain. However, we found some limitations of ChatGPT in extracting these norms that lead to incorrect norm extractions. The limitations include oversight of crucial details, hallucination, incorrect parsing of conjunctions, and empty norm elements. Enhanced norm extraction from contracts can foster the development of more transparent and trustworthy formal agent interaction specifications, thereby contributing to the improvement of multiagent systems.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset
Authors:
Akash Ghosh,
B Venkata Sahith,
Niloy Ganguly,
Pawan Goyal,
Mayank Singh
Abstract:
Question-answering (QA) on hybrid scientific tabular and textual data deals with scientific information, and relies on complex numerical reasoning. In recent years, while tabular QA has seen rapid progress, understanding their robustness on scientific information is lacking due to absence of any benchmark dataset. To investigate the robustness of the existing state-of-the-art QA models on scientif…
▽ More
Question-answering (QA) on hybrid scientific tabular and textual data deals with scientific information, and relies on complex numerical reasoning. In recent years, while tabular QA has seen rapid progress, understanding their robustness on scientific information is lacking due to absence of any benchmark dataset. To investigate the robustness of the existing state-of-the-art QA models on scientific hybrid tabular data, we propose a new dataset, "SciTabQA", consisting of 822 question-answer pairs from scientific tables and their descriptions. With the help of this dataset, we assess the state-of-the-art Tabular QA models based on their ability (i) to use heterogeneous information requiring both structured data (table) and unstructured data (text) and (ii) to perform complex scientific reasoning tasks. In essence, we check the capability of the models to interpret scientific tables and text. Our experiments show that "SciTabQA" is an innovative dataset to study question-answering over scientific heterogeneous data. We benchmark three state-of-the-art Tabular QA models, and find that the best F1 score is only 0.462.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Information Security and Privacy in the Digital World: Some Selected Topics
Authors:
Jaydip Sen,
Joceli Mayer,
Subhasis Dasgupta,
Subrata Nandi,
Srinivasan Krishnaswamy,
Pinaki Mitra,
Mahendra Pratap Singh,
Naga Prasanthi Kundeti,
Chandra Sekhara Rao MVP,
Sudha Sree Chekuri,
Seshu Babu Pallapothu,
Preethi Nanjundan,
Jossy P. George,
Abdelhadi El Allahi,
Ilham Morino,
Salma AIT Oussous,
Siham Beloualid,
Ahmed Tamtaoui,
Abderrahim Bajit
Abstract:
In the era of generative artificial intelligence and the Internet of Things, while there is explosive growth in the volume of data and the associated need for processing, analysis, and storage, several new challenges are faced in identifying spurious and fake information and protecting the privacy of sensitive data. This has led to an increasing demand for more robust and resilient schemes for aut…
▽ More
In the era of generative artificial intelligence and the Internet of Things, while there is explosive growth in the volume of data and the associated need for processing, analysis, and storage, several new challenges are faced in identifying spurious and fake information and protecting the privacy of sensitive data. This has led to an increasing demand for more robust and resilient schemes for authentication, integrity protection, encryption, non-repudiation, and privacy-preservation of data. The chapters in this book present some of the state-of-the-art research works in the field of cryptography and security in computing and communications.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Review Ecosystems to access Educational XR Experiences: a Sco** Review
Authors:
Shaun Bangay,
Adam P. A. Cardilini,
Sophie McKenzie,
Maria Nicholas,
Manjeet Singh
Abstract:
Educators, developers, and other stakeholders face challenges when creating, adapting, and utilizing virtual and augmented reality (XR) experiences for teaching curriculum topics. User created reviews of these applications provide important information about their relevance and effectiveness in supporting achievement of educational outcomes. To make these reviews accessible, relevant, and useful,…
▽ More
Educators, developers, and other stakeholders face challenges when creating, adapting, and utilizing virtual and augmented reality (XR) experiences for teaching curriculum topics. User created reviews of these applications provide important information about their relevance and effectiveness in supporting achievement of educational outcomes. To make these reviews accessible, relevant, and useful, they must be readily available and presented in a format that supports decision-making by educators. This paper identifies best practices for develo** a new review ecosystem by analyzing existing approaches to providing reviews of interactive experiences. It focuses on the form and format of these reviews, as well as the mechanisms for sharing information about experiences and identifying which ones are most effective. The paper also examines the incentives that drive review creation and maintenance, ensuring that new experiences receive attention from reviewers and that relevant information is updated when necessary. The strategies and opportunities for develo** an educational XR (eduXR) review ecosystem include methods for measuring properties such as quality metrics, engaging a broad range of stakeholders in the review process, and structuring the system as a closed loop managed by feedback and incentive structures to ensure stability and productivity. Computing educators are well-positioned to lead the development of these review ecosystems, which can relate XR experiences to the potential opportunities for teaching and learning that they offer.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Authors:
Rajeev Yasarla,
Manish Kumar Singh,
Hong Cai,
Yunxiao Shi,
Jisoo Jeong,
Yinhao Zhu,
Shizhong Han,
Risheek Garrepalli,
Fatih Porikli
Abstract:
In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame fea…
▽ More
In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame features one time step ahead iteratively. In this way, F-Net learns the underlying motion and correspondence information, and we incorporate its features into the depth decoding process. Additionally, to enrich the learning of multiframe correspondence cues, we further leverage a reconstruction network, R-Net, which is trained via adaptively masked auto-encoding of multiframe feature volumes. At inference time, both F-Net and R-Net are used to produce queries to work with the depth decoder, as well as a final refinement network. Through extensive experiments on several benchmarks, i.e., NYUDv2, KITTI, DDAD, and Sintel, which cover indoor, driving, and open-domain scenarios, we show that FutureDepth significantly improves upon baseline models, outperforms existing video depth estimation methods, and sets new state-of-the-art (SOTA) accuracy. Furthermore, FutureDepth is more efficient than existing SOTA video depth estimation models and has similar latencies when comparing to monocular models
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Authors:
Yunxiao Shi,
Manish Kumar Singh,
Hong Cai,
Fatih Porikli
Abstract:
In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network…
▽ More
In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Authors:
Swapnaja Achintalwar,
Ioana Baldini,
Djallel Bouneffouf,
Joan Byamugisha,
Maria Chang,
Pierre Dognin,
Eitan Farchi,
Ndivhuwo Makondo,
Aleksandra Mojsilovic,
Manish Nagireddy,
Karthikeyan Natesan Ramamurthy,
Inkit Padhi,
Orna Raz,
Jesus Rios,
Prasanna Sattigeri,
Moninder Singh,
Siphiwe Thwala,
Rosario A. Uceda-Sosa,
Kush R. Varshney
Abstract:
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentia…
▽ More
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
From Pro, Anti to Informative and Hesitant: An Infoveillance study of COVID-19 vaccines and vaccination discourse on Twitter
Authors:
Pardeep Singh,
Rabindra Lamsal,
Monika Singh,
Satish Chand,
Bhawna Shishodia
Abstract:
COVID-19 pandemic has brought unprecedented challenges to the world, and vaccination has been a key strategy to combat the disease. Since Twitter is one of the most widely used public microblogging platforms, researchers have analysed COVID-19 vaccines and vaccination Twitter discourse to explore the conversational dynamics around the topic. While contributing to the crisis informatics literature,…
▽ More
COVID-19 pandemic has brought unprecedented challenges to the world, and vaccination has been a key strategy to combat the disease. Since Twitter is one of the most widely used public microblogging platforms, researchers have analysed COVID-19 vaccines and vaccination Twitter discourse to explore the conversational dynamics around the topic. While contributing to the crisis informatics literature, we curate a large-scale geotagged Twitter dataset, GeoCovaxTweets Extended, and explore the discourse through multiple spatiotemporal analyses. This dataset covers a longer time span of 38 months, from the announcement of the first vaccine to the availability of booster doses. Results show that 43.4% of the collected tweets, although containing phrases and keywords related to vaccines and vaccinations, were unrelated to the COVID-19 context. In total, 23.1% of the discussions on vaccines and vaccinations were classified as Pro, 16% as Hesitant, 11.4% as Anti, and 6.1% as Informative. The trend shifted towards Pro and Informative tweets globally as vaccination programs progressed, indicating a change in the public's perception of COVID-19 vaccines and vaccination. Furthermore, we explored the discourse based on account attributes, i.e., followers counts and tweet counts. Results show a significant pattern of discourse differences. Our findings highlight the potential of harnessing a large-scale geotagged Twitter dataset to understand global public health communication and to inform targeted interventions aimed at addressing vaccine hesitancy.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Future of Pandemic Prevention and Response CCC Workshop Report
Authors:
David Danks,
Rada Mihalcea,
Katie Siek,
Mona Singh,
Brian Dixon,
Haley Griffin
Abstract:
This report summarizes the discussions and conclusions of a 2-day multidisciplinary workshop that brought together researchers and practitioners in healthcare, computer science, and social sciences to explore what lessons were learned and what actions, primarily in research, could be taken. One consistent observation was that there is significant merit in thinking not only about pandemic situation…
▽ More
This report summarizes the discussions and conclusions of a 2-day multidisciplinary workshop that brought together researchers and practitioners in healthcare, computer science, and social sciences to explore what lessons were learned and what actions, primarily in research, could be taken. One consistent observation was that there is significant merit in thinking not only about pandemic situations, but also about peacetime advances, as many healthcare networks and communities are now in a perpetual state of crisis. Attendees discussed how the COVID-19 pandemic amplified gaps in our health and computing systems, and how current and future computing technologies could fill these gaps and improve the trajectory of the next pandemic.
Three major computing themes emerged from the workshop: models, data, and infrastructure. Computational models are extremely important during pandemics, from anticipating supply needs of hospitals, to determining the care capacity of hospital and social service providers, to projecting the spread of the disease. Accurate, reliable models can save lives, and inform community leaders on policy decisions. Health system users require accurate, reliable data to achieve success when applying models. This requires data and measurement standardization across health care organizations, modernizing the data infrastructure, and methods for ensuring data remains private while shared for model development, validation, and application. Finally, many health care systems lack the data, compute, and communication infrastructures required to build models on their data, use those models in ordinary operations, or even to reliably access their data. Robust and timely computing research has the potential to better support healthcare works to save lives in times of crisis (e.g., pandemics) and today during relative peacetime.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study
Authors:
Prottay Kumar Adhikary,
Aseem Srivastava,
Shivani Kumar,
Salam Michael Singh,
Puneet Manuja,
**i K Gopinath,
Vijay Krishnan,
Swati Kedia,
Koushik Sinha Deb,
Tanmoy Chakraborty
Abstract:
Comprehensive summaries of sessions enable an effective continuity in mental health counseling, facilitating informed therapy planning. Yet, manual summarization presents a significant challenge, diverting experts' attention from the core counseling process. This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of ther…
▽ More
Comprehensive summaries of sessions enable an effective continuity in mental health counseling, facilitating informed therapy planning. Yet, manual summarization presents a significant challenge, diverting experts' attention from the core counseling process. This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of therapy sessions through aspect-based summarization, aiming to benchmark their performance. We introduce MentalCLOUDS, a counseling-component guided summarization dataset consisting of 191 counseling sessions with summaries focused on three distinct counseling components (aka counseling aspects). Additionally, we assess the capabilities of 11 state-of-the-art LLMs in addressing the task of component-guided summarization in counseling. The generated summaries are evaluated quantitatively using standard summarization metrics and verified qualitatively by mental health professionals. Our findings demonstrate the superior performance of task-specific LLMs such as MentalLlama, Mistral, and MentalBART in terms of standard quantitative metrics such as Rouge-1, Rouge-2, Rouge-L, and BERTScore across all aspects of counseling components. Further, expert evaluation reveals that Mistral supersedes both MentalLlama and MentalBART based on six parameters -- affective attitude, burden, ethicality, coherence, opportunity costs, and perceived effectiveness. However, these models share the same weakness by demonstrating a potential for improvement in the opportunity costs and perceived effectiveness metrics.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Beyond Spatio-Temporal Representations: Evolving Fourier Transform for Temporal Graphs
Authors:
Anson Bastos,
Kuldeep Singh,
Abhishek Nadgeri,
Manish Singh,
Toyotaro Suzumura
Abstract:
We present the Evolving Graph Fourier Transform (EFT), the first invertible spectral transform that captures evolving representations on temporal graphs. We motivate our work by the inadequacy of existing methods for capturing the evolving graph spectra, which are also computationally expensive due to the temporal aspect along with the graph vertex domain. We view the problem as an optimization ov…
▽ More
We present the Evolving Graph Fourier Transform (EFT), the first invertible spectral transform that captures evolving representations on temporal graphs. We motivate our work by the inadequacy of existing methods for capturing the evolving graph spectra, which are also computationally expensive due to the temporal aspect along with the graph vertex domain. We view the problem as an optimization over the Laplacian of the continuous time dynamic graph. Additionally, we propose pseudo-spectrum relaxations that decompose the transformation process, making it highly computationally efficient. The EFT method adeptly captures the evolving graph's structural and positional properties, making it effective for downstream tasks on evolving graphs. Hence, as a reference implementation, we develop a simple neural model induced with EFT for capturing evolving graph spectra. We empirically validate our theoretical findings on a number of large-scale and standard temporal graph benchmarks and demonstrate that our model achieves state-of-the-art performance.
△ Less
Submitted 18 April, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Ranking Large Language Models without Ground Truth
Authors:
Amit Dhurandhar,
Rahul Nair,
Moninder Singh,
Elizabeth Daly,
Karthikeyan Natesan Ramamurthy
Abstract:
Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to evaluate each other which can be unreliable. In this paper, we provide a novel perspective where, given a dataset of prompts (viz. questions, instructi…
▽ More
Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to evaluate each other which can be unreliable. In this paper, we provide a novel perspective where, given a dataset of prompts (viz. questions, instructions, etc.) and a set of LLMs, we rank them without access to any ground truth or reference responses. Inspired by real life where both an expert and a knowledgeable person can identify a novice our main idea is to consider triplets of models, where each one of them evaluates the other two, correctly identifying the worst model in the triplet with high probability. We also analyze our idea and provide sufficient conditions for it to succeed. Applying this idea repeatedly, we propose two methods to rank LLMs. In experiments on different generative tasks (summarization, multiple-choice, and dialog), our methods reliably recover close to true rankings without reference data. This points to a viable low-resource mechanism for practical use.
△ Less
Submitted 10 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models
Authors:
Himanshu Beniwal,
Kowsik Nandagopan D,
Mayank Singh
Abstract:
Large Language Models (LLMs) are increasingly becoming ubiquitous, yet their ability to reason about and retain temporal information remains limited. This hinders their application in real-world scenarios where understanding the sequential nature of events is crucial. This paper experiments with state-of-the-art models on a novel, large-scale temporal dataset, \textbf{TempUN}, to reveal significan…
▽ More
Large Language Models (LLMs) are increasingly becoming ubiquitous, yet their ability to reason about and retain temporal information remains limited. This hinders their application in real-world scenarios where understanding the sequential nature of events is crucial. This paper experiments with state-of-the-art models on a novel, large-scale temporal dataset, \textbf{TempUN}, to reveal significant limitations in temporal retention and reasoning abilities. Interestingly, closed-source models indicate knowledge gaps more frequently, potentially suggesting a trade-off between uncertainty awareness and incorrect responses. Further, exploring various fine-tuning approaches yielded no major performance improvements. The associated dataset and code are available at the following URL (https://github.com/lingoiitgn/TempUN).
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Neural 5G Indoor Localization with IMU Supervision
Authors:
Aleksandr Ermolov,
Shreya Kadambi,
Maximilian Arnold,
Mohammed Hirzallah,
Roohollah Amiri,
Deepak Singh Mahendar Singh,
Srinivas Yerramalli,
Daniel Dijkman,
Fatih Porikli,
Taesang Yoo,
Bence Major
Abstract:
Radio signals are well suited for user localization because they are ubiquitous, can operate in the dark and maintain privacy. Many prior works learn map**s between channel state information (CSI) and position fully-supervised. However, that approach relies on position labels which are very expensive to acquire. In this work, this requirement is relaxed by using pseudo-labels during deployment,…
▽ More
Radio signals are well suited for user localization because they are ubiquitous, can operate in the dark and maintain privacy. Many prior works learn map**s between channel state information (CSI) and position fully-supervised. However, that approach relies on position labels which are very expensive to acquire. In this work, this requirement is relaxed by using pseudo-labels during deployment, which are calculated from an inertial measurement unit (IMU). We propose practical algorithms for IMU double integration and training of the localization system. We show decimeter-level accuracy on simulated and challenging real data of 5G measurements. Our IMU-supervised method performs similarly to fully-supervised, but requires much less effort to deploy.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Decision Theory-Guided Deep Reinforcement Learning for Fast Learning
Authors:
Zelin Wan,
**-Hee Cho,
Mu Zhu,
Ahmed H. Anwar,
Charles Kamhoua,
Munindar P. Singh
Abstract:
This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary…
▽ More
This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary problem contexts: the cart pole and maze navigation challenges. Experimental results demonstrate that the integration of decision theory not only facilitates effective initial guidance for DRL agents but also promotes a more structured and informed exploration strategy, particularly in environments characterized by large and intricate state spaces. The results of experiment demonstrate that DT-guided DRL can provide significantly higher rewards compared to regular DRL. Specifically, during the initial phase of training, the DT-guided DRL yields up to an 184% increase in accumulated reward. Moreover, even after reaching convergence, it maintains a superior performance, ending with up to 53% more reward than standard DRL in large maze problems. DT-guided DRL represents an advancement in mitigating a fundamental challenge of DRL by leveraging functions informed by human (designer) knowledge, setting a foundation for further research in this promising interdisciplinary domain.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Norm Enforcement with a Soft Touch: Faster Emergence, Happier Agents
Authors:
Sz-Ting Tzeng,
Nirav Ajmeri,
Munindar P. Singh
Abstract:
A multiagent system is a society of autonomous agents whose interactions can be regulated via social norms. In general, the norms of a society are not hardcoded but emerge from the agents' interactions. Specifically, how the agents in a society react to each other's behavior and respond to the reactions of others determines which norms emerge in the society. We think of these reactions by an agent…
▽ More
A multiagent system is a society of autonomous agents whose interactions can be regulated via social norms. In general, the norms of a society are not hardcoded but emerge from the agents' interactions. Specifically, how the agents in a society react to each other's behavior and respond to the reactions of others determines which norms emerge in the society. We think of these reactions by an agent to the satisfactory or unsatisfactory behaviors of another agent as communications from the first agent to the second agent. Understanding these communications is a kind of social intelligence: these communications provide natural drivers for norm emergence by pushing agents toward certain behaviors, which can become established as norms. Whereas it is well-known that sanctioning can lead to the emergence of norms, we posit that a broader kind of social intelligence can prove more effective in promoting cooperation in a multiagent system.
Accordingly, we develop Nest, a framework that models social intelligence via a wider variety of communications and understanding of them than in previous work. To evaluate Nest, we develop a simulated pandemic environment and conduct simulation experiments to compare Nest with baselines considering a combination of three kinds of social communication: sanction, tell, and hint.
We find that societies formed of Nest agents achieve norms faster. Moreover, Nest agents effectively avoid undesirable consequences, which are negative sanctions and deviation from goals, and yield higher satisfaction for themselves than baseline agents despite requiring only an equivalent amount of information.
△ Less
Submitted 5 March, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Maximizing the Minimum Eigenvalue in Constant Dimension
Authors:
Adam Brown,
Aditi Laddha,
Mohit Singh
Abstract:
In an instance of the minimum eigenvalue problem, we are given a collection of $n$ vectors $v_1,\ldots, v_n \subset {\mathbb{R}^d}$, and the goal is to pick a subset $B\subseteq [n]$ of given vectors to maximize the minimum eigenvalue of the matrix $\sum_{i\in B} v_i v_i^{\top} $. Often, additional combinatorial constraints such as cardinality constraint $\left(|B|\leq k\right)$ or matroid constra…
▽ More
In an instance of the minimum eigenvalue problem, we are given a collection of $n$ vectors $v_1,\ldots, v_n \subset {\mathbb{R}^d}$, and the goal is to pick a subset $B\subseteq [n]$ of given vectors to maximize the minimum eigenvalue of the matrix $\sum_{i\in B} v_i v_i^{\top} $. Often, additional combinatorial constraints such as cardinality constraint $\left(|B|\leq k\right)$ or matroid constraint ($B$ is a basis of a matroid defined on $[n]$) must be satisfied by the chosen set of vectors. The minimum eigenvalue problem with matroid constraints models a wide variety of problems including the Santa Clause problem, the E-design problem, and the constructive Kadison-Singer problem.
In this paper, we give a randomized algorithm that finds a set $B\subseteq [n]$ subject to any matroid constraint whose minimum eigenvalue is at least $(1-ε)$ times the optimum, with high probability. The running time of the algorithm is $O\left( n^{O(d\log(d)/ε^2)}\right)$. In particular, our results give a polynomial time asymptotic scheme when the dimension of the vectors is constant. Our algorithm uses a convex programming relaxation of the problem after guessing a rescaling which allows us to apply pipage rounding and matrix Chernoff inequalities to round to a good solution. The key new component is a structural lemma which enables us to "guess'' the appropriate rescaling, which could be of independent interest. Our approach generalizes the approximation guarantee to monotone, homogeneous functions and as such we can maximize $\det(\sum_{i\in B} v_i v_i^\top)^{1/d}$, or minimize any norm of the eigenvalues of the matrix $\left(\sum_{i\in B} v_i v_i^\top\right)^{-1} $, with the same running time under some mild assumptions. As a byproduct, we also get a simple algorithm for an algorithmic version of Kadison-Singer problem.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Cross-lingual Editing in Multilingual Language Models
Authors:
Himanshu Beniwal,
Kowsik Nandagopan D,
Mayank Singh
Abstract:
The training of large language models (LLMs) necessitates substantial data and computational resources, and updating outdated LLMs entails significant efforts and resources. While numerous model editing techniques (METs) have emerged to efficiently update model outputs without retraining, their effectiveness in multilingual LLMs, where knowledge is stored in diverse languages, remains an underexpl…
▽ More
The training of large language models (LLMs) necessitates substantial data and computational resources, and updating outdated LLMs entails significant efforts and resources. While numerous model editing techniques (METs) have emerged to efficiently update model outputs without retraining, their effectiveness in multilingual LLMs, where knowledge is stored in diverse languages, remains an underexplored research area. This research paper introduces the cross-lingual model editing (\textbf{XME}) paradigm, wherein a fact is edited in one language, and the subsequent update propagation is observed across other languages. To investigate the XME paradigm, we conducted experiments using BLOOM, mBERT, and XLM-RoBERTa using the two writing scripts: \textit{Latin} (English, French, and Spanish) and \textit{Indic} (Hindi, Gujarati, and Bengali). The results reveal notable performance limitations of state-of-the-art METs under the XME setting, mainly when the languages involved belong to two distinct script families. These findings highlight the need for further research and development of XME techniques to address these challenges. For more comprehensive information, the dataset used in this research and the associated code are publicly available at the following URL\url{https://github.com/lingo-iitgn/XME}.
△ Less
Submitted 3 February, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
LEGOBench: Scientific Leaderboard Generation Benchmark
Authors:
Shruti Singh,
Shoaib Alam,
Husain Malwat,
Mayank Singh
Abstract:
The ever-increasing volume of paper submissions makes it difficult to stay informed about the latest state-of-the-art research. To address this challenge, we introduce LEGOBench, a benchmark for evaluating systems that generate scientific leaderboards. LEGOBench is curated from 22 years of preprint submission data on arXiv and more than 11k machine learning leaderboards on the PapersWithCode porta…
▽ More
The ever-increasing volume of paper submissions makes it difficult to stay informed about the latest state-of-the-art research. To address this challenge, we introduce LEGOBench, a benchmark for evaluating systems that generate scientific leaderboards. LEGOBench is curated from 22 years of preprint submission data on arXiv and more than 11k machine learning leaderboards on the PapersWithCode portal. We present four graph-based and two language model-based leaderboard generation task configurations. We evaluate popular encoder-only scientific language models as well as decoder-only large language models across these task configurations. State-of-the-art models showcase significant performance gaps in automatic leaderboard generation on LEGOBench. The code is available on GitHub ( https://github.com/lingo-iitgn/LEGOBench ) and the dataset is hosted on OSF ( https://osf.io/9v2py/?view_only=6f91b0b510df498ba01595f8f278f94c ).
△ Less
Submitted 21 February, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation
Authors:
Manpreet Singh,
Ravdeep Pasricha,
Nitish Singh,
Ravi Prasad Kondapalli,
Manoj R,
Kiran R,
Laurent Boué
Abstract:
In this paper, we design a real-time question-answering system specifically targeted for hel** sellers get relevant material/documentation they can share live with their customers or refer to during a call. Taking the Seismic content repository as a relatively large scale example of a diverse dataset of sales material, we demonstrate how LLM embeddings of sellers' queries can be matched with the…
▽ More
In this paper, we design a real-time question-answering system specifically targeted for hel** sellers get relevant material/documentation they can share live with their customers or refer to during a call. Taking the Seismic content repository as a relatively large scale example of a diverse dataset of sales material, we demonstrate how LLM embeddings of sellers' queries can be matched with the relevant content. We achieve this by engineering prompts in an elaborate fashion that makes use of the rich set of meta-features available for documents and sellers. Using a bi-encoder with cross-encoder re-ranker architecture, we show how the solution returns the most relevant content recommendations in just a few seconds even for large datasets. Our recommender system is deployed as an AML endpoint for real-time inferencing and has been integrated into a Copilot interface that is now deployed in the production version of the Dynamics CRM, known as MSX, used daily by Microsoft sellers.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLM
Authors:
Ankit Yadav,
Mayank Singh
Abstract:
Driven by the surge in code generation using large language models (LLMs), numerous benchmarks have emerged to evaluate these LLMs capabilities. We conducted a large-scale human evaluation of HumanEval and MBPP, two popular benchmarks for Python code generation, analyzing their diversity and difficulty. Our findings unveil a critical bias towards a limited set of programming concepts, neglecting m…
▽ More
Driven by the surge in code generation using large language models (LLMs), numerous benchmarks have emerged to evaluate these LLMs capabilities. We conducted a large-scale human evaluation of HumanEval and MBPP, two popular benchmarks for Python code generation, analyzing their diversity and difficulty. Our findings unveil a critical bias towards a limited set of programming concepts, neglecting most of the other concepts entirely. Furthermore, we uncover a worrying prevalence of easy tasks, potentially inflating model performance estimations. To address these limitations, we propose a novel benchmark, PythonSaga, featuring 185 hand-crafted prompts on a balanced representation of 38 programming concepts across diverse difficulty levels.
△ Less
Submitted 26 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Approximation Algorithms for the Weighted Nash Social Welfare via Convex and Non-Convex Programs
Authors:
Adam Brown,
Aditi Laddha,
Madhusudhan Reddy Pittu,
Mohit Singh
Abstract:
In an instance of the weighted Nash Social Welfare problem, we are given a set of $m$ indivisible items, $\mathscr{G}$, and $n$ agents, $\mathscr{A}$, where each agent $i \in \mathscr{A}$ has a valuation $v_{ij}\geq 0$ for each item $j\in \mathscr{G}$. In addition, every agent $i$ has a non-negative weight $w_i$ such that the weights collectively sum up to $1$. The goal is to find an assignment…
▽ More
In an instance of the weighted Nash Social Welfare problem, we are given a set of $m$ indivisible items, $\mathscr{G}$, and $n$ agents, $\mathscr{A}$, where each agent $i \in \mathscr{A}$ has a valuation $v_{ij}\geq 0$ for each item $j\in \mathscr{G}$. In addition, every agent $i$ has a non-negative weight $w_i$ such that the weights collectively sum up to $1$. The goal is to find an assignment $σ:\mathscr{G}\rightarrow \mathscr{A}$ that maximizes $\prod_{i\in \mathscr{A}} \left(\sum_{j\in σ^{-1}(i)} v_{ij}\right)^{w_i}$, the product of the weighted valuations of the players. When all the weights equal $\frac1n$, the problem reduces to the classical Nash Social Welfare problem, which has recently received much attention. In this work, we present a $5\cdot\exp\left(2\cdot D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})\right) = 5\cdot\exp\left(2\log{n} + 2\sum_{i=1}^n w_i \log{w_i}\right)$-approximation algorithm for the weighted Nash Social Welfare problem, where $D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})$ denotes the KL-divergence between the distribution induced by $\mathbf{w}$ and the uniform distribution on $[n]$.
We show a novel connection between the convex programming relaxations for the unweighted variant of Nash Social Welfare presented in \cite{cole2017convex, anari2017nash}, and generalize the programs to two different mathematical programs for the weighted case. The first program is convex and is necessary for computational efficiency, while the second program is a non-convex relaxation that can be rounded efficiently. The approximation factor derives from the difference in the objective values of the convex and non-convex relaxation.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
A Novel Sampled Clustering Algorithm for Rice Phenotypic Data
Authors:
Mithun Singh,
Kapil Ahuja,
Milind B. Ratnaparkhe
Abstract:
Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorit…
▽ More
Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways.
First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Also, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better.
Second, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis.
Third, with experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling.
△ Less
Submitted 12 May, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
GroupMixNorm Layer for Learning Fair Models
Authors:
Anubha Pandey,
Aditi Rai,
Maneet Singh,
Deepak Bhatt,
Tanmoy Bhowmik
Abstract:
Recent research has identified discriminatory behavior of automated prediction algorithms towards groups identified on specific protected attributes (e.g., gender, ethnicity, age group, etc.). When deployed in real-world scenarios, such techniques may demonstrate biased predictions resulting in unfair outcomes. Recent literature has witnessed algorithms for mitigating such biased behavior mostly b…
▽ More
Recent research has identified discriminatory behavior of automated prediction algorithms towards groups identified on specific protected attributes (e.g., gender, ethnicity, age group, etc.). When deployed in real-world scenarios, such techniques may demonstrate biased predictions resulting in unfair outcomes. Recent literature has witnessed algorithms for mitigating such biased behavior mostly by adding convex surrogates of fairness metrics such as demographic parity or equalized odds in the loss function, which are often not easy to estimate. This research proposes a novel in-processing based GroupMixNorm layer for mitigating bias from deep learning models. The GroupMixNorm layer probabilistically mixes group-level feature statistics of samples across different groups based on the protected attribute. The proposed method improves upon several fairness metrics with minimal impact on overall accuracy. Analysis on benchmark tabular and image datasets demonstrates the efficacy of the proposed method in achieving state-of-the-art performance. Further, the experimental analysis also suggests the robustness of the GroupMixNorm layer against new protected attributes during inference and its utility in eliminating bias from a pre-trained network.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Assessing GPT4-V on Structured Reasoning Tasks
Authors:
Mukul Singh,
José Cambronero,
Sumit Gulwani,
Vu Le,
Gust Verbruggen
Abstract:
Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chai…
▽ More
Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chain-of-Thought to multi-modal LLMs, yields significant improvements over the vanilla model. We also present a categorized analysis of scenarios where these models perform well and where they struggle, highlighting challenges associated with coherent multimodal reasoning.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems
Authors:
Andrew C. Freeman,
Ketan Mayer-Patel,
Montek Singh
Abstract:
The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal…
▽ More
The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
△ Less
Submitted 8 February, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models
Authors:
Manish Nagireddy,
Lamogha Chiazor,
Moninder Singh,
Ioana Baldini
Abstract:
Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant to capture the amplification of social bias, via stigmas, in generative language models. Taking inspiration from social science research, we start with a documented list of 93 US-centric stigmas and cur…
▽ More
Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant to capture the amplification of social bias, via stigmas, in generative language models. Taking inspiration from social science research, we start with a documented list of 93 US-centric stigmas and curate a question-answering (QA) dataset which involves simple social situations. Our benchmark, SocialStigmaQA, contains roughly 10K prompts, with a variety of prompt styles, carefully constructed to systematically test for both social bias and model robustness. We present results for SocialStigmaQA with two open source generative language models and we find that the proportion of socially biased output ranges from 45% to 59% across a variety of decoding strategies and prompting styles. We demonstrate that the deliberate design of the templates in our benchmark (e.g., adding biasing text to the prompt or using different verbs that change the answer that indicates bias) impacts the model tendencies to generate socially biased output. Additionally, through manual evaluation, we discover problematic patterns in the generated chain-of-thought output that range from subtle bias to lack of reasoning.
Warning: This paper contains examples of text which are toxic, biased, and potentially harmful.
△ Less
Submitted 27 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Motion Informed Needle Segmentation in Ultrasound Images
Authors:
Raghavv Goel,
Cecilia Morales,
Manpreet Singh,
Artur Dubrawski,
John Galeotti,
Howie Choset
Abstract:
Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both ne…
▽ More
Segmenting a moving needle in ultrasound images is challenging due to the presence of artifacts, noise, and needle occlusion. This task becomes even more demanding in scenarios where data availability is limited. In this paper, we present a novel approach for needle segmentation for 2D ultrasound that combines classical Kalman Filter (KF) techniques with data-driven learning, incorporating both needle features and needle motion. Our method offers three key contributions. First, we propose a compatible framework that seamlessly integrates into commonly used encoder-decoder style architectures. Second, we demonstrate superior performance compared to recent state-of-the-art needle segmentation models using our novel convolutional neural network (CNN) based KF-inspired block, achieving a 15\% reduction in pixel-wise needle tip error and an 8\% reduction in length error. Third, to our knowledge we are the first to implement a learnable filter to incorporate non-linear needle motion for improving needle segmentation.
△ Less
Submitted 3 May, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Authors:
Animesh Sinha,
Bo Sun,
Anmol Kalia,
Arantxa Casanova,
Elliot Blanchard,
David Yan,
Winnie Zhang,
Tony Nelli,
Jiahui Chen,
Hardik Shah,
Licheng Yu,
Mitesh Kumar Singh,
Ankit Ramchandani,
Maziar Sanjabi,
Sonal Gupta,
Amy Bearman,
Dhruv Mahajan
Abstract:
We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that r…
▽ More
We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that relying on prompt engineering with a photorealistic model to generate stickers leads to poor prompt alignment and scene diversity. To overcome these drawbacks, we first finetune Emu on millions of sticker-like images collected using weak supervision to elicit diversity. Next, we curate human-in-the-loop (HITL) Alignment and Style datasets from model generations, and finetune to improve prompt alignment and style alignment respectively. Sequential finetuning on these datasets poses a tradeoff between better style alignment and prompt alignment gains. To address this tradeoff, we propose a novel fine-tuning method called Style Tailoring, which jointly fits the content and style distribution and achieves best tradeoff. Evaluation results show our method improves visual quality by 14%, prompt alignment by 16.2% and scene diversity by 15.3%, compared to prompt engineering the base Emu model for stickers generation.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Authors:
Rohit Girdhar,
Mannat Singh,
Andrew Brown,
Quentin Duval,
Samaneh Azadi,
Sai Saketh Rambhatla,
Akbar Shah,
Xi Yin,
Devi Parikh,
Ishan Misra
Abstract:
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolut…
▽ More
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Balancing Notions of Equity: Approximation Algorithms for Fair Portfolio of Solutions in Combinatorial Optimization
Authors:
Swati Gupta,
Jai Moondra,
Mohit Singh
Abstract:
Inspired by equity considerations, we consider top-$k$ norm, ordered norm, and symmetric monotonic norm objectives for various combinatorial optimization problems. Top-$k$ norms and ordered norms have natural interpretations in terms of minimizing the impact on individuals bearing largest costs. To model decision-making with multiple equity criteria, we study the notion of portfolios of solutions…
▽ More
Inspired by equity considerations, we consider top-$k$ norm, ordered norm, and symmetric monotonic norm objectives for various combinatorial optimization problems. Top-$k$ norms and ordered norms have natural interpretations in terms of minimizing the impact on individuals bearing largest costs. To model decision-making with multiple equity criteria, we study the notion of portfolios of solutions with the property that each norm or equity criteria has an approximately optimal solution in this portfolio. We attempt to characterize portfolios by their sizes and approximation factor guarantees for various combinatorial problems. For a given problem, we investigate whether (1) there exists a single solution that is approximately optimal for all norms, (2) there exists a small approximately optimal portfolio of size larger than 1, (3) there exist polynomial time algorithms to find these small portfolios. We study an algorithmic framework to obtain single solutions that are approximately optimal for all norms. We show the existence of such a solution for problems such as $k$-clustering, ordered set cover, scheduling for job completion time minimization, and scheduling for machine load minimization on identical machines. We also give efficient algorithms to find these solutions in most cases, except set cover where we show there is a gap in terms of computational complexity. Our work improves upon the best-known approximation factor across all norms for a single solution in $k$-clustering. For uncapacitated facility location and scheduling for machine load minimization with identical jobs, we obtain logarithmic sized portfolios, also providing a matching lower bound in the latter case. Our work results in new open combinatorial questions, which might be of independent interest.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Incorporating Zero-Knowledge Succinct Non-interactive Argument of Knowledge for Blockchain-based Identity Management with off-chain computations
Authors:
Pranay Kothari,
Deepak Chopra,
Manjot Singh,
Shivam Bhardwaj,
Rudresh Dwivedi
Abstract:
In today's world, secure and efficient biometric authentication is of keen importance. Traditional authentication methods are no longer considered reliable due to their susceptibility to cyber-attacks. Biometric authentication, particularly fingerprint authentication, has emerged as a promising alternative, but it raises concerns about the storage and use of biometric data, as well as centralized…
▽ More
In today's world, secure and efficient biometric authentication is of keen importance. Traditional authentication methods are no longer considered reliable due to their susceptibility to cyber-attacks. Biometric authentication, particularly fingerprint authentication, has emerged as a promising alternative, but it raises concerns about the storage and use of biometric data, as well as centralized storage, which could make it vulnerable to cyber-attacks. In this paper, a novel blockchain-based fingerprint authentication system is proposed that integrates zk-SNARKs, which are zero-knowledge proofs that enable secure and efficient authentication without revealing sensitive biometric information. A KNN-based approach on the FVC2002, FVC2004 and FVC2006 datasets is used to generate a cancelable template for secure, faster, and robust biometric registration and authentication which is stored using the Interplanetary File System. The proposed approach provides an average accuracy of 99.01%, 98.97% and 98.52% over the FVC2002, FVC2004 and FVC2006 datasets respectively for fingerprint authentication. Incorporation of zk-SNARK facilitates smaller proof size. Overall, the proposed method has the potential to provide a secure and efficient solution for blockchain-based identity management.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Moral Sparks in Social Media Narratives
Authors:
Ruijie Xi,
Munindar P. Singh
Abstract:
There is increasing interest in building computational models of moral reasoning by people to enable effective interaction by Artificial Intelligence (AI) agents. We examine interactions on social media to understand human moral judgments in real-life ethical scenarios. Specifically, we examine posts from a popular Reddit subreddit (i.e., a subcommunity) called r/AmITheAsshole, where authors and c…
▽ More
There is increasing interest in building computational models of moral reasoning by people to enable effective interaction by Artificial Intelligence (AI) agents. We examine interactions on social media to understand human moral judgments in real-life ethical scenarios. Specifically, we examine posts from a popular Reddit subreddit (i.e., a subcommunity) called r/AmITheAsshole, where authors and commenters share their moral judgments on who (i.e., which participant of the described scenario) is blameworthy. To investigate the underlying reasoning influencing moral judgments, we focus on excerpts-which we term moral sparks-from original posts that some commenters include to indicate what motivates their judgments. To this end, we examine how (1) events activating social commonsense and (2) linguistic signals affect the identified moral sparks and their subsequent judgments. By examining over 24672 posts and 175988 comments, we find that event-related negative character traits (e.g., immature and rude) attract attention and stimulate blame, implying a dependent relationship between character traits and moral values. Specifically, we focus on causal graphs involving events (c-events) that activate social commonsense. We observe that c-events are perceived with varying levels of informativeness, influencing moral spark and judgment assignment in distinct ways. This observation is reinforced by examining linguistic features describing semantically similar c-events. Moreover, language influencing commenters' cognitive processes enhances the probability of an excerpt becoming a moral spark, while factual and concrete descriptions tend to inhibit this effect.
△ Less
Submitted 21 April, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Authors:
Mukul Singh,
José Cambronero,
Sumit Gulwani,
Vu Le,
Carina Negreanu,
Gust Verbruggen
Abstract:
Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses thi…
▽ More
Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.
△ Less
Submitted 1 November, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language
Authors:
Mukul Singh,
José Cambronero,
Sumit Gulwani,
Vu Le,
Carina Negreanu,
Elnaz Nouri,
Mohammad Raza,
Gust Verbruggen
Abstract:
Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can…
▽ More
Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.
△ Less
Submitted 1 November, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
TST$^\mathrm{R}$: Target Similarity Tuning Meets the Real World
Authors:
Anirudh Khatry,
Sumit Gulwani,
Priyanshu Gupta,
Vu Le,
Ananya Singha,
Mukul Singh,
Gust Verbruggen
Abstract:
Target similarity tuning (TST) is a method of selecting relevant examples in natural language (NL) to code generation through large language models (LLMs) to improve performance. Its goal is to adapt a sentence embedding model to have the similarity between two NL inputs match the similarity between their associated code outputs. In this paper, we propose different methods to apply and improve TST…
▽ More
Target similarity tuning (TST) is a method of selecting relevant examples in natural language (NL) to code generation through large language models (LLMs) to improve performance. Its goal is to adapt a sentence embedding model to have the similarity between two NL inputs match the similarity between their associated code outputs. In this paper, we propose different methods to apply and improve TST in the real world. First, we replace the sentence transformer with embeddings from a larger model, which reduces sensitivity to the language distribution and thus provides more flexibility in synthetic generation of examples, and we train a tiny model that transforms these embeddings to a space where embedding similarity matches code similarity, which allows the model to remain a black box and only requires a few matrix multiplications at inference time. Second, we show how to efficiently select a smaller number of training examples to train the TST model. Third, we introduce a ranking-based evaluation for TST that does not require end-to-end code generation experiments, which can be expensive to perform.
△ Less
Submitted 28 October, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
An Online Self-calibrating Refractive Camera Model with Application to Underwater Odometry
Authors:
Mohit Singh,
Mihir Dharmadhikari,
Kostas Alexis
Abstract:
This work presents a camera model for refractive media such as water and its application in underwater visual-inertial odometry. The model is self-calibrating in real-time and is free of known correspondences or calibration targets. It is separable as a distortion model (dependent on refractive index $n$ and radial pixel coordinate) and a virtual pinhole model (as a function of $n$). We derive the…
▽ More
This work presents a camera model for refractive media such as water and its application in underwater visual-inertial odometry. The model is self-calibrating in real-time and is free of known correspondences or calibration targets. It is separable as a distortion model (dependent on refractive index $n$ and radial pixel coordinate) and a virtual pinhole model (as a function of $n$). We derive the self-calibration formulation leveraging epipolar constraints to estimate the refractive index and subsequently correct for distortion. Through experimental studies using an underwater robot integrating cameras and inertial sensing, the model is validated regarding the accurate estimation of the refractive index and its benefits for robust odometry estimation in an extended envelope of conditions. Lastly, we show the transition between media and the estimation of the varying refractive index online, thus allowing computer vision tasks across refractive media.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Location Estimation and Recovery using 5G Positioning: Thwarting GNSS Spoofing Attacks
Authors:
Aneet Kumar Dutta,
Sebastian Brandt,
Mridula Singh
Abstract:
The availability of cheap GNSS spoofers can prevent safe navigation and tracking of road users. It can lead to loss of assets, inaccurate fare estimation, enforcing the wrong speed limit, miscalculated toll tax, passengers reaching an incorrect location, etc. The techniques designed to prevent and detect spoofing by using cryptographic solutions or receivers capable of differentiating legitimate a…
▽ More
The availability of cheap GNSS spoofers can prevent safe navigation and tracking of road users. It can lead to loss of assets, inaccurate fare estimation, enforcing the wrong speed limit, miscalculated toll tax, passengers reaching an incorrect location, etc. The techniques designed to prevent and detect spoofing by using cryptographic solutions or receivers capable of differentiating legitimate and attack signals are insufficient in detecting GNSS spoofing of road users. Recent studies, testbeds, and 3GPP standards are exploring the possibility of hybrid positioning, where GNSS data will be combined with the 5G-NR positioning to increase the security and accuracy of positioning. We design the Location Estimation and Recovery(LER) systems to estimate the correct absolute position using the combination of GNSS and 5G positioning with other road users, where a subset of road users can be malicious and collude to prevent spoofing detection. Our Location Verification Protocol extends the understanding of Message Time of Arrival Codes (MTAC) to prevent attacks against malicious provers. The novel Recovery and Meta Protocol uses road users' dynamic and unpredictable nature to detect GNSS spoofing. This protocol provides fast detection of GNSS spoofing with a very low rate of false positives and can be customized to a large family of settings. Even in a (highly unrealistic) worst-case scenario where each user is malicious with a probability of as large as 0.3, our protocol detects GNSS spoofing with high probability after communication and ranging with at most 20 road users, with a false positive rate close to 0. SUMO simulations for road traffic show that we can detect GNSS spoofing in 2.6 minutes since its start under moderate traffic conditions.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.