-
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Authors:
Colin White,
Samuel Dooley,
Manley Roberts,
Arka Pal,
Ben Feuer,
Siddhartha Jain,
Ravid Shwartz-Ziv,
Neel Jain,
Khalid Saifullah,
Siddartha Naidu,
Chinmay Hegde,
Yann LeCun,
Tom Goldstein,
Willie Neiswanger,
Micah Goldblum
Abstract:
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In…
▽ More
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In this work, we introduce a new benchmark for LLMs designed to be immune to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. We release LiveBench, the first benchmark that (1) contains frequently-updated questions from recent information sources, (2) scores answers automatically according to objective ground-truth values, and (3) contains a wide variety of challenging tasks, spanning math, coding, reasoning, language, instruction following, and data analysis. To achieve this, LiveBench contains questions that are based on recently-released math competitions, arXiv papers, news articles, and datasets, and it contains harder, contamination-free versions of tasks from previous benchmarks such as Big-Bench Hard, AMPS, and IFEval. We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size. LiveBench is difficult, with top models achieving below 65% accuracy. We release all questions, code, and model answers. Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time so that LiveBench can distinguish between the capabilities of LLMs as they improve in the future. We welcome community engagement and collaboration for expanding the benchmark tasks and models.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Advancing Question Answering on Handwritten Documents: A State-of-the-Art Recognition-Based Model for HW-SQuAD
Authors:
Aniket Pal,
Ajoy Mondal,
C. V. Jawahar
Abstract:
Question-answering handwritten documents is a challenging task with numerous real-world applications. This paper proposes a novel recognition-based approach that improves upon the previous state-of-the-art on the HW-SQuAD and BenthamQA datasets. Our model incorporates transformer-based document retrieval and ensemble methods at the model level, achieving an Exact Match score of 82.02% and 92.55% i…
▽ More
Question-answering handwritten documents is a challenging task with numerous real-world applications. This paper proposes a novel recognition-based approach that improves upon the previous state-of-the-art on the HW-SQuAD and BenthamQA datasets. Our model incorporates transformer-based document retrieval and ensemble methods at the model level, achieving an Exact Match score of 82.02% and 92.55% in HW-SQuAD and BenthamQA datasets, respectively, surpassing the previous best recognition-based approach by 10.89% and 26%. We also enhance the document retrieval component, boosting the top-5 retrieval accuracy from 90% to 95.30%. Our results demonstrate the significance of our proposed approach in advancing question answering on handwritten documents. The code and trained models will be publicly available to facilitate future research in this critical area of natural language.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Large Language Models Must Be Taught to Know What They Don't Know
Authors:
Sanyam Kapoor,
Nate Gruver,
Manley Roberts,
Katherine Collins,
Arka Pal,
Umang Bhatt,
Adrian Weller,
Samuel Dooley,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati…
▽ More
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models
Authors:
Johan S Daniel,
Anand Pal
Abstract:
The advancement of large language models has significantly improved natural language processing. However, challenges such as jailbreaks (prompt injections that cause an LLM to follow instructions contrary to its intended use), hallucinations (generating incorrect or misleading information), and comprehension errors remain prevalent. In this report, we present a comparative analysis of the performa…
▽ More
The advancement of large language models has significantly improved natural language processing. However, challenges such as jailbreaks (prompt injections that cause an LLM to follow instructions contrary to its intended use), hallucinations (generating incorrect or misleading information), and comprehension errors remain prevalent. In this report, we present a comparative analysis of the performance of fifteen distinct models, with each model undergoing a standardized test comprising 38 queries across three key metrics: jailbreaks, hallucinations, and comprehension errors. The models are assessed based on the total occurrences of jailbreaks, hallucinations, and comprehension errors. Our work exposes these models' inherent vulnerabilities and challenges the notion of human-level language comprehension of these models. We have empirically analysed the impact of non-standard Unicode characters on LLMs and their safeguarding mechanisms on the best-performing LLMs, including GPT-4, Gemini 1.5 Pro, LlaMA-3-70B, and Claude 3 Opus. By incorporating alphanumeric symbols from Unicode outside the standard Latin block and variants of characters in other languages, we observed a reduction in the efficacy of guardrails implemented through Reinforcement Learning Human Feedback (RLHF). Consequently, these models exhibit heightened vulnerability to content policy breaches and prompt leakage. Our study also suggests a need to incorporate non-standard Unicode text in LLM training data to enhance the capabilities of these models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Certified Robustness against Sparse Adversarial Perturbations via Data Localization
Authors:
Ambar Pal,
René Vidal,
Jeremias Sulam
Abstract:
Recent work in adversarial robustness suggests that natural data distributions are localized, i.e., they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this w…
▽ More
Recent work in adversarial robustness suggests that natural data distributions are localized, i.e., they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to $\ell_0$-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of $\ell_0$-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Text and Audio Simplification: Human vs. ChatGPT
Authors:
Gondy Leroy,
David Kauchak,
Philip Harber,
Ankit Pal,
Akash Shukla
Abstract:
Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including…
▽ More
Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora. We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated these texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention.
△ Less
Submitted 29 April, 2024;
originally announced May 2024.
-
Thermal Performance of a Liquid-cooling Assisted Thin Wickless Vapor Chamber
Authors:
Arani Mukhopadhyay,
Anish Pal,
Mohamad Jafari Gukeh,
Constantine M. Megaridis
Abstract:
The ever-increasing need for power consumption in electronic devices, coupled with the requirement for thinner size, calls for the development of efficient heat spreading components. Vapor chambers (VCs), because of their ability to effectively spread heat over a large area by two-phase heat transfer, seem ideal for such applications. However, creating thin and efficient vapor chambers that work o…
▽ More
The ever-increasing need for power consumption in electronic devices, coupled with the requirement for thinner size, calls for the development of efficient heat spreading components. Vapor chambers (VCs), because of their ability to effectively spread heat over a large area by two-phase heat transfer, seem ideal for such applications. However, creating thin and efficient vapor chambers that work over a wide range of power inputs is a persisting challenge. VCs that use wicks for circulating the phase changing media, suffer from capillary restrictions, dry-out, clogging, increase in size and weight, and can often be costly. Recent developments in wick-free wettability patterned vapor chambers replace traditional wicks with laser-fabricated wickless components. An experimental setup allows for fast testing and experimental evaluation of water-charged VCs with liquid-assisted cooling. The sealed chamber can maintain vacuum for long durations, and can be used for testing of very thin wick-free VCs. This work extends our previous study by decreasing overall thickness of the wick-free VC down to 3 mm and evaluates its performance. Furthermore, the impact of wettability patterns on VC performance is investigated, by carrying out experiments both in non-patterned and patterned VCs. Experiments are first carried out on a wick-free VC with no wettability patterns and comprising of an entirely superhydrophilic evaporator coupled with a hydrophobic condenser. Thereafter, wettability patterns that aid the rapid return of water to the heated site on the evaporator and improve condensation on the condenser of the vapor chamber are implemented. The thermal characteristics show that the patterned VCs outperform the non-patterned VCs under all scenarios. The patterned VCs exhibit low thermal resistance independent of fluid charging ratio withstanding higher power inputs without thermal dry-outs.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Evaluation of Thermal Performance of a Wick-free Vapor Chamber in Power Electronics Cooling
Authors:
Arani Mukhopadhyay,
Anish Pal,
Congbo Bao,
Mohamad Jafari Gukeh,
Sudip K. Mazumder,
Constantine M. Megaridis
Abstract:
Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the w…
▽ More
Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the wicking material, which slows down condensate transport rates and increases the chances for dry-out. Thicker wicks add to overall thermal resistance, while deterring the development of thinner devices by limiting the total thickness of the vapor chamber. Wickless vapor chambers eliminate the use of metal wicks entirely, by incorporating complementary wettability-patterned flat plates on both the evaporator and the condenser side. Such surface modifications enhance fluid transport on the evaporator side, while allowing the chambers to be virtually as thin as imaginable, thereby permitting design of thermally efficient thin electronic cooling devices. While wick-free vapor chambers have been studied and efficient design strategies have been suggested, we delve into real-life applications of wick-free vapor chambers in forced air cooling of high-power electronics. An experimental setup is developed wherein two Si-based MOSFETs of TO-247-3 packaging having high conduction resistance, are connected in parallel and switched at 100 kHz, to emulate high frequency power electronics operations. A rectangular copper wick-free vapor chamber spreads heat laterally over a surface 13 times larger than the heating area. This chamber is cooled externally by a fan that circulates air at room temperature. The present experimental setup extends our previous work on wick-free vapor chambers, while demonstrating the effectiveness of low-cost air cooling in vapor-chamber enhanced high-power electronics applications.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Queues with resetting: a perspective
Authors:
Reshmi Roy,
Arup Biswas,
Arnab Pal
Abstract:
Performance modeling is a key issue in queuing theory and operation research. It is well-known that the length of a queue that awaits service or the time spent by a job in a queue depends not only on the service rate, but also crucially on the fluctuations in service time. The larger the fluctuations, the longer the delay becomes and hence, this is a major hindrance for the queue to operate effici…
▽ More
Performance modeling is a key issue in queuing theory and operation research. It is well-known that the length of a queue that awaits service or the time spent by a job in a queue depends not only on the service rate, but also crucially on the fluctuations in service time. The larger the fluctuations, the longer the delay becomes and hence, this is a major hindrance for the queue to operate efficiently. Various strategies have been adapted to prevent this drawback. In this perspective, we investigate the effects of one such novel strategy namely resetting or restart, an emerging concept in statistical physics and stochastic complex process, that was recently introduced to mitigate fluctuations-induced delays in queues. In particular, we show that a service resetting mechanism accompanied with an overhead time can remarkably shorten the average queue lengths and waiting times. We examine various resetting strategies and further shed light on the intricate role of the overhead times to the queuing performance. Our analysis opens up future avenues in operation research where resetting-based strategies can be universally promising.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
[RE] Modeling Personalized Item Frequency Information for Next-basket Recommendation
Authors:
Sławomir Garcarz,
Avik Pal,
Pim Praat
Abstract:
This paper focuses on reproducing and extending the results of the paper: "Modeling Personalized Item Frequency Information for Next-basket Recommendation" which introduced the TIFU-KNN model and proposed to utilize Personalized Item Frequency (PIF) for Next Basket Recommendation (NBR). We utilized publicly available grocery shop** datasets used in the original paper and incorporated additional…
▽ More
This paper focuses on reproducing and extending the results of the paper: "Modeling Personalized Item Frequency Information for Next-basket Recommendation" which introduced the TIFU-KNN model and proposed to utilize Personalized Item Frequency (PIF) for Next Basket Recommendation (NBR). We utilized publicly available grocery shop** datasets used in the original paper and incorporated additional datasets to assess the generalizability of the findings. We evaluated the performance of the models using metrics such as Recall@K, NDCG@K, personalized-hit ratio (PHR), and Mean Reciprocal Rank (MRR). Furthermore, we conducted a thorough examination of fairness by considering user characteristics such as average basket size, item popularity, and novelty. Lastly, we introduced novel $β$-VAE architecture to model NBR. The experimental results confirmed that the reproduced model, TIFU-KNN, outperforms the baseline model, Personal Top Frequency, on various datasets and metrics. The findings also highlight the challenges posed by smaller basket sizes in some datasets and suggest avenues for future research to improve NBR performance.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Hitting "Probe"rty with Non-Linearity, and More
Authors:
Avik Pal,
Madhura Pawar
Abstract:
Structural probes learn a linear transformation to find how dependency trees are embedded in the hidden states of language models. This simple design may not allow for full exploitation of the structure of the encoded information. Hence, to investigate the structure of the encoded information to its full extent, we incorporate non-linear structural probes. We reformulate the design of non-linear s…
▽ More
Structural probes learn a linear transformation to find how dependency trees are embedded in the hidden states of language models. This simple design may not allow for full exploitation of the structure of the encoded information. Hence, to investigate the structure of the encoded information to its full extent, we incorporate non-linear structural probes. We reformulate the design of non-linear structural probes introduced by White et al. making its design simpler yet effective. We also design a visualization framework that lets us qualitatively assess how strongly two words in a sentence are connected in the predicted dependency tree. We use this technique to understand which non-linear probe variant is good at encoding syntactical information. Additionally, we also use it to qualitatively investigate the structure of dependency trees that BERT encodes in each of its layers. We find that the radial basis function (RBF) is an effective non-linear probe for the BERT model than the linear probe.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Authors:
Arka Pal,
Deep Karkhanis,
Samuel Dooley,
Manley Roberts,
Siddartha Naidu,
Colin White
Abstract:
Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the \textit{relative} probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can le…
▽ More
Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the \textit{relative} probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a \textit{reduction} of the model's likelihood of the preferred examples, as long as the relative probability between the preferred and dispreferred classes increases. We then show empirically that this phenomenon occurs when fine-tuning LLMs on common datasets, especially datasets in which the edit distance between pairs of completions is low. Using these insights, we design DPO-Positive (DPOP), a new loss function and training procedure which avoids this failure mode. Surprisingly, we also find that DPOP significantly outperforms DPO across a wide variety of datasets and downstream tasks, including datasets with high edit distances between completions. By fine-tuning with DPOP, we create and release Smaug-34B and Smaug-72B, which achieve state-of-the-art open-source performance. Notably, Smaug-72B is nearly 2\% better than any other open-source model on the HuggingFace Open LLM Leaderboard and becomes the first open-source LLM to surpass an average accuracy of 80\%.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
Authors:
Ankit Pal,
Malaikannan Sankarasubbu
Abstract:
Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation. For this purpose, we comprehensively evaluated both open-source LLMs and Google's new multimodal LLM called Gemini across Medical reasoning, hallucination detection, and Medical Visual Question Answering tasks. While Gemini showe…
▽ More
Large language models have the potential to be valuable in the healthcare industry, but it's crucial to verify their safety and effectiveness through rigorous evaluation. For this purpose, we comprehensively evaluated both open-source LLMs and Google's new multimodal LLM called Gemini across Medical reasoning, hallucination detection, and Medical Visual Question Answering tasks. While Gemini showed competence, it lagged behind state-of-the-art models like MedPaLM 2 and GPT-4 in diagnostic accuracy. Additionally, Gemini achieved an accuracy of 61.45\% on the medical VQA dataset, significantly lower than GPT-4V's score of 88\%. Our analysis revealed that Gemini is highly susceptible to hallucinations, overconfidence, and knowledge gaps, which indicate risks if deployed uncritically. We also performed a detailed analysis by medical subject and test type, providing actionable feedback for developers and clinicians. To mitigate risks, we applied prompting strategies that improved performance. Additionally, we facilitated future research and development by releasing a Python module for medical LLM evaluation and establishing a dedicated leaderboard on Hugging Face for medical domain LLMs. Python module can be found at https://github.com/promptslab/RosettaEval
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
Authors:
Sahil Khose,
Anisha Pal,
Aayushi Agarwal,
Deepanshi,
Judy Hoffman,
Prithvijit Chattopadhyay
Abstract:
Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkySce…
▽ More
Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layout (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) Models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions, and (4) incorporating additional sensor modalities (depth) can improve aerial scene understanding.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Household navigation and manipulation for everyday object rearrangement tasks
Authors:
Shrutheesh R. Iyer,
Anwesan Pal,
Jiaming Hu,
Akanimoh Adeleye,
Aditya Aggarwal,
Henrik I. Christensen
Abstract:
We consider the problem of building an assistive robotic system that can help humans in daily household cleanup tasks. Creating such an autonomous system in real-world environments is inherently quite challenging, as a general solution may not suit the preferences of a particular customer. Moreover, such a system consists of multi-objective tasks comprising -- (i) Detection of misplaced objects an…
▽ More
We consider the problem of building an assistive robotic system that can help humans in daily household cleanup tasks. Creating such an autonomous system in real-world environments is inherently quite challenging, as a general solution may not suit the preferences of a particular customer. Moreover, such a system consists of multi-objective tasks comprising -- (i) Detection of misplaced objects and prediction of their potentially correct placements, (ii) Fine-grained manipulation for stable object gras**, and (iii) Room-to-room navigation for transferring objects in unseen environments. This work systematically tackles each component and integrates them into a complete object rearrangement pipeline. To validate our proposed system, we conduct multiple experiments on a real robotic platform involving multi-room object transfer, user preference-based placement, and complex pick-and-place tasks. Project page: https://sites.google.com/eng.ucsd.edu/home-robot
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Creating Temporally Correlated High-Resolution Power Injection Profiles Using Physics-Aware GAN
Authors:
Hritik Gopal Shah,
Behrouz Azimian,
Anamitra Pal
Abstract:
Traditional smart meter measurements lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale…
▽ More
Traditional smart meter measurements lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.
△ Less
Submitted 21 November, 2023; v1 submitted 20 November, 2023;
originally announced November 2023.
-
A Video-Based Activity Classification of Human Pickers in Agriculture
Authors:
Abhishesh Pal,
Antonio C. Leite,
Jon G. O. Gjevestad,
Pål J. From
Abstract:
In farming systems, harvesting operations are tedious, time- and resource-consuming tasks. Based on this, deploying a fleet of autonomous robots to work alongside farmworkers may provide vast productivity and logistics benefits. Then, an intelligent robotic system should monitor human behavior, identify the ongoing activities and anticipate the worker's needs. In this work, the main contribution c…
▽ More
In farming systems, harvesting operations are tedious, time- and resource-consuming tasks. Based on this, deploying a fleet of autonomous robots to work alongside farmworkers may provide vast productivity and logistics benefits. Then, an intelligent robotic system should monitor human behavior, identify the ongoing activities and anticipate the worker's needs. In this work, the main contribution consists of creating a benchmark model for video-based human pickers detection, classifying their activities to serve in harvesting operations for different agricultural scenarios. Our solution uses the combination of a Mask Region-based Convolutional Neural Network (Mask R-CNN) for object detection and optical flow for motion estimation with newly added statistical attributes of flow motion descriptors, named as Correlation Sensitivity (CS). A classification criterion is defined based on the Kernel Density Estimation (KDE) analysis and K-means clustering algorithm, which are implemented upon in-house collected dataset from different crop fields like strawberry polytunnels and apple tree orchards. The proposed framework is quantitatively analyzed using sensitivity, specificity, and accuracy measures and shows satisfactory results amidst various dataset challenges such as lighting variation, blur, and occlusions.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Analytical Verification of Performance of Deep Neural Network Based Time-Synchronized Distribution System State Estimation
Authors:
Behrouz Azimian,
Shiva Moshtagh,
Anamitra Pal,
Shanshan Ma
Abstract:
Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not eff…
▽ More
Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not effectively indicate a trained DNN's ability to handle input perturbations. As such, we analytically verify robustness and trustworthiness of DNNs to input perturbations by treating them as mixed-integer linear programming (MILP) problems. The ability of batch normalization in addressing the scalability limitations of the MILP formulation is also highlighted. The framework is validated by performing time-synchronized distribution system state estimation for a modified IEEE 34-node system and a real-world large distribution system, both of which are incompletely observed by micro-phasor measurement units.
△ Less
Submitted 22 February, 2024; v1 submitted 12 November, 2023;
originally announced November 2023.
-
Concept-based Anomaly Detection in Retail Stores for Automatic Correction using Mobile Robots
Authors:
Aditya Kapoor,
Vartika Sengar,
Nijil George,
Vighnesh Vatsal,
Jayavardhana Gubbi,
Balamuralidhar P,
Arpan Pal
Abstract:
Tracking of inventory and rearrangement of misplaced items are some of the most labor-intensive tasks in a retail environment. While there have been attempts at using vision-based techniques for these tasks, they mostly use planogram compliance for detection of any anomalies, a technique that has been found lacking in robustness and scalability. Moreover, existing systems rely on human interventio…
▽ More
Tracking of inventory and rearrangement of misplaced items are some of the most labor-intensive tasks in a retail environment. While there have been attempts at using vision-based techniques for these tasks, they mostly use planogram compliance for detection of any anomalies, a technique that has been found lacking in robustness and scalability. Moreover, existing systems rely on human intervention to perform corrective actions after detection. In this paper, we present Co-AD, a Concept-based Anomaly Detection approach using a Vision Transformer (ViT) that is able to flag misplaced objects without using a prior knowledge base such as a planogram. It uses an auto-encoder architecture followed by outlier detection in the latent space. Co-AD has a peak success rate of 89.90% on anomaly detection image sets of retail objects drawn from the RP2K dataset, compared to 80.81% on the best-performing baseline of a standard ViT auto-encoder. To demonstrate its utility, we describe a robotic mobile manipulation pipeline to autonomously correct the anomalies flagged by Co-AD. This work is ultimately aimed towards develo** autonomous mobile robot solutions that reduce the need for human intervention in retail store management.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Pathologist-Like Explanations Unveiled: an Explainable Deep Learning System for White Blood Cell Classification
Authors:
Aditya Shankar Pal,
Debojyoti Biswas,
Joy Mahapatra,
Debasis Banerjee,
Prantar Chakrabarti,
Utpal Garain
Abstract:
White blood cells (WBCs) play a crucial role in safeguarding the human body against pathogens and foreign substances. Leveraging the abundance of WBC imaging data and the power of deep learning algorithms, automated WBC analysis has the potential for remarkable accuracy. However, the capability of deep learning models to explain their WBC classification remains largely unexplored. In this study, w…
▽ More
White blood cells (WBCs) play a crucial role in safeguarding the human body against pathogens and foreign substances. Leveraging the abundance of WBC imaging data and the power of deep learning algorithms, automated WBC analysis has the potential for remarkable accuracy. However, the capability of deep learning models to explain their WBC classification remains largely unexplored. In this study, we introduce HemaX, an explainable deep neural network-based model that produces pathologist-like explanations using five attributes: granularity, cytoplasm color, nucleus shape, size relative to red blood cells, and nucleus to cytoplasm ratio (N:C), along with cell classification, localization, and segmentation. HemaX is trained and evaluated on a novel dataset, LeukoX, comprising 467 blood smear images encompassing ten (10) WBC types. The proposed model achieves impressive results, with an average classification accuracy of 81.08% and a Jaccard index of 89.16% for cell localization. Additionally, HemaX performs well in generating the five explanations with a normalized mean square error of 0.0317 for N:C ratio and over 80% accuracy for the other four attributes. Comprehensive experiments comparing against multiple state-of-the-art models demonstrate that HemaX's classification accuracy remains unaffected by its ability to provide explanations. Moreover, empirical analyses and validation by expert hematologists confirm the faithfulness of explanations predicted by our proposed model.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
CLIFT: Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain
Authors:
Ankit Pal
Abstract:
This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question-answering task. The testbed includes 7.5k high-quality question answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several QA deep-learning models under the proposed testbed. Despite impressive results on the original test set, the pe…
▽ More
This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question-answering task. The testbed includes 7.5k high-quality question answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several QA deep-learning models under the proposed testbed. Despite impressive results on the original test set, the performance degrades when applied to new test sets, which shows the distribution shift. Our findings emphasize the need for and the potential for increasing the robustness of clinical domain models under distributional shifts. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shifts. We plan to expand the corpus by adding more samples and model results. The full paper and the updated benchmark are available at github.com/openlifescience-ai/clift
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness
Authors:
Ambar Pal,
Jeremias Sulam,
René Vidal
Abstract:
The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples…
▽ More
The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, utilizing structure in data naturally leads to classifiers that enjoy data-dependent polyhedral robustness guarantees, improving upon methods for provable certification in certain regimes.
△ Less
Submitted 25 May, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
A Boosted Machine Learning Framework for the Improvement of Phase and Crystal Structure Prediction of High Entropy Alloys Using Thermodynamic and Configurational Parameters
Authors:
Debsundar Dey,
Suchandan Das,
Anik Pal,
Santanu Dey,
Chandan Kumar Raul,
Arghya Chatterjee
Abstract:
The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is rooted in the diverse phases and the crystal structures they contain. In the realm of material informatics, employing machine learning (ML) techniques to classify phases and crystal structures of HEAs has gained considerable significance. In this study, we assembled a new collection of 1345 HEAs with varying compositions…
▽ More
The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is rooted in the diverse phases and the crystal structures they contain. In the realm of material informatics, employing machine learning (ML) techniques to classify phases and crystal structures of HEAs has gained considerable significance. In this study, we assembled a new collection of 1345 HEAs with varying compositions to predict phases. Within this collection, there were 705 sets of data that were utilized to predict the crystal structures with the help of thermodynamics and electronic configuration. Our study introduces a methodical framework i.e., the Pearson correlation coefficient that helps in selecting the strongly co-related features to increase the prediction accuracy. This study employed five distinct boosting algorithms to predict phases and crystal structures, offering an enhanced guideline for improving the accuracy of these predictions. Among all these algorithms, XGBoost gives the highest accuracy of prediction (94.05%) for phases and LightGBM gives the highest accuracy of prediction of crystal structure of the phases (90.07%). The quantification of the influence exerted by parameters on the model's accuracy was conducted and a new approach was made to elucidate the contribution of individual parameters in the process of phase prediction and crystal structure prediction.
△ Less
Submitted 31 December, 2023; v1 submitted 2 September, 2023;
originally announced September 2023.
-
Giraffe: Adventures in Expanding Context Lengths in LLMs
Authors:
Arka Pal,
Deep Karkhanis,
Manley Roberts,
Samuel Dooley,
Arvind Sundararajan,
Siddartha Naidu
Abstract:
Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of w…
▽ More
Modern large language models (LLMs) that rely on attention mechanisms are typically trained with fixed context lengths which enforce upper limits on the length of input sequences that they can handle at evaluation time. To use these models on sequences longer than the train-time context length, one might employ techniques from the growing family of context length extrapolation methods -- most of which focus on modifying the system of positional encodings used in the attention mechanism to indicate where tokens or activations are located in the input sequence. We conduct a wide survey of existing methods of context length extrapolation on a base LLaMA or LLaMA 2 model, and introduce some of our own design as well -- in particular, a new truncation strategy for modifying the basis for the position encoding.
We test these methods using three new evaluation tasks (FreeFormQA, AlteredNumericQA, and LongChat-Lines) as well as perplexity, which we find to be less fine-grained as a measure of long context performance of LLMs. We release the three tasks publicly as datasets on HuggingFace. We discover that linear scaling is the best method for extending context length, and show that further gains can be achieved by using longer scales at evaluation time. We also discover promising extrapolation capabilities in the truncated basis. To support further research in this area, we release three new 13B parameter long-context models which we call Giraffe: 4k and 16k context models trained from base LLaMA-13B, and a 32k context model trained from base LLaMA2-13B. We also release the code to replicate our results.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory
Authors:
Anwesan Pal,
Sahil Wadhwa,
Ayush Jaiswal,
Xu Zhang,
Yue Wu,
Rakesh Chada,
Pradeep Natarajan,
Henrik I. Christensen
Abstract:
Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (…
▽ More
Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntm
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
DOST -- Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
Authors:
Soumadeep Saha,
Utpal Garain,
Arijit Ukil,
Arpan Pal,
Sundeep Khandelwal
Abstract:
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question ha…
▽ More
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more "data efficient" and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Med-HALT: Medical Domain Hallucination Test for Large Language Models
Authors:
Ankit Pal,
Logesh Kumar Umapathi,
Malaikannan Sankarasubbu
Abstract:
This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), design…
▽ More
This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs's problem-solving and information retrieval abilities.
Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable language models in healthcare. Our benchmark can be found at medhalt.github.io
△ Less
Submitted 14 October, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Modeling and parametric optimization of 3D tendon-sheath actuator system for upper limb soft exosuit
Authors:
Amit Yadav,
Nitesh Kumar,
Shaurya Surana,
Aravind Ramasamy,
Abhishek Rudra Pal,
Sushma Santapuri,
Lalan Kumar,
Suriya Prakash Muthukrishnan,
Shubhendu Bhasin,
Sitikantha Roy
Abstract:
This paper presents an analysis of parametric characterization of a motor driven tendon-sheath actuator system for use in upper limb augmentation for applications such as rehabilitation, therapy, and industrial automation. The double tendon sheath system, which uses two sets of cables (agonist and antagonist side) guided through a sheath, is considered to produce smooth and natural-looking movemen…
▽ More
This paper presents an analysis of parametric characterization of a motor driven tendon-sheath actuator system for use in upper limb augmentation for applications such as rehabilitation, therapy, and industrial automation. The double tendon sheath system, which uses two sets of cables (agonist and antagonist side) guided through a sheath, is considered to produce smooth and natural-looking movements of the arm. The exoskeleton is equipped with a single motor capable of controlling both the flexion and extension motions. One of the key challenges in the implementation of a double tendon sheath system is the possibility of slack in the tendon, which can impact the overall performance of the system. To address this issue, a robust mathematical model is developed and a comprehensive parametric study is carried out to determine the most effective strategies for overcoming the problem of slack and improving the transmission. The study suggests that incorporating a series spring into the system's tendon leads to a universally applicable design, eliminating the need for individual customization. The results also show that the slack in the tendon can be effectively controlled by changing the pretension, spring constant, and size and geometry of spool mounted on the axle of motor.
△ Less
Submitted 10 September, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER
Authors:
Aniket Pal,
Krishna Pratap Singh
Abstract:
The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bo…
▽ More
The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bottleneck Attention Module (BAM) to improve feature representation and the decoder has a Gated Input-GRU (GI-GRU) unit with an extra gate to make decoding long and complex expressions easier. The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate). It also performs better in terms of top 1, 2, and 3 error accuracy across the CROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprate among all models on the CROHME 2019 dataset. Importantly, these successes are accomplished with a drop in the complexity of the calculation and a reduction in the need for GPU memory.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
OptimShare: A Unified Framework for Privacy Preserving Data Sharing -- Towards the Practical Utility of Data with Privacy
Authors:
M. A. P. Chamikara,
Seung Ick Jang,
Ian Oppermann,
Dongxi Liu,
Musotto Roberto,
Sushmita Ruj,
Arindam Pal,
Meisam Mohammady,
Seyit Camtepe,
Sylvia Young,
Chris Dorrian,
Nasir David
Abstract:
Tabular data sharing serves as a common method for data exchange. However, sharing sensitive information without adequate privacy protection can compromise individual privacy. Thus, ensuring privacy-preserving data sharing is crucial. Differential privacy (DP) is regarded as the gold standard in data privacy. Despite this, current DP methods tend to generate privacy-preserving tabular datasets tha…
▽ More
Tabular data sharing serves as a common method for data exchange. However, sharing sensitive information without adequate privacy protection can compromise individual privacy. Thus, ensuring privacy-preserving data sharing is crucial. Differential privacy (DP) is regarded as the gold standard in data privacy. Despite this, current DP methods tend to generate privacy-preserving tabular datasets that often suffer from limited practical utility due to heavy perturbation and disregard for the tables' utility dynamics. Besides, there has not been much research on selective attribute release, particularly in the context of controlled partially perturbed data sharing. This has significant implications for scenarios such as cross-agency data sharing in real-world situations. We introduce OptimShare: a utility-focused, multi-criteria solution designed to perturb input datasets selectively optimized for specific real-world applications. OptimShare combines the principles of differential privacy, fuzzy logic, and probability theory to establish an integrated tool for privacy-preserving data sharing. Empirical assessments confirm that OptimShare successfully strikes a balance between better data utility and robust privacy, effectively serving various real-world problem scenarios.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Machine Learning Approach for Cancer Entities Association and Classification
Authors:
G. Jeyakodi,
Arkadeep Pal,
Debapratim Gupta,
K. Sarukeswari,
V. Amouda
Abstract:
According to the World Health Organization (WHO), cancer is the second leading cause of death globally. Scientific research on different types of cancers grows at an ever-increasing rate, publishing large volumes of research articles every year. The insight information and the knowledge of the drug, diagnostics, risk, symptoms, treatments, etc., related to genes are significant factors that help e…
▽ More
According to the World Health Organization (WHO), cancer is the second leading cause of death globally. Scientific research on different types of cancers grows at an ever-increasing rate, publishing large volumes of research articles every year. The insight information and the knowledge of the drug, diagnostics, risk, symptoms, treatments, etc., related to genes are significant factors that help explore and advance the cancer research progression. Manual screening of such a large volume of articles is very laborious and time-consuming to formulate any hypothesis. The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature. Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries. Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening. Machine learning classifiers are also used to build the classification model and Structured Query Languages (SQL) is used to identify the hidden relations that may lead to significant predictions.
△ Less
Submitted 24 June, 2023; v1 submitted 30 May, 2023;
originally announced June 2023.
-
Understanding Noise-Augmented Training for Randomized Smoothing
Authors:
Ambar Pal,
Jeremias Sulam
Abstract:
Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks while making minimal assumptions about a classifier. This method relies on taking a majority vote of any base classifier over multiple noise-perturbed inputs to obtain a smoothed classifier, and it remains the tool of choice to certify deep and complex neural network models. Nonetheless, no…
▽ More
Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks while making minimal assumptions about a classifier. This method relies on taking a majority vote of any base classifier over multiple noise-perturbed inputs to obtain a smoothed classifier, and it remains the tool of choice to certify deep and complex neural network models. Nonetheless, non-trivial performance of such smoothed classifier crucially depends on the base model being trained on noise-augmented data, i.e., on a smoothed input distribution. While widely adopted in practice, it is still unclear how this noisy training of the base classifier precisely affects the risk of the robust smoothed classifier, leading to heuristics and tricks that are poorly understood. In this work we analyze these trade-offs theoretically in a binary classification setting, proving that these common observations are not universal. We show that, without making stronger distributional assumptions, no benefit can be expected from predictors trained with noise-augmentation, and we further characterize distributions where such benefit is obtained. Our analysis has direct implications to the practical deployment of randomized smoothing, and we illustrate some of these via experiments on CIFAR-10 and MNIST, as well as on synthetic datasets.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Dependently Ty** R Vectors, Arrays, and Matrices
Authors:
John Wrenn,
Anjali Pal,
Alexa VanHattum,
Shriram Krishnamurthi
Abstract:
The R programming language is widely used in large-scale data analyses. It contains especially rich built-in support for dealing with vectors, arrays, and matrices. These operations feature prominently in the applications that form R's raison d'être, making their behavior worth understanding. Furthermore, ostensibly for programmer convenience, their behavior in R is a notable extension over the co…
▽ More
The R programming language is widely used in large-scale data analyses. It contains especially rich built-in support for dealing with vectors, arrays, and matrices. These operations feature prominently in the applications that form R's raison d'être, making their behavior worth understanding. Furthermore, ostensibly for programmer convenience, their behavior in R is a notable extension over the corresponding operations in mathematics, thereby offering some challenges for specification and static verification.
We report on progress towards statically ty** this aspect of the R language. The interesting aspects of ty**, in this case, warn programmers about violating bounds, so the types must necessarily be dependent. We explain the ways in which R extends standard mathematical behavior. We then show how R's behavior can be specified in LiquidHaskell, a dependently-typed extension to Haskell. In the general case, actually verifying library and client code is currently beyond LiquidHaskell's reach; therefore, this work provides challenges and opportunities both for ty** R and for progress in dependently-typed programming languages.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
End-to-End Latency Optimization of Multi-view 3D Reconstruction for Disaster Response
Authors:
Xiaojie Zhang,
Mingjun Li,
Andrew Hilton,
Amitangshu Pal,
Soumyabrata Dey,
Saptarshi Debroy
Abstract:
In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openM…
▽ More
In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openMVG/openMVS) are exceedingly time consuming, especially when run on such computationally constrained mobile edge devices. Additionally, reducing the reconstruction latency of such inherently sequential algorithms is challenging as unintelligent, application-agnostic strategies can drastically degrade the reconstruction (i.e., application outcome) quality making them useless. In this paper, we aim to design a latency optimized MVS algorithm pipeline, with the objective to best balance the end-to-end latency and reconstruction quality by running the pipeline on a collaborative mobile edge environment. The overall optimization approach is two-pronged where: (a) application optimizations introduce data-level parallelism by splitting the pipeline into high frequency and low frequency reconstruction components and (b) system optimizations incorporate task-level parallelism to the pipelines by running them opportunistically on available resources with online quality control in order to balance both latency and quality. Our evaluation on a hardware testbed using publicly available datasets shows upto ~54% reduction in latency with negligible loss (~4-7%) in reconstruction quality.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
On the Importance of Signer Overlap for Sign Language Detection
Authors:
Abhilash Pal,
Stephan Huber,
Cyrine Chaabani,
Alessandro Manzotti,
Oscar Koller
Abstract:
Sign language detection, identifying if someone is signing or not, is becoming crucially important for its applications in remote conferencing software and for selecting useful sign data for training sign language recognition or translation tasks. We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well due to signer ove…
▽ More
Sign language detection, identifying if someone is signing or not, is becoming crucially important for its applications in remote conferencing software and for selecting useful sign data for training sign language recognition or translation tasks. We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well due to signer overlap between train and test partitions. We quantify this with a detailed analysis of the effect of signer overlap on current sign detection benchmark data sets. Comparing accuracy with and without overlap on the DGS corpus and Signing in the Wild, we observed a relative decrease in accuracy of 4.17% and 6.27%, respectively. Furthermore, we propose new data set partitions that are free of overlap and allow for more realistic performance assessment. We hope this work will contribute to improving the accuracy and generalization of sign language detection systems.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!
Authors:
Avik Pal,
Alan Edelman,
Chris Rackauckas
Abstract:
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of st…
▽ More
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
△ Less
Submitted 2 June, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Non-Intrusive Driver Behavior Characterization From Road-Side Cameras
Authors:
Pavana Pradeep Kumar,
Krishna Kant,
Amitangshu Pal
Abstract:
In this paper, we demonstrate a proof of concept for characterizing vehicular behavior using only the roadside cameras of the ITS system. The essential advantage of this method is that it can be implemented in the roadside infrastructure transparently and inexpensively and can have a global view of each vehicle's behavior without any involvement of or awareness by the individual vehicles or driver…
▽ More
In this paper, we demonstrate a proof of concept for characterizing vehicular behavior using only the roadside cameras of the ITS system. The essential advantage of this method is that it can be implemented in the roadside infrastructure transparently and inexpensively and can have a global view of each vehicle's behavior without any involvement of or awareness by the individual vehicles or drivers. By using a setup that includes programmatically controlled robot cars (to simulate different types of vehicular behaviors) and an external video camera set up to capture and analyze the vehicular behavior, we show that the driver classification based on the external video analytics yields accuracies that are within 1-2\% of the accuracies of direct vehicle-based characterization. We also show that the residual errors primarily relate to gaps in correct object identification and tracking and thus can be further reduced with a more sophisticated setup. The characterization can be used to enhance both the safety and performance of the traffic flow, particularly in the mixed manual and automated vehicle scenarios that are expected to be common soon.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
An efficient quantum-classical hybrid algorithm for distorted alphanumeric character identification
Authors:
Ankur Pal,
Abhishek Shukla,
Anirban Pathak
Abstract:
An algorithm for image processing is proposed. The proposed algorithm, which can be viewed as a quantum-classical hybrid algorithm, can transform a low-resolution bitonal image of a character from the set of alphanumeric characters (A-Z, 0-9) into a high-resolution image. The quantum part of the proposed algorithm fruitfully utilizes a variant of Grover's search algorithm, known as the fixed point…
▽ More
An algorithm for image processing is proposed. The proposed algorithm, which can be viewed as a quantum-classical hybrid algorithm, can transform a low-resolution bitonal image of a character from the set of alphanumeric characters (A-Z, 0-9) into a high-resolution image. The quantum part of the proposed algorithm fruitfully utilizes a variant of Grover's search algorithm, known as the fixed point search algorithm. Further, the quantum part of the algorithm is simulated using CQASM and the advantage of the algorithm is established through the complexity analysis. Additional analysis has also revealed that this scheme for optical character recognition (OCR) leads to high confidence value and generally works in a more efficient manner compared to the existing classical, quantum, and hybrid algorithms for a similar task.
△ Less
Submitted 25 December, 2022;
originally announced December 2022.
-
Machine Learning Assisted Inverse Design of Microresonators
Authors:
Arghadeep Pal,
Alekhya Ghosh,
Shuangyou Zhang,
Toby Bi,
Pascal DeľHaye
Abstract:
The high demand for fabricating microresonators with desired optical properties has led to various techniques to optimize geometries, mode structures, nonlinearities and dispersion. Depending on applications, the dispersion in such resonators counters their optical nonlinearities and influences the intracavity optical dynamics. In this paper, we demonstrate the use of a machine learning (ML) algor…
▽ More
The high demand for fabricating microresonators with desired optical properties has led to various techniques to optimize geometries, mode structures, nonlinearities and dispersion. Depending on applications, the dispersion in such resonators counters their optical nonlinearities and influences the intracavity optical dynamics. In this paper, we demonstrate the use of a machine learning (ML) algorithm as a tool to determine the geometry of microresonators from their dispersion profiles. The training dataset with ~460 samples is generated by finite element simulations and the model is experimentally verified using integrated silicon nitride microresonators. Two ML algorithms are compared along with suitable hyperparameter tuning, out of which Random Forest (RF) yields the best results. The average error on the simulated data is well below 15%.
△ Less
Submitted 10 November, 2022;
originally announced December 2022.
-
Time-Synchronized Full System State Estimation Considering Practical Implementation Challenges
Authors:
Antos Cheeramban Varghese,
Hritik Shah,
Behrouz Azimian,
Anamitra Pal,
Evangelos Farantatos
Abstract:
As the phasor measurement unit (PMU) placement problem involves a cost-benefit trade-off, more PMUs get placed on the higher voltage buses. However, this causes many of the lower voltage levels of the bulk power system to not be observed by PMUs. This lack of visibility then makes time-synchronized state estimation of the full system a challenging problem. We propose a Deep Neural network-based St…
▽ More
As the phasor measurement unit (PMU) placement problem involves a cost-benefit trade-off, more PMUs get placed on the higher voltage buses. However, this causes many of the lower voltage levels of the bulk power system to not be observed by PMUs. This lack of visibility then makes time-synchronized state estimation of the full system a challenging problem. We propose a Deep Neural network-based State Estimator (DeNSE) to overcome this problem. The DeNSE employs a Bayesian framework to indirectly combine inferences drawn from slow timescale but widespread supervisory control and data acquisition (SCADA) data with fast timescale but select PMU data to attain sub-second situational awareness of the entire system. The practical utility of the proposed approach is demonstrated by considering topology changes, non-Gaussian measurement noise, and bad data detection and correction. The results obtained using the IEEE 118-bus system show the superiority of the DeNSE over a purely SCADA state estimator and a PMU-only linear state estimator from a techno-economic viability perspective. Lastly, scalability of the DeNSE is proven by estimating the states of a large and realistic 2000-bus Synthetic Texas system.
△ Less
Submitted 21 March, 2024; v1 submitted 3 December, 2022;
originally announced December 2022.
-
On Utilizing Relationships for Transferable Few-Shot Fine-Grained Object Detection
Authors:
Ambar Pal,
Arnau Ramisa,
Amit Kumar K C,
René Vidal
Abstract:
State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top…
▽ More
State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top of a table", is much easier. Additionally, common-sense relationships like "on-top-of" are easy to annotate in a task-agnostic fashion. In this paper, we propose a probabilistic model that uses such relational knowledge to transform an off-the-shelf detector of coarse object categories (e.g., "table", "lamp") into a detector of fine-grained categories (e.g., "table-lamp"). We demonstrate that our method, RelDetect, achieves performance competitive to finetuning based state-of-the-art object detector baselines when an extremely low amount of fine-grained annotations is available ($0.2\%$ of entire dataset). We also demonstrate that RelDetect is able to utilize the inherent transferability of relationship information to obtain a better performance ($+5$ mAP points) than the above baselines on an unseen dataset (zero-shot transfer). In summary, we demonstrate the power of using relationships for object detection on datasets where fine-grained object categories can be linked to coarse-grained categories via suitable relationships.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
DeepParliament: A Legal domain Benchmark & Dataset for Parliament Bills Prediction
Authors:
Ankit Pal
Abstract:
This paper introduces DeepParliament, a legal domain Benchmark Dataset that gathers bill documents and metadata and performs various bill status classification tasks. The proposed dataset text covers a broad range of bills from 1986 to the present and contains richer information on parliament bill content. Data collection, detailed statistics and analyses are provided in the paper. Moreover, we ex…
▽ More
This paper introduces DeepParliament, a legal domain Benchmark Dataset that gathers bill documents and metadata and performs various bill status classification tasks. The proposed dataset text covers a broad range of bills from 1986 to the present and contains richer information on parliament bill content. Data collection, detailed statistics and analyses are provided in the paper. Moreover, we experimented with different types of models ranging from RNN to pretrained and reported the results. We are proposing two new benchmarks: Binary and Multi-Class Bill Status classification. Models developed for bill documents and relevant supportive tasks may assist Members of Parliament (MPs), presidents, and other legal practitioners. It will help review or prioritise bills, thus speeding up the billing process, improving the quality of decisions and reducing the time consumption in both houses. Considering that the foundation of the country's democracy is Parliament and state legislatures, we anticipate that our research will be an essential addition to the Legal NLP community. This work will be the first to present a Parliament bill prediction task. In order to improve the accessibility of legal AI resources and promote reproducibility, we have made our code and dataset publicly accessible at github.com/monk1337/DeepParliament
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges
Authors:
Madhura Joshi,
Ankit Pal,
Malaikannan Sankarasubbu
Abstract:
Federated learning is the process of develo** machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, an…
▽ More
Federated learning is the process of develo** machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, and applications a practitioner should be aware of in the topic of federated learning. This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries.
△ Less
Submitted 19 November, 2022; v1 submitted 14 November, 2022;
originally announced November 2022.
-
A novel approach to preventing SARS-CoV-2 transmission in classrooms: An OpenFOAM based CFD Study
Authors:
Anish Pal,
Riddhideep Biswas,
Ritam Pal,
Sourav Sarkar,
Achintya Mukhopadhyay
Abstract:
The education sector has suffered a catastrophic setback due to ongoing COVID-pandemic, with classrooms being closed indefinitely. The current study aims to solve the existing dilemma by examining COVID transmission inside a classroom and providing long-term sustainable solutions. In this work, a standard 5m x 3m x 5m classroom is considered where 24 students are seated, accompanied by a teacher.…
▽ More
The education sector has suffered a catastrophic setback due to ongoing COVID-pandemic, with classrooms being closed indefinitely. The current study aims to solve the existing dilemma by examining COVID transmission inside a classroom and providing long-term sustainable solutions. In this work, a standard 5m x 3m x 5m classroom is considered where 24 students are seated, accompanied by a teacher. A computational fluid dynamics simulation based on OpenFOAM is performed using a Eulerian-Lagrangian framework. Based on the stochastic dose response framework, we have evaluated the infection risk in the classroom for two distinct cases: (i) certain students are infected (ii) the teacher is infected. If the teacher is infected, the probability of infection could reach 100% for certain students. When certain students are infected, the maximum infection risk for a susceptible person reaches 30%. The commonly used cloth mask proves to be ineffective in providing protection against infection transmission reducing the maximum infection probability by approximately 26% only. Another commonly used solution in the form of shields installed on desks have also failed to provide adequate protection against infection reducing the infection risk only by 50%. Furthermore, the shields serves as a source of fomite mode of infection. Screens suspended from the ceiling, which entrap droplets, have been proposed as a novel solution that reduces the infection risk by 90% and 95% compared to the no screen scenario besides being completely devoid of fomite infection mode. As a result of the screens, the class-time can be extended by 55 minutes.
△ Less
Submitted 12 October, 2022;
originally announced November 2022.
-
Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation
Authors:
Subhadeep Dey,
Abhilash Pal,
Cyrine Chaabani,
Oscar Koller
Abstract:
This paper describes Microsoft's submission to the first shared task on sign language translation at WMT 2022, a public competition tackling sign language to spoken language translation for Swiss German sign language. The task is very challenging due to data scarcity and an unprecedented vocabulary size of more than 20k words on the target side. Moreover, the data is taken from real broadcast news…
▽ More
This paper describes Microsoft's submission to the first shared task on sign language translation at WMT 2022, a public competition tackling sign language to spoken language translation for Swiss German sign language. The task is very challenging due to data scarcity and an unprecedented vocabulary size of more than 20k words on the target side. Moreover, the data is taken from real broadcast news, includes native signing and covers scenarios of long videos. Motivated by recent advances in action recognition, we incorporate full body information by extracting features from a pre-trained I3D model and applying a standard transformer network. The accuracy of the system is further improved by applying careful data cleaning on the target text. We obtain BLEU scores of 0.6 and 0.78 on the test and dev set respectively, which is the best score among the participants of the shared task. Also in the human evaluation the submission reaches the first place. The BLEU score is further improved to 1.08 on the dev set by applying features extracted from a lip reading model.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Abstract Interpretation-Based Feature Importance for SVMs
Authors:
Abhinandan Pal,
Francesco Ranzato,
Caterina Urban,
Marco Zanella
Abstract:
We propose a symbolic representation for support vector machines (SVMs) by means of abstract interpretation, a well-known and successful technique for designing and implementing static program analyses. We leverage this abstraction in two ways: (1) to enhance the interpretability of SVMs by deriving a novel feature importance measure, called abstract feature importance (AFI), that does not depend…
▽ More
We propose a symbolic representation for support vector machines (SVMs) by means of abstract interpretation, a well-known and successful technique for designing and implementing static program analyses. We leverage this abstraction in two ways: (1) to enhance the interpretability of SVMs by deriving a novel feature importance measure, called abstract feature importance (AFI), that does not depend in any way on a given dataset of the accuracy of the SVM and is very fast to compute, and (2) for verifying stability, notably individual fairness, of SVMs and producing concrete counterexamples when the verification fails. We implemented our approach and we empirically demonstrated its effectiveness on SVMs based on linear and non-linear (polynomial and radial basis function) kernels. Our experimental results show that, independently of the accuracy of the SVM, our AFI measure correlates much more strongly with the stability of the SVM to feature perturbations than feature importance measures widely available in machine learning software such as permutation feature importance. It thus gives better insight into the trustworthiness of SVMs.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Distilling the Undistillable: Learning from a Nasty Teacher
Authors:
Surgan Jandial,
Yash Khasbage,
Arghya Pal,
Vineeth N Balasubramanian,
Balaji Krishnamurthy
Abstract:
The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher…
▽ More
The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Sample-efficient Model Predictive Control Design of Soft Robotics by Bayesian Optimization
Authors:
Anuj Pal,
Tianyi He,
Wenpeng Wei
Abstract:
This paper presents a sample-efficient data-driven method to design model predictive control (MPC) for cable-actuated soft robotics using Bayesian optimization. Instead of modeling the complex dynamics of the soft robots, the proposed approach uses Bayesian optimization to search the best-guessed low-dimensional prediction model and its associated controller to minimize the objective function of c…
▽ More
This paper presents a sample-efficient data-driven method to design model predictive control (MPC) for cable-actuated soft robotics using Bayesian optimization. Instead of modeling the complex dynamics of the soft robots, the proposed approach uses Bayesian optimization to search the best-guessed low-dimensional prediction model and its associated controller to minimize the objective function of closed-loop responses. The prediction model is updated by Bayesian optimization from the closed-loop input-output data in each iteration. A linear MPC is then designed based on the updated prediction model, and evaluated based on the closed-loop responses. Different from directly searching controller parameters, the closed-loop system stability, and inputs/outputs constraints can be easily handled in the MPC design. After a few iterations, a convergent solution of a (sub-)optimal controller can be obtained, which minimizes the user-defined closed-loop performance index. The proposed method is simulated and validated by a high-fidelity simulation of a cable-actuated soft robot. The simulation results demonstrate that the proposed approach can achieve desired tracking controller for the soft robot without a prior-known model.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Legal Case Document Similarity: You Need Both Network and Text
Authors:
Paheli Bhattacharya,
Kripabandhu Ghosh,
Arindam Pal,
Saptarshi Ghosh
Abstract:
Estimating the similarity between two legal case documents is an important and challenging problem, having various downstream applications such as prior-case retrieval and citation recommendation. There are two broad approaches for the task -- citation network-based and text-based. Prior citation network-based approaches consider citations only to prior-cases (also called precedents) (PCNet). This…
▽ More
Estimating the similarity between two legal case documents is an important and challenging problem, having various downstream applications such as prior-case retrieval and citation recommendation. There are two broad approaches for the task -- citation network-based and text-based. Prior citation network-based approaches consider citations only to prior-cases (also called precedents) (PCNet). This approach misses important signals inherent in Statutes (written laws of a jurisdiction). In this work, we propose Hier-SPCNet that augments PCNet with a heterogeneous network of Statutes. We incorporate domain knowledge for legal document similarity into Hier-SPCNet, thereby obtaining state-of-the-art results for network-based legal document similarity. Both textual and network similarity provide important signals for legal case similarity; but till now, only trivial attempts have been made to unify the two signals. In this work, we apply several methods for combining textual and network information for estimating legal case similarity. We perform extensive experiments over legal case documents from the Indian judiciary, where the gold standard similarity between document-pairs is judged by law experts from two reputed Law institutes in India. Our experiments establish that our proposed network-based methods significantly improve the correlation with domain experts' opinion when compared to the existing methods for network-based legal document similarity. Our best-performing combination method (that combines network-based and text-based similarity) improves the correlation with domain experts' opinion by 11.8% over the best text-based method and 20.6\% over the best network-based method. We also establish that our best-performing method can be used to recommend / retrieve citable and similar cases for a source (query) case, which are well appreciated by legal experts.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
A Deep Learning Approach Using Masked Image Modeling for Reconstruction of Undersampled K-spaces
Authors:
Kyler Larsen,
Arghya Pal,
Yogesh Rathi
Abstract:
Magnetic Resonance Imaging (MRI) scans are time consuming and precarious, since the patients remain still in a confined space for extended periods of time. To reduce scanning time, some experts have experimented with undersampled k spaces, trying to use deep learning to predict the fully sampled result. These studies report that as many as 20 to 30 minutes could be saved off a scan that takes an h…
▽ More
Magnetic Resonance Imaging (MRI) scans are time consuming and precarious, since the patients remain still in a confined space for extended periods of time. To reduce scanning time, some experts have experimented with undersampled k spaces, trying to use deep learning to predict the fully sampled result. These studies report that as many as 20 to 30 minutes could be saved off a scan that takes an hour or more. However, none of these studies have explored the possibility of using masked image modeling (MIM) to predict the missing parts of MRI k spaces. This study makes use of 11161 reconstructed MRI and k spaces of knee MRI images from Facebook's fastmri dataset. This tests a modified version of an existing model using baseline shifted window (Swin) and vision transformer architectures that makes use of MIM on undersampled k spaces to predict the full k space and consequently the full MRI image. Modifications were made using pytorch and numpy libraries, and were published to a github repository. After the model reconstructed the k space images, the basic Fourier transform was applied to determine the actual MRI image. Once the model reached a steady state, experimentation with hyperparameters helped to achieve pinpoint accuracy for the reconstructed images. The model was evaluated through L1 loss, gradient normalization, and structural similarity values. The model produced reconstructed images with L1 loss values averaging to <0.01 and gradient normalization values <0.1 after training finished. The reconstructed k spaces yielded structural similarity values of over 99% for both training and validation with the fully sampled k spaces, while validation loss continually decreased under 0.01. These data strongly support the idea that the algorithm works for MRI reconstruction, as they indicate the model's reconstructed image aligns extremely well with the original, fully sampled k space.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.