-
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Authors:
Sherry X. Chen,
Yaron Vaxman,
Elad Ben Baruch,
David Asulin,
Aviad Moreshet,
Kuo-Chin Lien,
Misha Sra,
Pradeep Sen
Abstract:
Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text…
▽ More
Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text prompts, and/or learning features for each input image in an attempt to coax the image generator to produce the desired result. However, these approaches all have shortcomings and fail to produce good results in a predictable and controllable manner. To address this problem, we present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing, something previously unexplored in the literature. With this simple change, we are able to generate results that both better align with the original images and reflect the desired result. Furthermore, we propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization when compared to prior approaches, which operate in the pixel domain. Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth that encode new concepts and incorporate them into the edited results. We present a host of image-editing capabilities enabled by our approach. Our code is publicly available at https://github.com/SherryXTChen/TiNO-Edit.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Automated Attribute Extraction from Legal Proceedings
Authors:
Subinay Adhikary,
Sagnik Das,
Sagnik Saha,
Procheta Sen,
Dwaipayan Roy,
Kripabandhu Ghosh
Abstract:
The escalating number of pending cases is a growing concern world-wide. Recent advancements in digitization have opened up possibilities for leveraging artificial intelligence (AI) tools in the processing of legal documents. Adopting a structured representation for legal documents, as opposed to a mere bag-of-words flat text representation, can significantly enhance processing capabilities. With t…
▽ More
The escalating number of pending cases is a growing concern world-wide. Recent advancements in digitization have opened up possibilities for leveraging artificial intelligence (AI) tools in the processing of legal documents. Adopting a structured representation for legal documents, as opposed to a mere bag-of-words flat text representation, can significantly enhance processing capabilities. With the aim of achieving this objective, we put forward a set of diverse attributes for criminal case proceedings. We use a state-of-the-art sequence labeling framework to automatically extract attributes from the legal documents. Moreover, we demonstrate the efficacy of the extracted attributes in a downstream task, namely legal judgment prediction.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Can Word Sense Distribution Detect Semantic Changes of Words?
Authors:
Xiaohang Tang,
Yi Zhou,
Taichi Aida,
Procheta Sen,
Danushka Bollegala
Abstract:
Semantic Change Detection (SCD) of words is an important task for various NLP applications that must make time-sensitive predictions. Some words are used over time in novel ways to express new meanings, and these new meanings establish themselves as novel senses of existing words. On the other hand, Word Sense Disambiguation (WSD) methods associate ambiguous words with sense ids, depending on the…
▽ More
Semantic Change Detection (SCD) of words is an important task for various NLP applications that must make time-sensitive predictions. Some words are used over time in novel ways to express new meanings, and these new meanings establish themselves as novel senses of existing words. On the other hand, Word Sense Disambiguation (WSD) methods associate ambiguous words with sense ids, depending on the context in which they occur. Given this relationship between WSD and SCD, we explore the possibility of predicting whether a target word has its meaning changed between two corpora collected at different time steps, by comparing the distributions of senses of that word in each corpora. For this purpose, we use pretrained static sense embeddings to automatically annotate each occurrence of the target word in a corpus with a sense id. Next, we compute the distribution of sense ids of a target word in a given corpus. Finally, we use different divergence or distance measures to quantify the semantic change of the target word across the two given corpora. Our experimental results on SemEval 2020 Task 1 dataset show that word sense distributions can be accurately used to predict semantic changes of words in English, German, Swedish and Latin.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Lexical Entrainment for Conversational Systems
Authors:
Zhengxiang Shi,
Procheta Sen,
Aldo Lipani
Abstract:
Conversational agents have become ubiquitous in assisting with daily tasks, and are expected to possess human-like features. One such feature is lexical entrainment (LE), a phenomenon in which speakers in human-human conversations tend to naturally and subconsciously align their lexical choices with those of their interlocutors, leading to more successful and engaging conversations. As an example,…
▽ More
Conversational agents have become ubiquitous in assisting with daily tasks, and are expected to possess human-like features. One such feature is lexical entrainment (LE), a phenomenon in which speakers in human-human conversations tend to naturally and subconsciously align their lexical choices with those of their interlocutors, leading to more successful and engaging conversations. As an example, if a digital assistant replies 'Your appointment for **ling Noodle Pub is at 7 pm' to the question 'When is my reservation for **ling Noodle Bar today?', it may feel as though the assistant is trying to correct the speaker, whereas a response of 'Your reservation for **ling Noodle Bar is at 7 pm' would likely be perceived as more positive. This highlights the importance of LE in establishing a shared terminology for maximum clarity and reducing ambiguity in conversations. However, we demonstrate in this work that current response generation models do not adequately address this crucial humanlike phenomenon. To address this, we propose a new dataset, named MULTIWOZ-ENTR, and a measure for LE for conversational systems. Additionally, we suggest a way to explicitly integrate LE into conversational systems with two new tasks, a LE extraction task and a LE generation task. We also present two baseline approaches for the LE extraction task, which aim to detect LE expressions from dialogue contexts.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Automated Argument Generation from Legal Facts
Authors:
Oscar Tuvey,
Procheta Sen
Abstract:
The count of pending cases has shown an exponential rise across nations (e.g., with more than 10 million pending cases in India alone). The main issue lies in the fact that the number of cases submitted to the law system is far greater than the available number of legal professionals present in a country. Given this worldwide context, the utilization of AI technology has gained paramount importanc…
▽ More
The count of pending cases has shown an exponential rise across nations (e.g., with more than 10 million pending cases in India alone). The main issue lies in the fact that the number of cases submitted to the law system is far greater than the available number of legal professionals present in a country. Given this worldwide context, the utilization of AI technology has gained paramount importance to enhance the efficiency and speed of legal procedures. In this study we partcularly focus on hel** legal professionals in the process of analyzing a legal case. Our specific investigation delves into harnessing the generative capabilities of open-sourced large language models to create arguments derived from the facts present in legal cases. Experimental results show that the generated arguments from the best performing method have on average 63% overlap with the benchmark set gold standard annotations.
△ Less
Submitted 12 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
LIPEx-Locally Interpretable Probabilistic Explanations-To Look Beyond The True Class
Authors:
Hongbo Zhu,
Angelo Cangelosi,
Procheta Sen,
Anirbit Mukherjee
Abstract:
In this work, we instantiate a novel perturbation-based multi-class explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation). We demonstrate that LIPEx not only locally replicates the probability distributions output by the widely used complex classification models but also provides insight into how every feature deemed to be important affects the prediction probability for e…
▽ More
In this work, we instantiate a novel perturbation-based multi-class explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation). We demonstrate that LIPEx not only locally replicates the probability distributions output by the widely used complex classification models but also provides insight into how every feature deemed to be important affects the prediction probability for each of the possible classes. We achieve this by defining the explanation as a matrix obtained via regression with respect to the Hellinger distance in the space of probability distributions. Ablation tests on text and image data, show that LIPEx-guided removal of important features from the data causes more change in predictions for the underlying model than similar tests based on other saliency-based or feature importance-based Explainable AI (XAI) methods. It is also shown that compared to LIME, LIPEx is more data efficient in terms of using a lesser number of perturbations of the data to obtain a reliable explanation. This data-efficiency is seen to manifest as LIPEx being able to compute its explanation matrix around 53% faster than all-class LIME, for classification experiments with text data.
△ Less
Submitted 7 December, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning
Authors:
Subhajit Chaudhury,
Sarathkrishna Swaminathan,
Daiki Kimura,
Prithviraj Sen,
Keerthiram Murugesan,
Rosario Uceda-Sosa,
Michiaki Tatsubori,
Achille Fokoue,
Pavan Kapanipathi,
Asim Munawar,
Alexander Gray
Abstract:
Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. Th…
▽ More
Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. This is because of their advantages ranging from inherent interpretability, the lesser requirement of training data, and being generalizable in scenarios with unseen data. Therefore, in this paper, we propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic semantic parser with a rule induction system to learn abstract interpretable rules as policies. Our experiments on established text-based game benchmarks show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better generalization to unseen test games and learning from fewer training interactions.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Are Human Explanations Always Helpful? Towards Objective Evaluation of Human Natural Language Explanations
Authors:
Bingsheng Yao,
Prithviraj Sen,
Lucian Popa,
James Hendler,
Dakuo Wang
Abstract:
Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be quite subjective. Before blindly using them as ground truth to train ML models, a vital question needs to be asked: How do we evaluate a human-annotated explana…
▽ More
Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be quite subjective. Before blindly using them as ground truth to train ML models, a vital question needs to be asked: How do we evaluate a human-annotated explanation's quality? In this paper, we build on the view that the quality of a human-annotated explanation can be measured based on its helpfulness (or impairment) to the ML models' performance for the desired NLP tasks for which the annotations were collected. In comparison to the commonly used Simulatability score, we define a new metric that can take into consideration the helpfulness of an explanation for model performance at both fine-tuning and inference. With the help of a unified dataset format, we evaluated the proposed metric on five datasets (e.g., e-SNLI) against two model architectures (T5 and BART), and the results show that our proposed metric can objectively evaluate the quality of human-annotated explanations, while Simulatability falls short.
△ Less
Submitted 22 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Task2KB: A Public Task-Oriented Knowledge Base
Authors:
Procheta Sen,
Xi Wang,
Ruiqing Xu,
Emine Yilmaz
Abstract:
Search engines and conversational assistants are commonly used to help users complete their every day tasks such as booking travel, cooking, etc. While there are some existing datasets that can be used for this purpose, their coverage is limited to very few domains. In this paper, we propose a novel knowledge base, 'Task2KB', which is constructed using data crawled from WikiHow, an online knowledg…
▽ More
Search engines and conversational assistants are commonly used to help users complete their every day tasks such as booking travel, cooking, etc. While there are some existing datasets that can be used for this purpose, their coverage is limited to very few domains. In this paper, we propose a novel knowledge base, 'Task2KB', which is constructed using data crawled from WikiHow, an online knowledge resource offering instructional articles on a wide range of tasks. Task2KB encapsulates various types of task-related information and attributes, such as requirements, detailed step description, and available methods to complete tasks. Due to its higher coverage compared to existing related knowledge graphs, Task2KB can be highly useful in the development of general purpose task completion assistants
△ Less
Submitted 24 January, 2023;
originally announced February 2023.
-
Deep Appearance Prefiltering
Authors:
Steve Bako,
Pradeep Sen,
Anton Kaplanyan
Abstract:
Physically based rendering of complex scenes can be prohibitively costly with a potentially unbounded and uneven distribution of complexity across the rendered image. The goal of an ideal level of detail (LoD) method is to make rendering costs independent of the 3D scene complexity, while preserving the appearance of the scene. However, current prefiltering LoD methods are limited in the appearanc…
▽ More
Physically based rendering of complex scenes can be prohibitively costly with a potentially unbounded and uneven distribution of complexity across the rendered image. The goal of an ideal level of detail (LoD) method is to make rendering costs independent of the 3D scene complexity, while preserving the appearance of the scene. However, current prefiltering LoD methods are limited in the appearances they can support due to their reliance of approximate models and other heuristics. We propose the first comprehensive multi-scale LoD framework for prefiltering 3D environments with complex geometry and materials (e.g., the Disney BRDF), while maintaining the appearance with respect to the ray-traced reference. Using a multi-scale hierarchy of the scene, we perform a data-driven prefiltering step to obtain an appearance phase function and directional coverage mask at each scale. At the heart of our approach is a novel neural representation that encodes this information into a compact latent form that is easy to decode inside a physically based renderer. Once a scene is baked out, our method requires no original geometry, materials, or textures at render time. We demonstrate that our approach compares favorably to state-of-the-art prefiltering methods and achieves considerable savings in memory for complex scenes.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
A Closer Look at the Calibration of Differentially Private Learners
Authors:
Hanlin Zhang,
Xuechen Li,
Prithviraj Sen,
Salim Roukos,
Tatsunori Hashimoto
Abstract:
We systematically study the calibration of classifiers trained with differentially private stochastic gradient descent (DP-SGD) and observe miscalibration across a wide range of vision and language tasks. Our analysis identifies per-example gradient clip** in DP-SGD as a major cause of miscalibration, and we show that existing approaches for improving calibration with differential privacy only p…
▽ More
We systematically study the calibration of classifiers trained with differentially private stochastic gradient descent (DP-SGD) and observe miscalibration across a wide range of vision and language tasks. Our analysis identifies per-example gradient clip** in DP-SGD as a major cause of miscalibration, and we show that existing approaches for improving calibration with differential privacy only provide marginal improvements in calibration error while occasionally causing large degradations in accuracy. As a solution, we show that differentially private variants of post-processing calibration methods such as temperature scaling and Platt scaling are surprisingly effective and have negligible utility cost to the overall model. Across 7 tasks, temperature scaling and Platt scaling with DP-SGD result in an average 3.1-fold reduction in the in-domain expected calibration error and only incur at most a minor percent drop in accuracy.
△ Less
Submitted 14 November, 2022; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
Authors:
Priyanka Sen,
Alham Fikri Aji,
Amir Saffari
Abstract:
We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of c…
▽ More
We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex questions, including superlative, intersection, and multi-hop questions, which were naturally elicited from crowd workers. We run baselines over Mintaka, the best of which achieves 38% hits@1 in English and 31% hits@1 multilingually, showing that existing models have room for improvement. We release Mintaka at https://github.com/amazon-research/mintaka.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Terahertz Communications Can Work in Rain and Snow: Impact of Adverse Weather Conditions on Channels at 140 GHz
Authors:
Priyangshu Sen,
Jacob Hall,
Michele Polese,
Vitaly Petrov,
Duschia Bodet,
Francesco Restuccia,
Tommaso Melodia,
Josep M. Jornet
Abstract:
Next-generation wireless networks will leverage the spectrum above 100 GHz to enable ultra-high data rate communications over multi-GHz-wide bandwidths. The propagation environment at such high frequencies, however, introduces challenges throughout the whole protocol stack design, from physical layer signal processing to application design. Therefore, it is fundamental to develop a holistic unders…
▽ More
Next-generation wireless networks will leverage the spectrum above 100 GHz to enable ultra-high data rate communications over multi-GHz-wide bandwidths. The propagation environment at such high frequencies, however, introduces challenges throughout the whole protocol stack design, from physical layer signal processing to application design. Therefore, it is fundamental to develop a holistic understanding of the channel propagation and fading characteristics over realistic deployment scenarios and ultra-wide bands. In this paper, we conduct an extensive measurement campaign to evaluate the impact of weather conditions on a wireless link in the 130-150 GHz band through a channel sounding campaign with clear weather, rain, and snow in a typical urban backhaul scenario. We present a novel channel sounder design that captures signals with -82 dBm sensitivity and 20 GHz of bandwidth. We analyze link budget, capacity, as well as channel parameters such as the delay spread and the K-factor. Our experimental results indicate that in the considered context the adverse weather does not interrupt the link, but introduces some additional constraints (e.g., high delay spread and increase in path loss in snow conditions) that need to be accounted for in the design of reliable Sixth Generation (6G) communication links above 100 GHz.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Interactive Segmentation and Visualization for Tiny Objects in Multi-megapixel Images
Authors:
Chengyuan Xu,
Boning Dong,
Noah Stier,
Curtis McCully,
D. Andrew Howell,
Pradeep Sen,
Tobias Höllerer
Abstract:
We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects (just a few pixels wide) in large multi-megapixel high-dynamic-range (HDR) images. Detecting cosmic rays (CRs) in astronomical observations is a cumbersome workflow that requires multiple tools, so we developed an interactive toolkit that unifies model inference, HDR ima…
▽ More
We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects (just a few pixels wide) in large multi-megapixel high-dynamic-range (HDR) images. Detecting cosmic rays (CRs) in astronomical observations is a cumbersome workflow that requires multiple tools, so we developed an interactive toolkit that unifies model inference, HDR image visualization, segmentation mask inspection and editing into a single graphical user interface. The feature set, initially designed for astronomical data, makes this work a useful research-supporting tool for human-in-the-loop tiny-object segmentation in scientific areas like biomedicine, materials science, remote sensing, etc., as well as computer vision. Our interface features mouse-controlled, synchronized, dual-window visualization of the image and the segmentation mask, a critical feature for locating tiny objects in multi-megapixel images. The browser-based tool can be readily hosted on the web to provide multi-user access and GPU acceleration for any device. The toolkit can also be used as a high-precision annotation tool, or adapted as the frontend for an interactive machine learning framework. Our open-source dataset, CR detection model, and visualization toolkit are available at https://github.com/cy-xu/cosmic-conn.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Centralised multi link measurement compression with side information
Authors:
Sayantan Chakraborty,
Arun Padakandla,
Pranab Sen
Abstract:
We prove new one shot achievability results for measurement compression of quantum instruments with side information at the receiver. Unlike previous one shot results for this problem, our one shot bounds are nearly optimal and do not need catalytic randomness. In fact, we state a more general problem called centralised multi link measurement compression with quantum side information and provide o…
▽ More
We prove new one shot achievability results for measurement compression of quantum instruments with side information at the receiver. Unlike previous one shot results for this problem, our one shot bounds are nearly optimal and do not need catalytic randomness. In fact, we state a more general problem called centralised multi link measurement compression with quantum side information and provide one shot achievability results for it. As a simple corollary, we obtain one shot measurement compression results for quantum instruments with side information that we mentioned earlier. All our one shot results lead to the standard results for this problem in the asymptotic iid setting. We prove our achievability bounds by first proving a novel sequential classical quantum multipartite covering lemma, which should be of independent interest.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks
Authors:
Prithviraj Sen,
Breno W. S. R. de Carvalho,
Ryan Riegel,
Alexander Gray
Abstract:
Recent work on neuro-symbolic inductive logic programming has led to promising approaches that can learn explanatory rules from noisy, real-world data. While some proposals approximate logical operators with differentiable operators from fuzzy or real-valued logic that are parameter-free thus diminishing their capacity to fit the data, other approaches are only loosely based on logic making it dif…
▽ More
Recent work on neuro-symbolic inductive logic programming has led to promising approaches that can learn explanatory rules from noisy, real-world data. While some proposals approximate logical operators with differentiable operators from fuzzy or real-valued logic that are parameter-free thus diminishing their capacity to fit the data, other approaches are only loosely based on logic making it difficult to interpret the learned "rules". In this paper, we propose learning rules with the recently proposed logical neural networks (LNN). Compared to others, LNNs offer strong connection to classical Boolean logic thus allowing for precise interpretation of learned rules while harboring parameters that can be trained with gradient-based optimization to effectively fit the data. We extend LNNs to induce rules in first-order logic. Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable and can achieve comparable or higher accuracy due to their flexible parameterization.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion
Authors:
Noah Stier,
Alexander Rich,
Pradeep Sen,
Tobias Höllerer
Abstract:
Recent volumetric 3D reconstruction methods can produce very accurate results, with plausible geometry even for unobserved surfaces. However, they face an undesirable trade-off when it comes to multi-view fusion. They can fuse all available view information by global averaging, thus losing fine detail, or they can heuristically cluster views for local fusion, thus restricting their ability to cons…
▽ More
Recent volumetric 3D reconstruction methods can produce very accurate results, with plausible geometry even for unobserved surfaces. However, they face an undesirable trade-off when it comes to multi-view fusion. They can fuse all available view information by global averaging, thus losing fine detail, or they can heuristically cluster views for local fusion, thus restricting their ability to consider all views jointly. Our key insight is that greater detail can be retained without restricting view diversity by learning a view-fusion function conditioned on camera pose and image content. We propose to learn this multi-view fusion using a transformer. To this end, we introduce VoRTX, an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion. Our model is occlusion-aware, leveraging the transformer architecture to predict an initial, projective scene geometry estimate. This estimate is used to avoid backprojecting image features through surfaces into occluded regions. We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods. We also demonstrate generalization without any fine-tuning, outperforming the same state-of-the-art methods on two other datasets, TUM-RGBD and ICL-NUIM.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
3DVNet: Multi-View Depth Prediction and Volumetric Refinement
Authors:
Alexander Rich,
Noah Stier,
Pradeep Sen,
Tobias Höllerer
Abstract:
We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techni…
▽ More
We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Sparse Fusion for Multimodal Transformers
Authors:
Yi Ding,
Alex Rich,
Mason Wang,
Noah Stier,
Matthew Turk,
Pradeep Sen,
Tobias Höllerer
Abstract:
Multimodal classification is a core task in human-centric machine learning. We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy. To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing…
▽ More
Multimodal classification is a core task in human-centric machine learning. We observe that information is highly complementary across modalities, thus unimodal information can be drastically sparsified prior to multimodal fusion without loss of accuracy. To this end, we present Sparse Fusion Transformers (SFT), a novel multimodal fusion method for transformers that performs comparably to existing state-of-the-art methods while having greatly reduced memory footprint and computation cost. Key to our idea is a sparse-pooling block that reduces unimodal token sets prior to cross-modality modeling. Evaluations are conducted on multiple multimodal benchmark datasets for a wide range of classification tasks. State-of-the-art performance is obtained on multiple benchmarks under similar experiment conditions, while reporting up to six-fold reduction in computational cost and memory requirements. Extensive ablation studies showcase our benefits of combining sparsification and multimodal learning over naive approaches. This paves the way for enabling multimodal learning on low-resource devices.
△ Less
Submitted 24 November, 2021; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Multi-Objective Few-shot Learning for Fair Classification
Authors:
Ishani Mondal,
Procheta Sen,
Debasis Ganguly
Abstract:
In this paper, we propose a general framework for mitigating the disparities of the predicted classes with respect to secondary attributes within the data (e.g., race, gender etc.). Our proposed method involves learning a multi-objective function that in addition to learning the primary objective of predicting the primary class labels from the data, also employs a clustering-based heuristic to min…
▽ More
In this paper, we propose a general framework for mitigating the disparities of the predicted classes with respect to secondary attributes within the data (e.g., race, gender etc.). Our proposed method involves learning a multi-objective function that in addition to learning the primary objective of predicting the primary class labels from the data, also employs a clustering-based heuristic to minimize the disparities of the class label distribution with respect to the cluster memberships, with the assumption that each cluster should ideally map to a distinct combination of attribute values. Experiments demonstrate effective mitigation of cognitive biases on a benchmark dataset without the use of annotations of secondary attribute values (the zero-shot case) or with the use of a small number of attribute value annotations (the few-shot case).
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Combining Rules and Embeddings via Neuro-Symbolic AI for Knowledge Base Completion
Authors:
Prithviraj Sen,
Breno W. S. R. Carvalho,
Ibrahim Abdelaziz,
Pavan Kapanipathi,
Francois Luus,
Salim Roukos,
Alexander Gray
Abstract:
Recent interest in Knowledge Base Completion (KBC) has led to a plethora of approaches based on reinforcement learning, inductive logic programming and graph embeddings. In particular, rule-based KBC has led to interpretable rules while being comparable in performance with graph embeddings. Even within rule-based KBC, there exist different approaches that lead to rules of varying quality and previ…
▽ More
Recent interest in Knowledge Base Completion (KBC) has led to a plethora of approaches based on reinforcement learning, inductive logic programming and graph embeddings. In particular, rule-based KBC has led to interpretable rules while being comparable in performance with graph embeddings. Even within rule-based KBC, there exist different approaches that lead to rules of varying quality and previous work has not always been precise in highlighting these differences. Another issue that plagues most rule-based KBC is the non-uniformity of relation paths: some relation sequences occur in very few paths while others appear very frequently. In this paper, we show that not all rule-based KBC models are the same and propose two distinct approaches that learn in one case: 1) a mixture of relations and the other 2) a mixture of paths. When implemented on top of neuro-symbolic AI, which learns rules by extending Boolean logic to real-valued logic, the latter model leads to superior KBC accuracy outperforming state-of-the-art rule-based KBC by 2-10% in terms of mean reciprocal rank. Furthermore, to address the non-uniformity of relation paths, we combine rule-based KBC with graph embeddings thus improving our results even further and achieving the best of both worlds.
△ Less
Submitted 16 September, 2021;
originally announced September 2021.
-
End-to-End Entity Resolution and Question Answering Using Differentiable Knowledge Graphs
Authors:
Armin Oliya,
Amir Saffari,
Priyanka Sen,
Tom Ayoola
Abstract:
Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these models are trained and evaluated in a setting where hand-annotated question entities are supplied to the model, leaving the important and non-trivial task of entity resolution (ER) outside the scope of E2E learning. In…
▽ More
Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these models are trained and evaluated in a setting where hand-annotated question entities are supplied to the model, leaving the important and non-trivial task of entity resolution (ER) outside the scope of E2E learning. In this work, we extend the boundaries of E2E learning for KGQA to include the training of an ER component. Our model only needs the question text and the answer entities to train, and delivers a stand-alone QA model that does not require an additional ER component to be supplied during runtime. Our approach is fully differentiable, thanks to its reliance on a recent method for building differentiable KGs (Cohen et al., 2020). We evaluate our E2E trained model on two public datasets and show that it comes close to baseline models that use hand-annotated entities.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Expanding End-to-End Question Answering on Differentiable Knowledge Graphs with Intersection
Authors:
Priyanka Sen,
Amir Saffari,
Armin Oliya
Abstract:
End-to-end question answering using a differentiable knowledge graph is a promising technique that requires only weak supervision, produces interpretable results, and is fully differentiable. Previous implementations of this technique (Cohen et al., 2020) have focused on single-entity questions using a relation following operation. In this paper, we propose a model that explicitly handles multiple…
▽ More
End-to-end question answering using a differentiable knowledge graph is a promising technique that requires only weak supervision, produces interpretable results, and is fully differentiable. Previous implementations of this technique (Cohen et al., 2020) have focused on single-entity questions using a relation following operation. In this paper, we propose a model that explicitly handles multiple-entity questions by implementing a new intersection operation, which identifies the shared elements between two sets of entities. We find that introducing intersection improves performance over a baseline model on two datasets, WebQuestionsSP (69.6% to 73.3% Hits@1) and ComplexWebQuestions (39.8% to 48.7% Hits@1), and in particular, improves performance on questions with multiple entities by over 14% on WebQuestionsSP and by 19% on ComplexWebQuestions.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Cosmic-CoNN: A Cosmic Ray Detection Deep-Learning Framework, Dataset, and Toolkit
Authors:
Chengyuan Xu,
Curtis McCully,
Boning Dong,
D. Andrew Howell,
Pradeep Sen
Abstract:
Rejecting cosmic rays (CRs) is essential for the scientific interpretation of CCD-captured data, but detecting CRs in single-exposure images has remained challenging. Conventional CR detectors require experimental parameter tuning for different instruments, and recent deep learning methods only produce instrument-specific models that suffer from performance loss on telescopes not included in the t…
▽ More
Rejecting cosmic rays (CRs) is essential for the scientific interpretation of CCD-captured data, but detecting CRs in single-exposure images has remained challenging. Conventional CR detectors require experimental parameter tuning for different instruments, and recent deep learning methods only produce instrument-specific models that suffer from performance loss on telescopes not included in the training data. We present Cosmic-CoNN, a generic CR detector deployed for 24 telescopes at the Las Cumbres Observatory, which is made possible by the three contributions in this work: 1) We build a large and diverse ground-based CR dataset leveraging thousands of images from a global telescope network. 2) We propose a novel loss function and a neural network optimized for telescope imaging data to train generic CR detection models. At 95% recall, our model achieves a precision of 93.70% on Las Cumbres imaging data and maintains a consistent performance on new ground-based instruments never used for training. Specifically, the Cosmic-CoNN model trained on the Las Cumbres CR dataset maintains high precisions of 92.03% and 96.69% on Gemini GMOS-N/S 1x1 and 2x2 binning images, respectively. 3) We build a suite of tools including an interactive CR mask visualization and editing interface, console commands, and Python APIs to make automatic, robust CR detection widely accessible by the community of astronomers. Our dataset, open-source codebase, and trained models are available at https://github.com/cy-xu/cosmic-conn.
△ Less
Submitted 6 October, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
LNN-EL: A Neuro-Symbolic Approach to Short-text Entity Linking
Authors:
Hang Jiang,
Sairam Gurajada,
Qiuhao Lu,
Sumit Neelam,
Lucian Popa,
Prithviraj Sen,
Yunyao Li,
Alexander Gray
Abstract:
Entity linking (EL), the task of disambiguating mentions in text by linking them to entities in a knowledge graph, is crucial for text understanding, question answering or conversational systems. Entity linking on short text (e.g., single sentence or question) poses particular challenges due to limited context. While prior approaches use either heuristics or black-box neural methods, here we propo…
▽ More
Entity linking (EL), the task of disambiguating mentions in text by linking them to entities in a knowledge graph, is crucial for text understanding, question answering or conversational systems. Entity linking on short text (e.g., single sentence or question) poses particular challenges due to limited context. While prior approaches use either heuristics or black-box neural methods, here we propose LNN-EL, a neuro-symbolic approach that combines the advantages of using interpretable rules based on first-order logic with the performance of neural learning. Even though constrained to using rules, LNN-EL performs competitively against SotA black-box neural approaches, with the added benefits of extensibility and transferability. In particular, we show that we can easily blend existing rule templates given by a human expert, with multiple types of features (priors, BERT encodings, box embeddings, etc), and even scores resulting from previous EL methods, thus improving on such methods. For instance, on the LC-QuAD-1.0 dataset, we show more than $4$\% increase in F1 score over previous SotA. Finally, we show that the inductive bias offered by using logic results in learned rules that transfer well across datasets, even without fine tuning, while maintaining high accuracy.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Variational Quantum Classifiers Through the Lens of the Hessian
Authors:
Pinaki Sen,
Amandeep Singh Bhatia,
Kamalpreet Singh Bhangu,
Ahmed Elbeltagi
Abstract:
In quantum computing, the variational quantum algorithms (VQAs) are well suited for finding optimal combinations of things in specific applications ranging from chemistry all the way to finance. The training of VQAs with gradient descent optimization algorithm has shown a good convergence. At an early stage, the simulation of variational quantum circuits on noisy intermediate-scale quantum (NISQ)…
▽ More
In quantum computing, the variational quantum algorithms (VQAs) are well suited for finding optimal combinations of things in specific applications ranging from chemistry all the way to finance. The training of VQAs with gradient descent optimization algorithm has shown a good convergence. At an early stage, the simulation of variational quantum circuits on noisy intermediate-scale quantum (NISQ) devices suffers from noisy outputs. Just like classical deep learning, it also suffers from vanishing gradient problems. It is a realistic goal to study the topology of loss landscape, to visualize the curvature information and trainability of these circuits in the existence of vanishing gradients. In this paper, we calculate the Hessian and visualize the loss landscape of variational quantum classifiers at different points in parameter space. The curvature information of variational quantum classifiers (VQC) is interpreted and the loss function's convergence is shown. It helps us better understand the behavior of variational quantum circuits to tackle optimization problems efficiently. We investigated the variational quantum classifiers via Hessian on quantum computers, starting with a simple 4-bit parity problem to gain insight into the practical behavior of Hessian, then thoroughly analyzed the behavior of Hessian's eigenvalues on training the variational quantum classifier for the Diabetes dataset. Finally, we show how the adaptive Hessian learning rate can influence the convergence while training the variational circuits.
△ Less
Submitted 24 December, 2021; v1 submitted 21 May, 2021;
originally announced May 2021.
-
One-shot inner bounds for sending private classical information over a quantum MAC
Authors:
Sayantan Chakraborty,
Aditya Nema,
Pranab Sen
Abstract:
We provide the first inner bounds for sending private classical information over a quantum multiple access channel. We do so by using three powerful information theoretic techniques: rate splitting, quantum simultaneous decoding for multiple access channels, and a novel smoothed distributed covering lemma for classical quantum channels. Our inner bounds are given in the one shot setting and accord…
▽ More
We provide the first inner bounds for sending private classical information over a quantum multiple access channel. We do so by using three powerful information theoretic techniques: rate splitting, quantum simultaneous decoding for multiple access channels, and a novel smoothed distributed covering lemma for classical quantum channels. Our inner bounds are given in the one shot setting and accordingly the three techniques used are all very recent ones specifically designed to work in this setting. The last technique is new to this work and is our main technical advancement. For the asymptotic iid setting, our one shot inner bounds lead to the natural quantum analogue of the best classical inner bounds for this problem.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Noise-Aware Video Saliency Prediction
Authors:
Ekta Prashnani,
Orazio Gallo,
Joohwan Kim,
Josef Spjut,
Pradeep Sen,
Iuri Frosio
Abstract:
We tackle the problem of predicting saliency maps for videos of dynamic scenes. We note that the accuracy of the maps reconstructed from the gaze data of a fixed number of observers varies with the frame, as it depends on the content of the scene. This issue is particularly pressing when a limited number of observers are available. In such cases, directly minimizing the discrepancy between the pre…
▽ More
We tackle the problem of predicting saliency maps for videos of dynamic scenes. We note that the accuracy of the maps reconstructed from the gaze data of a fixed number of observers varies with the frame, as it depends on the content of the scene. This issue is particularly pressing when a limited number of observers are available. In such cases, directly minimizing the discrepancy between the predicted and measured saliency maps, as traditional deep-learning methods do, results in overfitting to the noisy data. We propose a noise-aware training (NAT) paradigm that quantifies and accounts for the uncertainty arising from frame-specific gaze data inaccuracy. We show that NAT is especially advantageous when limited training data is available, with experiments across different models, loss functions, and datasets. We also introduce a video game-based saliency dataset, with rich temporal semantics, and multiple gaze attractors per frame. The dataset and source code are available at https://github.com/NVlabs/NAT-saliency.
△ Less
Submitted 22 November, 2021; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Deep Indexed Active Learning for Matching Heterogeneous Entity Representations
Authors:
Arjit Jain,
Sunita Sarawagi,
Prithviraj Sen
Abstract:
Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on such tasks require large amounts of labeled data to yield useful models. Active Learning is a promising approach for ER in low resource settings. However, the search space, to find inf…
▽ More
Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on such tasks require large amounts of labeled data to yield useful models. Active Learning is a promising approach for ER in low resource settings. However, the search space, to find informative samples for the user to label, grows quadratically for instance-pair tasks making active learning hard to scale. Previous works, in this setting, rely on hand-crafted predicates, pre-trained language model embeddings, or rule learning to prune away unlikely pairs from the Cartesian product. This blocking step can miss out on important regions in the product space leading to low recall. We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize recall for blocking and accuracy for matching blocked pairs. DIAL uses an Index-By-Committee framework, where each committee member learns representations based on powerful pre-trained transformer language models. We highlight surprising differences between the matcher and the blocker in the creation of the training data and the objective used to train their parameters. Experiments on five benchmark datasets and a multilingual record matching dataset show the effectiveness of our approach in terms of precision, recall and running time. Code is available at https://github.com/ArjitJ/DIAL
△ Less
Submitted 17 January, 2022; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Logic Embeddings for Complex Query Answering
Authors:
Francois Luus,
Prithviraj Sen,
Pavan Kapanipathi,
Ryan Riegel,
Ndivhuwo Makondo,
Thabang Lebese,
Alexander Gray
Abstract:
Answering logical queries over incomplete knowledge bases is challenging because: 1) it calls for implicit link prediction, and 2) brute force answering of existential first-order logic queries is exponential in the number of existential variables. Recent work of query embeddings provides fast querying, but most approaches model set logic with closed regions, so lack negation. Query embeddings tha…
▽ More
Answering logical queries over incomplete knowledge bases is challenging because: 1) it calls for implicit link prediction, and 2) brute force answering of existential first-order logic queries is exponential in the number of existential variables. Recent work of query embeddings provides fast querying, but most approaches model set logic with closed regions, so lack negation. Query embeddings that do support negation use densities that suffer drawbacks: 1) only improvise logic, 2) use expensive distributions, and 3) poorly model answer uncertainty. In this paper, we propose Logic Embeddings, a new approach to embedding complex queries that uses Skolemisation to eliminate existential variables for efficient querying. It supports negation, but improves on density approaches: 1) integrates well-studied t-norm logic and directly evaluates satisfiability, 2) simplifies modeling with truth values, and 3) models uncertainty with truth bounds. Logic Embeddings are competitively fast and accurate in query answering over large, incomplete knowledge graphs, outperform on negation queries, and in particular, provide improved modeling of answer uncertainty as evidenced by a superior correlation between answer set size and embedding entropy.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
One-shot multi-sender decoupling and simultaneous decoding for the quantum MAC
Authors:
Sayantan Chakraborty,
Aditya Nema,
Pranab Sen
Abstract:
In this work, we prove a novel one-shot multi-sender decoupling theorem generalising Dupuis result. We start off with a multipartite quantum state, say on A1 A2 R, where A1, A2 are treated as the two sender systems and R is the reference system. We apply independent Haar random unitaries in tensor product on A1 and A2 and then send the resulting systems through a quantum channel. We want the chann…
▽ More
In this work, we prove a novel one-shot multi-sender decoupling theorem generalising Dupuis result. We start off with a multipartite quantum state, say on A1 A2 R, where A1, A2 are treated as the two sender systems and R is the reference system. We apply independent Haar random unitaries in tensor product on A1 and A2 and then send the resulting systems through a quantum channel. We want the channel output B to be almost in tensor with the untouched reference R. Our main result shows that this is indeed the case if suitable entropic conditions are met. An immediate application of our main result is to obtain a one-shot simultaneous decoder for sending quantum information over a k-sender entanglement unassisted quantum multiple access channel (QMAC). The rate region achieved by this decoder is the natural one-shot quantum analogue of the pentagonal classical rate region. Assuming a simultaneous smoothing conjecture, this one-shot rate region approaches the optimal rate region of Yard, Dein the asymptotic iid limit. Our work is the first one to obtain a non-trivial simultaneous decoder for the QMAC with limited entanglement assistance in both one-shot and asymptotic iid settings; previous works used unlimited entanglement assistance.
△ Less
Submitted 19 February, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
Novel one-shot inner bounds for unassisted fully quantum channels via rate splitting
Authors:
Sayantan Chakraborty,
Aditya Nema,
Pranab Sen
Abstract:
We prove the first non-trivial one-shot inner bounds for sending quantum information over an entanglement unassisted two-sender quantum multiple access channel (QMAC) and an unassisted two-sender two-receiver quantum interference channel (QIC). Previous works only studied the unassisted QMAC in the limit of many independent and identical uses of the channel also known as the asymptotic iid limit,…
▽ More
We prove the first non-trivial one-shot inner bounds for sending quantum information over an entanglement unassisted two-sender quantum multiple access channel (QMAC) and an unassisted two-sender two-receiver quantum interference channel (QIC). Previous works only studied the unassisted QMAC in the limit of many independent and identical uses of the channel also known as the asymptotic iid limit, and did not study the unassisted QIC at all. We employ two techniques, rate splitting and successive cancellation}, in order to obtain our inner bound. Rate splitting was earlier used to obtain inner bounds, avoiding time sharing, for classical channels in the asymptotic iid setting. Our main technical contribution is to extend rate splitting from the classical asymptotic iid setting to the quantum one-shot setting. In the asymptotic iid limit our one-shot inner bound for QMAC approaches the rate region of Yard, Devetak and Hayden. For the QIC we get novel non-trivial rate regions in the asymptotic iid setting. All our results also extend to the case where limited entanglement assistance is provided, in both one-shot and asymptotic iid settings. The limited entanglement results for one-setting for both QMAC and QIC are new. For the QIC the limited entanglement results are new even in the asymptotic iid setting.
△ Less
Submitted 25 March, 2024; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Binary TTC: A Temporal Geofence for Autonomous Navigation
Authors:
Abhishek Badki,
Orazio Gallo,
Jan Kautz,
Pradeep Sen
Abstract:
Time-to-contact (TTC), the time for an object to collide with the observer's plane, is a powerful tool for path planning: it is potentially more informative than the depth, velocity, and acceleration of objects in the scene -- even for humans. TTC presents several advantages, including requiring only a monocular, uncalibrated camera. However, regressing TTC for each pixel is not straightforward, a…
▽ More
Time-to-contact (TTC), the time for an object to collide with the observer's plane, is a powerful tool for path planning: it is potentially more informative than the depth, velocity, and acceleration of objects in the scene -- even for humans. TTC presents several advantages, including requiring only a monocular, uncalibrated camera. However, regressing TTC for each pixel is not straightforward, and most existing methods make over-simplifying assumptions about the scene. We address this challenge by estimating TTC via a series of simpler, binary classifications. We predict with low latency whether the observer will collide with an obstacle within a certain time, which is often more critical than knowing exact, per-pixel TTC. For such scenarios, our method offers a temporal geofence in 6.4 ms -- over 25x faster than existing methods. Our approach can also estimate per-pixel TTC with arbitrarily fine quantization (including continuous values), when the computational budget allows for it. To the best of our knowledge, our method is the first to offer TTC information (binary or coarsely quantized) at sufficiently high frame-rates for practical use.
△ Less
Submitted 28 April, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Reduced Graphene Oxide Tattoo as Wearable Proximity Sensor
Authors:
Vaishakh Kedambaimoole,
Neelotpala Kumar,
Vijay Shirhatti,
Suresh Nuthalapati,
Saurabh Kumar,
Mangalore Manjunatha Nayak,
Prosenjit Sen,
Deji Akinwande,
Konandur Rajanna
Abstract:
The human body is punctuated with wide array of sensory systems that provide a high evolutionary advantage by facilitating formation of a detailed picture of the immediate surroundings. The sensors range across a wide spectrum, acquiring input from non-contact audio-visual means to contact based input via pressure and temperature. The ambit of sensing can be extended further by imparting the body…
▽ More
The human body is punctuated with wide array of sensory systems that provide a high evolutionary advantage by facilitating formation of a detailed picture of the immediate surroundings. The sensors range across a wide spectrum, acquiring input from non-contact audio-visual means to contact based input via pressure and temperature. The ambit of sensing can be extended further by imparting the body with increased non-contact sensing capability through the phenomenon of electrostatics. Here we present graphene-based tattoo sensor for proximity sensing, employing the principle of electrostatic gating. The sensor shows a remarkable change in resistance upon exposure to objects surrounded with static charge on them. Compared to prior work in this field, the sensor has demonstrated the highest recorded proximity detection range of 20 cm. It is ultra-thin, highly skin conformal and comes with a facile transfer process such that it can be tattooed on highly curvilinear rough substrates like the human skin, unlike other graphene-based proximity sensors reported before. Present work details the operation of wearable proximity sensor while exploring the effect of mounting body on the working mechanism. A possible role of the sensor as an alerting system against unwarranted contact with objects in public places especially during the current SARS-CoV-2 pandemic has also been explored in the form of an LED bracelet whose color is controlled by the proximity sensor attached to it.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
A Survey of the State of Explainable AI for Natural Language Processing
Authors:
Marina Danilevsky,
Kun Qian,
Ranit Aharonov,
Yannis Katsis,
Ban Kawas,
Prithviraj Sen
Abstract:
Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can…
▽ More
Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
The damage throttling number of a graph
Authors:
Joshua Carlson,
Robin Eagleton,
Jesse Geneson,
John Petrucci,
Carolyn Reinhart,
Preetul Sen
Abstract:
The cop throttling number of a graph, introduced in 2018 by Breen et al., optimizes the balance between the number of cops used and the number of rounds required to catch the robber in a game of Cops and Robbers. In 2019, Cox and Sanaei studied a variant of Cops and Robbers in which the robber tries to occupy (or damage) as many vertices as possible and the cop tries to minimize this damage. In th…
▽ More
The cop throttling number of a graph, introduced in 2018 by Breen et al., optimizes the balance between the number of cops used and the number of rounds required to catch the robber in a game of Cops and Robbers. In 2019, Cox and Sanaei studied a variant of Cops and Robbers in which the robber tries to occupy (or damage) as many vertices as possible and the cop tries to minimize this damage. In their paper, they study the minimum number of vertices damaged by the robber over all games played on a given graph $G$, called the damage number of $G$. We introduce the natural parameter called the damage throttling number of a graph, denoted $\operatorname{th}_d(G)$, which optimizes the balance between the number of cops used and the number of vertices damaged in the graph. To this end, we formalize the definition of $k$-damage number, which extends the damage number to games played with $k$ cops. We show that damage throttling and cop throttling share many properties, yet they exhibit interesting differences. We prove that the damage throttling number is tightly bounded above by one less than the cop throttling number. Infinite families of examples and non-examples of tightness in this bound are given. We also find an infinite family of connected graphs $G$ of order $n$ for which $\operatorname{th}_d(G) = Ω(n^{2/3})$.
△ Less
Submitted 10 July, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Bi3D: Stereo Depth Estimation via Binary Classifications
Authors:
Abhishek Badki,
Alejandro Troccoli,
Kihwan Kim,
Jan Kautz,
Pradeep Sen,
Orazio Gallo
Abstract:
Stereo-based depth estimation is a cornerstone of computer vision, with state-of-the-art methods delivering accurate results in real time. For several applications such as autonomous navigation, however, it may be useful to trade accuracy for lower latency. We present Bi3D, a method that estimates depth via a series of binary classifications. Rather than testing if objects are at a particular dept…
▽ More
Stereo-based depth estimation is a cornerstone of computer vision, with state-of-the-art methods delivering accurate results in real time. For several applications such as autonomous navigation, however, it may be useful to trade accuracy for lower latency. We present Bi3D, a method that estimates depth via a series of binary classifications. Rather than testing if objects are at a particular depth $D$, as existing stereo methods do, it classifies them as being closer or farther than $D$. This property offers a powerful mechanism to balance accuracy and latency. Given a strict time budget, Bi3D can detect objects closer than a given distance in as little as a few milliseconds, or estimate depth with arbitrarily coarse quantization, with complexity linear with the number of quantization levels. Bi3D can also use the allotted quantization levels to get continuous depth, but in a specific depth range. For standard stereo (i.e., continuous depth on the whole range), our method is close to or on par with state-of-the-art, finely tuned stereo methods.
△ Less
Submitted 1 June, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning
Authors:
Procheta Sen,
Debasis Ganguly
Abstract:
Human society had a long history of suffering from cognitive biases leading to social prejudices and mass injustice. The prevalent existence of cognitive biases in large volumes of historical data can pose a threat of being manifested as unethical and seemingly inhuman predictions as outputs of AI systems trained on such data. To alleviate this problem, we propose a bias-aware multi-objective lear…
▽ More
Human society had a long history of suffering from cognitive biases leading to social prejudices and mass injustice. The prevalent existence of cognitive biases in large volumes of historical data can pose a threat of being manifested as unethical and seemingly inhuman predictions as outputs of AI systems trained on such data. To alleviate this problem, we propose a bias-aware multi-objective learning framework that given a set of identity attributes (e.g. gender, ethnicity etc.) and a subset of sensitive categories of the possible classes of prediction outputs, learns to reduce the frequency of predicting certain combinations of them, e.g. predicting stereotypes such as `most blacks use abusive language', or `fear is a virtue of women'. Our experiments conducted on an emotion prediction task with balanced class priors shows that a set of baseline bias-agnostic models exhibit cognitive biases with respect to gender, such as women are prone to be afraid whereas men are more prone to be angry. In contrast, our proposed bias-aware multi-objective learning methodology is shown to reduce such biases in the predictied emotions.
△ Less
Submitted 28 July, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
What do Models Learn from Question Answering Datasets?
Authors:
Priyanka Sen,
Amir Saffari
Abstract:
While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they have yet to outperform humans on the task of question answering itself. In this paper, we investigate if models are learning reading comprehension from QA datasets by evaluating BERT-based models across five datasets. We evaluate models on their generalizability to out-of-domain example…
▽ More
While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they have yet to outperform humans on the task of question answering itself. In this paper, we investigate if models are learning reading comprehension from QA datasets by evaluating BERT-based models across five datasets. We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect data, and ability to handle question variations. We find that no single dataset is robust to all of our experiments and identify shortcomings in both datasets and evaluation methods. Following our analysis, we make recommendations for building future QA datasets that better evaluate the task of question answering through reading comprehension. We also release code to convert QA datasets to a shared format for easier experimentation at https://github.com/amazon-research/qa-dataset-converter.
△ Less
Submitted 13 October, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Forecasting in multivariate irregularly sampled time series with missing values
Authors:
Shivam Srivastava,
Prithviraj Sen,
Berthold Reinwald
Abstract:
Sparse and irregularly sampled multivariate time series are common in clinical, climate, financial and many other domains. Most recent approaches focus on classification, regression or forecasting tasks on such data. In forecasting, it is necessary to not only forecast the right value but also to forecast when that value will occur in the irregular time series. In this work, we present an approach…
▽ More
Sparse and irregularly sampled multivariate time series are common in clinical, climate, financial and many other domains. Most recent approaches focus on classification, regression or forecasting tasks on such data. In forecasting, it is necessary to not only forecast the right value but also to forecast when that value will occur in the irregular time series. In this work, we present an approach to forecast not only the values but also the time at which they are expected to occur.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.
-
A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching
Authors:
Venkata Vamsikrishna Meduri,
Lucian Popa,
Prithviraj Sen,
Mohamed Sarwat
Abstract:
Entity Matching (EM) is a core data cleaning task, aiming to identify different mentions of the same real-world entity. Active learning is one way to address the challenge of scarce labeled data in practice, by dynamically collecting the necessary examples to be labeled by an Oracle and refining the learned model (classifier) upon them. In this paper, we build a unified active learning benchmark f…
▽ More
Entity Matching (EM) is a core data cleaning task, aiming to identify different mentions of the same real-world entity. Active learning is one way to address the challenge of scarce labeled data in practice, by dynamically collecting the necessary examples to be labeled by an Oracle and refining the learned model (classifier) upon them. In this paper, we build a unified active learning benchmark framework for EM that allows users to easily combine different learning algorithms with applicable example selection algorithms. The goal of the framework is to enable concrete guidelines for practitioners as to what active learning combinations will work well for EM. Towards this, we perform comprehensive experiments on publicly available EM datasets from product and publication domains to evaluate active learning methods, using a variety of metrics including EM quality, #labels and example selection latencies. Our most surprising result finds that active learning with fewer labels can learn a classifier of comparable quality as supervised learning. In fact, for several of the datasets, we show that there is an active learning combination that beats the state-of-the-art supervised learning result. Our framework also includes novel optimizations that improve the quality of the learned model by roughly 9% in terms of F1-score and reduce example selection latencies by up to 10x without affecting the quality of the model.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
Meshlet Priors for 3D Mesh Reconstruction
Authors:
Abhishek Badki,
Orazio Gallo,
Jan Kautz,
Pradeep Sen
Abstract:
Estimating a mesh from an unordered set of sparse, noisy 3D points is a challenging problem that requires carefully selected priors. Existing hand-crafted priors, such as smoothness regularizers, impose an undesirable trade-off between attenuating noise and preserving local detail. Recent deep-learning approaches produce impressive results by learning priors directly from the data. However, the pr…
▽ More
Estimating a mesh from an unordered set of sparse, noisy 3D points is a challenging problem that requires carefully selected priors. Existing hand-crafted priors, such as smoothness regularizers, impose an undesirable trade-off between attenuating noise and preserving local detail. Recent deep-learning approaches produce impressive results by learning priors directly from the data. However, the priors are learned at the object level, which makes these algorithms class-specific and even sensitive to the pose of the object. We introduce meshlets, small patches of mesh that we use to learn local shape priors. Meshlets act as a dictionary of local features and thus allow to use learned priors to reconstruct object meshes in any pose and from unseen classes, even when the noise is large and the samples sparse.
△ Less
Submitted 1 June, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.
-
HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
Authors:
Yiwei Yang,
Eser Kandogan,
Yunyao Li,
Walter S. Lasecki,
Prithviraj Sen
Abstract:
While the role of humans is increasingly recognized in machine learning community, representation of and interaction with models in current human-in-the-loop machine learning (HITL-ML) approaches are too low-level and far-removed from human's conceptual models. We demonstrate HEIDL, a prototype HITL-ML system that exposes the machine-learned model through high-level, explainable linguistic express…
▽ More
While the role of humans is increasingly recognized in machine learning community, representation of and interaction with models in current human-in-the-loop machine learning (HITL-ML) approaches are too low-level and far-removed from human's conceptual models. We demonstrate HEIDL, a prototype HITL-ML system that exposes the machine-learned model through high-level, explainable linguistic expressions formed of predicates representing semantic structure of text. In HEIDL, human's role is elevated from simply evaluating model predictions to interpreting and even updating the model logic directly by enabling interaction with rule predicates themselves. Raising the currency of interaction to such semantic levels calls for new interaction paradigms between humans and machines that result in improved productivity for text analytics model development process. Moreover, by involving humans in the process, the human-machine co-created models generalize better to unseen data as domain experts are able to instill their expertise by extrapolating from what has been learned by automated algorithms from few labelled data.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Unions, intersections and a one-shot quantum joint typicality lemma
Authors:
Pranab Sen
Abstract:
A fundamental tool to prove inner bounds in classical network information theory is the so-called conditional joint typicality lemma. In addition to the lemma, one often uses unions and intersections of typical sets in the inner bound arguments without so much as giving them a second thought. These arguments do not work in the quantum setting. This bottleneck shows up in the fact that so-called si…
▽ More
A fundamental tool to prove inner bounds in classical network information theory is the so-called conditional joint typicality lemma. In addition to the lemma, one often uses unions and intersections of typical sets in the inner bound arguments without so much as giving them a second thought. These arguments do not work in the quantum setting. This bottleneck shows up in the fact that so-called simultaneous decoders, as opposed to successive cancellation decoders, are known for very few channels in quantum network information theory. Another manifestation of this bottleneck is the lack of so-called simultaneous smoothing theorems for quantum states.
In this paper, we overcome the bottleneck by proving for the first time a one-shot quantum joint typicality lemma with robust union and intersection properties. To do so, we develop two novel tools in quantum information theory which may be of independent interest. The first tool is a simple geometric idea called tilting, which increases the angles between a family of subspaces in orthogonal directions. The second tool, called smoothing and augmentation, is a way of perturbing a multipartite quantum state such that the partial trace over any subset of registers does not increase the operator norm by much.
Our joint typicality lemma allows us to construct simultaneous quantum decoders for many multiterminal quantum channels. It provides a powerful tool to extend many results in classical network information theory to the one-shot quantum setting.
△ Less
Submitted 24 December, 2020; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Inner bounds via simultaneous decoding in quantum network information theory
Authors:
Pranab Sen
Abstract:
We prove new inner bounds for several multiterminal channels with classical inputs and quantum outputs. Our inner bounds are all proved in the one-shot setting, and are natural analogues of the best classical inner bounds for the respective channels. For some of these channels, similar quantum inner bounds were unknown even in the asymptotic iid setting. We prove our inner bounds by appealing to a…
▽ More
We prove new inner bounds for several multiterminal channels with classical inputs and quantum outputs. Our inner bounds are all proved in the one-shot setting, and are natural analogues of the best classical inner bounds for the respective channels. For some of these channels, similar quantum inner bounds were unknown even in the asymptotic iid setting. We prove our inner bounds by appealing to a new classical-quantum joint typicality lemma proved in a companion paper ("A one-shot quantum joint typicality lemma", Pranab Sen, arXiv:1806.07278). This lemma allows us to lift to the quantum setting many inner bound proofs for classical multiterminal channels that use intersections and unions of typical sets.
△ Less
Submitted 24 December, 2020; v1 submitted 19 June, 2018;
originally announced June 2018.
-
PieAPP: Perceptual Image-Error Assessment through Pairwise Preference
Authors:
Ekta Prashnani,
Hong Cai,
Yasamin Mostofi,
Pradeep Sen
Abstract:
The ability to estimate the perceptual error between images is an important problem in computer vision with many applications. Although it has been studied extensively, however, no method currently exists that can robustly predict visual differences like humans. Some previous approaches used hand-coded models, but they fail to model the complexity of the human visual system. Others used machine le…
▽ More
The ability to estimate the perceptual error between images is an important problem in computer vision with many applications. Although it has been studied extensively, however, no method currently exists that can robustly predict visual differences like humans. Some previous approaches used hand-coded models, but they fail to model the complexity of the human visual system. Others used machine learning to train models on human-labeled datasets, but creating large, high-quality datasets is difficult because people are unable to assign consistent error labels to distorted images. In this paper, we present a new learning-based method that is the first to predict perceptual image error like human observers. Since it is much easier for people to compare two given images and identify the one more similar to a reference than to assign quality scores to each, we propose a new, large-scale dataset labeled with the probability that humans will prefer one image over another. We then train a deep-learning model using a novel, pairwise-learning framework to predict the preference of one distorted image over the other. Our key observation is that our trained network can then be used separately with only one distorted image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error labels. The perceptual error estimated by our new metric, PieAPP, is well-correlated with human opinion. Furthermore, it significantly outperforms existing algorithms, beating the state-of-the-art by almost 3x on our test set in terms of binary error rate, while also generalizing to new kinds of distortions, unlike previous learning-based methods.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Patch-Based Image Hallucination for Super Resolution with Detail Reconstruction from Similar Sample Images
Authors:
Chieh-Chi Kao,
Yuxiang Wang,
Jonathan Waltman,
Pradeep Sen
Abstract:
Image hallucination and super-resolution have been studied for decades, and many approaches have been proposed to upsample low-resolution images using information from the images themselves, multiple example images, or large image databases. However, most of this work has focused exclusively on small magnification levels because the algorithms simply sharpen the blurry edges in the upsampled image…
▽ More
Image hallucination and super-resolution have been studied for decades, and many approaches have been proposed to upsample low-resolution images using information from the images themselves, multiple example images, or large image databases. However, most of this work has focused exclusively on small magnification levels because the algorithms simply sharpen the blurry edges in the upsampled images - no actual new detail is typically reconstructed in the final result. In this paper, we present a patch-based algorithm for image hallucination which, for the first time, properly synthesizes novel high frequency detail. To do this, we pose the synthesis problem as a patch-based optimization which inserts coherent, high-frequency detail from contextually-similar images of the same physical scene/subject provided from either a personal image collection or a large online database. The resulting image is visually plausible and contains coherent high frequency information. We demonstrate the robustness of our algorithm by testing it on a large number of images and show that its performance is considerably superior to all state-of-the-art approaches, a result that is verified to be statistically significant through a randomized user study.
△ Less
Submitted 3 June, 2018;
originally announced June 2018.
-
Improving the Resolution of CNN Feature Maps Efficiently with Multisampling
Authors:
Shayan Sadigh,
Pradeep Sen
Abstract:
We describe a new class of subsampling techniques for CNNs, termed multisampling, that significantly increases the amount of information kept by feature maps through subsampling layers. One version of our method, which we call checkered subsampling, significantly improves the accuracy of state-of-the-art architectures such as DenseNet and ResNet without any additional parameters and, remarkably, i…
▽ More
We describe a new class of subsampling techniques for CNNs, termed multisampling, that significantly increases the amount of information kept by feature maps through subsampling layers. One version of our method, which we call checkered subsampling, significantly improves the accuracy of state-of-the-art architectures such as DenseNet and ResNet without any additional parameters and, remarkably, improves the accuracy of certain pretrained ImageNet models without any training or fine-tuning. We glean possible insight into the nature of data augmentations and demonstrate experimentally that coarse feature maps are bottlenecking the performance of neural networks in image classification.
△ Less
Submitted 30 September, 2023; v1 submitted 28 May, 2018;
originally announced May 2018.
-
On the Optimal Achievable Rates for Linear Computation With Random Homologous Codes
Authors:
Pinar Sen,
Sung Hoon Lim,
Young-Han Kim
Abstract:
The problem of computing a linear combination of sources over a multiple access channel is studied. Inner and outer bounds on the optimal tradeoff between the communication rates are established when encoding is restricted to random ensembles of homologous codes, namely, structured nested coset codes from the same generator matrix and individual sha** functions, but when decoding is optimized wi…
▽ More
The problem of computing a linear combination of sources over a multiple access channel is studied. Inner and outer bounds on the optimal tradeoff between the communication rates are established when encoding is restricted to random ensembles of homologous codes, namely, structured nested coset codes from the same generator matrix and individual sha** functions, but when decoding is optimized with respect to the realization of the encoders. For the special case in which the desired linear combination is "matched" to the structure of the multiple access channel in a natural sense, these inner and outer bounds coincide. This result indicates that most, if not all, coding schemes for computation in the literature that rely on random construction of nested coset codes cannot be improved by using more powerful decoders, such as the maximum likelihood decoder. The proof techniques are adapted to characterize the rate region for broadcast channels achieved by Marton's (random) coding scheme under maximum likelihood decoding.
△ Less
Submitted 29 October, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Deep Learning with Apache SystemML
Authors:
Niketan Pansare,
Michael Dusenberry,
Nakul **dal,
Matthias Boehm,
Berthold Reinwald,
Prithviraj Sen
Abstract:
Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different analytics tasks ranging from model preparation, building, evaluation, and tuning for both machine learning and deep learning. Develo** machine/deep learning models on…
▽ More
Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different analytics tasks ranging from model preparation, building, evaluation, and tuning for both machine learning and deep learning. Develo** machine/deep learning models on data in such shared environments is challenging. Apache SystemML provides a unified framework for implementing machine learning and deep learning algorithms in a variety of shared deployment scenarios. SystemML's novel compilation approach automatically generates runtime execution plans for machine/deep learning algorithms that are composed of single-node and distributed runtime operations depending on data and cluster characteristics such as data size, data sparsity, cluster size, and memory configurations, while still exploiting the capabilities of the underlying big data frameworks.
△ Less
Submitted 8 February, 2018;
originally announced February 2018.