-
Multi-Attention Integrated Deep Learning Frameworks for Enhanced Breast Cancer Segmentation and Identification
Authors:
Pandiyaraju V,
Shravan Venkatraman,
Pavan Kumar S,
Santhosh Malarvannan,
Kannan A
Abstract:
Breast cancer poses a profound threat to lives globally, claiming numerous lives each year. Therefore, timely detection is crucial for early intervention and improved chances of survival. Accurately diagnosing and classifying breast tumors using ultrasound images is a persistent challenge in medicine, demanding cutting-edge solutions for improved treatment strategies. This research introduces mult…
▽ More
Breast cancer poses a profound threat to lives globally, claiming numerous lives each year. Therefore, timely detection is crucial for early intervention and improved chances of survival. Accurately diagnosing and classifying breast tumors using ultrasound images is a persistent challenge in medicine, demanding cutting-edge solutions for improved treatment strategies. This research introduces multiattention-enhanced deep learning (DL) frameworks designed for the classification and segmentation of breast cancer tumors from ultrasound images. A spatial channel attention mechanism is proposed for segmenting tumors from ultrasound images, utilizing a novel LinkNet DL framework with an InceptionResNet backbone. Following this, the paper proposes a deep convolutional neural network with an integrated multi-attention framework (DCNNIMAF) to classify the segmented tumor as benign, malignant, or normal. From experimental results, it is observed that the segmentation model has recorded an accuracy of 98.1%, with a minimal loss of 0.6%. It has also achieved high Intersection over Union (IoU) and Dice Coefficient scores of 96.9% and 97.2%, respectively. Similarly, the classification model has attained an accuracy of 99.2%, with a low loss of 0.31%. Furthermore, the classification framework has achieved outstanding F1-Score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively. By offering a robust framework for early detection and accurate classification of breast cancer, this proposed work significantly advances the field of medical image analysis, potentially improving diagnostic precision and patient outcomes.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis
Authors:
Saranya Venkatraman,
Nafis Irtiza Tripto,
Dongwon Lee
Abstract:
The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborativ…
▽ More
The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author ($N=1$) to multi-author (up to $N=5$) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at \texttt{\url{https://github.com/saranya-venkatraman/multi_llm_story_writing}}.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Amortizing intractable inference in diffusion models for vision, language, and control
Authors:
Siddarth Venkatraman,
Moksh Jain,
Luca Scimeca,
Minsu Kim,
Marcin Sendera,
Mohsin Hasan,
Luke Rowe,
Sarthak Mittal,
Pablo Lemos,
Emmanuel Bengio,
Alexandre Adam,
Jarrid Rector-Brooks,
Yoshua Bengio,
Glen Berseth,
Nikolay Malkin
Abstract:
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generat…
▽ More
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
ClassInSight: Designing Conversation Support Tools to Visualize Classroom Discussion for Personalized Teacher Professional Development
Authors:
Tricia J. Ngoon,
S Sushil,
Angela Stewart,
Ung-Sang Lee,
Saranya Venkatraman,
Neil Thawani,
Prasenjit Mitra,
Sherice Clarke,
John Zimmerman,
Amy Ogan
Abstract:
Teaching is one of many professions for which personalized feedback and reflection can help improve dialogue and discussion between the professional and those they serve. However, professional development (PD) is often impersonal as human observation is labor-intensive. Data-driven PD tools in teaching are of growing interest, but open questions about how professionals engage with their data in pr…
▽ More
Teaching is one of many professions for which personalized feedback and reflection can help improve dialogue and discussion between the professional and those they serve. However, professional development (PD) is often impersonal as human observation is labor-intensive. Data-driven PD tools in teaching are of growing interest, but open questions about how professionals engage with their data in practice remain. In this paper, we present ClassInSight, a tool that visualizes three levels of teachers' discussion data and structures reflection. Through 22 reflection sessions and interviews with 5 high school science teachers, we found themes related to dissonance, contextualization, and sustainability in how teachers engaged with their data in the tool and in how their professional vision, the use of professional expertise to interpret events, shifted over time. We discuss guidelines for these conversational support tools to support personalized PD in professions beyond teaching where conversation and interaction are important.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
ALISON: Fast and Effective Stylometric Authorship Obfuscation
Authors:
Eric Xing,
Saranya Venkatraman,
Thai Le,
Dongwon Lee
Abstract:
Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its au…
▽ More
Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts
Authors:
Nafis Irtiza Tripto,
Saranya Venkatraman,
Dominik Macko,
Robert Moro,
Ivan Srba,
Adaku Uchendu,
Thai Le,
Dongwon Lee
Abstract:
In the realm of text manipulation and linguistic transformation, the question of authorship has been a subject of fascination and philosophical inquiry. Much like the Ship of Theseus paradox, which ponders whether a ship remains the same when each of its original planks is replaced, our research delves into an intriguing question: Does a text retain its original authorship when it undergoes numero…
▽ More
In the realm of text manipulation and linguistic transformation, the question of authorship has been a subject of fascination and philosophical inquiry. Much like the Ship of Theseus paradox, which ponders whether a ship remains the same when each of its original planks is replaced, our research delves into an intriguing question: Does a text retain its original authorship when it undergoes numerous paraphrasing iterations? Specifically, since Large Language Models (LLMs) have demonstrated remarkable proficiency in both the generation of original content and the modification of human-authored texts, a pivotal question emerges concerning the determination of authorship in instances where LLMs or similar paraphrasing tools are employed to rephrase the text--i.e., whether authorship should be attributed to the original human author or the AI-powered tool. Therefore, we embark on a philosophical voyage through the seas of language and authorship to unravel this intricate puzzle. Using a computational approach, we discover that the diminishing performance in text classification models, with each successive paraphrasing iteration, is closely associated with the extent of deviation from the original author's style, thus provoking a reconsideration of the current notion of authorship.
△ Less
Submitted 6 June, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis
Authors:
Pranav Narayanan Venkit,
Mukund Srinath,
Sanjana Gautam,
Saranya Venkatraman,
Vipul Gupta,
Rebecca J. Passonneau,
Shomir Wilson
Abstract:
We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological li…
▽ More
We conduct an inquiry into the sociotechnical aspects of sentiment analysis (SA) by critically examining 189 peer-reviewed papers on their applications, models, and datasets. Our investigation stems from the recognition that SA has become an integral component of diverse sociotechnical systems, exerting influence on both social and technical users. By delving into sociological and technological literature on sentiment, we unveil distinct conceptualizations of this term in domains such as finance, government, and medicine. Our study exposes a lack of explicit definitions and frameworks for characterizing sentiment, resulting in potential challenges and biases. To tackle this issue, we propose an ethics sheet encompassing critical inquiries to guide practitioners in ensuring equitable utilization of SA. Our findings underscore the significance of adopting an interdisciplinary approach to defining sentiment in SA and offer a pragmatic solution for its implementation.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
GPT-who: An Information Density-based Machine-Generated Text Detector
Authors:
Saranya Venkatraman,
Adaku Uchendu,
Dongwon Lee
Abstract:
The Uniform Information Density (UID) principle posits that humans prefer to spread information evenly during language production. We examine if this UID principle can help capture differences between Large Language Models (LLMs)-generated and human-generated texts. We propose GPT-who, the first psycholinguistically-inspired domain-agnostic statistical detector. This detector employs UID-based fea…
▽ More
The Uniform Information Density (UID) principle posits that humans prefer to spread information evenly during language production. We examine if this UID principle can help capture differences between Large Language Models (LLMs)-generated and human-generated texts. We propose GPT-who, the first psycholinguistically-inspired domain-agnostic statistical detector. This detector employs UID-based features to model the unique statistical signature of each LLM and human author for accurate detection. We evaluate our method using 4 large-scale benchmark datasets and find that GPT-who outperforms state-of-the-art detectors (both statistical- & non-statistical) such as GLTR, GPTZero, DetectGPT, OpenAI detector, and ZeroGPT by over $20$% across domains. In addition to better performance, it is computationally inexpensive and utilizes an interpretable representation of text articles. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible. UID-based measures for all datasets and code are available at https://github.com/saranya-venkatraman/gpt-who.
△ Less
Submitted 3 April, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Authors:
Siddarth Venkatraman,
Shivesh Khaitan,
Ravi Tej Akella,
John Dolan,
Jeff Schneider,
Glen Berseth
Abstract:
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approach…
▽ More
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset. Existing approaches use conservative methods that are tricky to tune and struggle with multi-modal data (as we show) or rely on noisy Monte Carlo return-to-go samples for reward conditioning. In this work, we propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills. This facilitates learning a Q-function while avoiding extrapolation error via batch-constraining. The latent space is also expressive and gracefully copes with multi-modal data. We show that the learned temporally-abstract latent space encodes richer task-specific information for offline RL tasks as compared to raw state-actions. This improves credit assignment and facilitates faster reward propagation during Q-learning. Our method demonstrates state-of-the-art performance on the D4RL benchmarks, particularly excelling in long-horizon, sparse-reward tasks.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Sparse reconstruction of ordinary differential equations with inference
Authors:
Sara Venkatraman,
Sumanta Basu,
Martin T. Wells
Abstract:
Sparse regression has emerged as a popular technique for learning dynamical systems from temporal data, beginning with the SINDy (Sparse Identification of Nonlinear Dynamics) framework proposed by arXiv:1509.03580. Quantifying the uncertainty inherent in differential equations learned from data remains an open problem, thus we propose leveraging recent advances in statistical inference for sparse…
▽ More
Sparse regression has emerged as a popular technique for learning dynamical systems from temporal data, beginning with the SINDy (Sparse Identification of Nonlinear Dynamics) framework proposed by arXiv:1509.03580. Quantifying the uncertainty inherent in differential equations learned from data remains an open problem, thus we propose leveraging recent advances in statistical inference for sparse regression to address this issue. Focusing on systems of ordinary differential equations (ODEs), SINDy assumes that each equation is a parsimonious linear combination of a few candidate functions, such as polynomials, and uses methods such as sequentially-thresholded least squares or the Lasso to identify a small subset of these functions that govern the system's dynamics. We instead employ bias-corrected versions of the Lasso and ridge regression estimators, as well as an empirical Bayes variable selection technique known as SEMMS, to estimate each ODE as a linear combination of terms that are statistically significant. We demonstrate through simulations that this approach allows us to recover the functional terms that correctly describe the dynamics more often than existing methods that do not account for uncertainty.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
How do decoding algorithms distribute information in dialogue responses?
Authors:
Saranya Venkatraman,
He He,
David Reitter
Abstract:
Humans tend to follow the Uniform Information Density (UID) principle by distributing information evenly in utterances. We study if decoding algorithms implicitly follow this UID principle, and under what conditions adherence to UID might be desirable for dialogue generation. We generate responses using different decoding algorithms with GPT-2 on the Persona-Chat dataset and collect human judgment…
▽ More
Humans tend to follow the Uniform Information Density (UID) principle by distributing information evenly in utterances. We study if decoding algorithms implicitly follow this UID principle, and under what conditions adherence to UID might be desirable for dialogue generation. We generate responses using different decoding algorithms with GPT-2 on the Persona-Chat dataset and collect human judgments on their quality using Amazon Mechanical Turk. We find that (i) surprisingly, model-generated responses follow the UID principle to a greater extent than human responses, and (ii) decoding algorithms that promote UID do not generate higher-quality responses. Instead, when we control for surprisal, non-uniformity of information density correlates with the quality of responses with very low/high surprisal. Our findings indicate that encouraging non-uniform responses is a potential solution to the ``likelihood trap'' problem (quality degradation in very high-likelihood text). Our dataset containing multiple candidate responses per dialog history along with human-annotated quality ratings is available at https://huggingface.co/datasets/saranya132/dialog_uid_gpt2.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
A Bi-Level Stochastic Game Model for PMU Placement in Power Grid with Cybersecurity Risks
Authors:
Saptarshi Ghosh,
Murali Sankar Venkatraman,
Shehab Ahmed,
Charalambos Konstantinou
Abstract:
Phasor measurement units (PMUs) provide accurate and high-fidelity measurements in order to monitor the state of the power grid and support various control and planning tasks. However, PMUs have a high installation cost prohibiting their massive deployment. Minimizing the number of installed PMUs needs to be achieved while also maintaining full observability of the network. At the same time, data…
▽ More
Phasor measurement units (PMUs) provide accurate and high-fidelity measurements in order to monitor the state of the power grid and support various control and planning tasks. However, PMUs have a high installation cost prohibiting their massive deployment. Minimizing the number of installed PMUs needs to be achieved while also maintaining full observability of the network. At the same time, data integrity attacks on PMU measurements can cause mislead power system control and operation routines. In this paper, a bi-level stochastic non-cooperative game-based placement model is proposed for PMU allocation in the presence of cyber-attack risks. In the first level, the protection of individual PMU placed in a network is addressed, while considering the interaction between the grid operator and the attacker with respective resource constraints. In the second level, the attacker observes the placement of the PMUs and compromises them, with the aim of maximizing the state estimation error and reducing the observability of the network. The proposed technique is deployed in the IEEE-9 bus test system. The results demonstrate a 9% reduction in the cost incurred by the power grid operator for deploying PMUs while considering cyber-risks.
△ Less
Submitted 15 April, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Can you even tell left from right? Presenting a new challenge for VQA
Authors:
Sai Raam Venkatraman,
Rishi Rao,
S. Balasubramanian,
Chandra Sekhar Vorugunti,
R. Raghunatha Sarma
Abstract:
Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition.…
▽ More
Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition. In this work, we present several quantitative measures of compositional separation and find that popular datasets for VQA are not good evaluators. To solve this, we present Uncommon Objects in Unseen Configurations (UOUC), a synthetic dataset for VQA. UOUC is at once fairly complex while also being well-separated, compositionally. The object-class of UOUC consists of 380 clasess taken from 528 characters from the Dungeons and Dragons game. The train set of UOUC consists of 200,000 scenes; whereas the test set consists of 30,000 scenes. In order to study compositional generalisation, simple reasoning and memorisation, each scene of UOUC is annotated with up to 10 novel questions. These deal with spatial relationships, hypothetical changes to scenes, counting, comparison, memorisation and memory-based reasoning. In total, UOUC presents over 2 million questions. UOUC also finds itself as a strong challenge to well-performing models for VQA. Our evaluation of recent models for VQA shows poor compositional generalisation, and comparatively lower ability towards simple reasoning. These results suggest that UOUC could lead to advances in research by being a strong benchmark for VQA.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
MLNav: Learning to Safely Navigate on Martian Terrains
Authors:
Shreyansh Daftry,
Neil Abcouwer,
Tyler Del Sesto,
Siddarth Venkatraman,
Jialin Song,
Lucas Igel,
Amos Byon,
Ugo Rosolia,
Yisong Yue,
Masahiro Ono
Abstract:
We present MLNav, a learning-enhanced path planning framework for safety-critical and resource-limited systems operating in complex environments, such as rovers navigating on Mars. MLNav makes judicious use of machine learning to enhance the efficiency of path planning while fully respecting safety constraints. In particular, the dominant computational cost in such safety-critical settings is runn…
▽ More
We present MLNav, a learning-enhanced path planning framework for safety-critical and resource-limited systems operating in complex environments, such as rovers navigating on Mars. MLNav makes judicious use of machine learning to enhance the efficiency of path planning while fully respecting safety constraints. In particular, the dominant computational cost in such safety-critical settings is running a model-based safety checker on the proposed paths. Our learned search heuristic can simultaneously predict the feasibility for all path options in a single run, and the model-based safety checker is only invoked on the top-scoring paths. We validate in high-fidelity simulations using both real Martian terrain data collected by the Perseverance rover, as well as a suite of challenging synthetic terrains. Our experiments show that: (i) compared to the baseline ENav path planner on board the Perserverance rover, MLNav can provide a significant improvement in multiple key metrics, such as a 10x reduction in collision checks when navigating real Martian terrains, despite being trained with synthetic terrains; and (ii) MLNav can successfully navigate highly challenging terrains where the baseline ENav fails to find a feasible path before timing out.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
An empirical Bayes approach to estimating dynamic models of co-regulated gene expression
Authors:
Sara Venkatraman,
Sumanta Basu,
Andrew G. Clark,
Sofie Delbare,
Myung Hee Lee,
Martin T. Wells
Abstract:
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expressi…
▽ More
Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag $R^2$ (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model's parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein's unbiased risk estimate that optimally balance the ODE model's fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.
△ Less
Submitted 31 December, 2021;
originally announced December 2021.
-
Machine Learning Based Path Planning for Improved Rover Navigation (Pre-Print Version)
Authors:
Neil Abcouwer,
Shreyansh Daftry,
Siddarth Venkatraman,
Tyler del Sesto,
Olivier Toupet,
Ravi Lanka,
Jialin Song,
Yisong Yue,
Masahiro Ono
Abstract:
Enhanced AutoNav (ENav), the baseline surface navigation software for NASA's Perseverance rover, sorts a list of candidate paths for the rover to traverse, then uses the Approximate Clearance Evaluation (ACE) algorithm to evaluate whether the most highly ranked paths are safe. ACE is crucial for maintaining the safety of the rover, but is computationally expensive. If the most promising candidates…
▽ More
Enhanced AutoNav (ENav), the baseline surface navigation software for NASA's Perseverance rover, sorts a list of candidate paths for the rover to traverse, then uses the Approximate Clearance Evaluation (ACE) algorithm to evaluate whether the most highly ranked paths are safe. ACE is crucial for maintaining the safety of the rover, but is computationally expensive. If the most promising candidates in the list of paths are all found to be infeasible, ENav must continue to search the list and run time-consuming ACE evaluations until a feasible path is found. In this paper, we present two heuristics that, given a terrain heightmap around the rover, produce cost estimates that more effectively rank the candidate paths before ACE evaluation. The first heuristic uses Sobel operators and convolution to incorporate the cost of traversing high-gradient terrain. The second heuristic uses a machine learning (ML) model to predict areas that will be deemed untraversable by ACE. We used physics simulations to collect training data for the ML model and to run Monte Carlo trials to quantify navigation performance across a variety of terrains with various slopes and rock distributions. Compared to ENav's baseline performance, integrating the heuristics can lead to a significant reduction in ACE evaluations and average computation time per planning cycle, increase path efficiency, and maintain or improve the rate of successful traverses. This strategy of targeting specific bottlenecks with ML while maintaining the original ACE safety checks provides an example of how ML can be infused into planetary science missions and other safety-critical software.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Learning Compositional Structures for Deep Learning: Why Routing-by-agreement is Necessary
Authors:
Sai Raam Venkatraman,
Ankit Anand,
S. Balasubramanian,
R. Raghunatha Sarma
Abstract:
A formal description of the compositionality of neural networks is associated directly with the formal grammar-structure of the objects it seeks to represent. This formal grammar-structure specifies the kind of components that make up an object, and also the configurations they are allowed to be in. In other words, objects can be described as a parse-tree of its components -- a structure that can…
▽ More
A formal description of the compositionality of neural networks is associated directly with the formal grammar-structure of the objects it seeks to represent. This formal grammar-structure specifies the kind of components that make up an object, and also the configurations they are allowed to be in. In other words, objects can be described as a parse-tree of its components -- a structure that can be seen as a candidate for building connection-patterns among neurons in neural networks. We present a formal grammar description of convolutional neural networks and capsule networks that shows how capsule networks can enforce such parse-tree structures, while CNNs do not. Specifically, we show that the entropy of routing coefficients in the dynamic routing algorithm controls this ability. Thus, we introduce the entropy of routing weights as a loss function for better compositionality among capsules. We show by experiments, on data with a compositional structure, that the use of this loss enables capsule networks to better detect changes in compositionality. Our experiments show that as the entropy of the routing weights increases, the ability to detect changes in compositionality reduces. We see that, without routing, capsule networks perform similar to convolutional neural networks in that both these models perform badly at detecting changes in compositionality. Our results indicate that routing is an important part of capsule networks -- effectively answering recent work that has questioned its necessity. We also, by experiments on SmallNORB, CIFAR-10, and FashionMNIST, show that this loss keeps the accuracy of capsule network models comparable to models that do not use it .
△ Less
Submitted 6 October, 2020; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Exact Solutions Of Time Fractional Generalized Burgers-Fisher Equation Using Exp function and Exponential Rational Function Methods
Authors:
Ramya Selvaraj,
Swaminathan Venkatraman
Abstract:
Using modified Riemann-Liouville derivative, the Exp function and Exponential rational function methods are implemented to solve the time-fractional generalized Burgers-Fisher equation (TF-GBF). The TF-GBF is transformed into a nonlinear ordinary differential equation (NLODE) by applying the transformation of traveling wave. The suggested methods are then introduced to formulate exact solutions fo…
▽ More
Using modified Riemann-Liouville derivative, the Exp function and Exponential rational function methods are implemented to solve the time-fractional generalized Burgers-Fisher equation (TF-GBF). The TF-GBF is transformed into a nonlinear ordinary differential equation (NLODE) by applying the transformation of traveling wave. The suggested methods are then introduced to formulate exact solutions for the resulting equation. The solutions are depicted using 2D and 3D plots.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Deep Residual Neural Networks for Image in Speech Steganography
Authors:
Shivam Agarwal,
Siddarth Venkatraman
Abstract:
Steganography is the art of hiding a secret message inside a publicly visible carrier message. Ideally, it is done without modifying the carrier, and with minimal loss of information in the secret message. Recently, various deep learning based approaches to steganography have been applied to different message types. We propose a deep learning based technique to hide a source RGB image message insi…
▽ More
Steganography is the art of hiding a secret message inside a publicly visible carrier message. Ideally, it is done without modifying the carrier, and with minimal loss of information in the secret message. Recently, various deep learning based approaches to steganography have been applied to different message types. We propose a deep learning based technique to hide a source RGB image message inside finite length speech segments without perceptual loss. To achieve this, we train three neural networks; an encoding network to hide the message in the carrier, a decoding network to reconstruct the message from the carrier and an additional image enhancer network to further improve the reconstructed message. We also discuss future improvements to the algorithm proposed.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Building Deep, Equivariant Capsule Networks
Authors:
Sairaam Venkatraman,
S. Balasubramanian,
R. Raghunatha Sarma
Abstract:
Capsule networks are constrained by the parameter-expensive nature of their layers, and the general lack of provable equivariance guarantees. We present a variation of capsule networks that aims to remedy this. We identify that learning all pair-wise part-whole relationships between capsules of successive layers is inefficient. Further, we also realise that the choice of prediction networks and th…
▽ More
Capsule networks are constrained by the parameter-expensive nature of their layers, and the general lack of provable equivariance guarantees. We present a variation of capsule networks that aims to remedy this. We identify that learning all pair-wise part-whole relationships between capsules of successive layers is inefficient. Further, we also realise that the choice of prediction networks and the routing mechanism are both key to equivariance. Based on these, we propose an alternative framework for capsule networks that learns to projectively encode the manifold of pose-variations, termed the space-of-variation (SOV), for every capsule-type of each layer. This is done using a trainable, equivariant function defined over a grid of group-transformations. Thus, the prediction-phase of routing involves projection into the SOV of a deeper capsule using the corresponding function. As a specific instantiation of this idea, and also in order to reap the benefits of increased parameter-sharing, we use type-homogeneous group-equivariant convolutions of shallower capsules in this phase. We also introduce an equivariant routing mechanism based on degree-centrality. We show that this particular instance of our general model is equivariant, and hence preserves the compositional representation of an input under transformations. We conduct several experiments on standard object-classification datasets that showcase the increased transformation-robustness, as well as general performance, of our model to several capsule baselines.
△ Less
Submitted 26 September, 2019; v1 submitted 4 August, 2019;
originally announced August 2019.
-
PAC-learning is Undecidable
Authors:
Sairaam Venkatraman,
S Balasubramanian,
R Raghunatha Sarma
Abstract:
The problem of attempting to learn the map** between data and labels is the crux of any machine learning task. It is, therefore, of interest to the machine learning community on practical as well as theoretical counts to consider the existence of a test or criterion for deciding the feasibility of attempting to learn. We investigate the existence of such a criterion in the setting of PAC-learnin…
▽ More
The problem of attempting to learn the map** between data and labels is the crux of any machine learning task. It is, therefore, of interest to the machine learning community on practical as well as theoretical counts to consider the existence of a test or criterion for deciding the feasibility of attempting to learn. We investigate the existence of such a criterion in the setting of PAC-learning, basing the feasibility solely on whether the map** to be learnt lends itself to approximation by a given class of hypothesis functions. We show that no such criterion exists, exposing a fundamental limitation in the decidability of learning. In other words, we prove that testing for PAC-learnability is undecidable in the Turing sense. We also briefly discuss some of the probable implications of this result to the current practice of machine learning.
△ Less
Submitted 20 October, 2022; v1 submitted 20 August, 2018;
originally announced August 2018.
-
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems
Authors:
Ravi Kiran Sarvadevabhatla,
Shanthakumar Venkatraman,
R. Venkatesh Babu
Abstract:
An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible object…
▽ More
An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible objects. Therefore, `harder' test images, mimicking the challenging conditions (e.g. occlusion) in which humans routinely recognize objects, need to be utilized for benchmarking. To address the concerns mentioned above, we make two contributions. First, we systematically vary the level of local object-part content, global detail and spatial context in images from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12. Second, we propose an object-part based benchmarking procedure which quantifies classifiers' robustness to a range of visibility and contextual settings. The benchmarking procedure relies on a semantic similarity measure that naturally addresses potential semantic granularity differences between the category labels in training and test datasets, thus eliminating manual map**. We use our procedure on the PPSS-12 dataset to benchmark top-performing classifiers trained on the ILSVRC-2012 dataset. Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail. Given this capability for additional differentiation, our approach can potentially supplement existing benchmarking procedures used in object recognition challenge leaderboards.
△ Less
Submitted 24 November, 2016; v1 submitted 23 November, 2016;
originally announced November 2016.