Search | arXiv e-print repository

GraVITON: Graph based garment war** with attention guided inversion for Virtual-tryon

Authors: Sanhita Pathak, Vinay Kaushik, Brejesh Lall

Abstract: Virtual try-on, a rapidly evolving field in computer vision, is transforming e-commerce by improving customer experiences through precise garment war** and seamless integration onto the human body. While existing methods such as TPS and flow address the garment war** but overlook the finer contextual details. In this paper, we introduce a novel graph based war** technique which emphasizes th… ▽ More Virtual try-on, a rapidly evolving field in computer vision, is transforming e-commerce by improving customer experiences through precise garment war** and seamless integration onto the human body. While existing methods such as TPS and flow address the garment war** but overlook the finer contextual details. In this paper, we introduce a novel graph based war** technique which emphasizes the value of context in garment flow. Our graph based war** module generates warped garment as well as a coarse person image, which is utilised by a simple refinement network to give a coarse virtual tryon image. The proposed work exploits latent diffusion model to generate the final tryon, treating garment transfer as an inpainting task. The diffusion model is conditioned with decoupled cross attention based inversion of visual and textual information. We introduce an occlusion aware war** constraint that generates dense warped garment, without any holes and occlusion. Our method, validated on VITON-HD and Dresscode datasets, showcases substantial state-of-the-art qualitative and quantitative results showing considerable improvement in garment war**, texture preservation, and overall realism. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 Figures and 6 Tables

arXiv:2405.19179 [pdf, other]

Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles

Authors: Saurabh Pathak, Samridha Shrestha, Abdelrahman AlMahmoud

Abstract: Object detection forms a key component in Unmanned Aerial Vehicles (UAVs) for completing high-level tasks that depend on the awareness of objects on the ground from an aerial perspective. In that scenario, adversarial patch attacks on an onboard object detector can severely impair the performance of upstream tasks. This paper proposes a novel model-agnostic defense mechanism against the threat of… ▽ More Object detection forms a key component in Unmanned Aerial Vehicles (UAVs) for completing high-level tasks that depend on the awareness of objects on the ground from an aerial perspective. In that scenario, adversarial patch attacks on an onboard object detector can severely impair the performance of upstream tasks. This paper proposes a novel model-agnostic defense mechanism against the threat of adversarial patch attacks in the context of UAV-based object detection. We formulate adversarial patch defense as an occlusion removal task. The proposed defense method can neutralize adversarial patches located on objects of interest, without exposure to adversarial patches during training. Our lightweight single-stage defense approach allows us to maintain a model-agnostic nature, that once deployed does not require to be updated in response to changes in the object detection pipeline. The evaluations in digital and physical domains show the feasibility of our method for deployment in UAV object detection pipelines, by significantly decreasing the Attack Success Ratio without incurring significant processing costs. As a result, the proposed defense solution can improve the reliability of object detection for UAVs. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: submitted to IROS 2024

ACM Class: I.4.4; I.4.9

arXiv:2404.18631 [pdf, other]

Feature importance to explain multimodal prediction models. A clinical use case

Authors: Jorn-Jan van de Beld, Shreyasi Pathak, Jeroen Geerdink, Johannes H. Hegeman, Christin Seifert

Abstract: Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative… ▽ More Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative and per-operative data from elderly hip fracture patients. Specifically, we include static patient data, hip and chest images before surgery in pre-operative data, vital signals, and medications administered during surgery in per-operative data. We extract features from image modalities using ResNet and from vital signals using LSTM. Explainable model outcomes are essential for clinical applicability, therefore we compute Shapley values to explain the predictions of our multimodal black box model. We find that i) Shapley values can be used to estimate the relative contribution of each modality both locally and globally, and ii) a modified version of the chain rule can be used to propagate Shapley values through a sequence of models supporting interpretable local explanations. Our findings imply that a multimodal combination of black box models can be explained by propagating Shapley values through the model sequence. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted at World Conference on Explainable Artificial Intelligence; 19 pages, 2 figures, 7 tables

arXiv:2404.07839 [pdf, other]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.01234 [pdf, other]

GFLean: An Autoformalisation Framework for Lean via GF

Authors: Shashank Pathak

Abstract: We present an autoformalisation framework for the Lean theorem prover, called GFLean. GFLean uses a high-level grammar writing tool called Grammatical Framework (GF) for parsing and linearisation. GFLean is implemented in Haskell. We explain the functionalities of GFLean, its inner working and discuss its limitations. We also discuss how we can use neural network based translation programs and rul… ▽ More We present an autoformalisation framework for the Lean theorem prover, called GFLean. GFLean uses a high-level grammar writing tool called Grammatical Framework (GF) for parsing and linearisation. GFLean is implemented in Haskell. We explain the functionalities of GFLean, its inner working and discuss its limitations. We also discuss how we can use neural network based translation programs and rule based translation programs together complimenting each other to build robust autoformalisation frameworks. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 19 Pages, 3 Figures

ACM Class: I.2.7

arXiv:2404.00613 [pdf, ps, other]

On $(θ, Θ)$-cyclic codes and their applications in constructing QECCs

Authors: Awadhesh Kumar Shukla, Sachin Pathak, Om Prakash Pandey, Vipul Mishra, Ashish Kumar Upadhyay

Abstract: Let $\mathbb F_q$ be a finite field, where $q$ is an odd prime power. Let $R=\mathbb{F}_q+u\mathbb{F}_q+v\mathbb{F}_q+uv\mathbb F_q$ with $u^2=u,v^2=v,uv=vu$. In this paper, we study the algebraic structure of $(θ, Θ)$-cyclic codes of block length $(r,s )$ over $\mathbb{F}_qR.$ Specifically, we analyze the structure of these codes as left $R[x:Θ]$-submodules of… ▽ More Let $\mathbb F_q$ be a finite field, where $q$ is an odd prime power. Let $R=\mathbb{F}_q+u\mathbb{F}_q+v\mathbb{F}_q+uv\mathbb F_q$ with $u^2=u,v^2=v,uv=vu$. In this paper, we study the algebraic structure of $(θ, Θ)$-cyclic codes of block length $(r,s )$ over $\mathbb{F}_qR.$ Specifically, we analyze the structure of these codes as left $R[x:Θ]$-submodules of $\mathfrak{R}_{r,s} = \frac{\mathbb{F}_q[x:θ]}{\langle x^r-1\rangle} \times \frac{R[x:Θ]}{\langle x^s-1\rangle}$. Our investigation involves determining generator polynomials and minimal generating sets for this family of codes. Further, we discuss the algebraic structure of separable codes. A relationship between the generator polynomials of $(θ, Θ)$-cyclic codes over $\mathbb F_qR$ and their duals is established. Moreover, we calculate the generator polynomials of dual of $(θ, Θ)$-cyclic codes. As an application of our study, we provide a construction of quantum error-correcting codes (QECCs) from $(θ, Θ)$-cyclic codes of block length $(r,s)$ over $\mathbb{F}_qR$. We support our theoretical results with illustrative examples. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 30 pages, 4 tables

arXiv:2403.20260 [pdf, other]

Prototype-based Interpretable Breast Cancer Prediction Models: Analysis and Challenges

Authors: Shreyasi Pathak, Jörg Schlötterer, Jeroen Veltman, Jeroen Geerdink, Maurice van Keulen, Christin Seifert

Abstract: Deep learning models have achieved high performance in medical applications, however, their adoption in clinical practice is hindered due to their black-box nature. Self-explainable models, like prototype-based models, can be especially beneficial as they are interpretable by design. However, if the learnt prototypes are of low quality then the prototype-based models are as good as black-box. Havi… ▽ More Deep learning models have achieved high performance in medical applications, however, their adoption in clinical practice is hindered due to their black-box nature. Self-explainable models, like prototype-based models, can be especially beneficial as they are interpretable by design. However, if the learnt prototypes are of low quality then the prototype-based models are as good as black-box. Having high quality prototypes is a pre-requisite for a truly interpretable model. In this work, we propose a prototype evaluation framework for coherence (PEF-C) for quantitatively evaluating the quality of the prototypes based on domain knowledge. We show the use of PEF-C in the context of breast cancer prediction using mammography. Existing works on prototype-based models on breast cancer prediction using mammography have focused on improving the classification performance of prototype-based models compared to black-box models and have evaluated prototype quality through anecdotal evidence. We are the first to go beyond anecdotal evidence and evaluate the quality of the mammography prototypes systematically using our PEF-C. Specifically, we apply three state-of-the-art prototype-based models, ProtoPNet, BRAIxProtoPNet++ and PIP-Net on mammography images for breast cancer prediction and evaluate these models w.r.t. i) classification performance, and ii) quality of the prototypes, on three public datasets. Our results show that prototype-based models are competitive with black-box models in terms of classification performance, and achieve a higher score in detecting ROIs. However, the quality of the prototypes are not yet sufficient and can be improved in aspects of relevance, purity and learning a variety of prototypes. We call the XAI community to systematically evaluate the quality of the prototypes to check their true usability in high stake decisions and improve such models further. △ Less

Submitted 21 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted at World Conference on Explainable Artificial Intelligence; 21 pages, 5 figures, 3 tables

arXiv:2403.08295 [pdf, other]

Gemma: Open Models Based on Gemini Research and Technology

Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations. △ Less

Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.07750 [pdf, other]

Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

Authors: Sahand Sharifzadeh, Christos Kaplanis, Shreya Pathak, Dharshan Kumaran, Anastasija Ilic, Jovana Mitrovic, Charles Blundell, Andrea Banino

Abstract: The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-t… ▽ More The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM. Despite the text-to-image model and VLM initially being trained on the same data, our approach leverages the image generator's ability to create novel compositions, resulting in synthetic image embeddings that expand beyond the limitations of the original dataset. Extensive experiments demonstrate that our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data, while requiring significantly less data. Furthermore, we perform a set of analyses on captions which reveals that semantic diversity and balance are key aspects for better downstream performance. Finally, we show that synthesizing images in the image embedding space is 25\% faster than in the pixel space. We believe our work not only addresses a significant challenge in VLM training but also opens up promising avenues for the development of self-improving multi-modal models. △ Less

Submitted 7 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: 9 pages, 6 figures

arXiv:2402.16863 [pdf]

Quantum Inspired Chaotic Salp Swarm Optimization for Dynamic Optimization

Authors: Sanjai Pathak, Ashish Mani, Mayank Sharma, Amlan Chatterjee

Abstract: Many real-world problems are dynamic optimization problems that are unknown beforehand. In practice, unpredictable events such as the arrival of new jobs, due date changes, and reservation cancellations, changes in parameters or constraints make the search environment dynamic. Many algorithms are designed to deal with stationary optimization problems, but these algorithms do not face dynamic optim… ▽ More Many real-world problems are dynamic optimization problems that are unknown beforehand. In practice, unpredictable events such as the arrival of new jobs, due date changes, and reservation cancellations, changes in parameters or constraints make the search environment dynamic. Many algorithms are designed to deal with stationary optimization problems, but these algorithms do not face dynamic optimization problems or manage them correctly. Although some optimization algorithms are proposed to deal with the changes in dynamic environments differently, there are still areas of improvement in existing algorithms due to limitations or drawbacks, especially in terms of locating and following the previously identified optima. With this in mind, we studied a variant of SSA known as QSSO, which integrates the principles of quantum computing. An attempt is made to improve the overall performance of standard SSA to deal with the dynamic environment effectively by locating and tracking the global optima for DOPs. This work is an extension of the proposed new algorithm QSSO, known as the Quantum-inspired Chaotic Salp Swarm Optimization (QCSSO) Algorithm, which details the various approaches considered while solving DOPs. A chaotic operator is employed with quantum computing to respond to change and guarantee to increase individual searchability by improving population diversity and the speed at which the algorithm converges. We experimented by evaluating QCSSO on a well-known generalized dynamic benchmark problem (GDBG) provided for CEC 2009, followed by a comparative numerical study with well-regarded algorithms. As promised, the introduced QCSSO is discovered as the rival algorithm for DOPs. △ Less

Submitted 20 January, 2024; originally announced February 2024.

Comments: 14 pages, 2 figures, 1 algorithm

arXiv:2402.08780 [pdf, other]

Enhanced Deep Q-Learning for 2D Self-Driving Cars: Implementation and Evaluation on a Custom Track Environment

Authors: Sagar Pathak, Bidhya Shrestha, Kritish Pahi

Abstract: This research project presents the implementation of a Deep Q-Learning Network (DQN) for a self-driving car on a 2-dimensional (2D) custom track, with the objective of enhancing the DQN network's performance. It encompasses the development of a custom driving environment using Pygame on a track surrounding the University of Memphis map, as well as the design and implementation of the DQN model. Th… ▽ More This research project presents the implementation of a Deep Q-Learning Network (DQN) for a self-driving car on a 2-dimensional (2D) custom track, with the objective of enhancing the DQN network's performance. It encompasses the development of a custom driving environment using Pygame on a track surrounding the University of Memphis map, as well as the design and implementation of the DQN model. The algorithm utilizes data from 7 sensors installed in the car, which measure the distance between the car and the track. These sensors are positioned in front of the vehicle, spaced 20 degrees apart, enabling them to sense a wide area ahead. We successfully implemented the DQN and also a modified version of the DQN with a priority-based action selection mechanism, which we refer to as modified DQN. The model was trained over 1000 episodes, and the average reward received by the agent was found to be around 40, which is approximately 60% higher than the original DQN and around 50% higher than the vanilla neural network. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 8 pages, 8 figures

arXiv:2312.13791 [pdf, other]

Parameterized Guarantees for Almost Envy-Free Allocations

Authors: Siddharth Barman, Debajyoti Kar, Shraddha Pathak

Abstract: We study fair allocation of indivisible goods among agents with additive valuations. We obtain novel approximation guarantees for three of the strongest fairness notions in discrete fair division, namely envy-free up to the removal of any positively-valued good (EFx), pairwise maximin shares (PMMS), and envy-free up to the transfer of any positively-valued good (tEFx). Our approximation guarantees… ▽ More We study fair allocation of indivisible goods among agents with additive valuations. We obtain novel approximation guarantees for three of the strongest fairness notions in discrete fair division, namely envy-free up to the removal of any positively-valued good (EFx), pairwise maximin shares (PMMS), and envy-free up to the transfer of any positively-valued good (tEFx). Our approximation guarantees are in terms of an instance-dependent parameter $γ\in (0,1]$ that upper bounds, for each indivisible good in the given instance, the multiplicative range of nonzero values for the good across the agents. First, we consider allocations wherein, between any pair of agents and up to the removal of any positively-valued good, the envy is multiplicatively bounded. Specifically, the current work develops a polynomial-time algorithm that computes a $\left( \frac{2γ}{\sqrt{5+4γ}-1}\right)$-approximately EFx allocation for any given fair division instance with range parameter $γ\in (0,1]$. For instances with $γ\geq 0.511$, the obtained approximation guarantee for EFx surpasses the previously best-known approximation bound of $(φ-1) \approx 0.618$, here $φ$ denotes the golden ratio. Furthermore, for $γ\in (0,1]$, we develop a polynomial-time algorithm for finding allocations wherein the PMMS requirement is satisfied, between every pair of agents, within a multiplicative factor of $\frac{5}{6} γ$. En route to this result, we obtain novel existential and computational guarantees for $\frac{5}{6}$-approximately PMMS allocations under restricted additive valuations. Finally, we develop an algorithm that efficiently computes a $2γ$-approximately tEFx allocation. Specifically, we obtain existence and efficient computation of exact tEFx allocations for all instances with $γ\in [0.5, 1]$. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 28 pages

arXiv:2312.07395 [pdf, other]

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Authors: Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak, Justin Chiu, Joe Heyward, Viorica Patraucean, Jiajun Shen, Antoine Miech, Andrew Zisserman, Aida Nematzdeh

Abstract: Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion. However, we expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in stan… ▽ More Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion. However, we expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed. To mitigate the memory bottleneck, we systematically analyze the memory/accuracy trade-off of various efficient methods: factorized attention, parameter-efficient image-to-video adaptation, input masking, and multi-resolution patchification. Surprisingly, simply masking large portions of the video (up to 75%) during contrastive pre-training proves to be one of the most robust ways to scale encoders to videos up to 4.3 minutes at 1 FPS. Our simple approach for training long video-to-text models, which scales to 1B parameters, does not add new architectural complexity and is able to outperform the popular paradigm of using much larger LLMs as an information aggregator over segment-based information on benchmarks with long-range temporal dependencies (YouCook2, EgoSchema). △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.05328 [pdf, other]

Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

Authors: Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Schwarz, Ryutaro Tanno, Olivier J. Henaff

Abstract: Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these methods have yet to be widely adopted since no one algorithm has been shown to a) generalize across models and tasks b) scale to large datasets and c) yield over… ▽ More Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these methods have yet to be widely adopted since no one algorithm has been shown to a) generalize across models and tasks b) scale to large datasets and c) yield overall FLOP savings when accounting for the overhead of data selection. In this work we propose a method which satisfies these three properties, leveraging small, cheap proxy models to estimate "learnability" scores for datapoints, which are used to prioritize data for the training of much larger models. As a result, our models require 46% and 51% fewer training updates and up to 25% less total computation to reach the same performance as uniformly trained visual classifiers on JFT and multimodal models on ALIGN. Finally, we find our data-prioritization scheme to be complementary with recent data-curation and learning objectives, yielding a new state-of-the-art in several multimodal transfer tasks. △ Less

Submitted 14 February, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Technical report

arXiv:2311.18281 [pdf, other]

Utilizing Radiomic Feature Analysis For Automated MRI Keypoint Detection: Enhancing Graph Applications

Authors: Sahar Almahfouz Nasser, Shashwat Pathak, Keshav Singhal, Mohit Meena, Nihar Gupte, Ananya Chinmaya, Prateek Garg, Amit Sethi

Abstract: Graph neural networks (GNNs) present a promising alternative to CNNs and transformers in certain image processing applications due to their parameter-efficiency in modeling spatial relationships. Currently, a major area of research involves the converting non-graph input data for GNN-based models, notably in scenarios where the data originates from images. One approach involves converting images i… ▽ More Graph neural networks (GNNs) present a promising alternative to CNNs and transformers in certain image processing applications due to their parameter-efficiency in modeling spatial relationships. Currently, a major area of research involves the converting non-graph input data for GNN-based models, notably in scenarios where the data originates from images. One approach involves converting images into nodes by identifying significant keypoints within them. Super-Retina, a semi-supervised technique, has been utilized for detecting keypoints in retinal images. However, its limitations lie in the dependency on a small initial set of ground truth keypoints, which is progressively expanded to detect more keypoints. Having encountered difficulties in detecting consistent initial keypoints in brain images using SIFT and LoFTR, we proposed a new approach: radiomic feature-based keypoint detection. Demonstrating the anatomical significance of the detected keypoints was achieved by showcasing their efficacy in improving registration processes guided by these keypoints. Subsequently, these keypoints were employed as the ground truth for the keypoint detection method (LK-SuperRetina). Furthermore, the study showcases the application of GNNs in image matching, highlighting their superior performance in terms of both the number of good matches and confidence scores. This research sets the stage for expanding GNN applications into various other applications, including but not limited to image classification, segmentation, and registration. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.20274 [pdf, other]

doi 10.1145/3132847.3133141

Extracting Entities of Interest from Comparative Product Reviews

Authors: Jatin Arora, Sumit Agrawal, Pawan Goyal, Sayan Pathak

Abstract: This paper presents a deep learning based approach to extract product comparison information out of user reviews on various e-commerce websites. Any comparative product review has three major entities of information: the names of the products being compared, the user opinion (predicate) and the feature or aspect under comparison. All these informing entities are dependent on each other and bound b… ▽ More This paper presents a deep learning based approach to extract product comparison information out of user reviews on various e-commerce websites. Any comparative product review has three major entities of information: the names of the products being compared, the user opinion (predicate) and the feature or aspect under comparison. All these informing entities are dependent on each other and bound by the rules of the language, in the review. We observe that their inter-dependencies can be captured well using LSTMs. We evaluate our system on existing manually labeled datasets and observe out-performance over the existing Semantic Role Labeling (SRL) framework popular for this task. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Source Code: https://github.com/jatinarora2702/Review-Information-Extraction

ACM Class: I.2.7; H.3.3

Journal ref: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Pages 1975 - 1978

arXiv:2310.12677 [pdf, other]

Weakly Supervised Learning for Breast Cancer Prediction on Mammograms in Realistic Settings

Authors: Shreyasi Pathak, Jörg Schlötterer, Jeroen Geerdink, Onno Dirk Vijlbrief, Maurice van Keulen, Christin Seifert

Abstract: Automatic methods for early detection of breast cancer on mammography can significantly decrease mortality. Broad uptake of those methods in hospitals is currently hindered because the methods have too many constraints. They assume annotations available for single images or even regions-of-interest (ROIs), and a fixed number of images per patient. Both assumptions do not hold in a general hospital… ▽ More Automatic methods for early detection of breast cancer on mammography can significantly decrease mortality. Broad uptake of those methods in hospitals is currently hindered because the methods have too many constraints. They assume annotations available for single images or even regions-of-interest (ROIs), and a fixed number of images per patient. Both assumptions do not hold in a general hospital setting. Relaxing those assumptions results in a weakly supervised learning setting, where labels are available per case, but not for individual images or ROIs. Not all images taken for a patient contain malignant regions and the malignant ROIs cover only a tiny part of an image, whereas most image regions represent benign tissue. In this work, we investigate a two-level multi-instance learning (MIL) approach for case-level breast cancer prediction on two public datasets (1.6k and 5k cases) and an in-house dataset of 21k cases. Observing that breast cancer is usually only present in one side, while images of both breasts are taken as a precaution, we propose a domain-specific MIL pooling variant. We show that two-level MIL can be applied in realistic clinical settings where only case labels, and a variable number of images per patient are available. Data in realistic settings scales with continuous patient intake, while manual annotation efforts do not. Hence, research should focus in particular on unsupervised ROI extraction, in order to improve breast cancer prediction for all patients. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 10 pages, 5 figures, 5 tables

arXiv:2310.05024 [pdf, other]

Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn

Authors: Sanhita Pathak, Vinay Kaushik, Brejesh Lall

Abstract: Image-based virtual try-on aims to fit an in-shop garment onto a clothed person image. Garment war**, which aligns the target garment with the corresponding body parts in the person image, is a crucial step in achieving this goal. Existing methods often use multi-stage frameworks to handle clothes war**, person body synthesis and tryon generation separately or rely on noisy intermediate parser… ▽ More Image-based virtual try-on aims to fit an in-shop garment onto a clothed person image. Garment war**, which aligns the target garment with the corresponding body parts in the person image, is a crucial step in achieving this goal. Existing methods often use multi-stage frameworks to handle clothes war**, person body synthesis and tryon generation separately or rely on noisy intermediate parser-based labels. We propose a novel single-stage framework that implicitly learns the same without explicit multi-stage learning. Our approach utilizes a novel semantic-contextual fusion attention module for garment-person feature fusion, enabling efficient and realistic cloth war** and body synthesis from target pose keypoints. By introducing a lightweight linear attention framework that attends to garment regions and fuses multiple sampled flow fields, we also address misalignment and artifacts present in previous methods. To achieve simultaneous learning of warped garment and try-on results, we introduce a Warped Cloth Learning Module. Our proposed approach significantly improves the quality and efficiency of virtual try-on methods, providing users with a more reliable and realistic virtual try-on experience. △ Less

Submitted 25 May, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: Accepted in ICME 2024

arXiv:2309.11326 [pdf, other]

How to turn your camera into a perfect pinhole model

Authors: Ivan De Boi, Stuti Pathak, Marina Oliveira, Rudi Penne

Abstract: Camera calibration is a first and fundamental step in various computer vision applications. Despite being an active field of research, Zhang's method remains widely used for camera calibration due to its implementation in popular toolboxes. However, this method initially assumes a pinhole model with oversimplified distortion models. In this work, we propose a novel approach that involves a pre-pro… ▽ More Camera calibration is a first and fundamental step in various computer vision applications. Despite being an active field of research, Zhang's method remains widely used for camera calibration due to its implementation in popular toolboxes. However, this method initially assumes a pinhole model with oversimplified distortion models. In this work, we propose a novel approach that involves a pre-processing step to remove distortions from images by means of Gaussian processes. Our method does not need to assume any distortion model and can be applied to severely warped images, even in the case of multiple distortion sources, e.g., a fisheye image of a curved mirror reflection. The Gaussian processes capture all distortions and camera imperfections, resulting in virtual images as though taken by an ideal pinhole camera with square pixels. Furthermore, this ideal GP-camera only needs one image of a square grid calibration pattern. This model allows for a serious upgrade of many algorithms and applications that are designed in a pure projective geometry setting but with a performance that is very sensitive to nonlinear lens distortions. We demonstrate the effectiveness of our method by simplifying Zhang's calibration method, reducing the number of parameters and getting rid of the distortion parameters and iterative optimization. We validate by means of synthetic data and real world images. The contributions of this work include the construction of a virtual ideal pinhole camera using Gaussian processes, a simplified calibration method and lens distortion removal. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 15 pages, 3 figures, conference CIARP

arXiv:2303.15225 [pdf, other]

GP-PCS: One-shot Feature-Preserving Point Cloud Simplification with Gaussian Processes on Riemannian Manifolds

Authors: Stuti Pathak, Thomas M. McDonald, Seppe Sels, Rudi Penne

Abstract: The processing, storage and transmission of large-scale point clouds is an ongoing challenge in the computer vision community which hinders progress in the application of 3D models to real-world settings, such as autonomous driving, virtual reality and remote sensing. We propose a novel, one-shot point cloud simplification method which preserves both the salient structural features and the overall… ▽ More The processing, storage and transmission of large-scale point clouds is an ongoing challenge in the computer vision community which hinders progress in the application of 3D models to real-world settings, such as autonomous driving, virtual reality and remote sensing. We propose a novel, one-shot point cloud simplification method which preserves both the salient structural features and the overall shape of a point cloud without any prior surface reconstruction step. Our method employs Gaussian processes suitable for functions defined on Riemannian manifolds, allowing us to model the surface variation function across any given point cloud. A simplified version of the original cloud is obtained by sequentially selecting points using a greedy sparsification scheme. The selection criterion used for this scheme ensures that the simplified cloud best represents the surface variation of the original point cloud. We evaluate our method on several benchmark and self-acquired point clouds, compare it to a range of existing methods, demonstrate its application in downstream tasks of registration and surface reconstruction, and show that our method is competitive both in terms of empirical performance and computational efficiency. △ Less

Submitted 9 January, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: 12 pages

arXiv:2301.07608 [pdf, other]

Human-Timescale Adaptation in an Open-Ended Task Space

Authors: Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls , et al. (3 additional authors not shown)

Abstract: Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a… ▽ More Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2210.15104 [pdf, ps, other]

TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Authors: Piyush Behre, Sharman Tan, Amy Shah, Harini Kesavamoorthy, Shuangyu Chang, Fei Zuo, Chris Basoglu, Sayan Pathak

Abstract: Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained mod… ▽ More Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.14446 [pdf, other]

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Authors: Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak

Abstract: Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine tr… ▽ More Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points. △ Less

Submitted 27 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2207.11578 [pdf, other]

A Scalable Bayesian Persuasion Framework for Epidemic Containment on Heterogeneous Networks

Authors: Shraddha Pathak, Ankur A. Kulkarni

Abstract: During an epidemic, the information available to individuals in the society deeply influences their belief of the epidemic spread, and consequently the preventive measures they take to stay safe from the infection. In this paper, we develop a scalable framework for ascertaining the optimal information disclosure a government must make to individuals in a networked society for the purpose of epidem… ▽ More During an epidemic, the information available to individuals in the society deeply influences their belief of the epidemic spread, and consequently the preventive measures they take to stay safe from the infection. In this paper, we develop a scalable framework for ascertaining the optimal information disclosure a government must make to individuals in a networked society for the purpose of epidemic containment. This problem of information design problem is complicated by the heterogeneous nature of the society, the positive externalities faced by individuals, and the variety in the public response to such disclosures. We use a networked public goods model to capture the underlying societal structure. Our first main result is a structural decomposition of the government's objectives into two independent components -- a component dependent on the utility function of individuals, and another dependent on properties of the underlying network. Since the network dependent term in this decomposition is unaffected by the signals sent by the government, this characterization simplifies the problem of finding the optimal information disclosure policies. We find explicit conditions, in terms of the risk aversion and prudence, under which no disclosure, full disclosure, exaggeration and downplay are the optimal policies. The structural decomposition results are also helpful in studying other forms of interventions like incentive design and network design. △ Less

Submitted 23 July, 2022; originally announced July 2022.

MSC Class: 91A28

arXiv:2202.11454 [pdf, ps, other]

On $Z_{p^r}Z_{p^r}Z_{p^s}$-Additive Cyclic Codes

Authors: Cristina Fernández-Córdoba, Sachin Pathak, Ashish Kumar Upadhyay

Abstract: In this paper, we introduce $\mathbb{Z}_{p^r}\mathbb{Z}_{p^r}\mathbb{Z}_{p^s}$-additive cyclic codes for $r\leq s$. These codes can be identified as $\mathbb{Z}_{p^s}[x]$-submodules of $\mathbb{Z}_{p^r}[x]/\langle x^α-1\rangle \times \mathbb{Z}_{p^r}[x]/\langle x^β-1\rangle\times \mathbb{Z}_{p^s}[x]/\langle x^γ-1\rangle$. We determine the generator polynomials and minimal generating sets for this… ▽ More In this paper, we introduce $\mathbb{Z}_{p^r}\mathbb{Z}_{p^r}\mathbb{Z}_{p^s}$-additive cyclic codes for $r\leq s$. These codes can be identified as $\mathbb{Z}_{p^s}[x]$-submodules of $\mathbb{Z}_{p^r}[x]/\langle x^α-1\rangle \times \mathbb{Z}_{p^r}[x]/\langle x^β-1\rangle\times \mathbb{Z}_{p^s}[x]/\langle x^γ-1\rangle$. We determine the generator polynomials and minimal generating sets for this family of codes. Some previous works has been done for the case $p=2$ with $r=s=1$, $r=s=2$, and $r=1,s=2$. However, we show that in these previous works the classification of these codes were incomplete and the statements in this paper complete such classification. We also discuss the structure of separable $\mathbb{Z}_{p^r}\mathbb{Z}_{p^r}\mathbb{Z}_{p^s}$-additive cyclic codes and determine their generator polynomials. Further, we also study the duality of $\mathbb{Z}_{p^s}[x]$-submodules. As applications, we present some examples and construct some optimal binary codes. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2201.08164 [pdf, other]

doi 10.1145/3583558

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Authors: Meike Nauta, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yasmin Schmitt, Jörg Schlötterer, Maurice van Keulen, Christin Seifert

Abstract: The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness a… ▽ More The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practices of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. This survey also contributes to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. Our systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. The Co-12 categorization scheme and our identified evaluation methods open up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously. △ Less

Submitted 24 February, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Published in ACM Computing Surveys (DOI http://dx.doi.org/10.1145/3583558). This ArXiv version includes the supplementary material. Website with categorization of XAI methods at https://utwente-dmb.github.io/xai-papers/

arXiv:2107.09931 [pdf, other]

The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding

Authors: Archiki Prasad, Mohammad Ali Rehan, Shreya Pathak, Preethi Jyothi

Abstract: While recent benchmarks have spurred a lot of new work on improving the generalization of pretrained multilingual language models on multilingual tasks, techniques to improve code-switched natural language understanding tasks have been far less explored. In this work, we propose the use of bilingual intermediate pretraining as a reliable technique to derive large and consistent performance gains o… ▽ More While recent benchmarks have spurred a lot of new work on improving the generalization of pretrained multilingual language models on multilingual tasks, techniques to improve code-switched natural language understanding tasks have been far less explored. In this work, we propose the use of bilingual intermediate pretraining as a reliable technique to derive large and consistent performance gains on three different NLP tasks using code-switched text. We achieve substantial absolute improvements of 7.87%, 20.15%, and 10.99%, on the mean accuracies and F1 scores over previous state-of-the-art systems for Hindi-English Natural Language Inference (NLI), Question Answering (QA) tasks, and Spanish-English Sentiment Analysis (SA) respectively. We show consistent performance gains on four different code-switched language-pairs (Hindi-English, Spanish-English, Tamil-English and Malayalam-English) for SA. We also present a code-switched masked language modelling (MLM) pretraining technique that consistently benefits SA compared to standard MLM pretraining using real code-switched text. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:2105.06424 [pdf, other]

Stateless Model Checking under a Reads-Value-From Equivalence

Authors: Pratyush Agarwal, Krishnendu Chatterjee, Shreya Pathak, Andreas Pavlogiannis, Viktor Toman

Abstract: Stateless model checking (SMC) is one of the standard approaches to the verification of concurrent programs. As scheduling non-determinism creates exponentially large spaces of thread interleavings, SMC attempts to partition this space into equivalence classes and explore only a few representatives from each class. The efficiency of this approach depends on two factors: (a) the coarseness of the p… ▽ More Stateless model checking (SMC) is one of the standard approaches to the verification of concurrent programs. As scheduling non-determinism creates exponentially large spaces of thread interleavings, SMC attempts to partition this space into equivalence classes and explore only a few representatives from each class. The efficiency of this approach depends on two factors: (a) the coarseness of the partitioning, and (b) the time to generate representatives in each class. For this reason, the search for coarse partitionings that are efficiently explorable is an active research challenge. In this work we present RVF-SMC, a new SMC algorithm that uses a novel \emph{reads-value-from (RVF)} partitioning. Intuitively, two interleavings are deemed equivalent if they agree on the value obtained in each read event, and read events induce consistent causal orderings between them. The RVF partitioning is provably coarser than recent approaches based on Mazurkiewicz and "reads-from" partitionings. Our experimental evaluation reveals that RVF is quite often a very effective equivalence, as the underlying partitioning is exponentially coarser than other approaches. Moreover, RVF-SMC generates representatives very efficiently, as the reduction in the partitioning is often met with significant speed-ups in the model checking task. △ Less

Submitted 13 May, 2021; originally announced May 2021.

Comments: Full technical report of the CAV2021 work

arXiv:2101.09844 [pdf, other]

Pattern Ensembling for Spatial Trajectory Reconstruction

Authors: Shivam Pathak, Mingyi He, Sergey Malinchik, Stanislav Sobolevsky

Abstract: Digital sensing provides an unprecedented opportunity to assess and understand mobility. However, incompleteness, missing information, possible inaccuracies, and temporal heterogeneity in the geolocation data can undermine its applicability. As mobility patterns are often repeated, we propose a method to use similar trajectory patterns from the local vicinity and probabilistically ensemble them to… ▽ More Digital sensing provides an unprecedented opportunity to assess and understand mobility. However, incompleteness, missing information, possible inaccuracies, and temporal heterogeneity in the geolocation data can undermine its applicability. As mobility patterns are often repeated, we propose a method to use similar trajectory patterns from the local vicinity and probabilistically ensemble them to robustly reconstruct missing or unreliable observations. We evaluate the proposed approach in comparison with traditional functional trajectory interpolation using a case of sea vessel trajectory data provided by The Automatic Identification System (AIS). By effectively leveraging the similarities in real-world trajectories, our pattern ensembling method helps to reconstruct missing trajectory segments of extended length and complex geometry. It can be used for locating mobile objects when temporary unobserved as well as for creating an evenly sampled trajectory interpolation useful for further trajectory mining. △ Less

Submitted 24 January, 2021; originally announced January 2021.

Comments: 11 pages, 5 figures

MSC Class: 68W99 ACM Class: I.5

arXiv:2010.01485 [pdf, other]

Improving Lesion Detection by exploring bias on Skin Lesion dataset

Authors: Anusua Trivedi, Sreya Muppalla, Shreyaan Pathak, Azadeh Mobasher, Pawel Janowski, Rahul Dodhia, Juan M. Lavista Ferres

Abstract: All datasets contain some biases, often unintentional, due to how they were acquired and annotated. These biases distort machine-learning models' performance, creating spurious correlations that the models can unfairly exploit, or, contrarily destroying clear correlations that the models could learn. With the popularity of deep learning models, automated skin lesion analysis is starting to play an… ▽ More All datasets contain some biases, often unintentional, due to how they were acquired and annotated. These biases distort machine-learning models' performance, creating spurious correlations that the models can unfairly exploit, or, contrarily destroying clear correlations that the models could learn. With the popularity of deep learning models, automated skin lesion analysis is starting to play an essential role in the early detection of Melanoma. The ISIC Archive is one of the most used skin lesion sources to benchmark deep learning-based tools. Bissoto et al. experimented with different bounding-box based masks and showed that deep learning models could classify skin lesion images without clinically meaningful information in the input data. Their findings seem confounding since the ablated regions (random rectangular boxes) are not significant. The shape of the lesion is a crucial factor in the clinical characterization of a skin lesion. In that context, we performed a set of experiments that generate shape-preserving masks instead of rectangular bounding-box based masks. A deep learning model trained on these shape-preserving masked images does not outperform models trained on images without clinically meaningful information. That strongly suggests spurious correlations guiding the models. We propose use of general adversarial network (GAN) to mitigate the underlying bias. △ Less

Submitted 4 October, 2020; originally announced October 2020.

arXiv:2002.08820 [pdf]

Deep Learning Estimation of Multi-Tissue Constrained Spherical Deconvolution with Limited Single Shell DW-MRI

Authors: Vishwesh Nath, Sudhir K. Pathak, Kurt G. Schilling, Walt Schneider, Bennett A. Landman

Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) is the only non-invasive approach for estimation of intra-voxel tissue microarchitecture and reconstruction of in vivo neural pathways for the human brain. With improvement in accelerated MRI acquisition technologies, DW-MRI protocols that make use of multiple levels of diffusion sensitization have gained popularity. A well-known advanced meth… ▽ More Diffusion-weighted magnetic resonance imaging (DW-MRI) is the only non-invasive approach for estimation of intra-voxel tissue microarchitecture and reconstruction of in vivo neural pathways for the human brain. With improvement in accelerated MRI acquisition technologies, DW-MRI protocols that make use of multiple levels of diffusion sensitization have gained popularity. A well-known advanced method for reconstruction of white matter microstructure that uses multi-shell data is multi-tissue constrained spherical deconvolution (MT-CSD). MT-CSD substantially improves the resolution of intra-voxel structure over the traditional single shell version, constrained spherical deconvolution (CSD). Herein, we explore the possibility of using deep learning on single shell data (using the b=1000 s/mm2 from the Human Connectome Project (HCP)) to estimate the information content captured by 8th order MT-CSD using the full three shell data (b=1000, 2000, and 3000 s/mm2 from HCP). Briefly, we examine two network architectures: 1.) Sequential network of fully connected dense layers with a residual block in the middle (ResDNN), 2.) Patch based convolutional neural network with a residual block (ResCNN). For both networks an additional output block for estimation of voxel fraction was used with a modified loss function. Each approach was compared against the baseline of using MT-CSD on all data on 15 subjects from the HCP divided into 5 training, 2 validation, and 8 testing subjects with a total of 6.7 million voxels. The fiber orientation distribution function (fODF) can be recovered with high correlation (0.77 vs 0.74 and 0.65) as compared to the ground truth of MT-CST, which was derived from the multi-shell DW-MRI acquisitions. Source code and models have been made publicly available. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: 10 pages, 7 figures

arXiv:1912.01960 [pdf, other]

Pattern and Anomaly Detection in Urban Temporal Networks

Authors: Mingyi He, Shivam Pathak, Urwa Muaz, **gtian Zhou, Saloni Saini, Sergey Malinchik, Stanislav Sobolevsky

Abstract: Broad spectrum of urban activities including mobility can be modeled as temporal networks evolving over time. Abrupt changes in urban dynamics caused by events such as disruption of civic operations, mass crowd gatherings, holidays and natural disasters are potentially reflected in these temporal mobility networks. Identification and early detecting of such abnormal developments is of critical imp… ▽ More Broad spectrum of urban activities including mobility can be modeled as temporal networks evolving over time. Abrupt changes in urban dynamics caused by events such as disruption of civic operations, mass crowd gatherings, holidays and natural disasters are potentially reflected in these temporal mobility networks. Identification and early detecting of such abnormal developments is of critical importance for transportation planning and security. Anomaly detection from high dimensional network data is a challenging task as edge level measurements often have low values and high variance resulting in high noise-to-signal ratio. In this study, we propose a generic three-phase pipeline approach to tackle curse of dimensionality and noisiness of the original data. Our pipeline consists of i) initial network aggregation leveraging community detection ii) unsupervised dimensionality reduction iii) clustering of the resulting representations for outlier detection. We perform extensive experiments to evaluate the proposed approach on mobility data collected from two major cities, New York City and Taipei. Our results empirically prove that proposed methodology outperforms traditional approaches for anomaly detection. We further argue that the proposed anomaly detection framework is potentially generalizable to various other types of temporal networks e.g. social interactions, information propagation and epidemic spread. △ Less

Submitted 25 November, 2019; originally announced December 2019.

Comments: 12 pages, 3 figures

arXiv:1808.06914 [pdf]

Segmentation of Microscopy Data for finding Nuclei in Divergent Images

Authors: Shivam Singh, Stuti Pathak

Abstract: Every year millions of people die due to disease of Cancer. Due to its invasive nature it is very complex to cure even in primary stages. Hence, only method to survive this disease completely is via forecasting by analyzing the early mutation in cells of the patient biopsy. Cell Segmentation can be used to find cell which have left their nuclei. This enables faster cure and high rate of survival.… ▽ More Every year millions of people die due to disease of Cancer. Due to its invasive nature it is very complex to cure even in primary stages. Hence, only method to survive this disease completely is via forecasting by analyzing the early mutation in cells of the patient biopsy. Cell Segmentation can be used to find cell which have left their nuclei. This enables faster cure and high rate of survival. Cell counting is a hard, yet tedious task that would greatly benefit from automation. To accomplish this task, segmentation of cells need to be accurate. In this paper, we have improved the learning of training data by our network. It can annotate precise masks on test data. we examine the strength of activation functions in medical image segmentation task by improving learning rates by our proposed Carving Technique. Identifying the cells nuclei is the starting point for most analyses, identifying nuclei allows researchers to identify each individual cell in a sample, and by measuring how cells react to various treatments, the researcher can understand the underlying biological processes at work. Experimental results shows the efficiency of the proposed work. △ Less

Submitted 22 August, 2018; v1 submitted 19 August, 2018; originally announced August 2018.

Comments: 7 pages, 7 figures, 1 table. arXiv admin note: text overlap with arXiv:1807.04459, arXiv:1802.10548, arXiv:1807.10165 by other authors

arXiv:1710.01420 [pdf, other]

Usable & Scalable Learning Over Relational Data With Automatic Language Bias

Authors: Jose Picado, Arash Termehchy, Sudhanshu Pathak, Alan Fern, Praveen Ilango, Yunqiao Cai

Abstract: Relational databases are valuable resources for learning novel and interesting relations and concepts. In order to constraint the search through the large space of candidate definitions, users must tune the algorithm by specifying a language bias. Unfortunately, specifying the language bias is done via trial and error and is guided by the expert's intuitions. We propose AutoBias, a system that lev… ▽ More Relational databases are valuable resources for learning novel and interesting relations and concepts. In order to constraint the search through the large space of candidate definitions, users must tune the algorithm by specifying a language bias. Unfortunately, specifying the language bias is done via trial and error and is guided by the expert's intuitions. We propose AutoBias, a system that leverages information in the schema and content of the database to automatically induce the language bias used by popular relational learning systems. We show that AutoBias delivers the same accuracy as using manually-written language bias by imposing only a slight overhead on the running time of the learning algorithm. △ Less

Submitted 6 April, 2020; v1 submitted 3 October, 2017; originally announced October 2017.

arXiv:1609.05670 [pdf, ps, other]

Load-aware Performance Analysis of Cell Center/Edge Users in Random HetNets

Authors: Praful D. Mankar, Goutam Das, S. S. Pathak

Abstract: For real-time traffic, the link quality and call blocking probability (both derived from coverage probability) are realized to be poor for cell edge users (CEUs) compared to cell center users (CCUs) as the signal reception in the cell center region is better compared to the cell edge region. In heterogeneous networks (HetNets), the uncoordinated channel access by different types of base stations d… ▽ More For real-time traffic, the link quality and call blocking probability (both derived from coverage probability) are realized to be poor for cell edge users (CEUs) compared to cell center users (CCUs) as the signal reception in the cell center region is better compared to the cell edge region. In heterogeneous networks (HetNets), the uncoordinated channel access by different types of base stations determine the interference statistics that further arbitrates the coverage probability. Thus, the spectrum allocation techniques have major impact on the performance of CCU and CEU. In this paper, the performance of CCUs and CEUs in a random two-tier network is studied for two spectrum allocation techniques namely: 1) co-channel (CSA), and 2) shared (SSA). For performance analysis, the widely accepted conception of modeling the tiers of HetNet using independent homogeneous Poisson point process (PPP) is considered to accommodate the spatial randomness in location of BSs. To incorporate the spatial randomness in the arrival of service and to aid the load-aware analysis, the cellular traffic is modeled using spatio-temporal PPP. Under this scenario, we have developed an analytical framework to evaluate the load-aware performance, including coverage and blocking probabilities, of CCUs and CEUs under both spectrum allocation techniques. Further, we provide insight into achievable area energy efficiency for SSA and CSA. The developed analytical framework is validated through extensive simulations. Next, we demonstrate the impact of traffic load and femto access points density on the performance of CCUs/CEUs under CSA and SSA. △ Less

Submitted 9 March, 2017; v1 submitted 19 September, 2016; originally announced September 2016.

Comments: 13 pages and 11 figures. This paper is submitted to IEEE Transaction on Vehicular Technology

arXiv:1609.05656 [pdf, ps, other]

Coverage Analysis of Two-Tier HetNets for Co-Channel, Orthogonal, and Partial Spectrum Sharing under Fractional Load Conditions

Authors: Praful D. Mankar, Goutam Das, S. S. Pathak

Abstract: In heterogeneous networks, the random deployment of femto access points (FAPs) and macro base stations (MBSs) with uncoordinated channel access impose huge inter-tier interferences. In real-life networks, the process of MBSs deployment exhibits the homogeneity, however the FAPs have the behavioral characteristic of clusters formation like in malls, apartments, offices, etc. Therefore, the composit… ▽ More In heterogeneous networks, the random deployment of femto access points (FAPs) and macro base stations (MBSs) with uncoordinated channel access impose huge inter-tier interferences. In real-life networks, the process of MBSs deployment exhibits the homogeneity, however the FAPs have the behavioral characteristic of clusters formation like in malls, apartments, offices, etc. Therefore, the composite modeling of the MBSs and the FAPs using Poisson point process and Poisson cluster process is employed for the evaluation of coverage probability. The scenario of the real-time traffic for macro-tier and the best-effort traffic for femto-tier is considered. Cognition is introduced in the clustered FAPs to control the inter-tier interference. Furthermore, the impact of macro-tier load is analyzed by exploiting the inherent coupling between coverage probability and activity factor of an MBS. Further, we study the effect of co-channel, orthogonal, and partial spectrum sharing modes on the coverage for given parameters like load condition, FAPs/MBSs density, etc. We provide simulation validation for the derived expressions of coverage and present an comparative analysis for the mentioned spectrum sharing modes. △ Less

Submitted 18 February, 2017; v1 submitted 19 September, 2016; originally announced September 2016.

Comments: 14 pages and 10 figures. Submitted to IEEE Transaction on Vehicular Technology

arXiv:1606.05124 [pdf, other]

Robust Active Perception via Data-association aware Belief Space planning

Authors: Shashank Pathak, Antony Thomas, Asaf Feniger, Vadim Indelman

Abstract: We develop a belief space planning (BSP) approach that advances the state of the art by incorporating reasoning about data association (DA) within planning, while considering additional sources of uncertainty. Existing BSP approaches typically assume data association is given and perfect, an assumption that can be harder to justify while operating, in the presence of localization uncertainty, in a… ▽ More We develop a belief space planning (BSP) approach that advances the state of the art by incorporating reasoning about data association (DA) within planning, while considering additional sources of uncertainty. Existing BSP approaches typically assume data association is given and perfect, an assumption that can be harder to justify while operating, in the presence of localization uncertainty, in ambiguous and perceptually aliased environments. In contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about DA within belief evolution, and as such can better accommodate these challenging real world scenarios. In particular, we show that due to perceptual aliasing, the posterior belief becomes a mixture of probability distribution functions, and design cost functions that measure the expected level of ambiguity and posterior uncertainty. Using these and standard costs (e.g.~control penalty, distance to goal) within the objective function, yields a general framework that reliably represents action impact, and in particular, capable of active disambiguation. Our approach is thus applicable to robust active perception and autonomous navigation in perceptually aliased environments. We demonstrate key aspects in basic and realistic simulations. △ Less

Submitted 16 June, 2016; originally announced June 2016.

ACM Class: I.2.9; I.2.10; G.3

arXiv:1602.03273 [pdf, ps, other]

YTrace: End-to-end Performance Diagnosis in Large Cloud and Content Providers

Authors: Partha Kanuparthy, Yuchen Dai, Sudhir Pathak, Sambit Samal, Theophilus Benson, Mojgan Ghasemi, P. P. S. Narayan

Abstract: Content providers build serving stacks to deliver content to users. An important goal of a content provider is to ensure good user experience, since user experience has an impact on revenue. In this paper, we describe a system at Yahoo called YTrace that diagnoses bad user experience in near real time. We present the different components of YTrace for end-to-end multi-layer diagnosis (instrumentat… ▽ More Content providers build serving stacks to deliver content to users. An important goal of a content provider is to ensure good user experience, since user experience has an impact on revenue. In this paper, we describe a system at Yahoo called YTrace that diagnoses bad user experience in near real time. We present the different components of YTrace for end-to-end multi-layer diagnosis (instrumentation, methods and backend system), and the system architecture for delivering diagnosis in near real time across all user sessions at Yahoo. YTrace diagnoses problems across service and network layers in the end-to-end path spanning user host, Internet, CDN and the datacenters, and has three diagnosis goals: detection, localization and root cause analysis (including cascading problems) of performance problems in user sessions with the cloud. The key component of the methods in YTrace is capturing and discovering causality, which we design based on a mix of instrumentation API, domain knowledge and blackbox methods. We show three case studies from production that span a large-scale distributed storage system, a datacenter-wide network, and an end-to-end video serving stack at Yahoo. We end by listing a number of open directions for performance diagnosis in cloud and content providers. △ Less

Submitted 25 May, 2016; v1 submitted 10 February, 2016; originally announced February 2016.

ACM Class: B.8.2; C.2.4; C.4

arXiv:1504.08367 [pdf, ps, other]

Another look in the Analysis of Cooperative Spectrum Sensing over Nakagami-$m$ Fading Channels

Authors: Debasish Bera, Sant S. Pathak, Indrajit Chakrabarty, George K. Karagiannidis

Abstract: Modeling and analysis of cooperative spectrum sensing is an important aspect in cognitive radio systems. In this paper, the problem of energy detection (ED) of an unknown signal over Nakagami-$m$ fading is revisited. Specifically, an analytical expression for the local probability of detection is derived, while using the approach of ED at the individual secondary user (SU), a new fusion rule, base… ▽ More Modeling and analysis of cooperative spectrum sensing is an important aspect in cognitive radio systems. In this paper, the problem of energy detection (ED) of an unknown signal over Nakagami-$m$ fading is revisited. Specifically, an analytical expression for the local probability of detection is derived, while using the approach of ED at the individual secondary user (SU), a new fusion rule, based on the likelihood ratio test, is presented. The channels between the primary user to SUs and SUs to fusion center are considered to be independent Nakagami-$m$. The proposed fusion rule uses the channel statistics, instead of the instantaneous channel state information, and is based on the Neyman-Pearson criteria. Closed-form solutions for the system-level probability of detection and probability of false alarm are also derived. Furthermore, a closed-form expression for the optimal number of cooperative SUs, needed to minimize the total error rate, is presented. The usefulness of factor graph and sum-product-algorithm models for computing likelihoods, is also discussed to highlight its advantage, in terms of computational cost. The performance of the proposed schemes have been evaluated both by analysis and simulations. Results show that the proposed rules perform well over a wide range of the signal-to-noise ratio. △ Less

Submitted 28 April, 2015; originally announced April 2015.

Comments: 29 pages, 9 figures

arXiv:1407.5609 [pdf, ps, other]

Efficient Algorithms for the Closest Pair Problem and Applications

Authors: Sanguthevar Rajasekaran, Sudipta Pathak

Abstract: The closest pair problem (CPP) is one of the well studied and fundamental problems in computing. Given a set of points in a metric space, the problem is to identify the pair of closest points. Another closely related problem is the fixed radius nearest neighbors problem (FRNNP). Given a set of points and a radius $R$, the problem is, for every input point $p$, to identify all the other input point… ▽ More The closest pair problem (CPP) is one of the well studied and fundamental problems in computing. Given a set of points in a metric space, the problem is to identify the pair of closest points. Another closely related problem is the fixed radius nearest neighbors problem (FRNNP). Given a set of points and a radius $R$, the problem is, for every input point $p$, to identify all the other input points that are within a distance of $R$ from $p$. A naive deterministic algorithm can solve these problems in quadratic time. CPP as well as FRNNP play a vital role in computational biology, computational finance, share market analysis, weather prediction, entomology, electro cardiograph, N-body simulations, molecular simulations, etc. As a result, any improvements made in solving CPP and FRNNP will have immediate implications for the solution of numerous problems in these domains. We live in an era of big data and processing these data take large amounts of time. Speeding up data processing algorithms is thus much more essential now than ever before. In this paper we present algorithms for CPP and FRNNP that improve (in theory and/or practice) the best-known algorithms reported in the literature for CPP and FRNNP. These algorithms also improve the best-known algorithms for related applications including time series motif mining and the two locus problem in Genome Wide Association Studies (GWAS). △ Less

Submitted 21 July, 2014; originally announced July 2014.

arXiv:1401.0898 [pdf]

Feature Selection Using Classifier in High Dimensional Data

Authors: Vijendra Singh, Shivani Pathak

Abstract: Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The central objective of this paper is to reduce the dimension of the data by finding a small set of important features which can give good classification performance.… ▽ More Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The central objective of this paper is to reduce the dimension of the data by finding a small set of important features which can give good classification performance. We have applied filter and wrapper approach with different classifiers QDA and LDA respectively. A widely-used filter method is used for bioinformatics data i.e. a univariate criterion separately on each feature, assuming that there is no interaction between features and then applied Sequential Feature Selection method. Experimental results show that filter approach gives better performance in respect of Misclassification Error Rate. △ Less

Submitted 5 January, 2014; originally announced January 2014.

arXiv:1306.2425 [pdf]

Ber Performance Analysis of WiMAX PHY Layer under different channel conditions

Authors: Shantanu Pathak, Ranjani S

Abstract: This paper gives an introduction on the IEEE 802.16 standard WIMAX or Worldwide Interoperability for Microwave Access. The different parts give details on the architectural specifications of WiMAX networks and also on the working principle of WiMAX networks including its services provided. It also provides brief descriptions on its salient features of this technology and how it benefits the networ… ▽ More This paper gives an introduction on the IEEE 802.16 standard WIMAX or Worldwide Interoperability for Microwave Access. The different parts give details on the architectural specifications of WiMAX networks and also on the working principle of WiMAX networks including its services provided. It also provides brief descriptions on its salient features of this technology and how it benefits the networking industry. A brief outline of the basic building blocks or equipment of WiMAX architecture is also provided. This paper also evaluates the simulation performance of IEEE 802.16 OFDM PHY layer. The Stanford University Interim (SUI) channel model under varying parameters is selected for the wireless channel in the simulation. The performance measurements and analysis was done in simulation developed in MATLAB. △ Less

Submitted 11 June, 2013; originally announced June 2013.

Comments: 19 pages, 12 figures

Journal ref: International Journal of Information Sciences and Techniques (IJIST) Vol.3, No.3, May 2013

arXiv:1305.2836 [pdf]

Infrastructure to Vehicle Real Time Secured Communication

Authors: Smita Pathak, Urmila Shrawankar

Abstract: Among civilian communication systems, vehicular networks emerge as one of the most is convincing and yet most challenging instantiations of the mobile ad hoc networking technology. Towards the deployment of vehicular communication systems, security and privacy are critical factors and significant challenges to be met. This Vehicular communication (VC) system has the potential to improve road safet… ▽ More Among civilian communication systems, vehicular networks emerge as one of the most is convincing and yet most challenging instantiations of the mobile ad hoc networking technology. Towards the deployment of vehicular communication systems, security and privacy are critical factors and significant challenges to be met. This Vehicular communication (VC) system has the potential to improve road safety and driving comfort. Nevertheless, securing the operation is a prerequisite for deployment so in this paper we are focusing on real time experimental design of infrastructure to vehicle communication. We outline how VANET will be a better option than GPS technology. We also try to discuss IP address passing using DHCP in the network and the security issues. △ Less

Submitted 10 May, 2013; originally announced May 2013.

Comments: Pages: 05 Figures: 04, Proceedings of International Symposium on Computing, Communication, and Control, ISBN 978-1-84626, 2009. arXiv admin note: text overlap with arXiv:0912.5391 by other authors

arXiv:1111.2160 [pdf]

Adaptive Subcarrier and Bit Allocation for Downlink OFDMA System with Proportional Fairness

Authors: Sudhir B. Lande, J. B. Helonde, Rajesh Pande, S. S. Pathak

Abstract: This paper investigates the adaptive subcarrier and bit allocation algorithm for OFDMA systems. To minimize overall transmitted power, we propose a novel adaptive subcarrier and bit allocation algorithm based on channel state information (CSI) and quality state information (QSI). A suboptimal approach that separately performs subcarrier allocation and bit loading is proposed. It is shown that a ne… ▽ More This paper investigates the adaptive subcarrier and bit allocation algorithm for OFDMA systems. To minimize overall transmitted power, we propose a novel adaptive subcarrier and bit allocation algorithm based on channel state information (CSI) and quality state information (QSI). A suboptimal approach that separately performs subcarrier allocation and bit loading is proposed. It is shown that a near optimal solution is obtained by the proposed algorithm which has low complexity compared to that of other conventional algorithm. We will study the problem of finding an optimal sub-carrier and power allocation strategy for downlink communication to multiple users in an OFDMA based wireless system. Assuming knowledge of the instantaneous channel gains for all users, we propose a multiuser OFDMA subcarrier, and bit allocation algorithm to minimize the total transmit power. This is done by assigning each user a set of subcarriers and by determining the number of bits and the transmit power level for each subcarrier. The objective is to minimize the total transmitted power over the entire network to satisfy the application layer and physical layer. We formulate this problem as a constrained optimization problem and present centralized algorithms. The simulation results will show that our approach results in an efficient assignment of subcarriers and transmitter power levels in terms of the energy required for transmitting each bit of information, to address this need, we also present a bit loading algorithm for allocating subcarriers and bits in order to satisfy the rate requirements of the links. △ Less

Submitted 9 November, 2011; originally announced November 2011.

Showing 1–44 of 44 results for author: Pathak, S