Search | arXiv e-print repository

Detection of Sleep Oxygen Desaturations from Electroencephalogram Signals

Authors: Shashank Manjunath, Aarti Sathyanarayana

Abstract: In this work, we leverage machine learning techniques to identify potential biomarkers of oxygen desaturation during sleep exclusively from electroencephalogram (EEG) signals in pediatric patients with sleep apnea. Development of a machine learning technique which can successfully identify EEG signals from patients with sleep apnea as well as identify latent EEG signals which come from subjects wh… ▽ More In this work, we leverage machine learning techniques to identify potential biomarkers of oxygen desaturation during sleep exclusively from electroencephalogram (EEG) signals in pediatric patients with sleep apnea. Development of a machine learning technique which can successfully identify EEG signals from patients with sleep apnea as well as identify latent EEG signals which come from subjects who experience oxygen desaturations but do not themselves occur during oxygen desaturation events would provide a strong step towards develo** a brain-based biomarker for sleep apnea in order to aid with easier diagnosis of this disease. We leverage a large corpus of data, and show that machine learning enables us to classify EEG signals as occurring during oxygen desaturations or not occurring during oxygen desaturations with an average 66.8% balanced accuracy. We furthermore investigate the ability of machine learning models to identify subjects who experience oxygen desaturations from EEG data that does not occur during oxygen desaturations. We conclude that there is a potential biomarker for oxygen desaturation in EEG data. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 4 Pages

arXiv:2405.00808 [pdf, other]

ReeSPOT: Reeb Graph Models Semantic Patterns of Normalcy in Human Trajectories

Authors: Bowen Zhang, S. Shailja, Chandrakanth Gudavalli, Connor Levenson, Amil Khan, B. S. Manjunath

Abstract: This paper introduces ReeSPOT, a novel Reeb graph-based method to model patterns of life in human trajectories (akin to a fingerprint). Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In this paper, we model this behavior using Reeb graphs where any deviation from usual day-to-day activities is en… ▽ More This paper introduces ReeSPOT, a novel Reeb graph-based method to model patterns of life in human trajectories (akin to a fingerprint). Human behavior typically follows a pattern of normalcy in day-to-day activities. This is marked by recurring activities within specific time periods. In this paper, we model this behavior using Reeb graphs where any deviation from usual day-to-day activities is encoded as nodes in the Reeb graph. The complexity of the proposed algorithm is linear with respect to the number of time points in a given trajectory. We demonstrate the usage of ReeSPOT and how it captures the critically significant spatial and temporal deviations using the nodes of the Reeb graph. Our case study presented in this paper includes realistic human movement scenarios: visiting uncommon locations, taking odd routes at infrequent times, uncommon time visits, and uncommon stay durations. We analyze the Reeb graph to interpret the topological structure of the GPS trajectories. Potential applications of ReeSPOT include urban planning, security surveillance, and behavioral research. △ Less

Submitted 13 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2401.13006 [pdf, other]

CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited Data

Authors: Chandrakanth Gudavalli, Erik Rosten, Lakshmanan Nataraj, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Content creation and image editing can benefit from flexible user controls. A common intermediate representation for conditional image generation is a semantic map, that has information of objects present in the image. When compared to raw RGB pixels, the modification of semantic map is much easier. One can take a semantic map and easily modify the map to selectively insert, remove, or replace obj… ▽ More Content creation and image editing can benefit from flexible user controls. A common intermediate representation for conditional image generation is a semantic map, that has information of objects present in the image. When compared to raw RGB pixels, the modification of semantic map is much easier. One can take a semantic map and easily modify the map to selectively insert, remove, or replace objects in the map. The method proposed in this paper takes in the modified semantic map and alter the original image in accordance to the modified map. The method leverages traditional pre-trained image-to-image translation GANs, such as CycleGAN or Pix2Pix GAN, that are fine-tuned on a limited dataset of reference images associated with the semantic maps. We discuss the qualitative and quantitative performance of our technique to illustrate its capacity and possible applications in the fields of image forgery and image editing. We also demonstrate the effectiveness of the proposed image forgery technique in thwarting the numerous deep learning-based image forensic techniques, highlighting the urgent need to develop robust and generalizable image forensic tools in the fight against the spread of fake media. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2312.07383 [pdf, ps, other]

Delay analysis of the IEEE 802.11bd EDCA with repetitions

Authors: Aditya Sarkar, Sreelakshmi Manjunath

Abstract: We analyse the performance of the IEEE 802.11bd MAC protocol, with Enhanced Distributed Channel Access (EDCA) and repeated transmissions, in terms of the MAC access delay of packets pertaining to safety-related events. We outline Markov chain models for the contention mechanism of priority-based access categories, and derive the associated steady-state probabilities. Using these probabilities, we… ▽ More We analyse the performance of the IEEE 802.11bd MAC protocol, with Enhanced Distributed Channel Access (EDCA) and repeated transmissions, in terms of the MAC access delay of packets pertaining to safety-related events. We outline Markov chain models for the contention mechanism of priority-based access categories, and derive the associated steady-state probabilities. Using these probabilities, we characterise the delay experienced by the packet in the MAC layer. Further, we characterise the reliability of the protocol in terms of the likelihood that a packet is delivered within a critical time interval. Numerical computations are conducted to understand the impact of various system parameters on the MAC access delay. The analysis indicates that the MAC access delay depends on various system parameters, some of which are influenced by the traffic scenario and nature of safety-critical events. Motivated by this, we used our analysis to study the delay and reliability of the 802.11bd MAC protocol specific to the context of platooning of connected vehicles subject to interruptions by human-driven motorised two wheelers. We observe that while the delay performance of the protocol is as per the QoS requirements of the standard, the protocol may not be reliable for this specific application. Our study suggests that it is desirable to co-design vehicular communication protocols with prevalent safety-related traffic applications. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.10879 [pdf, other]

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Authors: Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You

Abstract: The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data… ▽ More The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments. △ Less

Submitted 25 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2306.16672 [pdf, ps, other]

Towards understanding the performance of IEEE 802.11p MAC in heterogeneous traffic conditions

Authors: MS Gayathree, Sreelakshmi Manjunath

Abstract: Motivated by the need to study the performance of vehicular communication protocols as applicable to heterogeneous traffic conditions, we study the performance of IEEE 802.11p medium access protocol under such a traffic setup. We consider a setup comprising connected vehicles and human-driven Motorised Two Wheelers (MTWs), where the connected vehicles are required to move as platoon with a desired… ▽ More Motivated by the need to study the performance of vehicular communication protocols as applicable to heterogeneous traffic conditions, we study the performance of IEEE 802.11p medium access protocol under such a traffic setup. We consider a setup comprising connected vehicles and human-driven Motorised Two Wheelers (MTWs), where the connected vehicles are required to move as platoon with a desired constant headway despite interruptions from the two wheelers. We invoke specific mobility models for the movement of the vehicles--car following models for connected vehicle platoons and gap-acceptance model to capture the movement of the MTWs--and use them to configure (i) the traffic setup and (ii) the rate at which data packets related to safety-critical messages need to be transmitted. A control-theoretic analysis of the car-following models yields a bound on the admissible communication delay to ensure non-oscillatory convergence of the platoon headway. We then use suitable Markov chain models to derive the distribution of the MAC access delay experienced by packets pertaining to safety-critical events as well as routine safety messages. The distribution along with the bound on the admissible delay enables us to derive the reliability of the 802.11p MAC protocol in terms of traffic and EDCA parameters. Our study highlights the need for redesign of MAC protocols for vehicular communications for safety-critical applications in heterogeneous conditions. △ Less

Submitted 12 December, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 15 pages

arXiv:2304.14853 [pdf, other]

doi 10.1109/EMBC40787.2023.10340674

Topological Data Analysis of Electroencephalogram Signals for Pediatric Obstructive Sleep Apnea

Authors: Shashank Manjunath, Jose A. Perea, Aarti Sathyanarayana

Abstract: Topological data analysis (TDA) is an emerging technique for biological signal processing. TDA leverages the invariant topological features of signals in a metric space for robust analysis of signals even in the presence of noise. In this paper, we leverage TDA on brain connectivity networks derived from electroencephalogram (EEG) signals to identify statistical differences between pediatric patie… ▽ More Topological data analysis (TDA) is an emerging technique for biological signal processing. TDA leverages the invariant topological features of signals in a metric space for robust analysis of signals even in the presence of noise. In this paper, we leverage TDA on brain connectivity networks derived from electroencephalogram (EEG) signals to identify statistical differences between pediatric patients with obstructive sleep apnea (OSA) and pediatric patients without OSA. We leverage a large corpus of data, and show that TDA enables us to see a statistical difference between the brain dynamics of the two groups. △ Less

Submitted 28 April, 2023; originally announced April 2023.

arXiv:2304.02767 [pdf, other]

MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection

Authors: Satish Kumar, Ivan Arevalo, ASM Iftekhar, B S Manjunath

Abstract: Methane (CH$_4$) is the chief contributor to global climate change. Recent Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) has been very useful in quantitative map** of methane emissions. Existing methods for analyzing this data are sensitive to local terrain conditions, often require manual inspection from domain experts, prone to significant error and hence are not s… ▽ More Methane (CH$_4$) is the chief contributor to global climate change. Recent Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) has been very useful in quantitative map** of methane emissions. Existing methods for analyzing this data are sensitive to local terrain conditions, often require manual inspection from domain experts, prone to significant error and hence are not scalable. To address these challenges, we propose a novel end-to-end spectral absorption wavelength aware transformer network, MethaneMapper, to detect and quantify the emissions. MethaneMapper introduces two novel modules that help to locate the most relevant methane plume regions in the spectral domain and uses them to localize these accurately. Thorough evaluation shows that MethaneMapper achieves 0.63 mAP in detection and reduces the model size (by 5x) compared to the current state of the art. In addition, we also introduce a large-scale dataset of methane plume segmentation mask for over 1200 AVIRIS-NG flight lines from 2015-2022. It contains over 4000 methane plume sites. Our dataset will provide researchers the opportunity to develop and advance new methods for tackling this challenging green-house gas detection problem with significant broader social impact. Dataset and source code are public △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 16 pages, 9 figures, 3 tables, Accepted to Computer Vision and Pattern Recognition (CVPR 2023) conference. CVPR 2023 Highlights paper

ACM Class: I.2; I.3; I.5; I.7; I.m

arXiv:2303.10722 [pdf, other]

Q-RBSA: High-Resolution 3D EBSD Map Generation Using An Efficient Quaternion Transformer Network

Authors: Devendra K. Jangid, Neal R. Brodnik, McLean P. Echlin, Tresa M. Pollock, Samantha H. Daly, B. S. Manjunath

Abstract: Gathering 3D material microstructural information is time-consuming, expensive, and energy-intensive. Acquisition of 3D data has been accelerated by developments in serial sectioning instrument capabilities; however, for crystallographic information, the electron backscatter diffraction (EBSD) imaging modality remains rate limiting. We propose a physics-based efficient deep learning framework to r… ▽ More Gathering 3D material microstructural information is time-consuming, expensive, and energy-intensive. Acquisition of 3D data has been accelerated by developments in serial sectioning instrument capabilities; however, for crystallographic information, the electron backscatter diffraction (EBSD) imaging modality remains rate limiting. We propose a physics-based efficient deep learning framework to reduce the time and cost of collecting 3D EBSD maps. Our framework uses a quaternion residual block self-attention network (QRBSA) to generate high-resolution 3D EBSD maps from sparsely sectioned EBSD maps. In QRBSA, quaternion-valued convolution effectively learns local relations in orientation space, while self-attention in the quaternion domain captures long-range correlations. We apply our framework to 3D data collected from commercially relevant titanium alloys, showing both qualitatively and quantitatively that our method can predict missing samples (EBSD information between sparsely sectioned map** points) as compared to high-resolution ground truth 3D EBSD maps. △ Less

Submitted 19 March, 2023; originally announced March 2023.

arXiv:2301.07666 [pdf, other]

DDS: Decoupled Dynamic Scene-Graph Generation Network

Authors: A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B. S. Manjunath

Abstract: Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynam… ▽ More Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynamic scene-graph generation network -- that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2212.07044 [pdf, other]

3D Neuron Morphology Analysis

Authors: Jiaxiang Jiang, Michael Goebel, Cezar Borba, William Smith, B. S. Manjunath

Abstract: We consider the problem of finding an accurate representation of neuron shapes, extracting sub-cellular features, and classifying neurons based on neuron shapes. In neuroscience research, the skeleton representation is often used as a compact and abstract representation of neuron shapes. However, existing methods are limited to getting and analyzing "curve" skeletons which can only be applied for… ▽ More We consider the problem of finding an accurate representation of neuron shapes, extracting sub-cellular features, and classifying neurons based on neuron shapes. In neuroscience research, the skeleton representation is often used as a compact and abstract representation of neuron shapes. However, existing methods are limited to getting and analyzing "curve" skeletons which can only be applied for tubular shapes. This paper presents a 3D neuron morphology analysis method for more general and complex neuron shapes. First, we introduce the concept of skeleton mesh to represent general neuron shapes and propose a novel method for computing mesh representations from 3D surface point clouds. A skeleton graph is then obtained from skeleton mesh and is used to extract sub-cellular features. Finally, an unsupervised learning method is used to embed the skeleton graph for neuron classification. Extensive experiment results are provided and demonstrate the robustness of our method to analyze neuron morphology. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2211.09363 [pdf, other]

Generalizable Deepfake Detection with Phase-Based Motion Analysis

Authors: Ekta Prashnani, Michael Goebel, B. S. Manjunath

Abstract: We propose PhaseForensics, a DeepFake (DF) video detection method that leverages a phase-based motion representation of facial temporal dynamics. Existing methods relying on temporal inconsistencies for DF detection present many advantages over the typical frame-based methods. However, they still show limited cross-dataset generalization and robustness to common distortions. These shortcomings are… ▽ More We propose PhaseForensics, a DeepFake (DF) video detection method that leverages a phase-based motion representation of facial temporal dynamics. Existing methods relying on temporal inconsistencies for DF detection present many advantages over the typical frame-based methods. However, they still show limited cross-dataset generalization and robustness to common distortions. These shortcomings are partially due to error-prone motion estimation and landmark tracking, or the susceptibility of the pixel intensity-based features to spatial distortions and the cross-dataset domain shifts. Our key insight to overcome these issues is to leverage the temporal phase variations in the band-pass components of the Complex Steerable Pyramid on face sub-regions. This not only enables a robust estimate of the temporal dynamics in these regions, but is also less prone to cross-dataset variations. Furthermore, the band-pass filters used to compute the local per-frame phase form an effective defense against the perturbations commonly seen in gradient-based adversarial attacks. Overall, with PhaseForensics, we show improved distortion and adversarial robustness, and state-of-the-art cross-dataset generalization, with 91.2% video-level AUC on the challenging CelebDFv2 (a recent state-of-the-art compares at 86.9%). △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.08479 [pdf, other]

Context-Matched Collage Generation for Underwater Invertebrate Detection

Authors: R. Austin McEver, Bowen Zhang, B. S. Manjunath

Abstract: The quality and size of training sets often limit the performance of many state of the art object detectors. However, in many scenarios, it can be difficult to collect images for training, not to mention the costs associated with collecting annotations suitable for training these object detectors. For these reasons, on challenging video datasets such as the Dataset for Underwater Substrate and Inv… ▽ More The quality and size of training sets often limit the performance of many state of the art object detectors. However, in many scenarios, it can be difficult to collect images for training, not to mention the costs associated with collecting annotations suitable for training these object detectors. For these reasons, on challenging video datasets such as the Dataset for Underwater Substrate and Invertebrate Analysis (DUSIA), budgets may only allow for collecting and providing partial annotations. To aid in the challenges associated with training with limited and partial annotations, we introduce Context Matched Collages, which leverage explicit context labels to combine unused background examples with existing annotated data to synthesize additional training samples that ultimately improve object detection performance. By combining a set of our generated collage images with the original training set, we see improved performance using three different object detectors on DUSIA, ultimately achieving state of the art object detection performance on the dataset. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.02696 [pdf, other]

MalGrid: Visualization Of Binary Features In Large Malware Corpora

Authors: Tajuddin Manhar Mohammed, Lakshmanan Nataraj, Satish Chikkagoudar, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: The number of malware is constantly on the rise. Though most new malware are modifications of existing ones, their sheer number is quite overwhelming. In this paper, we present a novel system to visualize and map millions of malware to points in a 2-dimensional (2D) spatial grid. This enables visualizing relationships within large malware datasets that can be used to develop triage solutions to sc… ▽ More The number of malware is constantly on the rise. Though most new malware are modifications of existing ones, their sheer number is quite overwhelming. In this paper, we present a novel system to visualize and map millions of malware to points in a 2-dimensional (2D) spatial grid. This enables visualizing relationships within large malware datasets that can be used to develop triage solutions to screen different malware rapidly and provide situational awareness. Our approach links two visualizations within an interactive display. Our first view is a spatial point-based visualization of similarity among the samples based on a reduced dimensional projection of binary feature representations of malware. Our second spatial grid-based view provides a better insight into similarities and differences between selected malware samples in terms of the binary-based visual representations they share. We also provide a case study where the effect of packing on the malware data is correlated with the complexity of the packing algorithm. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: Submitted version - MILCOM 2022 IEEE Military Communications Conference. The high-quality images in this paper can be found on Github (https://github.com/Mayachitra-Inc/MalGrid)

arXiv:2210.03780 [pdf, other]

LOCL: Learning Object-Attribute Composition using Localization

Authors: Satish Kumar, ASM Iftekhar, Ekta Prashnani, B. S. Manjunath

Abstract: This paper describes LOCL (Learning Object Attribute Composition using Localization) that generalizes composition zero shot learning to objects in cluttered and more realistic settings. The problem of unseen Object Attribute (OA) associations has been well studied in the field, however, the performance of existing methods is limited in challenging scenes. In this context, our key contribution is a… ▽ More This paper describes LOCL (Learning Object Attribute Composition using Localization) that generalizes composition zero shot learning to objects in cluttered and more realistic settings. The problem of unseen Object Attribute (OA) associations has been well studied in the field, however, the performance of existing methods is limited in challenging scenes. In this context, our key contribution is a modular approach to localizing objects and attributes of interest in a weakly supervised context that generalizes robustly to unseen configurations. Localization coupled with a composition classifier significantly outperforms state of the art (SOTA) methods, with an improvement of about 12% on currently available challenging datasets. Further, the modularity enables the use of localized feature extractor to be used with existing OA compositional learning methods to improve their overall performance. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 20 pages, 7 figures, 11 tables, Accepted in British Machine Vision Conference 2022

ACM Class: I.2; I.4; I.5; I.7; I.m

arXiv:2208.07997 [pdf, other]

Deep Learning Enabled Time-Lapse 3D Cell Analysis

Authors: Jiaxiang Jiang, Amil Khan, S. Shailja, Samuel A. Belteton, Michael Goebel, Daniel B. Szymanski, B. S. Manjunath

Abstract: This paper presents a method for time-lapse 3D cell analysis. Specifically, we consider the problem of accurately localizing and quantitatively analyzing sub-cellular features, and for tracking individual cells from time-lapse 3D confocal cell image stacks. The heterogeneity of cells and the volume of multi-dimensional images presents a major challenge for fully automated analysis of morphogenesis… ▽ More This paper presents a method for time-lapse 3D cell analysis. Specifically, we consider the problem of accurately localizing and quantitatively analyzing sub-cellular features, and for tracking individual cells from time-lapse 3D confocal cell image stacks. The heterogeneity of cells and the volume of multi-dimensional images presents a major challenge for fully automated analysis of morphogenesis and development of cells. This paper is motivated by the pavement cell growth process, and building a quantitative morphogenesis model. We propose a deep feature based segmentation method to accurately detect and label each cell region. An adjacency graph based method is used to extract sub-cellular features of the segmented cells. Finally, the robust graph based tracking algorithm using multiple cell features is proposed for associating cells at different time instances. Extensive experiment results are provided and demonstrate the robustness of the proposed method. The code is available on Github and the method is available as a service through the BisQue portal. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2206.08396 [pdf, other]

User Customizable and Robust Geo-Indistinguishability for Location Privacy

Authors: Primal Pappachan, Chenxi Qiu, Anna Squicciarini, Vishnu Sharma Hunsur Manjunath

Abstract: Location obfuscation functions generated by existing systems for ensuring location privacy are monolithic and do not allow users to customize their obfuscation range. This can lead to the user being mapped in undesirable locations (e.g., shady neighborhoods) to the location-requesting services. Modifying the obfuscation function generated by a centralized server on the user side can result in poor… ▽ More Location obfuscation functions generated by existing systems for ensuring location privacy are monolithic and do not allow users to customize their obfuscation range. This can lead to the user being mapped in undesirable locations (e.g., shady neighborhoods) to the location-requesting services. Modifying the obfuscation function generated by a centralized server on the user side can result in poor privacy as the original function is not robust against such updates. Users themselves might find it challenging to understand the parameters involved in obfuscation mechanisms (e.g., obfuscation range and granularity of location representation) and therefore struggle to set realistic trade-offs between privacy, utility, and customization. In this paper, we propose a new framework called, CORGI, i.e., CustOmizable Robust Geo-Indistinguishability, which generates location obfuscation functions that are robust against user customization while providing strong privacy guarantees based on the Geo-Indistinguishability paradigm. CORGI utilizes a tree representation of a given region to assist users in specifying their privacy and customization requirements. The server side of CORGI takes these requirements as inputs and generates an obfuscation function that satisfies Geo-Indistinguishability requirements and is robust against customization on the user side. The obfuscation function is returned to the user who can then choose to update the obfuscation function (e.g., obfuscation range, granularity of location representation). The experimental results on a real dataset demonstrate that CORGI can efficiently generate obfuscation matrices that are more robust to the customization by users. △ Less

Submitted 1 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Under review

arXiv:2206.00718 [pdf, other]

doi 10.1007/s11263-023-01755-4

Context-Driven Detection of Invertebrate Species in Deep-Sea Video

Authors: R. Austin McEver, Bowen Zhang, Connor Levenson, A S M Iftekhar, B. S. Manjunath

Abstract: Each year, underwater remotely operated vehicles (ROVs) collect thousands of hours of video of unexplored ocean habitats revealing a plethora of information regarding biodiversity on Earth. However, fully utilizing this information remains a challenge as proper annotations and analysis require trained scientists time, which is both limited and costly. To this end, we present a Dataset for Underwat… ▽ More Each year, underwater remotely operated vehicles (ROVs) collect thousands of hours of video of unexplored ocean habitats revealing a plethora of information regarding biodiversity on Earth. However, fully utilizing this information remains a challenge as proper annotations and analysis require trained scientists time, which is both limited and costly. To this end, we present a Dataset for Underwater Substrate and Invertebrate Analysis (DUSIA), a benchmark suite and growing large-scale dataset to train, validate, and test methods for temporally localizing four underwater substrates as well as temporally and spatially localizing 59 underwater invertebrate species. DUSIA currently includes over ten hours of footage across 25 videos captured in 1080p at 30 fps by an ROV following pre planned transects across the ocean floor near the Channel Islands of California. Each video includes annotations indicating the start and end times of substrates across the video in addition to counts of species of interest. Some frames are annotated with precise bounding box locations for invertebrate species of interest, as seen in Figure 1. To our knowledge, DUSIA is the first dataset of its kind for deep sea exploration, with video from a moving camera, that includes substrate annotations and invertebrate species that are present at significant depths where sunlight does not penetrate. Additionally, we present the novel context-driven object detector (CDD) where we use explicit substrate classification to influence an object detection network to simultaneously predict a substrate and species class influenced by that substrate. We also present a method for improving training on partially annotated bounding box frames. Finally, we offer a baseline method for automating the counting of invertebrate species of interest. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Journal ref: International Journal of Computer Vision 2023

arXiv:2111.04710 [pdf, other]

OMD: Orthogonal Malware Detection Using Audio, Image, and Static Features

Authors: Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Tejaswi Nanjundaswamy, Satish Chikkagoudar, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: With the growing number of malware and cyber attacks, there is a need for "orthogonal" cyber defense approaches, which are complementary to existing methods by detecting unique malware samples that are not predicted by other methods. In this paper, we propose a novel and orthogonal malware detection (OMD) approach to identify malware using a combination of audio descriptors, image similarity descr… ▽ More With the growing number of malware and cyber attacks, there is a need for "orthogonal" cyber defense approaches, which are complementary to existing methods by detecting unique malware samples that are not predicted by other methods. In this paper, we propose a novel and orthogonal malware detection (OMD) approach to identify malware using a combination of audio descriptors, image similarity descriptors and other static/statistical features. First, we show how audio descriptors are effective in classifying malware families when the malware binaries are represented as audio signals. Then, we show that the predictions made on the audio descriptors are orthogonal to the predictions made on image similarity descriptors and other static features. Further, we develop a framework for error analysis and a metric to quantify how orthogonal a new feature set (or type) is with respect to other feature sets. This allows us to add new features and detection methods to our overall framework. Experimental results on malware datasets show that our approach provides a robust framework for orthogonal malware detection. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: Submitted version - MILCOM 2021 IEEE Military Communications Conference

arXiv:2111.04703 [pdf, other]

HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and Statistical Analysis

Authors: Tajuddin Manhar Mohammed, Lakshmanan Nataraj, Satish Chikkagoudar, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Malicious PDF documents present a serious threat to various security organizations that require modern threat intelligence platforms to effectively analyze and characterize the identity and behavior of PDF malware. State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware. However, ML models are often susceptible to evasion attacks, in which an adversary… ▽ More Malicious PDF documents present a serious threat to various security organizations that require modern threat intelligence platforms to effectively analyze and characterize the identity and behavior of PDF malware. State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware. However, ML models are often susceptible to evasion attacks, in which an adversary obfuscates the malware code to avoid being detected by an Antivirus. In this paper, we derive a simple yet effective holistic approach to PDF malware detection that leverages signal and statistical analysis of malware binaries. This includes combining orthogonal feature space models from various static and dynamic malware detection methods to enable generalized robustness when faced with code obfuscations. Using a dataset of nearly 30,000 PDF files containing both malware and benign samples, we show that our holistic approach maintains a high detection rate (99.92%) of PDF malware and even detects new malicious files created by simple methods that remove the obfuscation conducted by malware authors to hide their malware, which are undetected by most antiviruses. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: Submitted version - MILCOM 2021 IEEE Military Communications Conference

arXiv:2109.14696 [pdf, other]

Time-Distributed Feature Learning in Network Traffic Classification for Internet of Things

Authors: Yoga Suhas Kuruba Manjunath, Sihao Zhao, Xiao-** Zhang

Abstract: The plethora of Internet of Things (IoT) devices leads to explosive network traffic. The network traffic classification (NTC) is an essential tool to explore behaviours of network flows, and NTC is required for Internet service providers (ISPs) to manage the performance of the IoT network. We propose a novel network data representation, treating the traffic data as a series of images. Thus, the ne… ▽ More The plethora of Internet of Things (IoT) devices leads to explosive network traffic. The network traffic classification (NTC) is an essential tool to explore behaviours of network flows, and NTC is required for Internet service providers (ISPs) to manage the performance of the IoT network. We propose a novel network data representation, treating the traffic data as a series of images. Thus, the network data is realized as a video stream to employ time-distributed (TD) feature learning. The intra-temporal information within the network statistical data is learned using convolutional neural networks (CNN) and long short-term memory (LSTM), and the inter pseudo-temporal feature among the flows is learned by TD multi-layer perceptron (MLP). We conduct experiments using a large data-set with more number of classes. The experimental result shows that the TD feature learning elevates the network classification performance by 10%. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2109.10114 [pdf, other]

Virtual Reality Gaming on the Cloud: A Reality Check

Authors: Sihao Zhao, Hatem Abou-zeid, Ramy Atawia, Yoga Suhas Kuruba Manjunath, Akram Bin Sediq, Xiao-** Zhang

Abstract: Cloud virtual reality (VR) gaming traffic characteristics such as frame size, inter-arrival time, and latency need to be carefully studied as a first step toward scalable VR cloud service provisioning. To this end, in this paper we analyze the behavior of VR gaming traffic and Quality of Service (QoS) when VR rendering is conducted remotely in the cloud. We first build a VR testbed utilizing a clo… ▽ More Cloud virtual reality (VR) gaming traffic characteristics such as frame size, inter-arrival time, and latency need to be carefully studied as a first step toward scalable VR cloud service provisioning. To this end, in this paper we analyze the behavior of VR gaming traffic and Quality of Service (QoS) when VR rendering is conducted remotely in the cloud. We first build a VR testbed utilizing a cloud server, a commercial VR headset, and an off-the-shelf WiFi router. Using this testbed, we collect and process cloud VR gaming traffic data from different games under a number of network conditions and fixed and adaptive video encoding schemes. To analyze the application-level characteristics such as video frame size, frame inter-arrival time, frame loss and frame latency, we develop an interval threshold based identification method for video frames. Based on the frame identification results, we present two statistical models that capture the behaviour of the VR gaming video traffic. The models can be used by researchers and practitioners to generate VR traffic models for simulations and experiments - and are paramount in designing advanced radio resource management (RRM) and network optimization for cloud VR gaming services. To the best of the authors' knowledge, this is the first measurement study and analysis conducted using a commercial cloud VR gaming platform, and under both fixed and adaptive bitrate streaming. We make our VR traffic data-sets publicly available for further research by the community. △ Less

Submitted 21 September, 2021; originally announced September 2021.

arXiv:2109.01764 [pdf, other]

doi 10.1007/978-981-16-0289-4_29

Seam Carving Detection and Localization using Two-Stage Deep Neural Networks

Authors: Lakshmanan Nataraj, Chandrakanth Gudavalli, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Seam carving is a method to resize an image in a content aware fashion. However, this method can also be used to carve out objects from images. In this paper, we propose a two-step method to detect and localize seam carved images. First, we build a detector to detect small patches in an image that has been seam carved. Next, we compute a heatmap on an image based on the patch detector's output. Us… ▽ More Seam carving is a method to resize an image in a content aware fashion. However, this method can also be used to carve out objects from images. In this paper, we propose a two-step method to detect and localize seam carved images. First, we build a detector to detect small patches in an image that has been seam carved. Next, we compute a heatmap on an image based on the patch detector's output. Using these heatmaps, we build another detector to detect if a whole image is seam carved or not. Our experimental results show that our approach is effective in detecting and localizing seam carved images. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2108.12534 [pdf, other]

SeeTheSeams: Localized Detection of Seam Carving based Image Forgery in Satellite Imagery

Authors: Chandrakanth Gudavalli, Erik Rosten, Lakshmanan Nataraj, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Seam carving is a popular technique for content aware image retargeting. It can be used to deliberately manipulate images, for example, change the GPS locations of a building or insert/remove roads in a satellite image. This paper proposes a novel approach for detecting and localizing seams in such images. While there are methods to detect seam carving based manipulations, this is the first time t… ▽ More Seam carving is a popular technique for content aware image retargeting. It can be used to deliberately manipulate images, for example, change the GPS locations of a building or insert/remove roads in a satellite image. This paper proposes a novel approach for detecting and localizing seams in such images. While there are methods to detect seam carving based manipulations, this is the first time that robust localization and detection of seam carving forgery is made possible. We also propose a seam localization score (SLS) metric to evaluate the effectiveness of localization. The proposed method is evaluated extensively on a large collection of images from different sources, demonstrating a high level of detection and localization performance across these datasets. The datasets curated during this work will be released to the public. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:2108.01175 [pdf, other]

A computational geometry approach for modeling neuronal fiber pathways

Authors: S. Shailja, Angela Zhang, B. S. Manjunath

Abstract: We propose a novel and efficient algorithm to model high-level topological structures of neuronal fibers. Tractography constructs complex neuronal fibers in three dimensions that exhibit the geometry of white matter pathways in the brain. However, most tractography analysis methods are time consuming and intractable. We develop a computational geometry-based tractography representation that aims t… ▽ More We propose a novel and efficient algorithm to model high-level topological structures of neuronal fibers. Tractography constructs complex neuronal fibers in three dimensions that exhibit the geometry of white matter pathways in the brain. However, most tractography analysis methods are time consuming and intractable. We develop a computational geometry-based tractography representation that aims to simplify the connectivity of white matter fibers. Given the trajectories of neuronal fiber pathways, we model the evolution of trajectories that encodes geometrically significant events and calculate their point correspondence in the 3D brain space. Trajectory inter-distance is used as a parameter to control the granularity of the model that allows local or global representation of the tractogram. Using diffusion MRI data from Alzheimer's patient study, we extract tractography features from our model for distinguishing the Alzheimer's subject from the normal control. Software implementation of our algorithm is available on GitHub. △ Less

Submitted 2 August, 2021; originally announced August 2021.

arXiv:2108.00596 [pdf, other]

GTNet:Guided Transformer Network for Detecting Human-Object Interactions

Authors: A S M Iftekhar, Satish Kumar, R. Austin McEver, Suya You, B. S. Manjunath

Abstract: The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair. HOI is considered one of the fundamental steps in truly understanding complex visual scenes. For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images tha… ▽ More The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair. HOI is considered one of the fundamental steps in truly understanding complex visual scenes. For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images that highlight the interactions between human object pairs. This issue is addressed by the novel self-attention based guided transformer network, GTNet. GTNet encodes this spatial contextual information in human and object visual features via self-attention while achieving state of the art results on both the V-COCO and HICO-DET datasets. Code will be made available online. △ Less

Submitted 11 September, 2023; v1 submitted 1 August, 2021; originally announced August 2021.

Comments: accepted for presentation in Pattern Recognition and Tracking XXXIV at SPIE commerce+ defence Program

arXiv:2105.14173 [pdf, other]

FoveaTer: Foveated Transformer for Image Classification

Authors: Aditya Jonnalagadda, William Yang Wang, B. S. Manjunath, Miguel P. Eckstein

Abstract: Many animals and humans process the visual field with a varying spatial resolution (foveated vision) and use peripheral processing to make eye movements and point the fovea to acquire high-resolution information about objects of interest. This architecture results in computationally efficient rapid scene exploration. Recent progress in self-attention-based Vision Transformers, an alternative to th… ▽ More Many animals and humans process the visual field with a varying spatial resolution (foveated vision) and use peripheral processing to make eye movements and point the fovea to acquire high-resolution information about objects of interest. This architecture results in computationally efficient rapid scene exploration. Recent progress in self-attention-based Vision Transformers, an alternative to the traditionally convolution-reliant computer vision systems. However, the Transformer models do not explicitly model the foveated properties of the visual system nor the interaction between eye movements and the classification task. We propose Foveated Transformer (FoveaTer) model, which uses pooling regions and eye movements to perform object classification tasks using a Vision Transformer architecture. Using square pooling regions or biologically-inspired radial-polar pooling regions, our proposed model pools the image features from the convolution backbone and uses the pooled features as an input to transformer layers. It decides on subsequent fixation location based on the attention assigned by the Transformer to various locations from past and present fixations. It dynamically allocates more fixation/computational resources to more challenging images before making the final image category decision. Using five ablation studies, we evaluate the contribution of different components of the Foveated model. We perform a psychophysics scene categorization task and use the experimental data to find a suitable radial-polar pooling region combination. We also show that the Foveated model better explains the human decisions in a scene categorization task than a Baseline model. We demonstrate our model's robustness against PGD adversarial attacks with both types of pooling regions, where we see the Foveated model outperform the Baseline model. △ Less

Submitted 2 October, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

arXiv:2104.05693 [pdf, other]

Holistic Image Manipulation Detection using Pixel Co-occurrence Matrices

Authors: Lakshmanan Nataraj, Michael Goebel, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Digital image forensics aims to detect images that have been digitally manipulated. Realistic image forgeries involve a combination of splicing, resampling, region removal, smoothing and other manipulation methods. While most detection methods in literature focus on detecting a particular type of manipulation, it is challenging to identify doctored images that involve a host of manipulations. In t… ▽ More Digital image forensics aims to detect images that have been digitally manipulated. Realistic image forgeries involve a combination of splicing, resampling, region removal, smoothing and other manipulation methods. While most detection methods in literature focus on detecting a particular type of manipulation, it is challenging to identify doctored images that involve a host of manipulations. In this paper, we propose a novel approach to holistically detect tampered images using a combination of pixel co-occurrence matrices and deep learning. We extract horizontal and vertical co-occurrence matrices on three color channels in the pixel domain and train a model using a deep convolutional neural network (CNN) framework. Our method is agnostic to the type of manipulation and classifies an image as tampered or untampered. We train and validate our model on a dataset of more than 86,000 images. Experimental results show that our approach is promising and achieves more than 0.99 area under the curve (AUC) evaluation metric on the training and validation subsets. Further, our approach also generalizes well and achieves around 0.81 AUC on an unseen test dataset comprising more than 19,740 images released as part of the Media Forensics Challenge (MFC) 2020. Our score was highest among all other teams that participated in the challenge, at the time of announcement of the challenge results. △ Less

Submitted 12 April, 2021; originally announced April 2021.

arXiv:2103.11589 [pdf, other]

Adversarially Optimized Mixup for Robust Classification

Authors: Jason Bunk, Srinjoy Chattopadhyay, B. S. Manjunath, Shivkumar Chandrasekaran

Abstract: Mixup is a procedure for data augmentation that trains networks to make smoothly interpolated predictions between datapoints. Adversarial training is a strong form of data augmentation that optimizes for worst-case predictions in a compact space around each data-point, resulting in neural networks that make much more robust predictions. In this paper, we bring these ideas together by adversarially… ▽ More Mixup is a procedure for data augmentation that trains networks to make smoothly interpolated predictions between datapoints. Adversarial training is a strong form of data augmentation that optimizes for worst-case predictions in a compact space around each data-point, resulting in neural networks that make much more robust predictions. In this paper, we bring these ideas together by adversarially probing the space between datapoints, using projected gradient descent (PGD). The fundamental approach in this work is to leverage backpropagation through the mixup interpolation during training to optimize for places where the network makes unsmooth and incongruous predictions. Additionally, we also explore several modifications and nuances, like optimization of the mixup ratio and geometrical label assignment, and discuss their impact on enhancing network robustness. Through these ideas, we have been able to train networks that robustly generalize better; experiments on CIFAR-10 and CIFAR-100 demonstrate consistent improvements in accuracy against strong adversaries, including the recent strong ensemble attack AutoAttack. Our source code would be released for reproducibility. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:2103.11002 [pdf, other]

Attribution of Gradient Based Adversarial Attacks for Reverse Engineering of Deceptions

Authors: Michael Goebel, Jason Bunk, Srinjoy Chattopadhyay, Lakshmanan Nataraj, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Machine Learning (ML) algorithms are susceptible to adversarial attacks and deception both during training and deployment. Automatic reverse engineering of the toolchains behind these adversarial machine learning attacks will aid in recovering the tools and processes used in these attacks. In this paper, we present two techniques that support automated identification and attribution of adversarial… ▽ More Machine Learning (ML) algorithms are susceptible to adversarial attacks and deception both during training and deployment. Automatic reverse engineering of the toolchains behind these adversarial machine learning attacks will aid in recovering the tools and processes used in these attacks. In this paper, we present two techniques that support automated identification and attribution of adversarial ML attack toolchains using Co-occurrence Pixel statistics and Laplacian Residuals. Our experiments show that the proposed techniques can identify parameters used to generate adversarial samples. To the best of our knowledge, this is the first approach to attribute gradient based adversarial attacks and estimate their parameters. Source code and data is available at: https://github.com/michael-goebel/ei_red △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2101.10578 [pdf, other]

Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning

Authors: Tajuddin Manhar Mohammed, Lakshmanan Nataraj, Satish Chikkagoudar, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: We propose a novel method to detect and visualize malware through image classification. The executable binaries are represented as grayscale images obtained from the count of N-grams (N=2) of bytes in the Discrete Cosine Transform (DCT) domain and a neural network is trained for malware detection. A shallow neural network is trained for classification, and its accuracy is compared with deep-networ… ▽ More We propose a novel method to detect and visualize malware through image classification. The executable binaries are represented as grayscale images obtained from the count of N-grams (N=2) of bytes in the Discrete Cosine Transform (DCT) domain and a neural network is trained for malware detection. A shallow neural network is trained for classification, and its accuracy is compared with deep-network architectures such as ResNet that are trained using transfer learning. Neither dis-assembly nor behavioral analysis of malware is required for these methods. Motivated by the visual similarity of these images for different malware families, we compare our deep neural network models with standard image features like GIST descriptors to evaluate the performance. A joint feature measure is proposed to combine different features using error analysis to get an accurate ensemble model for improved classification performance. A new dataset called MaleX which contains around 1 million malware and benign Windows executable samples is created for large-scale malware detection and classification experiments. Experimental results are quite promising with 96% binary classification accuracy on MaleX. The proposed model is also able to generalize well on larger unseen malware samples and the results compare favorably with state-of-the-art static analysis-based malware detection algorithms. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: Submitted version - Proceedings of the 54th Hawaii International Conference on System Sciences (HICSS) 2021

arXiv:2011.09540 [pdf, other]

StressNet: Detecting Stress in Thermal Videos

Authors: Satish Kumar, A S M Iftekhar, Michael Goebel, Tom Bullock, Mary H. MacLean, Michael B. Miller, Tyler Santander, Barry Giesbrecht, Scott T. Grafton, B. S. Manjunath

Abstract: Precise measurement of physiological signals is critical for the effective monitoring of human vital signs. Recent developments in computer vision have demonstrated that signals such as pulse rate and respiration rate can be extracted from digital video of humans, increasing the possibility of contact-less monitoring. This paper presents a novel approach to obtaining physiological signals and clas… ▽ More Precise measurement of physiological signals is critical for the effective monitoring of human vital signs. Recent developments in computer vision have demonstrated that signals such as pulse rate and respiration rate can be extracted from digital video of humans, increasing the possibility of contact-less monitoring. This paper presents a novel approach to obtaining physiological signals and classifying stress states from thermal video. The proposed network--"StressNet"--features a hybrid emission representation model that models the direct emission and absorption of heat by the skin and underlying blood vessels. This results in an information-rich feature representation of the face, which is used by spatio-temporal network for reconstructing the ISTI ( Initial Systolic Time Interval: a measure of change in cardiac sympathetic activity that is considered to be a quantitative index of stress in humans ). The reconstructed ISTI signal is fed into a stress-detection model to detect and classify the individual's stress state ( i.e. stress or no stress ). A detailed evaluation demonstrates that StressNet achieves estimated the ISTI signal with 95% accuracy and detect stress with average precision of 0.842. The source code is available on Github. △ Less

Submitted 23 November, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

Comments: 11 pages, 10 figues, 2 tables, Conference WACV2021

ACM Class: H.1; I.2; I.3; I.4; I.5; J.3; J.4

arXiv:2010.13343 [pdf, other]

doi 10.1109/ISBI48211.2021.9433831

Semi supervised segmentation and graph-based tracking of 3D nuclei in time-lapse microscopy

Authors: S. Shailja, Jiaxiang Jiang, B. S. Manjunath

Abstract: We propose a novel weakly supervised method to improve the boundary of the 3D segmented nuclei utilizing an over-segmented image. This is motivated by the observation that current state-of-the-art deep learning methods do not result in accurate boundaries when the training data is weakly annotated. Towards this, a 3D U-Net is trained to get the centroid of the nuclei and integrated with a simple l… ▽ More We propose a novel weakly supervised method to improve the boundary of the 3D segmented nuclei utilizing an over-segmented image. This is motivated by the observation that current state-of-the-art deep learning methods do not result in accurate boundaries when the training data is weakly annotated. Towards this, a 3D U-Net is trained to get the centroid of the nuclei and integrated with a simple linear iterative clustering (SLIC) supervoxel algorithm that provides better adherence to cluster boundaries. To track these segmented nuclei, our algorithm utilizes the relative nuclei location depicting the processes of nuclei division and apoptosis. The proposed algorithmic pipeline achieves better segmentation performance compared to the state-of-the-art method in Cell Tracking Challenge (CTC) 2019 and comparable performance to state-of-the-art methods in IEEE ISBI CTC2020 while utilizing very few pixel-wise annotated data. Detailed experimental results are provided, and the source code is available on GitHub. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: To be submitted to ISBI 2021

Report number: 20764170

Journal ref: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)

arXiv:2010.09066 [pdf, other]

Exploiting Context for Robustness to Label Noise in Active Learning

Authors: Sudipta Paul, Shivkumar Chandrasekaran, B. S. Manjunath, Amit K. Roy-Chowdhury

Abstract: Several works in computer vision have demonstrated the effectiveness of active learning for adapting the recognition model when new unlabeled data becomes available. Most of these works consider that labels obtained from the annotator are correct. However, in a practical scenario, as the quality of the labels depends on the annotator, some of the labels might be wrong, which results in degraded re… ▽ More Several works in computer vision have demonstrated the effectiveness of active learning for adapting the recognition model when new unlabeled data becomes available. Most of these works consider that labels obtained from the annotator are correct. However, in a practical scenario, as the quality of the labels depends on the annotator, some of the labels might be wrong, which results in degraded recognition performance. In this paper, we address the problems of i) how a system can identify which of the queried labels are wrong and ii) how a multi-class active learning system can be adapted to minimize the negative impact of label noise. Towards solving the problems, we propose a noisy label filtering based learning approach where the inter-relationship (context) that is quite common in natural data is utilized to detect the wrong labels. We construct a graphical representation of the unlabeled data to encode these relationships and obtain new beliefs on the graph when noisy labels are available. Comparing the new beliefs with the prior relational information, we generate a dissimilarity score to detect the incorrect labels and update the recognition model with correct labels which result in better recognition performance. This is demonstrated in three different applications: scene classification, activity classification, and document classification. △ Less

Submitted 18 October, 2020; originally announced October 2020.

arXiv:2008.05381 [pdf, other]

Improving the Performance of Fine-Grain Image Classifiers via Generative Data Augmentation

Authors: Shashank Manjunath, Aitzaz Nathaniel, Jeff Druce, Stan German

Abstract: Recent advances in machine learning (ML) and computer vision tools have enabled applications in a wide variety of arenas such as financial analytics, medical diagnostics, and even within the Department of Defense. However, their widespread implementation in real-world use cases poses several challenges: (1) many applications are highly specialized, and hence operate in a \emph{sparse data} domain;… ▽ More Recent advances in machine learning (ML) and computer vision tools have enabled applications in a wide variety of arenas such as financial analytics, medical diagnostics, and even within the Department of Defense. However, their widespread implementation in real-world use cases poses several challenges: (1) many applications are highly specialized, and hence operate in a \emph{sparse data} domain; (2) ML tools are sensitive to their training sets and typically require cumbersome, labor-intensive data collection and data labelling processes; and (3) ML tools can be extremely "black box," offering users little to no insight into the decision-making process or how new data might affect prediction performance. To address these challenges, we have designed and developed Data Augmentation from Proficient Pre-Training of Robust Generative Adversarial Networks (DAPPER GAN), an ML analytics support tool that automatically generates novel views of training images in order to improve downstream classifier performance. DAPPER GAN leverages high-fidelity embeddings generated by a StyleGAN2 model (trained on the LSUN cars dataset) to create novel imagery for previously unseen classes. We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy and reduced requirements for real data using our GAN based data augmentation framework. The method's validity was supported through an analysis of classifier performance on both augmented and non-augmented datasets, achieving comparable or better accuracy with up to 30\% less real data across visually similar classes. To support this method, we developed a novel augmentation method that can manipulate semantically meaningful dimensions (e.g., orientation) of the target object in the embedding space. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:2007.13887 [pdf, other]

3DMaterialGAN: Learning 3D Shape Representation from Latent Space for Materials Science Applications

Authors: Devendra K. Jangid, Neal R. Brodnik, Amil Khan, McLean P. Echlin, Tresa M. Pollock, Sam Daly, B. S. Manjunath

Abstract: In the field of computer vision, unsupervised learning for 2D object generation has advanced rapidly in the past few years. However, 3D object generation has not garnered the same attention or success as its predecessor. To facilitate novel progress at the intersection of computer vision and materials science, we propose a 3DMaterialGAN network that is capable of recognizing and synthesizing indiv… ▽ More In the field of computer vision, unsupervised learning for 2D object generation has advanced rapidly in the past few years. However, 3D object generation has not garnered the same attention or success as its predecessor. To facilitate novel progress at the intersection of computer vision and materials science, we propose a 3DMaterialGAN network that is capable of recognizing and synthesizing individual grains whose morphology conforms to a given 3D polycrystalline material microstructure. This Generative Adversarial Network (GAN) architecture yields complex 3D objects from probabilistic latent space vectors with no additional information from 2D rendered images. We show that this method performs comparably or better than state-of-the-art on benchmark annotated 3D datasets, while also being able to distinguish and generate objects that are not easily annotated, such as grain morphologies. The value of our algorithm is demonstrated with analysis on experimental real-world data, namely generating 3D grain structures found in a commercially relevant wrought titanium alloy, which were validated through statistical shape comparison. This framework lays the foundation for the recognition and synthesis of polycrystalline material microstructures, which are used in additive manufacturing, aerospace, and structural design applications. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:2007.10466 [pdf, other]

Detection, Attribution and Localization of GAN Generated Images

Authors: Michael Goebel, Lakshmanan Nataraj, Tejaswi Nanjundaswamy, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, B. S. Manjunath

Abstract: Recent advances in Generative Adversarial Networks (GANs) have led to the creation of realistic-looking digital images that pose a major challenge to their detection by humans or computers. GANs are used in a wide range of tasks, from modifying small attributes of an image (StarGAN [14]), transferring attributes between image pairs (CycleGAN [91]), as well as generating entirely new images (ProGAN… ▽ More Recent advances in Generative Adversarial Networks (GANs) have led to the creation of realistic-looking digital images that pose a major challenge to their detection by humans or computers. GANs are used in a wide range of tasks, from modifying small attributes of an image (StarGAN [14]), transferring attributes between image pairs (CycleGAN [91]), as well as generating entirely new images (ProGAN [36], StyleGAN [37], SPADE/GauGAN [64]). In this paper, we propose a novel approach to detect, attribute and localize GAN generated images that combines image features with deep learning methods. For every image, co-occurrence matrices are computed on neighborhood pixels of RGB channels in different directions (horizontal, vertical and diagonal). A deep learning network is then trained on these features to detect, attribute and localize these GAN generated/manipulated images. A large scale evaluation of our approach on 5 GAN datasets comprising over 2.76 million images (ProGAN, StarGAN, CycleGAN, StyleGAN and SPADE/GauGAN) shows promising results in detecting GAN generated images. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2007.05615 [pdf, other]

PCAMs: Weakly Supervised Semantic Segmentation Using Point Supervision

Authors: R. Austin McEver, B. S. Manjunath

Abstract: Current state of the art methods for generating semantic segmentation rely heavily on a large set of images that have each pixel labeled with a class of interest label or background. Coming up with such labels, especially in domains that require an expert to do annotations, comes at a heavy cost in time and money. Several methods have shown that we can learn semantic segmentation from less expensi… ▽ More Current state of the art methods for generating semantic segmentation rely heavily on a large set of images that have each pixel labeled with a class of interest label or background. Coming up with such labels, especially in domains that require an expert to do annotations, comes at a heavy cost in time and money. Several methods have shown that we can learn semantic segmentation from less expensive image-level labels, but the effectiveness of point level labels, a healthy compromise between all pixels labelled and none, still remains largely unexplored. This paper presents a novel procedure for producing semantic segmentation from images given some point level annotations. This method includes point annotations in the training of a convolutional neural network (CNN) for producing improved localization and class activation maps. Then, we use another CNN for predicting semantic affinities in order to propagate rough class labels and create pseudo semantic segmentation labels. Finally, we propose training a CNN that is normally fully supervised using our pseudo labels in place of ground truth labels, which further improves performance and simplifies the inference process by requiring just one CNN during inference rather than two. Our method achieves state of the art results for point supervised semantic segmentation on the PASCAL VOC 2012 dataset \cite{everingham2010pascal}, even outperforming state of the art methods for stronger bounding box and squiggle supervision. △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2003.05541 [pdf, other]

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Authors: Oytun Ulutan, A S M Iftekhar, B. S. Manjunath

Abstract: Comprehensive visual understanding requires detection frameworks that can effectively learn and utilize object interactions while analyzing objects individually. This is the main objective in Human-Object Interaction (HOI) detection task. In particular, relative spatial reasoning and structural connections between objects are essential cues for analyzing interactions, which is addressed by the pro… ▽ More Comprehensive visual understanding requires detection frameworks that can effectively learn and utilize object interactions while analyzing objects individually. This is the main objective in Human-Object Interaction (HOI) detection task. In particular, relative spatial reasoning and structural connections between objects are essential cues for analyzing interactions, which is addressed by the proposed Visual-Spatial-Graph Network (VSGNet) architecture. VSGNet extracts visual features from the human-object pairs, refines the features with spatial configurations of the pair, and utilizes the structural connections between the pair via graph convolutions. The performance of VSGNet is thoroughly evaluated using the Verbs in COCO (V-COCO) and HICO-DET datasets. Experimental results indicate that VSGNet outperforms state-of-the-art solutions by 8% or 4 mAP in V-COCO and 16% or 3 mAP in HICO-DET. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: Accepted in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020)

arXiv:1907.10419 [pdf, other]

doi 10.1007/978-3-030-46640-4_4

Predicting Clinical Outcome of Stroke Patients with Tractographic Feature

Authors: Po-Yu Kao, Jefferson W. Chen, B. S. Manjunath

Abstract: The volume of stroke lesion is the gold standard for predicting the clinical outcome of stroke patients. However, the presence of stroke lesion may cause neural disruptions to other brain regions, and these potentially damaged regions may affect the clinical outcome of stroke patients. In this paper, we introduce the tractographic feature to capture these potentially damaged regions and predict th… ▽ More The volume of stroke lesion is the gold standard for predicting the clinical outcome of stroke patients. However, the presence of stroke lesion may cause neural disruptions to other brain regions, and these potentially damaged regions may affect the clinical outcome of stroke patients. In this paper, we introduce the tractographic feature to capture these potentially damaged regions and predict the modified Rankin Scale (mRS), which is a widely used outcome measure in stroke clinical trials. The tractographic feature is built from the stroke lesion and average connectome information from a group of normal subjects. The tractographic feature takes into account different functional regions that may be affected by the stroke, thus complementing the commonly used stroke volume features. The proposed tractographic feature is tested on a public stroke benchmark Ischemic Stroke Lesion Segmentation 2017 and achieves higher accuracy than the stroke volume and the state-of-the-art feature on predicting the mRS grades of stroke patients. In addition, the tractographic feature also yields a lower average absolute error than the commonly used stroke volume feature. △ Less

Submitted 19 September, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: 12 pages, 4 figures, 3 tables. Accepted by MICCAI-BrainLesion 2019 as an oral presentation

arXiv:1907.06302 [pdf, ps, other]

Compound TCP with Random Early Detection (RED): stability, bifurcation and performance analyses

Authors: Sreelakshmi Manjunath, Gaurav Raina

Abstract: The problem of increased queueing delays in the Internet motivates the study of currently implemented transport protocols and active queue management (AQM) policies. We study Compound TCP (default protocol in Windows) with Random Early Detection (RED). RED uses an exponentially weighted moving average of the queue size to make packet-drop** decisions, aiming to control the queue size. One must s… ▽ More The problem of increased queueing delays in the Internet motivates the study of currently implemented transport protocols and active queue management (AQM) policies. We study Compound TCP (default protocol in Windows) with Random Early Detection (RED). RED uses an exponentially weighted moving average of the queue size to make packet-drop** decisions, aiming to control the queue size. One must study RED with current protocols in order to explore its viability in the context of increased queueing delays. We derive a non-linear time-delayed model for Compound TCP-RED. We derive a sufficient condition for local stability of this model, and examine the impact of (i) round-trip time (RTT) of the TCP flows, (ii) queue averaging parameter and (iii) packet-drop** thresholds. Further, we establish that the system undergoes a Hopf bifurcation as any of the above parameters is varied. This suggests the emergence of limit cycles in the queue size, which may lead to synchronisation of TCP flows and loss of link utilisation. Next, we study a regime where queue size averaging is not performed, and packet-drop** decisions are based on instantaneous queue size. In this regime, we derive the necessary and sufficient condition for local stability. A comparison of the stability results for Compound TCP-RED in the two regimes--with and without queue size averaging--reveals that averaging may not be beneficial to system stability. Packet-level simulations show that the queue size indeed exhibits limit cycle oscillations as system parameters are varied. We then outline a simple threshold-based queue policy, that could ensure stable low-latency operation. We show that the threshold policy outperforms RED in terms of queueing delay, flow completion time and packet loss. We highlight that the threshold-based policy could mitigate the issue of increased queueing delays in the Internet. △ Less

Submitted 14 July, 2019; originally announced July 2019.

Comments: 46 pages, 19 figures

arXiv:1907.00281 [pdf, other]

Improving 3D U-Net for Brain Tumor Segmentation by Utilizing Lesion Prior

Authors: Po-Yu Kao, Jefferson W. Chen, B. S. Manjunath

Abstract: We propose a novel, simple and effective method to integrate lesion prior and a 3D U-Net for improving brain tumor segmentation. First, we utilize the ground-truth brain tumor lesions from a group of patients to generate the heatmaps of different types of lesions. These heatmaps are used to create the volume-of-interest (VOI) map which contains prior information about brain tumor lesions. The VOI… ▽ More We propose a novel, simple and effective method to integrate lesion prior and a 3D U-Net for improving brain tumor segmentation. First, we utilize the ground-truth brain tumor lesions from a group of patients to generate the heatmaps of different types of lesions. These heatmaps are used to create the volume-of-interest (VOI) map which contains prior information about brain tumor lesions. The VOI map is then integrated with the multimodal MR images and input to a 3D U-Net for segmentation. The proposed method is evaluated on a public benchmark dataset, and the experimental results show that the proposed feature fusion method achieves an improvement over the baseline methods. In addition, our proposed method also achieves a competitive performance compared to state-of-the-art methods. △ Less

Submitted 19 February, 2020; v1 submitted 29 June, 2019; originally announced July 2019.

Comments: 5 pages, 4 figures, 1 table, LNCS format

arXiv:1904.07387 [pdf, other]

doi 10.1007/978-3-030-31901-4_2

Predicting Fluid Intelligence of Children using T1-weighted MR Images and a StackNet

Authors: Po-Yu Kao, Angela Zhang, Michael Goebel, Jefferson W. Chen, B. S. Manjunath

Abstract: In this work, we utilize T1-weighted MR images and StackNet to predict fluid intelligence in adolescents. Our framework includes feature extraction, feature normalization, feature denoising, feature selection, training a StackNet, and predicting fluid intelligence. The extracted feature is the distribution of different brain tissues in different brain parcellation regions. The proposed StackNet co… ▽ More In this work, we utilize T1-weighted MR images and StackNet to predict fluid intelligence in adolescents. Our framework includes feature extraction, feature normalization, feature denoising, feature selection, training a StackNet, and predicting fluid intelligence. The extracted feature is the distribution of different brain tissues in different brain parcellation regions. The proposed StackNet consists of three layers and 11 models. Each layer uses the predictions from all previous layers including the input layer. The proposed StackNet is tested on a public benchmark Adolescent Brain Cognitive Development Neurocognitive Prediction Challenge 2019 and achieves a mean squared error of 82.42 on the combined training and validation set with 10-fold cross-validation. In addition, the proposed StackNet also achieves a mean squared error of 94.25 on the testing data. The source code is available on GitHub. △ Less

Submitted 11 May, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: 8 pages, 2 figures, 3 tables, Accepted by MICCAI ABCD-NP Challenge 2019; Added NDA

arXiv:1903.06836 [pdf, other]

Detecting GAN generated Fake Images using Co-occurrence Matrices

Authors: Lakshmanan Nataraj, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, Amit K. Roy-Chowdhury, B. S. Manjunath

Abstract: The advent of Generative Adversarial Networks (GANs) has brought about completely novel ways of transforming and manipulating pixels in digital images. GAN based techniques such as Image-to-Image translations, DeepFakes, and other automated methods have become increasingly popular in creating fake images. In this paper, we propose a novel approach to detect GAN generated fake images using a combin… ▽ More The advent of Generative Adversarial Networks (GANs) has brought about completely novel ways of transforming and manipulating pixels in digital images. GAN based techniques such as Image-to-Image translations, DeepFakes, and other automated methods have become increasingly popular in creating fake images. In this paper, we propose a novel approach to detect GAN generated fake images using a combination of co-occurrence matrices and deep learning. We extract co-occurrence matrices on three color channels in the pixel domain and train a model using a deep convolutional neural network (CNN) framework. Experimental results on two diverse and challenging GAN datasets comprising more than 56,000 images based on unpaired image-to-image translations (cycleGAN [1]) and facial attributes/expressions (StarGAN [2]) show that our approach is promising and achieves more than 99% classification accuracy in both datasets. Further, our approach also generalizes well and achieves good results when trained on one dataset and tested on the other. △ Less

Submitted 2 October, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

arXiv:1903.02495 [pdf, other]

doi 10.1109/TIP.2019.2895466

Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries

Authors: Jawadul H. Bappy, Cody Simons, Lakshmanan Nataraj, B. S. Manjunath, Amit K. Roy-Chowdhury

Abstract: With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulati… ▽ More With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy-clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture which utilizes resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts like JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency domain correlation to analyze the discriminative characteristics between manipulated and non-manipulated regions by incorporating encoder and LSTM network. Finally, decoder network learns the map** from low-resolution feature maps to pixel-wise predictions for image tamper localization. With predicted mask provided by final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets. △ Less

Submitted 6 March, 2019; originally announced March 2019.

arXiv:1902.04729 [pdf, other]

doi 10.1109/ICIP.2019.8803095

Accurate 3D Cell Segmentation using Deep Feature and CRF Refinement

Authors: Jiaxiang Jiang, Po-Yu Kao, Samuel A. Belteton, Daniel B. Szymanski, B. S. Manjunath

Abstract: We consider the problem of accurately identifying cell boundaries and labeling individual cells in confocal microscopy images, specifically, 3D image stacks of cells with tagged cell membranes. Precise identification of cell boundaries, their shapes, and quantifying inter-cellular space leads to a better understanding of cell morphogenesis. Towards this, we outline a cell segmentation method that… ▽ More We consider the problem of accurately identifying cell boundaries and labeling individual cells in confocal microscopy images, specifically, 3D image stacks of cells with tagged cell membranes. Precise identification of cell boundaries, their shapes, and quantifying inter-cellular space leads to a better understanding of cell morphogenesis. Towards this, we outline a cell segmentation method that uses a deep neural network architecture to extract a confidence map of cell boundaries, followed by a 3D watershed algorithm and a final refinement using a conditional random field. In addition to improving the accuracy of segmentation compared to other state-of-the-art methods, the proposed approach also generalizes well to different datasets without the need to retrain the network for each dataset. Detailed experimental results are provided, and the source code is available on GitHub. △ Less

Submitted 12 February, 2019; originally announced February 2019.

Comments: 5 pages, 5 figures, 3 tables

Journal ref: 2019 IEEE International Conference on Image Processing (ICIP)

arXiv:1902.04038 [pdf, other]

Deep Learning Methods for Event Verification and Image Repurposing Detection

Authors: M. Goebel, A. Flenner, L. Nataraj, B. S. Manjunath

Abstract: The authenticity of images posted on social media is an issue of growing concern. Many algorithms have been developed to detect manipulated images, but few have investigated the ability of deep neural network based approaches to verify the authenticity of image labels, such as event names. In this paper, we propose several novel methods to predict if an image was captured at one of several notewor… ▽ More The authenticity of images posted on social media is an issue of growing concern. Many algorithms have been developed to detect manipulated images, but few have investigated the ability of deep neural network based approaches to verify the authenticity of image labels, such as event names. In this paper, we propose several novel methods to predict if an image was captured at one of several noteworthy events. We use a set of images from several recorded events such as storms, marathons, protests, and other large public gatherings. Two strategies of applying pre-trained Imagenet network for event verification are presented, with two modifications for each strategy. The first method uses the features from the last convolutional layer of a pre-trained network as input to a classifier. We also consider the effects of tuning the convolutional weights of the pre-trained network to improve classification. The second method combines many features extracted from smaller scales and uses the output of a pre-trained network as the input to a second classifier. For both methods, we investigated several different classifiers and tested many different pre-trained networks. Our experiments demonstrate both these approaches are effective for event verification and image re-purposing detection. The classification at the global scale tends to marginally outperform our tested local methods and fine tuning the network further improves the results. △ Less

Submitted 11 February, 2019; originally announced February 2019.

arXiv:1901.09088 [pdf, other]

Automated Segmentation of CT Scans for Normal Pressure Hydrocephalus

Authors: Angela Zhang, Po-Yu Kao, Ronald Sahyouni, Ashutosh Shelat, Jefferson Chen, B. S. Manjunath

Abstract: Normal Pressure Hydrocephalus (NPH) is one of the few reversible forms of dementia, Due to their low cost and versatility, Computed Tomography (CT) scans have long been used as an aid to help diagnose intracerebral anomalies such as NPH. However, no well-defined and effective protocol currently exists for the analysis of CT scan-based ventricular, cerebral mass and subarachnoid space volumes in th… ▽ More Normal Pressure Hydrocephalus (NPH) is one of the few reversible forms of dementia, Due to their low cost and versatility, Computed Tomography (CT) scans have long been used as an aid to help diagnose intracerebral anomalies such as NPH. However, no well-defined and effective protocol currently exists for the analysis of CT scan-based ventricular, cerebral mass and subarachnoid space volumes in the setting of NPH. The Evan's ratio, an approximation of the ratio of ventricle to brain volume using only one 2D slice of the scan, has been proposed but is not robust. Instead of manually measuring a 2-dimensional proxy for the ratio of ventricle volume to brain volume, this study proposes an automated method of calculating the brain volumes for better recognition of NPH from a radiological standpoint. The method first aligns the subject CT volume to a common space through an affine transformation, then uses a random forest classifier to mask relevant tissue types. A 3D morphological segmentation method is used to partition the brain volume, which in turn is used to train machine learning methods to classify the subjects into non-NPH vs. NPH based on volumetric information. The proposed algorithm has increased sensitivity compared to the Evan's ratio thresholding method. △ Less

Submitted 23 July, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

MSC Class: 92-04

arXiv:1812.11631 [pdf, other]

Actor Conditioned Attention Maps for Video Action Detection

Authors: Oytun Ulutan, Swati Rallapalli, Mudhakar Srivatsa, Carlos Torres, B. S. Manjunath

Abstract: While observing complex events with multiple actors, humans do not assess each actor separately, but infer from the context. The surrounding context provides essential information for understanding actions. To this end, we propose to replace region of interest(RoI) pooling with an attention module, which ranks each spatio-temporal region's relevance to a detected actor instead of crop**. We refe… ▽ More While observing complex events with multiple actors, humans do not assess each actor separately, but infer from the context. The surrounding context provides essential information for understanding actions. To this end, we propose to replace region of interest(RoI) pooling with an attention module, which ranks each spatio-temporal region's relevance to a detected actor instead of crop**. We refer to these as Actor-Conditioned Attention Maps (ACAM), which amplify/dampen the features extracted from the entire scene. The resulting actor-conditioned features focus the model on regions that are relevant to the conditioned actor. For actor localization, we leverage pre-trained object detectors, which transfer better. The proposed model is efficient and our action detection pipeline achieves near real-time performance. Experimental results on AVA 2.1 and JHMDB demonstrate the effectiveness of attention maps, with improvements of 7 mAP on AVA and 4 mAP on JHMDB. △ Less

Submitted 10 May, 2020; v1 submitted 30 December, 2018; originally announced December 2018.

Comments: WACV2020 Paper

Journal ref: In The IEEE Winter Conference on Applications of Computer Vision (pp. 527-536) 2020

arXiv:1811.02629 [pdf, other]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset. △ Less

Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

Showing 1–50 of 63 results for author: Manjunath, S