Search | arXiv e-print repository

Learning State-Invariant Representations of Objects from Image Collections with State, Pose, and Viewpoint Changes

Abstract: We add one more invariance - state invariance - to the more commonly used other invariances for learning object representations for recognition and retrieval. By state invariance, we mean robust with respect to changes in the structural form of the object, such as when an umbrella is folded, or when an item of clothing is tossed on the floor. Since humans generally have no difficulty in recognizin… ▽ More We add one more invariance - state invariance - to the more commonly used other invariances for learning object representations for recognition and retrieval. By state invariance, we mean robust with respect to changes in the structural form of the object, such as when an umbrella is folded, or when an item of clothing is tossed on the floor. Since humans generally have no difficulty in recognizing objects despite such state changes, we are naturally faced with the question of whether it is possible to devise a neural architecture with similar abilities. To that end, we present a novel dataset, ObjectsWithStateChange, that captures state and pose variations in the object images recorded from arbitrary viewpoints. We believe that this dataset will facilitate research in fine-grained object recognition and retrieval of objects that are capable of state changes. The goal of such research would be to train models capable of generating object embeddings that remain invariant to state changes while also staying invariant to transformations induced by changes in viewpoint, pose, illumination, etc. To demonstrate the usefulness of the ObjectsWithStateChange dataset, we also propose a curriculum learning strategy that uses the similarity relationships in the learned embedding space after each epoch to guide the training process. The model learns discriminative features by comparing visually similar objects within and across different categories, encouraging it to differentiate between objects that may be challenging to distinguish due to changes in their state. We believe that this strategy enhances the model's ability to capture discriminative features for fine-grained tasks that may involve objects with state changes, leading to performance improvements on object-level tasks not only on our new dataset, but also on two other challenging multi-view datasets such as ModelNet40 and ObjectPI. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2403.00272 [pdf, other]

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Authors: Rohan Sarkar, Avinash Kak

Abstract: In the context of pose-invariant object recognition and retrieval, we demonstrate that it is possible to achieve significant improvements in performance if both the category-based and the object-identity-based embeddings are learned simultaneously during training. In hindsight, that sounds intuitive because learning about the categories is more fundamental than learning about the individual object… ▽ More In the context of pose-invariant object recognition and retrieval, we demonstrate that it is possible to achieve significant improvements in performance if both the category-based and the object-identity-based embeddings are learned simultaneously during training. In hindsight, that sounds intuitive because learning about the categories is more fundamental than learning about the individual objects that correspond to those categories. However, to the best of what we know, no prior work in pose-invariant learning has demonstrated this effect. This paper presents an attention-based dual-encoder architecture with specially designed loss functions that optimize the inter- and intra-class distances simultaneously in two different embedding spaces, one for the category embeddings and the other for the object-level embeddings. The loss functions we have proposed are pose-invariant ranking losses that are designed to minimize the intra-class distances and maximize the inter-class distances in the dual representation spaces. We demonstrate the power of our approach with three challenging multi-view datasets, ModelNet-40, ObjectPI, and FG3D. With our dual approach, for single-view object recognition, we outperform the previous best by 20.0% on ModelNet40, 2.0% on ObjectPI, and 46.5% on FG3D. On the other hand, for single-view object retrieval, we outperform the previous best by 33.7% on ModelNet40, 18.8% on ObjectPI, and 56.9% on FG3D. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)

arXiv:2308.01262 [pdf, other]

Incorporating Season and Solar Specificity into Renderings made by a NeRF Architecture using Satellite Images

Authors: Michael Gableman, Avinash Kak

Abstract: As a result of Shadow NeRF and Sat-NeRF, it is possible to take the solar angle into account in a NeRF-based framework for rendering a scene from a novel viewpoint using satellite images for training. Our work extends those contributions and shows how one can make the renderings season-specific. Our main challenge was creating a Neural Radiance Field (NeRF) that could render seasonal features inde… ▽ More As a result of Shadow NeRF and Sat-NeRF, it is possible to take the solar angle into account in a NeRF-based framework for rendering a scene from a novel viewpoint using satellite images for training. Our work extends those contributions and shows how one can make the renderings season-specific. Our main challenge was creating a Neural Radiance Field (NeRF) that could render seasonal features independently of viewing angle and solar angle while still being able to render shadows. We teach our network to render seasonal features by introducing one more input variable -- time of the year. However, the small training datasets typical of satellite imagery can introduce ambiguities in cases where shadows are present in the same location for every image of a particular season. We add additional terms to the loss function to discourage the network from using seasonal features for accounting for shadows. We show the performance of our network on eight Areas of Interest containing images captured by the Maxar WorldView-3 satellite. This evaluation includes tests measuring the ability of our framework to accurately render novel views, generate height maps, predict shadows, and specify seasonal features independently from shadows. Our ablation studies justify the choices made for network design parameters. △ Less

Submitted 15 December, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: 18 pages, 17 figures, 10 tables

ACM Class: I.4.8; I.3.3

arXiv:2305.14301 [pdf, other]

A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Authors: Fangda Li, Zhiqiang Hu, Wen Chen, Avinash Kak

Abstract: Hematoxylin and Eosin (H&E) staining is a widely used sample preparation procedure for enhancing the saturation of tissue sections and the contrast between nuclei and cytoplasm in histology images for medical diagnostics. However, various factors, such as the differences in the reagents used, result in high variability in the colors of the stains actually recorded. This variability poses a challen… ▽ More Hematoxylin and Eosin (H&E) staining is a widely used sample preparation procedure for enhancing the saturation of tissue sections and the contrast between nuclei and cytoplasm in histology images for medical diagnostics. However, various factors, such as the differences in the reagents used, result in high variability in the colors of the stains actually recorded. This variability poses a challenge in achieving generalization for machine-learning based computer-aided diagnostic tools. To desensitize the learned models to stain variations, we propose the Generative Stain Augmentation Network (G-SAN) -- a GAN-based framework that augments a collection of cell images with simulated yet realistic stain variations. At its core, G-SAN uses a novel and highly computationally efficient Laplacian Pyramid (LP) based generator architecture, that is capable of disentangling stain from cell morphology. Through the task of patch classification and nucleus segmentation, we show that using G-SAN-augmented training data provides on average 15.7% improvement in F1 score and 7.3% improvement in panoptic quality, respectively. Our code is available at https://github.com/lifangda01/GSAN-Demo. △ Less

Submitted 14 July, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.12560 [pdf]

HexRAN: A Programmable Approach to Open RAN Base Station System Design

Authors: Ahan Kak, Van-Quan Pham, Huu-Trung Thieu, Nakjung Choi

Abstract: In recent years, the radio access network (RAN) domain has witnessed a sea change with increasing levels of virtualization and softwarization driven by emerging paradigms such as the Open RAN (O-RAN) movement. However, the fundamental building block of the cellular network, i.e., the base station, remains unchanged and ill-equipped to handle this architectural evolution. In particular, with refere… ▽ More In recent years, the radio access network (RAN) domain has witnessed a sea change with increasing levels of virtualization and softwarization driven by emerging paradigms such as the Open RAN (O-RAN) movement. However, the fundamental building block of the cellular network, i.e., the base station, remains unchanged and ill-equipped to handle this architectural evolution. In particular, with reference to existing base station architectures, there exists a general lack of programmability and composability along with a protocol stack that grapples with diverging and often conflicting requirements set forth by 3GPP and O-RAN. Recognizing the need for an "O-RAN-native" approach to base station design, this paper introduces HexRAN- a novel base station architecture characterized by key features relating to RAN disaggregation and composability, 3GPP and O-RAN protocol integration and programmability, robust controller interactions, and customizable RAN slicing. Furthermore, the paper also includes a concrete systems-level prototype and comprehensive experimental evaluation of HexRAN on an over-the-air testbed, showcasing the scalability and performance benefits associated with the proposed architecture. △ Less

Submitted 3 July, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2303.06193 [pdf, other]

Adaptive Supervised PatchNCE Loss for Learning H&E-to-IHC Stain Translation with Inconsistent Groundtruth Image Pairs

Authors: Fangda Li, Zhiqiang Hu, Wen Chen, Avinash Kak

Abstract: Immunohistochemical (IHC) staining highlights the molecular information critical to diagnostics in tissue samples. However, compared to H&E staining, IHC staining can be much more expensive in terms of both labor and the laboratory equipment required. This motivates recent research that demonstrates that the correlations between the morphological information present in the H&E-stained slides and t… ▽ More Immunohistochemical (IHC) staining highlights the molecular information critical to diagnostics in tissue samples. However, compared to H&E staining, IHC staining can be much more expensive in terms of both labor and the laboratory equipment required. This motivates recent research that demonstrates that the correlations between the morphological information present in the H&E-stained slides and the molecular information in the IHC-stained slides can be used for H&E-to-IHC stain translation. However, due to a lack of pixel-perfect H&E-IHC groundtruth pairs, most existing methods have resorted to relying on expert annotations. To remedy this situation, we present a new loss function, Adaptive Supervised PatchNCE (ASP), to directly deal with the input to target inconsistencies in a proposed H&E-to-IHC image-to-image translation framework. The ASP loss is built upon a patch-based contrastive learning criterion, named Supervised PatchNCE (SP), and augments it further with weight scheduling to mitigate the negative impact of noisy supervision. Lastly, we introduce the Multi-IHC Stain Translation (MIST) dataset, which contains aligned H&E-IHC patches for 4 different IHC stains critical to breast cancer diagnosis. In our experiment, we demonstrate that our proposed method outperforms existing image-to-image translation methods for stain translation to multiple IHC stains. All of our code and datasets are available at https://github.com/lifangda01/AdaptiveSupervisedPatchNCE. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.05639 [pdf, other]

Self-Supervised One-Shot Learning for Automatic Segmentation of StyleGAN Images

Authors: Ankit Manerikar, Avinash C. Kak

Abstract: We propose a framework for the automatic one-shot segmentation of synthetic images generated by a StyleGAN. Our framework is based on the observation that the multi-scale hidden features in the GAN generator hold useful semantic information that can be utilized for automatic on-the-fly segmentation of the generated images. Using these features, our framework learns to segment synthetic images usin… ▽ More We propose a framework for the automatic one-shot segmentation of synthetic images generated by a StyleGAN. Our framework is based on the observation that the multi-scale hidden features in the GAN generator hold useful semantic information that can be utilized for automatic on-the-fly segmentation of the generated images. Using these features, our framework learns to segment synthetic images using a self-supervised contrastive clustering algorithm that projects the hidden features into a compact space for per-pixel classification. This contrastive learner is based on using a novel data augmentation strategy and a pixel-wise swapped prediction loss that leads to faster learning of the feature vectors for one-shot segmentation. We have tested our implementation on five standard benchmarks to yield a segmentation performance that not only outperforms the semi-supervised baselines by an average wIoU margin of 1.02 % but also improves the inference speeds by a factor of 4.5. Finally, we also show the results of using the proposed one-shot learner in implementing BagGAN, a framework for producing annotated synthetic baggage X-ray scans for threat detection. This framework was trained and tested on the PIDRay baggage benchmark to yield a performance comparable to its baseline segmenter based on manual annotations. △ Less

Submitted 23 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

arXiv:2302.12301 [pdf, other]

An Aligned Multi-Temporal Multi-Resolution Satellite Image Dataset for Change Detection Research

Authors: Rahul Deshmukh, Constantine J. Roros, Amith Kashyap, Avinash C. Kak

Abstract: This paper presents an aligned multi-temporal and multi-resolution satellite image dataset for research in change detection. We expect our dataset to be useful to researchers who want to fuse information from multiple satellites for detecting changes on the surface of the earth that may not be fully visible in any single satellite. The dataset we present was created by augmenting the SpaceNet-7 da… ▽ More This paper presents an aligned multi-temporal and multi-resolution satellite image dataset for research in change detection. We expect our dataset to be useful to researchers who want to fuse information from multiple satellites for detecting changes on the surface of the earth that may not be fully visible in any single satellite. The dataset we present was created by augmenting the SpaceNet-7 dataset with temporally parallel stacks of Landsat and Sentinel images. The SpaceNet-7 dataset consists of time-sequenced Planet images recorded over 101 AOIs (Areas-of-Interest). In our dataset, for each of the 60 AOIs that are meant for training, we augment the Planet datacube with temporally parallel datacubes of Landsat and Sentinel images. The temporal alignments between the high-res Planet images, on the one hand, and the Landsat and Sentinel images, on the other, are approximate since the temporal resolution for the Planet images is one month -- each image being a mosaic of the best data collected over a month. Whenever we have a choice regarding which Landsat and Sentinel images to pair up with the Planet images, we have chosen those that had the least cloud cover. A particularly important feature of our dataset is that the high-res and the low-res images are spatially aligned together with our MuRA framework presented in this paper. Foundational to the alignment calculation is the modeling of inter-satellite misalignment errors with polynomials as in NASA's AROP algorithm. We have named our dataset MuRA-T for the MuRA framework that is used for aligning the cross-satellite images and "T" for the temporal dimension in the dataset. △ Less

Submitted 27 February, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 8 pages, 4 figures, 3 tables, satellite image dataset

arXiv:2201.00467 [pdf, other]

maskGRU: Tracking Small Objects in the Presence of Large Background Motions

Authors: Constantine J. Roros, Avinash C. Kak

Abstract: We propose a recurrent neural network-based spatio-temporal framework named maskGRU for the detection and tracking of small objects in videos. While there have been many developments in the area of object tracking in recent years, tracking a small moving object amid other moving objects and actors (such as a ball amid moving players in sports footage) continues to be a difficult task. Existing spa… ▽ More We propose a recurrent neural network-based spatio-temporal framework named maskGRU for the detection and tracking of small objects in videos. While there have been many developments in the area of object tracking in recent years, tracking a small moving object amid other moving objects and actors (such as a ball amid moving players in sports footage) continues to be a difficult task. Existing spatio-temporal networks, such as convolutional Gated Recurrent Units (convGRUs), are difficult to train and have trouble accurately tracking small objects under such conditions. To overcome these difficulties, we developed the maskGRU framework that uses a weighted sum of the internal hidden state produced by a convGRU and a 3-channel mask of the tracked object's predicted bounding box as the hidden state to be used at the next time step of the underlying convGRU. We believe the technique of incorporating a mask into the hidden state through a weighted sum has two benefits: controlling the effect of exploding gradients and introducing an attention-like mechanism into the network by indicating where in the previous video frame the object is located. Our experiments show that maskGRU outperforms convGRU at tracking objects that are small relative to the video resolution even in the presence of other moving objects. △ Less

Submitted 2 January, 2022; originally announced January 2022.

Comments: 12 pages, 3 figures

arXiv:2112.05335 [pdf, other]

Uncertainty, Edge, and Reverse-Attention Guided Generative Adversarial Network for Automatic Building Detection in Remotely Sensed Images

Authors: Somrita Chattopadhyay, Avinash C. Kak

Abstract: Despite recent advances in deep-learning based semantic segmentation, automatic building detection from remotely sensed imagery is still a challenging problem owing to large variability in the appearance of buildings across the globe. The errors occur mostly around the boundaries of the building footprints, in shadow areas, and when detecting buildings whose exterior surfaces have reflectivity pro… ▽ More Despite recent advances in deep-learning based semantic segmentation, automatic building detection from remotely sensed imagery is still a challenging problem owing to large variability in the appearance of buildings across the globe. The errors occur mostly around the boundaries of the building footprints, in shadow areas, and when detecting buildings whose exterior surfaces have reflectivity properties that are very similar to those of the surrounding regions. To overcome these problems, we propose a generative adversarial network based segmentation framework with uncertainty attention unit and refinement module embedded in the generator. The refinement module, composed of edge and reverse attention units, is designed to refine the predicted building map. The edge attention enhances the boundary features to estimate building boundaries with greater precision, and the reverse attention allows the network to explore the features missing in the previously estimated regions. The uncertainty attention unit assists the network in resolving uncertainties in classification. As a measure of the power of our approach, as of December 4, 2021, it ranks at the second place on DeepGlobe's public leaderboard despite the fact that main focus of our approach -- refinement of the building edges -- does not align exactly with the metrics used for leaderboard rankings. Our overall F1-score on DeepGlobe's challenging dataset is 0.745. We also report improvements on the previous-best results for the challenging INRIA Validation Dataset for which our network achieves an overall IoU of 81.28% and an overall accuracy of 97.03%. Along the same lines, for the official INRIA Test Dataset, our network scores 77.86% and 96.41% in overall IoU and accuracy. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: 23 pages

arXiv:2102.10513 [pdf, other]

CheckSoft : A Scalable Event-Driven Software Architecture for Kee** Track of People and Things in People-Centric Spaces

Authors: Rohan Sarkar, Avinash C. Kak

Abstract: We present CheckSoft, a scalable event-driven software architecture for kee** track of people-object interactions in people-centric applications such as airport checkpoint security areas, automated retail stores, smart libraries, and so on. The architecture works off the video data generated in real time by a network of surveillance cameras. Although there are many different aspects to automatin… ▽ More We present CheckSoft, a scalable event-driven software architecture for kee** track of people-object interactions in people-centric applications such as airport checkpoint security areas, automated retail stores, smart libraries, and so on. The architecture works off the video data generated in real time by a network of surveillance cameras. Although there are many different aspects to automating these applications, the most difficult part of the overall problem is kee** track of the interactions between the people and the objects. CheckSoft uses finite-state-machine (FSM) based logic for kee** track of such interactions which allows the system to quickly reject any false detections of the interactions by the video cameras. CheckSoft is easily scalable since the architecture is based on multi-processing in which a separate process is assigned to each human and to each "storage container" for the objects. A storage container may be a shelf on which the objects are displayed or a bin in which the objects are stored, depending on the specific application in which CheckSoft is deployed. △ Less

Submitted 21 February, 2021; originally announced February 2021.

Comments: 33 pages, 25 figures, 6 Tables

arXiv:2010.01041 [pdf, other]

Homography Estimation with Convolutional Neural Networks Under Conditions of Variance

Authors: David Niblick, Avinash Kak

Abstract: Planar homography estimation is foundational to many computer vision problems, such as Simultaneous Localization and Map** (SLAM) and Augmented Reality (AR). However, conditions of high variance confound even the state-of-the-art algorithms. In this report, we analyze the performance of two recently published methods using Convolutional Neural Networks (CNNs) that are meant to replace the more t… ▽ More Planar homography estimation is foundational to many computer vision problems, such as Simultaneous Localization and Map** (SLAM) and Augmented Reality (AR). However, conditions of high variance confound even the state-of-the-art algorithms. In this report, we analyze the performance of two recently published methods using Convolutional Neural Networks (CNNs) that are meant to replace the more traditional feature-matching based approaches to the estimation of homography. Our evaluation of the CNN based methods focuses particularly on measuring the performance under conditions of significant noise, illumination shift, and occlusion. We also measure the benefits of training CNNs to varying degrees of noise. Additionally, we compare the effect of using color images instead of grayscale images for inputs to CNNs. Finally, we compare the results against baseline feature-matching based homography estimation methods using SIFT, SURF, and ORB. We find that CNNs can be trained to be more robust against noise, but at a small cost to accuracy in the noiseless case. Additionally, CNNs perform significantly better in conditions of extreme variance than their feature-matching based counterparts. With regard to color inputs, we conclude that with no change in the CNN architecture to take advantage of the additional information in the color planes, the difference in performance using color inputs or grayscale inputs is negligible. About the CNNs trained with noise-corrupted inputs, we show that training a CNN to a specific magnitude of noise leads to a "Goldilocks Zone" with regard to the noise levels where that CNN performs best. △ Less

Submitted 22 October, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: 9 pages, 16 figures, submitted to 2020 Computer Vision and Pattern Recognition Conference

arXiv:2008.10271 [pdf, other]

doi 10.1109/JSTARS.2021.3066944

Semantic Labeling of Large-Area Geographic Regions Using Multi-View and Multi-Date Satellite Images and Noisy OSM Training Labels

Authors: Bharath Comandur, Avinash C. Kak

Abstract: We present a novel multi-view training framework and CNN architecture for combining information from multiple overlap** satellite images and noisy training labels derived from OpenStreetMap (OSM) to semantically label buildings and roads across large geographic regions (100 km$^2$). Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to… ▽ More We present a novel multi-view training framework and CNN architecture for combining information from multiple overlap** satellite images and noisy training labels derived from OpenStreetMap (OSM) to semantically label buildings and roads across large geographic regions (100 km$^2$). Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches that use the views independently of one another. A unique (and, perhaps, surprising) property of our system is that modifications that are added to the tail-end of the CNN for learning from the multi-view data can be discarded at the time of inference with a relatively small penalty in the overall performance. This implies that the benefits of training using multiple views are absorbed by all the layers of the network. Additionally, our approach only adds a small overhead in terms of the GPU-memory consumption even when training with as many as 32 views per scene. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. With no human supervision, our IoU scores for the buildings and roads classes are 0.8 and 0.64 respectively which are better than state-of-the-art approaches that use OSM labels and that are not completely automated. △ Less

Submitted 26 June, 2021; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: This work has been accepted by the IEEE for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1911.09800 [pdf, other]

A Comparative Evaluation of SGM Variants (including a New Variant, tMGM) for Dense Stereo Matching

Authors: Sonali Patil, Tanmay Prakash, Bharath Comandur, Avinash Kak

Abstract: Our goal here is threefold: [1] To present a new dense-stereo matching algorithm, tMGM, that by combining the hierarchical logic of tSGM with the support structure of MGM achieves 6-8\% performance improvement over the baseline SGM (these performance numbers are posted under tMGM-16 in the Middlebury Benchmark V3 ); and [2] Through an exhaustive quantitative and qualitative comparative study, to c… ▽ More Our goal here is threefold: [1] To present a new dense-stereo matching algorithm, tMGM, that by combining the hierarchical logic of tSGM with the support structure of MGM achieves 6-8\% performance improvement over the baseline SGM (these performance numbers are posted under tMGM-16 in the Middlebury Benchmark V3 ); and [2] Through an exhaustive quantitative and qualitative comparative study, to compare how the major variants of the SGM approach to dense stereo matching, including the new tMGM, perform in the presence of: (a) illumination variations and shadows, (b) untextured or weakly textured regions, (c) repetitive patterns in the scene in the presence of large stereo rectification errors. [3] To present a novel DEM-Sculpting approach for estimating initial disparity search bounds for multi-date satellite stereo pairs. Based on our study, we have found that tMGM generally performs best with respect to all these data conditions. Both tSGM and MGM improve the density of stereo disparity maps and combining the two in tMGM makes it possible to accurately estimate the disparities at a significant number of pixels that would otherwise be declared invalid by SGM. The datasets we have used in our comparative evaluation include the Middlebury2014, KITTI2015, and ETH3D datasets and the satellite images over the San Fernando area from the MVS Challenge dataset. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1907.04404 [pdf, other]

A New Stereo Benchmarking Dataset for Satellite Images

Authors: Sonali Patil, Bharath Comandur, Tanmay Prakash, Avinash C. Kak

Abstract: In order to facilitate further research in stereo reconstruction with multi-date satellite images, the goal of this paper is to provide a set of stereo-rectified images and the associated groundtruthed disparities for 10 AOIs (Area of Interest) drawn from two sources: 8 AOIs from IARPA's MVS Challenge dataset and 2 AOIs from the CORE3D-Public dataset. The disparities were groundtruthed by first co… ▽ More In order to facilitate further research in stereo reconstruction with multi-date satellite images, the goal of this paper is to provide a set of stereo-rectified images and the associated groundtruthed disparities for 10 AOIs (Area of Interest) drawn from two sources: 8 AOIs from IARPA's MVS Challenge dataset and 2 AOIs from the CORE3D-Public dataset. The disparities were groundtruthed by first constructing a fused DSM from the stereo pairs and by aligning 30 cm LiDAR with the fused DSM. Unlike the existing benckmarking datasets, we have also carried out a quantitative evaluation of our groundtruthed disparities using human annotated points in two of the AOIs. Additionally, the rectification accuracy in our dataset is comparable to the same in the existing state-of-the-art stereo datasets. In general, we have used the WorldView-3 (WV3) images for the dataset, the exception being the UCSD area for which we have used both WV3 and WorldView-2 (WV2) images. All of the dataset images are now in the public domain. Since multi-date satellite images frequently include images acquired in different seasons (which creates challenges in finding corresponding pairs of pixels for stereo), our dataset also includes for each image a building mask over which the disparities estimated by stereo should prove reliable. Additional metadata included in the dataset includes information about each image's acquisition date and time, the azimuth and elevation angles of the camera, and the intersection angles for the two views in a stereo pair. Also included in the dataset are both quantitative and qualitative analyses of the accuracy of the groundtruthed disparity maps. Our dataset is available for download at \url{https://engineering.purdue.edu/RVL/Database/SatStereo/index.html} △ Less

Submitted 9 July, 2019; originally announced July 2019.

arXiv:1905.00934 [pdf, other]

A Splitting-Based Iterative Algorithm for GPU-Accelerated Statistical Dual-Energy X-Ray CT Reconstruction

Authors: Fangda Li, Ankit Manerikar, Tanmay Prakash, Avinash Kak

Abstract: When dealing with material classification in baggage at airports, Dual-Energy Computed Tomography (DECT) allows characterization of any given material with coefficients based on two attenuative effects: Compton scattering and photoelectric absorption. However, straightforward projection-domain decomposition methods for this characterization often yield poor reconstructions due to the high dynamic… ▽ More When dealing with material classification in baggage at airports, Dual-Energy Computed Tomography (DECT) allows characterization of any given material with coefficients based on two attenuative effects: Compton scattering and photoelectric absorption. However, straightforward projection-domain decomposition methods for this characterization often yield poor reconstructions due to the high dynamic range of material properties encountered in an actual luggage scan. Hence, for better reconstruction quality under a timing constraint, we propose a splitting-based, GPU-accelerated, statistical DECT reconstruction algorithm. Compared to prior art, our main contribution lies in the significant acceleration made possible by separating reconstruction and decomposition within an ADMM framework. Experimental results, on both synthetic and real-world baggage phantoms, demonstrate a significant reduction in time required for convergence. △ Less

Submitted 2 May, 2019; originally announced May 2019.

arXiv:1811.04772 [pdf, other]

doi 10.1117/12.2557638

Adaptive Target Recognition: A Case Study Involving Airport Baggage Screening

Authors: Ankit Manerikar, Tanmay Prakash, Avinash C. Kak

Abstract: This work addresses the question whether it is possible to design a computer-vision based automatic threat recognition (ATR) system so that it can adapt to changing specifications of a threat without having to create a new ATR each time. The changes in threat specifications, which may be warranted by intelligence reports and world events, are typically regarding the physical characteristics of wha… ▽ More This work addresses the question whether it is possible to design a computer-vision based automatic threat recognition (ATR) system so that it can adapt to changing specifications of a threat without having to create a new ATR each time. The changes in threat specifications, which may be warranted by intelligence reports and world events, are typically regarding the physical characteristics of what constitutes a threat: its material composition, its shape, its method of concealment, etc. Here we present our design of an AATR system (Adaptive ATR) that can adapt to changing specifications in materials characterization (meaning density, as measured by its x-ray attenuation coefficient), its mass, and its thickness. Our design uses a two-stage cascaded approach, in which the first stage is characterized by a high recall rate over the entire range of possibilities for the threat parameters that are allowed to change. The purpose of the second stage is to then fine-tune the performance of the overall system for the current threat specifications. The computational effort for this fine-tuning for achieving a desired PD/PFA rate is far less than what it would take to create a new classifier with the same overall performance for the new set of threat specifications. △ Less

Submitted 30 November, 2018; v1 submitted 12 November, 2018; originally announced November 2018.

arXiv:1709.00488 [pdf, other]

RMPD - A Recursive Mid-Point Displacement Algorithm for Path Planning

Authors: Fangda Li, Ankit V. Manerikar, Avinash C. Kak

Abstract: Motivated by what is required for real-time path planning, the paper starts out by presenting sRMPD, a new recursive "local" planner founded on the key notion that, unless made necessary by an obstacle, there must be no deviation from the shortest path between any two points, which would normally be a straight line path in the configuration space. Subsequently, we increase the power of sRMPD by us… ▽ More Motivated by what is required for real-time path planning, the paper starts out by presenting sRMPD, a new recursive "local" planner founded on the key notion that, unless made necessary by an obstacle, there must be no deviation from the shortest path between any two points, which would normally be a straight line path in the configuration space. Subsequently, we increase the power of sRMPD by using it as a "connect" subroutine call in a higher-level sampling-based algorithm mRMPD that is inspired by multi-RRT. As a consequence, mRMPD spawns a larger number of space exploring trees in regions of the configuration space that are characterized by a higher density of obstacles. The overall effect is a hybrid tree growing strategy with a trade-off between random exploration as made possible by multi-RRT based logic and immediate exploitation of opportunities to connect two states as made possible by sRMPD. The mRMPD planner can be biased with regard to this trade-off for solving different kinds of planning problems efficiently. Based on the test cases we have run, our experiments show that mRMPD can reduce planning time by up to 80% compared to basic RRT. △ Less

Submitted 25 February, 2018; v1 submitted 1 September, 2017; originally announced September 2017.

arXiv:1304.1513 [pdf]

Hierarchical Evidence Accumulation in the Pseiki System and Experiments in Model-Driven Mobile Robot Navigation

Authors: A. C. Kak, K. M. Andress, C. Lopez-Abadia, M. S. Carroll, J. R. Lewis

Abstract: In this paper, we will review the process of evidence accumulation in the PSEIKI system for expectation-driven interpretation of images of 3-D scenes. Expectations are presented to PSEIKI as a geometrical hierarchy of abstractions. PSEIKI's job is then to construct abstraction hierarchies in the perceived image taking cues from the abstraction hierarchies in the expectations. The Dempster-Shafe… ▽ More In this paper, we will review the process of evidence accumulation in the PSEIKI system for expectation-driven interpretation of images of 3-D scenes. Expectations are presented to PSEIKI as a geometrical hierarchy of abstractions. PSEIKI's job is then to construct abstraction hierarchies in the perceived image taking cues from the abstraction hierarchies in the expectations. The Dempster-Shafer formalism is used for associating belief values with the different possible labels for the constructed abstractions in the perceived image. This system has been used successfully for autonomous navigation of a mobile robot in indoor environments. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence (UAI1989)

Report number: UAI-P-1989-PG-194-207

Showing 1–19 of 19 results for author: Kak, A