-
Dynamically Modulating Visual Place Recognition Sequence Length For Minimum Acceptable Performance Scenarios
Authors:
Connor Malone,
Ankit Vora,
Thierry Peynot,
Michael Milford
Abstract:
Mobile robots and autonomous vehicles are often required to function in environments where critical position estimates from sensors such as GPS become uncertain or unreliable. Single image visual place recognition (VPR) provides an alternative for localization but often requires techniques such as sequence matching to improve robustness, which incurs additional computation and latency costs. Even…
▽ More
Mobile robots and autonomous vehicles are often required to function in environments where critical position estimates from sensors such as GPS become uncertain or unreliable. Single image visual place recognition (VPR) provides an alternative for localization but often requires techniques such as sequence matching to improve robustness, which incurs additional computation and latency costs. Even then, the sequence length required to localize at an acceptable performance level varies widely; and simply setting overly long fixed sequence lengths creates unnecessary latency, computational overhead, and can even degrade performance. In these scenarios it is often more desirable to meet or exceed a set target performance at minimal expense. In this paper we present an approach which uses a calibration set of data to fit a model that modulates sequence length for VPR as needed to exceed a target localization performance. We make use of a coarse position prior, which could be provided by any other localization system, and capture the variation in appearance across this region. We use the correlation between appearance variation and sequence length to curate VPR features and fit a multilayer perceptron (MLP) for selecting the optimal length. We demonstrate that this method is effective at modulating sequence length to maximize the number of sections in a dataset which meet or exceed a target performance whilst minimizing the median length used. We show applicability across several datasets and reveal key phenomena like generalization capabilities, the benefits of curating features and the utility of non-state-of-the-art feature extractors with nuanced properties.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Hybrid Approach to Parallel Stochastic Gradient Descent
Authors:
Aakash Sudhirbhai Vora,
Dhrumil Chetankumar Joshi,
Aksh Kantibhai Patel
Abstract:
Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We pr…
▽ More
Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We propose a third approach to data parallelism which is a hybrid between synchronous and asynchronous approaches, using both approaches to train the neural network. When the threshold function is selected appropriately to gradually shift all parameter aggregation from asynchronous to synchronous, we show that in a given time period our hybrid approach outperforms both asynchronous and synchronous approaches.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images
Authors:
Yuzhen Ding,
Jason M. Holmes,
Hongying Feng,
Baoxin Li,
Lisa A. McGee,
Jean-Claude M. Rwigema,
Sujay A. Vora,
Daniel J. Ma,
Robert L. Foote,
Samir H. Patel,
Wei Liu
Abstract:
In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag…
▽ More
In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose, thus unfavorable for pediatric patients. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Here, we propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images as the solo input and can synthesize accurate, full-size 3D CT in real time(within milliseconds). We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality(MAE: <45HU), dosimetrical accuracy(Gamma passing rate (2%/2mm/10%)>97%) and patient position uncertainty(shift error: <0.4mm). The proposed framework can generate accurate 3D CT faithfully mirroring real-time patient position, thus significantly improving patient setup accuracy, kee** imaging dose minimum, and maintaining treatment veracity.
△ Less
Submitted 1 April, 2024;
originally announced May 2024.
-
Robust Optimization for Spot Scanning Proton Therapy based on Dose-Linear Energy Transfer (LET) Volume Constraints
Authors:
**gyuan Chen,
Yunze Yang,
Hongying Feng,
Lian Zhang,
Carlos E. Vargas,
Nathan Y. Yu,
Jean-Claude M. Rwigema,
Sameer R. Keole,
Sujay A. Vora,
Jiajian Shen,
Wei Liu
Abstract:
Purpose: Historically, spot scanning proton therapy (SSPT) treatment planning utilizes dose volume constraints and linear-energy-transfer (LET) volume constraints separately to balance tumor control and organs-at-risk (OARs) protection. We propose a novel dose-LET volume constraint (DLVC)-based robust optimization (DLVCRO) method for SSPT in treating prostate cancer to obtain a desirable joint dos…
▽ More
Purpose: Historically, spot scanning proton therapy (SSPT) treatment planning utilizes dose volume constraints and linear-energy-transfer (LET) volume constraints separately to balance tumor control and organs-at-risk (OARs) protection. We propose a novel dose-LET volume constraint (DLVC)-based robust optimization (DLVCRO) method for SSPT in treating prostate cancer to obtain a desirable joint dose and LET distribution to minimize adverse events (AEs).
Methods: DLVCRO treats DLVC as soft constraints controlling the joint distribution of dose and LET. Ten prostate cancer patients were included with rectum and bladder as OARs. DLVCRO was compared with the conventional robust optimization (RO) method using the worst-case analysis method. Besides the dose-volume histogram (DVH) indices, the analogous LETVH and extra-biological-dose (xBD)-volume histogram indices were also used. The Wilcoxon signed rank test was used to measure statistical significance.
Results: In nominal scenario, DLVCRO significantly improved dose, LET and xBD distributions to protect OARs (rectum: V70Gy: 3.07\% vs. 2.90\%, p = .0063, RO vs. DLVCRO; $\text{LET}_{\max}$ (keV/um): 11.53 vs. 9.44, p = .0101; $\text{xBD}_{\max}$ (Gy$\cdot$keV/um): 420.55 vs. 398.79, p = .0086; bladder: V65Gy: 4.82\% vs. 4.61\%, p = .0032; $\text{LET}_{\max}$ 8.97 vs. 7.51, p = .0047; $\text{xBD}_{\max}$ 490.11 vs. 476.71, p = .0641). The physical dose distributions in targets are comparable (D2%: 98.57\% vs. 98.39\%; p = .0805; CTV D2% - D98%: 7.10\% vs. 7.75\%, p = .4624). In the worst-case scenario, DLVCRO robustly enhanced OAR while maintaining the similar plan robustness in target dose coverage and homogeneity.
Conclusion: DLVCRO upgrades 2D DVH-based to 3D DLVH-based treatment planning to adjust dose/LET distributions simultaneously and robustly. DLVCRO is potentially a powerful tool to improve patient outcomes in SSPT.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Increasing SLAM Pose Accuracy by Ground-to-Satellite Image Registration
Authors:
Yanhao Zhang,
Yujiao Shi,
Shan Wang,
Ankit Vora,
Akhil Perincherry,
Yongbo Chen,
Hongdong Li
Abstract:
Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and map** (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the v…
▽ More
Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and map** (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
A Detection and Filtering Framework for Collaborative Localization
Authors:
Thirumalaesh Ashokkumar,
Katherine A Skinner,
Siddarth Agarwal,
Ankit Vora,
Ashutosh Bhown
Abstract:
Increasingly, autonomous vehicles (AVs) are becoming a reality, such as the Advanced Driver Assistance Systems (ADAS) in vehicles that assist drivers in driving and parking functions with vehicles today. The localization problem for AVs relies primarily on multiple sensors, including cameras, LiDARs, and radars. Manufacturing, installing, calibrating, and maintaining these sensors can be very expe…
▽ More
Increasingly, autonomous vehicles (AVs) are becoming a reality, such as the Advanced Driver Assistance Systems (ADAS) in vehicles that assist drivers in driving and parking functions with vehicles today. The localization problem for AVs relies primarily on multiple sensors, including cameras, LiDARs, and radars. Manufacturing, installing, calibrating, and maintaining these sensors can be very expensive, thereby increasing the overall cost of AVs. This research explores the means to improve localization on vehicles belonging to the ADAS category in a platooning context, where an ADAS vehicle follows a lead "Smart" AV equipped with a highly accurate sensor suite. We propose and produce results by using a filtering framework to combine pose information derived from vision and odometry to improve the localization of the ADAS vehicle that follows the smart vehicle.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability
Authors:
Lian Zhang,
Jason M. Holmes,
Zhengliang Liu,
Hongying Feng,
Terence T. Sio,
Carlos E. Vargas,
Sameer R. Keole,
Kristin Stützer,
Sheng Li,
Tianming Liu,
Jiajian Shen,
William W. Wong,
Sujay A. Vora,
Wei Liu
Abstract:
Purpose: Prior AI-based dose prediction studies in photon and proton therapy often neglect underlying physics, limiting their generalizability to handle outlier clinical cases, especially for pencil beam scanning proton therapy (PBSPT). Our aim is to design a physics-aware and generalizable AI-based PBSPT dose prediction method that has the underlying physics considered to achieve high generalizab…
▽ More
Purpose: Prior AI-based dose prediction studies in photon and proton therapy often neglect underlying physics, limiting their generalizability to handle outlier clinical cases, especially for pencil beam scanning proton therapy (PBSPT). Our aim is to design a physics-aware and generalizable AI-based PBSPT dose prediction method that has the underlying physics considered to achieve high generalizability to properly handle the outlier clinical cases. Methods and Materials: This study analyzed PBSPT plans of 103 prostate and 78 lung cancer patients from our institution,with each case comprising CT images, structure sets, and plan doses from our Monte-Carlo dose engine (serving as the ground truth). Three methods were evaluated in the ablation study: the ROI-based method, the beam mask and sliding window method, and the noisy probing dose method. Twelve cases with uncommon beam angles or prescription doses tested the methods' generalizability to rare treatment planning scenarios. Performance evaluation used DVH indices, 3D Gamma passing rates (3%/2mm/10%), and dice coefficients for dose agreement. Results: The noisy probing dose method showed improved agreement of DVH indices, 3D Gamma passing rates, and dice coefficients compared to the conventional methods for the testing cases. The noisy probing dose method showed better generalizability in the 6 outlier cases than the ROI-based and beam mask-based methods with 3D Gamma passing rates (for prostate cancer, targets: 89.32%$\pm$1.45% vs. 93.48%$\pm$1.51% vs. 96.79%$\pm$0.83%, OARs: 85.87%$\pm$1.73% vs. 91.15%$\pm$1.13% vs. 94.29%$\pm$1.01%). The dose predictions were completed within 0.3 seconds. Conclusions: We've devised a novel noisy probing dose method for PBSPT dose prediction in prostate and lung cancer patients. With more physics included, it enhances the generalizability of dose prediction in handling outlier clinical cases.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Artificial Intelligence-Facilitated Online Adaptive Proton Therapy Using Pencil Beam Scanning Proton Therapy
Authors:
Hongying Feng,
Jie Shan,
Carlos E. Vargas,
Sameer R. Keole,
Jean-Claude M. Rwigema,
Nathan Y. Yu,
Yuzhen Ding,
Lian Zhang,
Steven E. Schild,
William W. Wong,
Sujay A. Vora,
JiaJian Shen,
Wei Liu
Abstract:
We propose an oAPT workflow that incorporates all these functionalities and validate its clinical implementation feasibility with prostate patients. AI-based auto-segmentation tool AccuContourTM (Manteia, Xiamen, China) was seamlessly integrated into oAPT. Initial spot arrangement tool on the vCT for re-optimization was implemented using raytracing. An LET-based biological effect evaluation tool w…
▽ More
We propose an oAPT workflow that incorporates all these functionalities and validate its clinical implementation feasibility with prostate patients. AI-based auto-segmentation tool AccuContourTM (Manteia, Xiamen, China) was seamlessly integrated into oAPT. Initial spot arrangement tool on the vCT for re-optimization was implemented using raytracing. An LET-based biological effect evaluation tool was developed to assess the overlap region of high dose and high LET in selected OARs. Eleven prostate cancer patients were retrospectively selected to verify the efficacy and efficiency of the proposed oAPT workflow. The time cost of each component in the workflow was recorded for analysis. The verification plan showed significant degradation of the CTV coverage and rectum and bladder sparing due to the interfractional anatomical changes. Re-optimization on the vCT resulted in great improvement of the plan quality. No overlap regions of high dose and high LET distributions were observed in bladder or rectum in re-plans. 3D Gamma analyses in PSQA confirmed the accuracy of the re-plan doses before delivery (Gamma passing rate = 99.57%), and after delivery (98.59%). The robustness of the re-plans passed all clinical requirements. The average time for the complete execution of the workflow was 9.12minutes, excluding manual intervention time. The AI-facilitated oAPT workflow was demonstrated to be both efficient and effective by generating a re-plan that significantly improved the plan quality in prostate cancer treated with PBSPT.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report
Authors:
Jason Holmes,
Lian Zhang,
Yuzhen Ding,
Hongying Feng,
Zhengliang Liu,
Tianming Liu,
William W. Wong,
Sujay A. Vora,
Jonathan B. Ashman,
Wei Liu
Abstract:
Purpose: To introduce the concept of using large language models (LLMs) to re-label structure names in accordance with the American Association of Physicists in Medicine (AAPM) Task Group (TG)-263 standard, and to establish a benchmark for future studies to reference.
Methods and Materials: The Generative Pre-trained Transformer (GPT)-4 application programming interface (API) was implemented as…
▽ More
Purpose: To introduce the concept of using large language models (LLMs) to re-label structure names in accordance with the American Association of Physicists in Medicine (AAPM) Task Group (TG)-263 standard, and to establish a benchmark for future studies to reference.
Methods and Materials: The Generative Pre-trained Transformer (GPT)-4 application programming interface (API) was implemented as a Digital Imaging and Communications in Medicine (DICOM) storage server, which upon receiving a structure set DICOM file, prompts GPT-4 to re-label the structure names of both target volumes and normal tissues according to the AAPM TG-263. Three disease sites, prostate, head and neck, and thorax were selected for evaluation. For each disease site category, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50) and 50 patients were randomly selected for evaluation. Structure names that were considered were those that were most likely to be relevant for studies utilizing structure contours for many patients.
Results: The overall re-labeling accuracy of both target volumes and normal tissues for prostate, head and neck, and thorax cases was 96.0%, 98.5%, and 96.9% respectively. Re-labeling of target volumes was less accurate on average except for prostate - 100%, 93.1%, and 91.1% respectively.
Conclusions: Given the accuracy of GPT-4 in re-labeling structure names of both target volumes and normal tissues as presented in this work, LLMs are poised to be the preferred method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
View Consistent Purification for Accurate Cross-View Localization
Authors:
Shan Wang,
Yanhao Zhang,
Akhil Perincherry,
Ankit Vora,
Hongdong Li
Abstract:
This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images. The proposed method addresses limitations in existing cross-view localization methods that struggle to handle noise sources such as moving objects and seasonal variations. It is the first sparse visual-only method that enhances…
▽ More
This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images. The proposed method addresses limitations in existing cross-view localization methods that struggle to handle noise sources such as moving objects and seasonal variations. It is the first sparse visual-only method that enhances perception in dynamic environments by detecting view-consistent key points and their corresponding deep features from ground and satellite views, while removing off-the-ground objects and establishing homography transformation between the two views. Moreover, the proposed method incorporates a spatial embedding approach that leverages camera intrinsic and extrinsic information to reduce the ambiguity of purely visual matching, leading to improved feature matching and overall pose estimation accuracy. The method exhibits strong generalization and is robust to environmental changes, requiring only geo-poses as ground truth. Extensive experiments on the KITTI and Ford Multi-AV Seasonal datasets demonstrate that our proposed method outperforms existing state-of-the-art methods, achieving median spatial accuracy errors below $0.5$ meters along the lateral and longitudinal directions, and a median orientation accuracy error below 2 degrees.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Stereo Visual Odometry with Deep Learning-Based Point and Line Feature Matching using an Attention Graph Neural Network
Authors:
Shenbagaraj Kannapiran,
Nalin Bendapudi,
Ming-Yuan Yu,
Devarth Parikh,
Spring Berman,
Ankit Vora,
Gaurav Pandey
Abstract:
Robust feature matching forms the backbone for most Visual Simultaneous Localization and Map** (vSLAM), visual odometry, 3D reconstruction, and Structure from Motion (SfM) algorithms. However, recovering feature matches from texture-poor scenes is a major challenge and still remains an open area of research. In this paper, we present a Stereo Visual Odometry (StereoVO) technique based on point a…
▽ More
Robust feature matching forms the backbone for most Visual Simultaneous Localization and Map** (vSLAM), visual odometry, 3D reconstruction, and Structure from Motion (SfM) algorithms. However, recovering feature matches from texture-poor scenes is a major challenge and still remains an open area of research. In this paper, we present a Stereo Visual Odometry (StereoVO) technique based on point and line features which uses a novel feature-matching mechanism based on an Attention Graph Neural Network that is designed to perform well even under adverse weather conditions such as fog, haze, rain, and snow, and dynamic lighting conditions such as nighttime illumination and glare scenarios. We perform experiments on multiple real and synthetic datasets to validate the ability of our method to perform StereoVO under low visibility weather and lighting conditions through robust point and line matches. The results demonstrate that our method achieves more line feature matches than state-of-the-art line matching algorithms, which when complemented with point feature matches perform consistently well in adverse weather and dynamic lighting conditions.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer
Authors:
Yujiao Shi,
Fei Wu,
Akhil Perincherry,
Ankit Vora,
Hongdong Li
Abstract:
Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite i…
▽ More
Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. Our approach designs a geometry-guided cross-view transformer that combines the benefits of conventional geometry and learnable cross-view transformers to map the ground-view observations to an overhead view. Given the synthesized overhead view and observed satellite feature maps, we construct a neural pose optimizer with strong global information embedding ability to estimate the relative rotation between them. After aligning their rotations, we develop an uncertainty-guided spatial correlation to generate a probability map of the vehicle locations, from which the relative translation can be determined. Experimental results demonstrate that our method significantly outperforms the state-of-the-art. Notably, the likelihood of restricting the vehicle lateral pose to be within 1m of its Ground Truth (GT) value on the cross-view KITTI dataset has been improved from $35.54\%$ to $76.44\%$, and the likelihood of restricting the vehicle orientation to be within $1^{\circ}$ of its GT value has been improved from $19.64\%$ to $99.10\%$.
△ Less
Submitted 19 July, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Achievable Rates for Information Extraction from a Strategic Sender
Authors:
Anuj S. Vora,
Ankur A. Kulkarni
Abstract:
We consider a setting of non-cooperative communication where a receiver wants to recover randomly generated sequences of symbols that are observed by a strategic sender. The sender aims to maximize an average utility that may not align with the recovery criterion of the receiver, whereby the received signals may not be truthful. We pose this problem as a sequential game between the sender and the…
▽ More
We consider a setting of non-cooperative communication where a receiver wants to recover randomly generated sequences of symbols that are observed by a strategic sender. The sender aims to maximize an average utility that may not align with the recovery criterion of the receiver, whereby the received signals may not be truthful. We pose this problem as a sequential game between the sender and the receiver with the receiver as the leader and determine `achievable strategies' for the receiver that attain arbitrarily small probability of error for large blocklengths. We show the existence of such achievable strategies under a sufficient condition on the utility of the sender. For the case of the binary alphabet, this condition is also necessary, in the absence of which, the probability of error goes to one for all choices of strategies of the receiver. We show that for reliable recovery, the receiver chooses to correctly decode only a subset of messages received from the sender and deliberately makes an error on messages outside this subset. Due to this decoding strategy, despite a clean channel, our setting exhibits a notion of maximum rate of communication above which the probability of error may not vanish asymptotically and in certain cases, may even tend to one. For the case of the binary alphabet, the maximum rate may be strictly less than unity for certain classes of utilities.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Modelling small block aperture in an in-house developed GPU-accelerated Monte Carlo-based dose engine for pencil beam scanning proton therapy
Authors:
Hongying Feng,
Jason M. Holmes,
Sujay A. Vora,
Joshua B. Stoker,
Martin Bues,
William W. Wong,
Terence S. Sio,
Robert L. Foote,
Samir H. Patel,
Jiajian Shen,
Wei Liu
Abstract:
Purpose: To enhance an in-house graphic-processing-unit (GPU) accelerated virtual particle (VP)-based Monte Carlo (MC) proton dose engine (VPMC) to model aperture blocks in both dose calculation and optimization for pencil beam scanning proton therapy (PBSPT)-based stereotactic radiosurgery (SRS). Methods and Materials: A block aperture module was integrated into VPMC. VPMC was validated by an ope…
▽ More
Purpose: To enhance an in-house graphic-processing-unit (GPU) accelerated virtual particle (VP)-based Monte Carlo (MC) proton dose engine (VPMC) to model aperture blocks in both dose calculation and optimization for pencil beam scanning proton therapy (PBSPT)-based stereotactic radiosurgery (SRS). Methods and Materials: A block aperture module was integrated into VPMC. VPMC was validated by an opensource code, MCsquare, in eight water phantom simulations with 3cm thick brass apertures: four were with aperture openings of 1, 2, 3, and 4cm without a range shifter, while the other four were with same aperture opening configurations with a range shifter of 45mm water equivalent thickness. VPMC was benchmarked with MCsquare and RayStation MC for 10 patients with small targets (average volume 8.4 cc). Finally, 3 patients were selected for robust optimization with aperture blocks using VPMC. Results: In the water phantoms, 3D gamma passing rate (2%/2mm/10%) between VPMC and MCsquare were 99.71$\pm$0.23%. In the patient geometries, 3D gamma passing rates (3%/2mm/10%) between VPMC/MCsquare and RayStation MC were 97.79$\pm$2.21%/97.78$\pm$1.97%, respectively. The calculation time was greatly decreased from 112.45$\pm$114.08 seconds (MCsquare) to 8.20$\pm$6.42 seconds (VPMC), both having statistical uncertainties of about 0.5%. The robustly optimized plans met all the dose-volume-constraints (DVCs) for the targets and OARs per our institutional protocols. The mean calculation time for 13 influence matrices in robust optimization by VPMC was 41.6 seconds. Conclusion: VPMC has been successfully enhanced to model aperture blocks in dose calculation and optimization for the PBSPT-based SRS.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place Recognition under Adverse Conditions
Authors:
Stephen Hausler,
Sourav Garg,
Punarjay Chakravarty,
Shubham Shrivastava,
Ankit Vora,
Michael Milford
Abstract:
Can knowing where you are assist in perceiving objects in your surroundings, especially under adverse weather and lighting conditions? In this work we investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map or pixel-level map-query correspondences. We contribute an algorithm which refines an initial set of candidate objec…
▽ More
Can knowing where you are assist in perceiving objects in your surroundings, especially under adverse weather and lighting conditions? In this work we investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map or pixel-level map-query correspondences. We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map. We begin by using visual place recognition (VPR) to retrieve a reference map image for a given query image, then use a binary classification neural network that compares the query and map** image regions to validate the query detection. Once our classification network is trained, on approximately 1000 query-map image pairs, it is able to improve the performance of vehicle detection when combined with an existing off-the-shelf vehicle detector. We demonstrate our approach using standard datasets across two cities (Oxford and Zurich) under different settings of train-test separation of map-query traverse pairs. We further emphasize the performance gains of our approach against alternative design choices and show that VPR suffices for the task, eliminating the need for precise ground truth localization.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization
Authors:
Stephen Hausler,
Sourav Garg,
Punarjay Chakravarty,
Shubham Shrivastava,
Ankit Vora,
Michael Milford
Abstract:
Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle gr…
▽ More
Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle's motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from $0.25m$ to $5m$, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately $35\%$ of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization
Authors:
Aditya Vora,
Akshay Gadi Patil,
Hao Zhang
Abstract:
We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates to act as surface priors. Our method, coined DiViNet, operates in two stages. It first learns the…
▽ More
We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates to act as surface priors. Our method, coined DiViNet, operates in two stages. It first learns the templates, in the form of 3D Gaussian functions, across different scenes, without 3D supervision. In the reconstruction stage, our predicted templates serve as anchors to help "stitch'' the surfaces over sparse regions. We demonstrate that our approach is not only able to complete the surface geometry but also reconstructs surface details to a reasonable extent from a few disparate input views. On the DTU and BlendedMVS datasets, our approach achieves the best reconstruction quality among existing methods in the presence of such sparse views and performs on par, if not better, with competing methods when dense views are employed as inputs.
△ Less
Submitted 1 November, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Beam mask and sliding window-facilitated deep learning-based accurate and efficient dose prediction for pencil beam scanning proton therapy
Authors:
Lian Zhang,
Jason M. Holmes,
Zhengliang Liu,
Sujay A. Vora,
Terence T. Sio,
Carlos E. Vargas,
Nathan Y. Yu,
Sameer R. Keole,
Steven E. Schild,
Martin Bues,
Sheng Li,
Tianming Liu,
Jiajian Shen,
William W. Wong,
Wei Liu
Abstract:
Purpose: To develop a DL-based PBSPT dose prediction workflow with high accuracy and balanced complexity to support on-line adaptive proton therapy clinical decision and subsequent replanning.
Methods: PBSPT plans of 103 prostate cancer patients and 83 lung cancer patients previously treated at our institution were included in the study, each with CTs, structure sets, and plan doses calculated b…
▽ More
Purpose: To develop a DL-based PBSPT dose prediction workflow with high accuracy and balanced complexity to support on-line adaptive proton therapy clinical decision and subsequent replanning.
Methods: PBSPT plans of 103 prostate cancer patients and 83 lung cancer patients previously treated at our institution were included in the study, each with CTs, structure sets, and plan doses calculated by the in-house developed Monte-Carlo dose engine. For the ablation study, we designed three experiments corresponding to the following three methods: 1) Experiment 1, the conventional region of interest (ROI) method. 2) Experiment 2, the beam mask (generated by raytracing of proton beams) method to improve proton dose prediction. 3) Experiment 3, the sliding window method for the model to focus on local details to further improve proton dose prediction. A fully connected 3D-Unet was adopted as the backbone. Dose volume histogram (DVH) indices, 3D Gamma passing rates, and dice coefficients for the structures enclosed by the iso-dose lines between the predicted and the ground truth doses were used as the evaluation metrics. The calculation time for each proton dose prediction was recorded to evaluate the method's efficiency.
Results: Compared to the conventional ROI method, the beam mask method improved the agreement of DVH indices for both targets and OARs and the sliding window method further improved the agreement of the DVH indices. For the 3D Gamma passing rates in the target, OARs, and BODY (outside target and OARs), the beam mask method can improve the passing rates in these regions and the sliding window method further improved them. A similar trend was also observed for the dice coefficients. In fact, this trend was especially remarkable for relatively low prescription isodose lines. The dose predictions for all the testing cases were completed within 0.25s.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Satellite Image Based Cross-view Localization for Autonomous Vehicle
Authors:
Shan Wang,
Yanhao Zhang,
Ankit Vora,
Akhil Perincherry,
Hongdong Li
Abstract:
Existing spatial localization techniques for autonomous vehicles mostly use a pre-built 3D-HD map, often constructed using a survey-grade 3D map** vehicle, which is not only expensive but also laborious. This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy, provi…
▽ More
Existing spatial localization techniques for autonomous vehicles mostly use a pre-built 3D-HD map, often constructed using a survey-grade 3D map** vehicle, which is not only expensive but also laborious. This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy, providing a cheaper and more practical way for localization. While the utilization of satellite imagery for cross-view localization is an established concept, the conventional methodology focuses primarily on image retrieval. This paper introduces a novel approach to cross-view localization that departs from the conventional image retrieval method. Specifically, our method develops (1) a Geometric-align Feature Extractor (GaFE) that leverages measured 3D points to bridge the geometric gap between ground and overhead views, (2) a Pose Aware Branch (PAB) adopting a triplet loss to encourage pose-aware feature extraction, and (3) a Recursive Pose Refine Branch (RPRB) using the Levenberg-Marquardt (LM) algorithm to align the initial pose towards the true vehicle pose iteratively. Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view. The results demonstrate the superiority of our method in cross-view localization with median spatial and angular errors within $1$ meter and $1^\circ$, respectively.
△ Less
Submitted 20 April, 2023; v1 submitted 27 July, 2022;
originally announced July 2022.
-
Improving Worst Case Visual Localization Coverage via Place-specific Sub-selection in Multi-camera Systems
Authors:
Stephen Hausler,
Ming Xu,
Sourav Garg,
Punarjay Chakravarty,
Shubham Shrivastava,
Ankit Vora,
Michael Milford
Abstract:
6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical [email protected] type metrics, these systems still have limited utility for real-world appli…
▽ More
6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical [email protected] type metrics, these systems still have limited utility for real-world applications like autonomous vehicles because of their `worst' areas of performance - the locations where they provide insufficient recall at a certain required error tolerance. Here we investigate the utility of using `place specific configurations', where a map is segmented into a number of places, each with its own configuration for modulating the pose estimation step, in this case selecting a camera within a multi-camera system. On the Ford AV benchmark dataset, we demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines - minimizing the percentage of the dataset which has low recall at a certain error tolerance, as well as improved overall localization performance. Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment, where a fleet of AVs are regularly traversing a known route.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Localization of a Smart Infrastructure Fisheye Camera in a Prior Map for Autonomous Vehicles
Authors:
Subodh Mishra,
Armin Parchami,
Enrique Corona,
Punarjay Chakravarty,
Ankit Vora,
Devarth Parikh,
Gaurav Pandey
Abstract:
This work presents a technique for localization of a smart infrastructure node, consisting of a fisheye camera, in a prior map. These cameras can detect objects that are outside the line of sight of the autonomous vehicles (AV) and send that information to AVs using V2X technology. However, in order for this information to be of any use to the AV, the detected objects should be provided in the ref…
▽ More
This work presents a technique for localization of a smart infrastructure node, consisting of a fisheye camera, in a prior map. These cameras can detect objects that are outside the line of sight of the autonomous vehicles (AV) and send that information to AVs using V2X technology. However, in order for this information to be of any use to the AV, the detected objects should be provided in the reference frame of the prior map that the AV uses for its own navigation. Therefore, it is important to know the accurate pose of the infrastructure camera with respect to the prior map. Here we propose to solve this localization problem in two steps, \textit{(i)} we perform feature matching between perspective projection of fisheye image and bird's eye view (BEV) satellite imagery from the prior map to estimate an initial camera pose, \textit{(ii)} we refine the initialization by maximizing the Mutual Information (MI) between intensity of pixel values of fisheye image and reflectivity of 3D LiDAR points in the map data. We validate our method on simulated data and also present results with real world data.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Infrastructure Node-based Vehicle Localization for Autonomous Driving
Authors:
Elijah S. Lee,
Ankit Vora,
Armin Parchami,
Punarjay Chakravarty,
Gaurav Pandey,
Vijay Kumar
Abstract:
Vehicle localization is essential for autonomous vehicle (AV) navigation and Advanced Driver Assistance Systems (ADAS). Accurate vehicle localization is often achieved via expensive inertial navigation systems or by employing compute-intensive vision processing (LiDAR/camera) to augment the low-cost and noisy inertial sensors. Here we have developed a framework for fusing the information obtained…
▽ More
Vehicle localization is essential for autonomous vehicle (AV) navigation and Advanced Driver Assistance Systems (ADAS). Accurate vehicle localization is often achieved via expensive inertial navigation systems or by employing compute-intensive vision processing (LiDAR/camera) to augment the low-cost and noisy inertial sensors. Here we have developed a framework for fusing the information obtained from a smart infrastructure node (ix-node) with the autonomous vehicles on-board localization engine to estimate the robust and accurate pose of the ego-vehicle even with cheap inertial sensors. A smart ix-node is typically used to augment the perception capability of an autonomous vehicle, especially when the onboard perception sensors of AVs are blocked by the dynamic and static objects in the environment thereby making them ineffectual. In this work, we utilize this perception output from an ix-node to increase the localization accuracy of the AV. The fusion of ix-node perception output with the vehicle's low-cost inertial sensors allows us to perform reliable vehicle localization without the need for relying on expensive inertial navigation systems or compute-intensive vision processing onboard the AVs. The proposed approach has been tested on real-world datasets collected from a test track in Ann Arbor, Michigan. Detailed analysis of the experimental results shows that incorporating ix-node data improves localization performance.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
S-BEV: Semantic Birds-Eye View Representation for Weather and Lighting Invariant 3-DoF Localization
Authors:
Mokshith Voodarla,
Shubham Shrivastava,
Sagar Manglani,
Ankit Vora,
Siddharth Agarwal,
Punarjay Chakravarty
Abstract:
We describe a light-weight, weather and lighting invariant, Semantic Bird's Eye View (S-BEV) signature for vision-based vehicle re-localization. A topological map of S-BEV signatures is created during the first traversal of the route, which are used for coarse localization in subsequent route traversal. A fine-grained localizer is then trained to output the global 3-DoF pose of the vehicle using i…
▽ More
We describe a light-weight, weather and lighting invariant, Semantic Bird's Eye View (S-BEV) signature for vision-based vehicle re-localization. A topological map of S-BEV signatures is created during the first traversal of the route, which are used for coarse localization in subsequent route traversal. A fine-grained localizer is then trained to output the global 3-DoF pose of the vehicle using its S-BEV and its coarse localization. We conduct experiments on vKITTI2 virtual dataset and show the potential of the S-BEV to be robust to weather and lighting. We also demonstrate results with 2 vehicles on a 22 km long highway route in the Ford AV dataset.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
Optimal Questionnaires for Screening of Strategic Agents
Authors:
Anuj S. Vora,
Ankur A. Kulkarni
Abstract:
During the COVID-$19$ pandemic the health authorities at airports and train stations try to screen and identify the travellers possibly exposed to the virus. However, many individuals avoid getting tested and hence may misreport their travel history. This is a challenge for the health authorities who wish to ascertain the truly susceptible cases in spite of this strategic misreporting. We investig…
▽ More
During the COVID-$19$ pandemic the health authorities at airports and train stations try to screen and identify the travellers possibly exposed to the virus. However, many individuals avoid getting tested and hence may misreport their travel history. This is a challenge for the health authorities who wish to ascertain the truly susceptible cases in spite of this strategic misreporting. We investigate the problem of questioning travellers to classify them for further testing when the travellers are strategic or are unwilling to reveal their travel histories. We show there are fundamental limits to how many travel histories the health authorities can recover.% can be correctly classified by any probing mechanism.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Ensembling Low Precision Models for Binary Biomedical Image Segmentation
Authors:
Tianyu Ma,
Hang Zhang,
Hanley Ong,
Amar Vora,
Thanh D. Nguyen,
Ajay Gupta,
Yi Wang,
Mert Sabuncu
Abstract:
Segmentation of anatomical regions of interest such as vessels or small lesions in medical images is still a difficult problem that is often tackled with manual input by an expert. One of the major challenges for this task is that the appearance of foreground (positive) regions can be similar to background (negative) regions. As a result, many automatic segmentation algorithms tend to exhibit asym…
▽ More
Segmentation of anatomical regions of interest such as vessels or small lesions in medical images is still a difficult problem that is often tackled with manual input by an expert. One of the major challenges for this task is that the appearance of foreground (positive) regions can be similar to background (negative) regions. As a result, many automatic segmentation algorithms tend to exhibit asymmetric errors, typically producing more false positives than false negatives. In this paper, we aim to leverage this asymmetry and train a diverse ensemble of models with very high recall, while sacrificing their precision. Our core idea is straightforward: A diverse ensemble of low precision and high recall models are likely to make different false positive errors (classifying background as foreground in different parts of the image), but the true positives will tend to be consistent. Thus, in aggregate the false positive errors will cancel out, yielding high performance for the ensemble. Our strategy is general and can be applied with any segmentation model. In three different applications (carotid artery segmentation in a neck CT angiography, myocardium segmentation in a cardiovascular MRI and multiple sclerosis lesion segmentation in a brain MRI), we show how the proposed approach can significantly boost the performance of a baseline segmentation method.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Enhanced Normalized Mutual Information for Localization in Noisy Environments
Authors:
Samuel Todd Flanagan,
Drupad K. Khublani,
J. -F. Chamberland,
Siddharth Agarwal,
Ankit Vora
Abstract:
Fine localization is a crucial task for autonomous vehicles. Although many algorithms have been explored in the literature for this specific task, the goal of getting accurate results from commodity sensors remains a challenge. As autonomous vehicles make the transition from expensive prototypes to production items, the need for inexpensive, yet reliable solutions is increasing rapidly. This artic…
▽ More
Fine localization is a crucial task for autonomous vehicles. Although many algorithms have been explored in the literature for this specific task, the goal of getting accurate results from commodity sensors remains a challenge. As autonomous vehicles make the transition from expensive prototypes to production items, the need for inexpensive, yet reliable solutions is increasing rapidly. This article considers scenarios where images are captured with inexpensive cameras and localization takes place using pre-loaded fine maps of local roads as side information. The techniques proposed herein extend schemes based on normalized mutual information by leveraging the likelihood of shades rather than exact sensor readings for localization in noisy environments. This algorithmic enhancement, rooted in statistical signal processing, offers substantial gains in performance. Numerical simulations are used to highlight the benefits of the proposed techniques in representative application scenarios. Analysis of a Ford image set is performed to validate the core findings of this work.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
Shannon meets Myerson: Information Extraction from a Strategic Sender
Authors:
Anuj S. Vora,
Ankur A. Kulkarni
Abstract:
We study a setting where a receiver must design a questionnaire to recover a sequence of symbols known to strategic sender, whose utility may not be incentive compatible. We allow the receiver the possibility of selecting the alternatives presented in the questionnaire, and thereby linking decisions across the components of the sequence. We show that, despite the strategic sender and the noise in…
▽ More
We study a setting where a receiver must design a questionnaire to recover a sequence of symbols known to strategic sender, whose utility may not be incentive compatible. We allow the receiver the possibility of selecting the alternatives presented in the questionnaire, and thereby linking decisions across the components of the sequence. We show that, despite the strategic sender and the noise in the channel, the receiver can recover exponentially many sequences, but also that exponentially many sequences are unrecoverable even by the best strategy. We define the growth rate of the number of recovered sequences as the information extraction capacity. A generalization of the Shannon capacity, it characterizes the optimal amount of communication resources required. We derive bounds leading to an exact evaluation of the information extraction capacity in many cases. Our results form the building blocks of a novel, noncooperative regime of communication involving a strategic sender.
△ Less
Submitted 15 September, 2022; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Aerial Imagery based LIDAR Localization for Autonomous Vehicles
Authors:
Ankit Vora,
Siddharth Agarwal,
Gaurav Pandey,
James McBride
Abstract:
This paper presents a localization technique using aerial imagery maps and LIDAR based ground reflectivity for autonomous vehicles in urban environments. Traditional localization techniques using LIDAR reflectivity rely on high definition reflectivity maps generated from a map** vehicle. The cost and effort required to maintain such prior maps are generally very high because it requires a fleet…
▽ More
This paper presents a localization technique using aerial imagery maps and LIDAR based ground reflectivity for autonomous vehicles in urban environments. Traditional localization techniques using LIDAR reflectivity rely on high definition reflectivity maps generated from a map** vehicle. The cost and effort required to maintain such prior maps are generally very high because it requires a fleet of expensive map** vehicles. In this work we propose a localization technique where the vehicle localizes using aerial/satellite imagery, eradicating the need to develop and maintain complex high-definition maps. The proposed technique has been tested on a real world dataset collected from a test track in Ann Arbor, Michigan. This research concludes that aerial imagery based maps provides real-time localization performance similar to state-of-the-art LIDAR based maps for autonomous vehicles in urban environments at reduced costs.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Ford Multi-AV Seasonal Dataset
Authors:
Siddharth Agarwal,
Ankit Vora,
Gaurav Pandey,
Wayne Williams,
Helen Kourous,
James McBride
Abstract:
This paper presents a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles traversed an average route of 66 km in Michigan that included a mix of driving scenarios such as the Detroit Airport, freeways, city-centers, university campus and suburban neighbourhoods, etc. Each vehicle used in this data collec…
▽ More
This paper presents a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles traversed an average route of 66 km in Michigan that included a mix of driving scenarios such as the Detroit Airport, freeways, city-centers, university campus and suburban neighbourhoods, etc. Each vehicle used in this data collection is a Ford Fusion outfitted with an Applanix POS-LV GNSS system, four HDL-32E Velodyne 3D-lidar scanners, 6 Point Grey 1.3 MP Cameras arranged on the rooftop for 360-degree coverage and 1 Pointgrey 5 MP camera mounted behind the windshield for the forward field of view. We present the seasonal variation in weather, lighting, construction and traffic conditions experienced in dynamic urban environments. This dataset can help design robust algorithms for autonomous vehicles and multi-agent systems. Each log in the dataset is time-stamped and contains raw data from all the sensors, calibration values, pose trajectory, ground truth pose, and 3D maps. All data is available in Rosbag format that can be visualized, modified and applied using the open-source Robot Operating System (ROS). We also provide the output of state-of-the-art reflectivity-based localization for bench-marking purposes. The dataset can be freely downloaded at our website.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Localization in Autonomous Vehicles Using a Generalized Inner Product
Authors:
Samuel Todd Flanagan,
Drupad K. Khublani,
Jean-Francois Chamberland,
Siddharth Agarwal,
Ankit Vora
Abstract:
Fine localization in autonomous driving platforms is a task of broad interest, receiving much attention in recent years. Some localization algorithms use the Euclidean distance as a similarity measure between the local image acquired by a camera and a global map, which acts as side information. The global map is typically expressed in terms of the coordinate system of the road plane. Yet, a road i…
▽ More
Fine localization in autonomous driving platforms is a task of broad interest, receiving much attention in recent years. Some localization algorithms use the Euclidean distance as a similarity measure between the local image acquired by a camera and a global map, which acts as side information. The global map is typically expressed in terms of the coordinate system of the road plane. Yet, a road image captured by a camera is subject to distortion in that nearby features on the road have much larger footprints on the focal plane of the camera compared with those of equally-sized features that lie farther ahead of the vehicle. Using commodity computational tools, it is straightforward to execute a transformation and, thereby, bring the distorted image into the frame of reference of the global map. However, this nonlinear transformation results in unequal noise amplification. The noise profile induced by this transformation should be accounted for when trying to match an acquired image to a global map, with more reliable regions being given more weight in the process. This physical reality presents an algorithmic opportunity to improve existing localization algorithms, especially in harsh conditions. This article reviews the physics of road feature acquisition through a camera, and it proposes an improved matching method rooted in statistical analysis. Findings are supported by numerical simulations.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Minimax Theorems for Finite Blocklength Lossy Joint Source-Channel Coding over an AVC
Authors:
Anuj S. Vora,
Ankur A. Kulkarni
Abstract:
Motivated by applications in the security of cyber-physical systems, we pose the finite blocklength communication problem in the presence of a jammer as a zero-sum game between the encoder-decoder team and the jammer, by allowing the communicating team as well as the jammer only locally randomized strategies. The communicating team's problem is non-convex under locally randomized codes, and hence,…
▽ More
Motivated by applications in the security of cyber-physical systems, we pose the finite blocklength communication problem in the presence of a jammer as a zero-sum game between the encoder-decoder team and the jammer, by allowing the communicating team as well as the jammer only locally randomized strategies. The communicating team's problem is non-convex under locally randomized codes, and hence, in general, a minimax theorem need not hold for this game. However, we show that approximate minimax theorems hold in the sense that the minimax and maximin values of the game approach each other asymptotically. In particular, for rates strictly below a critical threshold, both the minimax and maximin values approach zero, and for rates strictly above it, they both approach unity. We then show a second order minimax theorem, i.e., for rates exactly approaching the threshold with along a specific scaling, the minimax and maximin values approach the same constant value, that is neither zero nor one. Critical to these results is our derivation of finite blocklength bounds on the minimax and maximin values of the game and our derivation of second order dispersion-based bounds.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Localization Requirements for Autonomous Vehicles
Authors:
Tyler G. R. Reid,
Sarah E. Houts,
Robert Cammarata,
Graham Mills,
Siddharth Agarwal,
Ankit Vora,
Gaurav Pandey
Abstract:
Autonomous vehicles require precise knowledge of their position and orientation in all weather and traffic conditions for path planning, perception, control, and general safe operation. Here we derive these requirements for autonomous vehicles based on first principles. We begin with the safety integrity level, defining the allowable probability of failure per hour of operation based on desired im…
▽ More
Autonomous vehicles require precise knowledge of their position and orientation in all weather and traffic conditions for path planning, perception, control, and general safe operation. Here we derive these requirements for autonomous vehicles based on first principles. We begin with the safety integrity level, defining the allowable probability of failure per hour of operation based on desired improvements on road safety today. This draws comparisons with the localization integrity levels required in aviation and rail where similar numbers are derived at 10^-8 probability of failure per hour of operation. We then define the geometry of the problem, where the aim is to maintain knowledge that the vehicle is within its lane and to determine what road level it is on. Longitudinal, lateral, and vertical localization error bounds (alert limits) and 95% accuracy requirements are derived based on US road geometry standards (lane width, curvature, and vertical clearance) and allowable vehicle dimensions. For passenger vehicles operating on freeway roads, the result is a required lateral error bound of 0.57 m (0.20 m, 95%), a longitudinal bound of 1.40 m (0.48 m, 95%), a vertical bound of 1.30 m (0.43 m, 95%), and an attitude bound in each direction of 1.50 deg (0.51 deg, 95%). On local streets, the road geometry makes requirements more stringent where lateral and longitudinal error bounds of 0.29 m (0.10 m, 95%) are needed with an orientation requirement of 0.50 deg (0.17 deg, 95%).
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
FCHD: Fast and accurate head detection in crowded scenes
Authors:
Aditya Vora,
Vinay Chilaka
Abstract:
In this paper, we propose FCHD-Fully Convolutional Head Detector, an end-to-end trainable head detection model. Our proposed architecture is a single fully convolutional network which is responsible for both bounding box prediction and classification. This makes our model lightweight with low inference time and memory requirements. Along with run-time, our model has better overall average precisio…
▽ More
In this paper, we propose FCHD-Fully Convolutional Head Detector, an end-to-end trainable head detection model. Our proposed architecture is a single fully convolutional network which is responsible for both bounding box prediction and classification. This makes our model lightweight with low inference time and memory requirements. Along with run-time, our model has better overall average precision (AP) which is achieved by selection of anchor sizes based on the effective receptive field of the network. This can be concluded from our experiments on several head detection datasets with varying head counts. We achieve an AP of 0.70 on a challenging head detection dataset which is comparable to some standard benchmarks. Along with this our model runs at 5 FPS on Nvidia Quadro M1000M for VGA resolution images. Code is available at https://github.com/aditya-vora/FCHD-Fully-Convolutional-Head-Detector.
△ Less
Submitted 5 May, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.
-
Ultra-Broadband Terahertz Perfect Absorber based on Doped Silicon
Authors:
Ankit Vora,
Satyadhar Joshi,
Arun Matai
Abstract:
The requirement for metamaterial perfect absorbers (MPA) based on doped semiconductors is steadily increasing due to the available matured fabrication and simulation technology. There is a particular interest in develo** terahertz (THz) perfect absorbers using doped semiconductors for achieving characteristics such as polarization-independence, wide-angle, and broadband absorption. We report MPA…
▽ More
The requirement for metamaterial perfect absorbers (MPA) based on doped semiconductors is steadily increasing due to the available matured fabrication and simulation technology. There is a particular interest in develo** terahertz (THz) perfect absorbers using doped semiconductors for achieving characteristics such as polarization-independence, wide-angle, and broadband absorption. We report MPA based on patterned arrays of tapered micro-cylindrical structures of doped silicon to enable them with broadband, wide-angle, and polarization-independent response. In this work, we modeled the MPA structures using COMSOL Multiphysics to evaluate its electromagnetic wave response using the software's RF Module running in parallel on a Beowulf cluster. We evaluated the doped silicon MPA structures for its response in the frequency spectrum of 0.1 to 5.0 THz for transverse magnetic and transverse electric polarizations at the normal and oblique incidence up to 75 degrees. This proposed doped silicon MPA was found to support a perfect absorption for a wide frequency spectrum from 1.7 to 3.9 THz along with insensitivity towards polarization and incident angles up to 60 deg. The execution of MPA on simplified Beowulf cluster significantly reduced the simulations time by the orders of magnitude compared to the sequential simulations.
△ Less
Submitted 1 July, 2018;
originally announced July 2018.
-
Optimal Design of Thin-film Plasmonic Solar Cells using Differential Evolution Optimization Algorithms
Authors:
Ankit Vora,
Satyadhar Joshi,
Arun Matai,
Joshua M. Pearce,
Durdu Guney
Abstract:
An approach using a differential evolution (DE) optimization algorithm is proposed to optimize design parameters for improving the optical absorption efficiency of plasmonic solar cells (PSC). This approach is based on formulating the parameters extraction as a search and optimization process in order to maximize the optical absorption in the PSC. Determining the physical parameters of three-dimen…
▽ More
An approach using a differential evolution (DE) optimization algorithm is proposed to optimize design parameters for improving the optical absorption efficiency of plasmonic solar cells (PSC). This approach is based on formulating the parameters extraction as a search and optimization process in order to maximize the optical absorption in the PSC. Determining the physical parameters of three-dimensional (3-D) PSC is critical for designing and estimating their performance, however, due to the complex design of the PSC, parameters extraction is time and calculation intensive. In this paper, this technique is demonstrated for the case of commercial thin-film hydrogenated amorphous silicon (a-Si:H) solar photovoltaic cells enhanced through patterned silver nano-disk plasmonic structures. The DE optimization of PSC structures was performed to execute a real-time parameter search and optimization. The predicted optical enhancement (OE) in optical absorption in the active layer of the PSC for AM-1.5 solar spectrum was found to be over 19.45% higher compared to the reference cells. The proposed technique offers higher accuracy and automates the tuning of control parameters of PSC in a time-efficient manner.
△ Less
Submitted 1 July, 2018;
originally announced July 2018.
-
A Classification approach towards Unsupervised Learning of Visual Representations
Authors:
Aditya Vora
Abstract:
In this paper, we present a technique for unsupervised learning of visual representations. Specifically, we train a model for foreground and background classification task, in the process of which it learns visual representations. Foreground and background patches for training come af- ter mining for such patches from hundreds and thousands of unlabelled videos available on the web which we ex- tr…
▽ More
In this paper, we present a technique for unsupervised learning of visual representations. Specifically, we train a model for foreground and background classification task, in the process of which it learns visual representations. Foreground and background patches for training come af- ter mining for such patches from hundreds and thousands of unlabelled videos available on the web which we ex- tract using a proposed patch extraction algorithm. With- out using any supervision, with just using 150, 000 unla- belled videos and the PASCAL VOC 2007 dataset, we train a object recognition model that achieves 45.3 mAP which is close to the best performing unsupervised feature learn- ing technique whereas better than many other proposed al- gorithms. The code for patch extraction is implemented in Matlab and available open source at the following link .
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
New approach for SCR selection and optimization for Septum magnet power supply with high reliability
Authors:
Ankit Vora
Abstract:
A new approach for selection of Silicon-Controlled Rectifier (SCR) for switching high pulsed currents in the septum magnet power supply is described. In this approach, an attempt is made to select the SCR from its I2t rating given in data sheet. For this, a factor which we have called I2t derating factor is defined as the ratio of I2t rating of SCR to the I2t value of current pulse to be switched.…
▽ More
A new approach for selection of Silicon-Controlled Rectifier (SCR) for switching high pulsed currents in the septum magnet power supply is described. In this approach, an attempt is made to select the SCR from its I2t rating given in data sheet. For this, a factor which we have called I2t derating factor is defined as the ratio of I2t rating of SCR to the I2t value of current pulse to be switched. Thus, the SCR to be used in power supply is selected from its I2t derating factor. Three different experiments were performed using SCR of different manufacturers, to estimate the value of I2t derating factors. In these experiments, the SCR was subjected to maximum thermal stress till SCR started showing degradation. The forward blocking voltage and the leakage current are the parameters used for characterizing the SCR before and after the stress. The results of these experiments are presented and discussed. This approach will be of great help when pulse rating of SCR, like, transient thermal impedance is not available. Finally, a method is presented which can contribute to design low-cost and high-reliability septum magnet power supply.
△ Less
Submitted 5 August, 2017;
originally announced August 2017.
-
Copper-oxide Nanowires based Humidity Sensor
Authors:
Ankit Vora,
Arvind K. Srivastava
Abstract:
This paper presents investigated results of copper-oxide nanowires used as a humidity sensor. Copper-oxide nanowires films were grown over cross-comb type gold electrodes on a SiO2 substrate using thermal annealing technique, and its humidity sensitive characteristics were investigated through resistance across the gold electrodes. These copper-oxide nanowires films revealed high sensitivity and l…
▽ More
This paper presents investigated results of copper-oxide nanowires used as a humidity sensor. Copper-oxide nanowires films were grown over cross-comb type gold electrodes on a SiO2 substrate using thermal annealing technique, and its humidity sensitive characteristics were investigated through resistance across the gold electrodes. These copper-oxide nanowires films revealed high sensitivity and long-term stability with fast response time. It was found that resistance across gold electrodes of the fabricated sensor decreases with increase in humidity almost linearly on a logarithmic scale. It appears that copper-oxide nanowires can be used as low-cost humidity sensor with high output reliability and reproduction rate. The observations were carried out at room temperature (RT) and relative humidity (RH) in the range of 6% to 97%.
△ Less
Submitted 5 August, 2017;
originally announced August 2017.
-
Iterative Spectral Clustering for Unsupervised Object Localization
Authors:
Aditya Vora,
Shanmuganathan Raman
Abstract:
This paper addresses the problem of unsupervised object localization in an image. Unlike previous supervised and weakly supervised algorithms that require bounding box or image level annotations for training classifiers in order to learn features representing the object, we propose a simple yet effective technique for localization using iterative spectral clustering. This iterative spectral cluste…
▽ More
This paper addresses the problem of unsupervised object localization in an image. Unlike previous supervised and weakly supervised algorithms that require bounding box or image level annotations for training classifiers in order to learn features representing the object, we propose a simple yet effective technique for localization using iterative spectral clustering. This iterative spectral clustering approach along with appropriate cluster selection strategy in each iteration naturally helps in searching of object region in the image. In order to estimate the final localization window, we group the proposals obtained from the iterative spectral clustering step based on the perceptual similarity, and average the coordinates of the proposals from the top scoring groups. We benchmark our algorithm on challenging datasets like Object Discovery and PASCAL VOC 2007, achieving an average CorLoc percentage of 51% and 35% respectively which is comparable to various other weakly supervised algorithms despite being completely unsupervised.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Flow-free Video Object Segmentation
Authors:
Aditya Vora,
Shanmuganathan Raman
Abstract:
Segmenting foreground object from a video is a challenging task because of the large deformations of the objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient approach for video object segmentation by clustering visually similar generic object segments throughout the video. Our algorithm segments various object instances appearing in…
▽ More
Segmenting foreground object from a video is a challenging task because of the large deformations of the objects, occlusions, and background clutter. In this paper, we propose a frame-by-frame but computationally efficient approach for video object segmentation by clustering visually similar generic object segments throughout the video. Our algorithm segments various object instances appearing in the video and then perform clustering in order to group visually similar segments into one cluster. Since the object that needs to be segmented appears in most part of the video, we can retrieve the foreground segments from the cluster having maximum number of segments, thus filtering out noisy segments that do not represent any object. We then apply a track and fill approach in order to localize the objects in the frames where the object segmentation framework fails to segment any object. Our algorithm performs comparably to the recent automatic methods for video object segmentation when benchmarked on DAVIS dataset while being computationally much faster.
△ Less
Submitted 28 June, 2017;
originally announced June 2017.
-
Multi-resonant silver nano-disk patterned thin film hydrogenated amorphous silicon solar cells for Staebler-Wronski effect compensation
Authors:
Ankit Vora,
Jephias Gwamuri,
Joshua M. Pearce,
Paul L. Bergstrom,
Durdu Ö. Güney
Abstract:
We study polarization independent improved light trap** in commercial thin film hydrogenated amorphous silicon (a-Si:H) solar photovoltaic cells using a three-dimensional silver array of multi-resonant nano-disk structures embedded in a silicon nitride anti-reflection coating (ARC) to enhance optical absorption in the intrinsic layer (i-a-Si:H) for the visible spectrum for any polarization angle…
▽ More
We study polarization independent improved light trap** in commercial thin film hydrogenated amorphous silicon (a-Si:H) solar photovoltaic cells using a three-dimensional silver array of multi-resonant nano-disk structures embedded in a silicon nitride anti-reflection coating (ARC) to enhance optical absorption in the intrinsic layer (i-a-Si:H) for the visible spectrum for any polarization angle. Predicted total optical enhancement (OE) in absorption in the i-a-Si:H for AM-1.5 solar spectrum is 18.51% as compared to the reference, and producing a 19.65% improvement in short-circuit current density (JSC) over 11.7 mA/cm2 for a reference cell. The JSC in the nano-disk patterned solar cell (NDPSC) was found to be higher than the commercial reference structure for any incident angle. The NDPSC has a multi-resonant optical response for the visible spectrum and the associated mechanism for OE in i-a-Si:H layer is excitation of Fabry-Perot resonance facilitated by surface plasmon resonances. The detrimental Staebler-Wronski effect (SWE) in a-Si:H solar cell can be minimized by the additional OE in the NDPSC and self-annealing of defect states by additional heat generation, thus likely improving the overall stabilized characteristics of a-Si:H solar cells.
△ Less
Submitted 7 September, 2014;
originally announced September 2014.
-
Exchanging Ohmic Losses in Metamaterial Absorbers with Useful Optical Absorption for Photovoltaics
Authors:
Ankit Vora,
Jephias Gwamuri,
Nezih Pala,
Anand Kulkarni,
Joshua M. Pearce,
Durdu Ö. Güney
Abstract:
Using metamaterial absorbers, we have shown that metallic layers in the absorbers do not necessarily constitute undesired resistive heating problem for photovoltaics. Tailoring the geometric skin depth of metals and employing the natural bulk absorbance characteristics of the semiconductors in those absorbers can enable the exchange of undesired resistive losses with the useful optical absorbance…
▽ More
Using metamaterial absorbers, we have shown that metallic layers in the absorbers do not necessarily constitute undesired resistive heating problem for photovoltaics. Tailoring the geometric skin depth of metals and employing the natural bulk absorbance characteristics of the semiconductors in those absorbers can enable the exchange of undesired resistive losses with the useful optical absorbance in the active semiconductors. Thus, Ohmic loss dominated metamaterial absorbers can be converted into photovoltaic near-perfect absorbers with the advantage of harvesting the full potential of light management offered by the metamaterial absorbers. Based on experimental permittivity data for indium gallium nitride, we have shown that between 75%-95% absorbance can be achieved in the semiconductor layers of the converted metamaterial absorbers. Besides other metamaterial and plasmonic devices, our results may also apply to photodectors and other metal or semiconductor based optical devices where resistive losses and power consumption are important pertaining to the device performance.
△ Less
Submitted 28 April, 2014;
originally announced April 2014.
-
Universe Detectors for Sybil Defense in Ad Hoc Wireless Networks
Authors:
Adnan Vora,
Mikhail Nesterenko,
Sébastien Tixeuil,
Sylvie Delaët
Abstract:
The Sybil attack in unknown port networks such as wireless is not considered tractable. A wireless node is not capable of independently differentiating the universe of real nodes from the universe of arbitrary non-existent fictitious nodes created by the attacker. Similar to failure detectors, we propose to use universe detectors to help nodes determine which universe is real. In this paper, we…
▽ More
The Sybil attack in unknown port networks such as wireless is not considered tractable. A wireless node is not capable of independently differentiating the universe of real nodes from the universe of arbitrary non-existent fictitious nodes created by the attacker. Similar to failure detectors, we propose to use universe detectors to help nodes determine which universe is real. In this paper, we (i) define several variants of the neighborhood discovery problem under Sybil attack (ii) propose a set of matching universe detectors (iii) demonstrate the necessity of additional topological constraints for the problems to be solvable: node density and communication range; (iv) present SAND -- an algorithm that solves these problems with the help of appropriate universe detectors, this solution demonstrates that the proposed universe detectors are the weakest detectors possible for each problem.
△ Less
Submitted 13 May, 2008; v1 submitted 1 May, 2008;
originally announced May 2008.
-
Void Traversal for Guaranteed Delivery in Geometric Routing
Authors:
Mikhail Nesterenko,
Adnan Vora
Abstract:
Geometric routing algorithms like GFG (GPSR) are lightweight, scalable algorithms that can be used to route in resource-constrained ad hoc wireless networks. However, such algorithms run on planar graphs only. To efficiently construct a planar graph, they require a unit-disk graph. To make the topology unit-disk, the maximum link length in the network has to be selected conservatively. In practi…
▽ More
Geometric routing algorithms like GFG (GPSR) are lightweight, scalable algorithms that can be used to route in resource-constrained ad hoc wireless networks. However, such algorithms run on planar graphs only. To efficiently construct a planar graph, they require a unit-disk graph. To make the topology unit-disk, the maximum link length in the network has to be selected conservatively. In practical setting this leads to the designs where the node density is rather high. Moreover, the network diameter of a planar subgraph is greater than the original graph, which leads to longer routes. To remedy this problem, we propose a void traversal algorithm that works on arbitrary geometric graphs. We describe how to use this algorithm for geometric routing with guaranteed delivery and compare its performance with GFG.
△ Less
Submitted 25 March, 2008;
originally announced March 2008.