Search | arXiv e-print repository

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

Authors: Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick

Abstract: A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets of internet videos. In this paper, we propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations o… ▽ More A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets of internet videos. In this paper, we propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations of a given task. At test time, we generate an example of an execution of the task conditioned on images of a novel scene, and use this synthesized execution directly to control the robot. Our key insight is that using common tools allows us to effortlessly bridge the embodiment gap between the human hand and the robot manipulator. We evaluate our approach on four tasks of increasing complexity and demonstrate that harnessing internet-scale generative models allows the learned policy to achieve a significantly higher degree of generalization than existing behavior cloning approaches. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Project page: https://dreamitate.cs.columbia.edu/

arXiv:2406.12036 [pdf, other]

MedCalc-Bench: Evaluating Large Language Models for Medical Calculations

Authors: Nikhil Khandekar, Qiao **, Guangzhi Xiong, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid A Anwar, Andrew Zhang, Aidan Gilson, Maxwell B Singer, Amisha Dave, Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu

Abstract: As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative e… ▽ More As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative equations and rule-based reasoning paradigms for evidence-based decision support. To this end, we propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained. While our evaluation results show the potential of LLMs in this area, none of them are effective enough for clinical settings. Common issues include extracting the incorrect entities, not using the correct equation or rules for a calculation task, or incorrectly performing the arithmetic for the computation. We hope our study highlights the quantitative knowledge and reasoning gaps in LLMs within medical settings, encouraging future improvements of LLMs for various clinical calculation tasks. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Github link: https://github.com/ncbi-nlp/MedCalc-Bench HuggingFace link: https://huggingface.co/datasets/nsk7153/MedCalc-Bench

arXiv:2406.11794 [pdf, other]

DataComp-LM: In search of the next generation of training sets for language models

Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Project page: https://www.datacomp.ai/dclm/

arXiv:2406.10212 [pdf, other]

NeST: Neural Stress Tensor Tomography by leveraging 3D Photoelasticity

Authors: Akshat Dave, Tianyi Zhang, Aaron Young, Ramesh Raskar, Wolfgang Heidrich, Ashok Veeraraghavan

Abstract: Photoelasticity enables full-field stress analysis in transparent objects through stress-induced birefringence. Existing techniques are limited to 2D slices and require destructively slicing the object. Recovering the internal 3D stress distribution of the entire object is challenging as it involves solving a tensor tomography problem and handling phase wrap** ambiguities. We introduce NeST, an… ▽ More Photoelasticity enables full-field stress analysis in transparent objects through stress-induced birefringence. Existing techniques are limited to 2D slices and require destructively slicing the object. Recovering the internal 3D stress distribution of the entire object is challenging as it involves solving a tensor tomography problem and handling phase wrap** ambiguities. We introduce NeST, an analysis-by-synthesis approach for reconstructing 3D stress tensor fields as neural implicit representations from polarization measurements. Our key insight is to jointly handle phase unwrap** and tensor tomography using a differentiable forward model based on Jones calculus. Our non-linear model faithfully matches real captures, unlike prior linear approximations. We develop an experimental multi-axis polariscope setup to capture 3D photoelasticity and experimentally demonstrate that NeST reconstructs the internal stress distribution for objects with varying shape and force conditions. Additionally, we showcase novel applications in stress analysis, such as visualizing photoelastic fringes by virtually slicing the object and viewing photoelastic fringes from unseen viewpoints. NeST paves the way for scalable non-destructive 3D photoelastic analysis. △ Less

Submitted 24 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: Project webpage: https://akshatdave.github.io/nest

arXiv:2405.17599 [pdf, other]

A Mobility Equity Metric for Multi-Modal Intelligent Transportation Systems

Authors: Heeseung Bang, Aditya Dave, Filippos N. Tzortzoglou, Andreas A. Malikopoulos

Abstract: In this paper, we introduce a metric to evaluate the equity in mobility and a routing framework to enhance the metric within multi-modal intelligent transportation systems. The mobility equity metric (MEM) simultaneously accounts for service accessibility and transportation costs to quantify the equity and fairness in a transportation network. Finally, we develop a system planner integrated with M… ▽ More In this paper, we introduce a metric to evaluate the equity in mobility and a routing framework to enhance the metric within multi-modal intelligent transportation systems. The mobility equity metric (MEM) simultaneously accounts for service accessibility and transportation costs to quantify the equity and fairness in a transportation network. Finally, we develop a system planner integrated with MEM that aims to distribute travel demand for the transportation network, resulting in a socially optimal mobility system. Our framework results in a transportation network that is efficient in terms of travel time, improves accessibility, and ensures equity in transportation. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 6 pages, 7 figures

arXiv:2405.14868 [pdf, other]

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Authors: Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick

Abstract: Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this pape… ▽ More Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessitating careful recording setups, and significantly restricting their utility in the wild as well as in terms of embodied AI applications. In this paper, we propose $\textbf{GCD}$, a controllable monocular dynamic view synthesis pipeline that leverages large-scale diffusion priors to, given a video of any scene, generate a synchronous video from any other chosen perspective, conditioned on a set of relative camera pose parameters. Our model does not require depth as input, and does not explicitly model 3D scene geometry, instead performing end-to-end video-to-video translation in order to achieve its goal efficiently. Despite being trained on synthetic multi-view video data only, zero-shot real-world generalization experiments show promising results in multiple domains, including robotics, object permanence, and driving environments. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Project webpage is available at: https://gcd.cs.columbia.edu/

arXiv:2405.06640 [pdf, other]

Linearizing Large Language Models

Authors: Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar

Abstract: Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by pr… ▽ More Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by proposing novel time-mixing and gating architectures, but pre-training large language models requires significant data and compute investments. Thus, the search for subquadratic architectures is limited by the availability of compute and quality pre-training datasets. As a cost-effective alternative to pre-training linear transformers, we propose Scalable UPtraining for Recurrent Attention (SUPRA). We present a method to uptrain existing large pre-trained transformers into Recurrent Neural Networks (RNNs) with a modest compute budget. This allows us to leverage the strong pre-training data and performance of existing transformer LLMs, while requiring 5% of the training cost. We find that our linearization technique leads to competitive performance on standard benchmarks, but we identify persistent in-context learning and long-context modeling shortfalls for even the largest linear models. Our code and models can be found at https://github.com/TRI-ML/linear_open_lm. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.11511 [pdf, other]

Event Cameras Meet SPADs for High-Speed, Low-Bandwidth Imaging

Authors: Manasi Muglikar, Siddharth Somasundaram, Akshat Dave, Edoardo Charbon, Ramesh Raskar, Davide Scaramuzza

Abstract: Traditional cameras face a trade-off between low-light performance and high-speed imaging: longer exposure times to capture sufficient light results in motion blur, whereas shorter exposures result in Poisson-corrupted noisy images. While burst photography techniques help mitigate this tradeoff, conventional cameras are fundamentally limited in their sensor noise characteristics. Event cameras and… ▽ More Traditional cameras face a trade-off between low-light performance and high-speed imaging: longer exposure times to capture sufficient light results in motion blur, whereas shorter exposures result in Poisson-corrupted noisy images. While burst photography techniques help mitigate this tradeoff, conventional cameras are fundamentally limited in their sensor noise characteristics. Event cameras and single-photon avalanche diode (SPAD) sensors have emerged as promising alternatives to conventional cameras due to their desirable properties. SPADs are capable of single-photon sensitivity with microsecond temporal resolution, and event cameras can measure brightness changes up to 1 MHz with low bandwidth requirements. We show that these properties are complementary, and can help achieve low-light, high-speed image reconstruction with low bandwidth requirements. We introduce a sensor fusion framework to combine SPADs with event cameras to improves the reconstruction of high-speed, low-light scenes while reducing the high bandwidth cost associated with using every SPAD frame. Our evaluation, on both synthetic and real sensor data, demonstrates significant enhancements ( > 5 dB PSNR) in reconstructing low-light scenes at high temporal resolution (100 kHz) compared to conventional cameras. Event-SPAD fusion shows great promise for real-world applications, such as robotics or medical imaging. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.13199 [pdf, other]

DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images

Authors: Zaid Tasneem, Akshat Dave, Abhishek Singh, Kushagra Tiwary, Praneeth Vepakomma, Ashok Veeraraghavan, Ramesh Raskar

Abstract: Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractab… ▽ More Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractable. Our approach, DecentNeRF, is the first attempt at decentralized, crowd-sourced NeRFs that require $\sim 10^4\times$ less server computing for a scene than a centralized approach. Instead of sending the raw data, our approach requires users to send a 3D representation, distributing the high computation cost of training centralized NeRFs between the users. It learns photorealistic scene representations by decomposing users' 3D views into personal and global NeRFs and a novel optimally weighted aggregation of only the latter. We validate the advantage of our approach to learn NeRFs with photorealism and minimal server computation cost on structured synthetic and real-world photo tourism datasets. We further analyze how secure aggregation of global NeRFs in DecentNeRF minimizes the undesired reconstruction of personal content by the server. △ Less

Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.08540 [pdf, other]

Language models scale reliably with over-training and on downstream tasks

Authors: Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Luca Soldaini, Alexandros G. Dimakis, Gabriel Ilharco, Pang Wei Koh, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

Abstract: Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contr… ▽ More Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contrast, models are often over-trained to reduce inference costs. Moreover, scaling laws mostly predict loss on next-token prediction, but models are usually compared on downstream task performance. To address both shortcomings, we create a testbed of 104 models with 0.011B to 6.9B parameters trained with various numbers of tokens on three data distributions. First, we fit scaling laws that extrapolate in both the amount of over-training and the number of model parameters. This enables us to predict the validation loss of a 1.4B parameter, 900B token run (i.e., 32$\times$ over-trained) and a 6.9B parameter, 138B token run (i.e., a compute-optimal run)$\unicode{x2014}$each from experiments that take 300$\times$ less compute. Second, we relate the perplexity of a language model to its downstream task performance by proposing a power law. We use this law to predict top-1 error averaged over downstream tasks for the two aforementioned models, using experiments that take 20$\times$ less compute. Our experiments are available at https://github.com/mlfoundations/scaling. △ Less

Submitted 14 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05742 [pdf, other]

Safe Merging in Mixed Traffic with Confidence

Authors: Heeseung Bang, Aditya Dave, Andreas A. Malikopoulos

Abstract: In this letter, we present an approach for learning human driving behavior, without relying on specific model structures or prior distributions, in a mixed-traffic environment where connected and automated vehicles (CAVs) coexist with human-driven vehicles (HDVs). We employ conformal prediction to obtain theoretical safety guarantees and use real-world traffic data to validate our approach. Then,… ▽ More In this letter, we present an approach for learning human driving behavior, without relying on specific model structures or prior distributions, in a mixed-traffic environment where connected and automated vehicles (CAVs) coexist with human-driven vehicles (HDVs). We employ conformal prediction to obtain theoretical safety guarantees and use real-world traffic data to validate our approach. Then, we design a controller that ensures effective merging of CAVs with HDVs with safety guarantees. We provide numerical simulations to illustrate the efficacy of the control approach. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures

arXiv:2403.05715 [pdf, other]

A Framework for Effective AI Recommendations in Cyber-Physical-Human Systems

Authors: Aditya Dave, Heeseung Bang, Andreas A. Malikopoulos

Abstract: Many cyber-physical-human systems (CPHS) involve a human decision-maker who may receive recommendations from an artificial intelligence (AI) platform while holding the ultimate responsibility of making decisions. In such CPHS applications, the human decision-maker may depart from an optimal recommended decision and instead implement a different one for various reasons. In this letter, we develop a… ▽ More Many cyber-physical-human systems (CPHS) involve a human decision-maker who may receive recommendations from an artificial intelligence (AI) platform while holding the ultimate responsibility of making decisions. In such CPHS applications, the human decision-maker may depart from an optimal recommended decision and instead implement a different one for various reasons. In this letter, we develop a rigorous framework to overcome this challenge. In our framework, we consider that humans may deviate from AI recommendations as they perceive and interpret the system's state in a different way than the AI platform. We establish the structural properties of optimal recommendation strategies and develop an approximate human model (AHM) used by the AI. We provide theoretical bounds on the optimality gap that arises from an AHM and illustrate the efficacy of our results in a numerical example. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.06695 [pdf, other]

Integrating LLMs for Explainable Fault Diagnosis in Complex Systems

Authors: Akshay J. Dave, Tat Nghia Nguyen, Richard B. Vilim

Abstract: This paper introduces an integrated system designed to enhance the explainability of fault diagnostics in complex systems, such as nuclear power plants, where operator understanding is critical for informed decision-making. By combining a physics-based diagnostic tool with a Large Language Model, we offer a novel solution that not only identifies faults but also provides clear, understandable expl… ▽ More This paper introduces an integrated system designed to enhance the explainability of fault diagnostics in complex systems, such as nuclear power plants, where operator understanding is critical for informed decision-making. By combining a physics-based diagnostic tool with a Large Language Model, we offer a novel solution that not only identifies faults but also provides clear, understandable explanations of their causes and implications. The system's efficacy is demonstrated through application to a molten salt facility, showcasing its ability to elucidate the connections between diagnosed faults and sensor data, answer operator queries, and evaluate historical sensor anomalies. Our approach underscores the importance of merging model-based diagnostics with advanced AI to improve the reliability and transparency of autonomous systems. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 4 pages

arXiv:2401.14398 [pdf, other]

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Authors: Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick

Abstract: We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, incl… ▽ More We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Website: https://gestalt.cs.columbia.edu/

arXiv:2401.13020 [pdf, other]

A Safe Reinforcement Learning Algorithm for Supervisory Control of Power Plants

Authors: Yixuan Sun, Sami Khairy, Richard B. Vilim, Rui Hu, Akshay J. Dave

Abstract: Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-e… ▽ More Traditional control theory-based methods require tailored engineering for each system and constant fine-tuning. In power plant control, one often needs to obtain a precise representation of the system dynamics and carefully design the control scheme accordingly. Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks due to its ability to learn from trial-and-error interactions with the environment. It eliminates the need for explicitly modeling the environment's dynamics, which is potentially inaccurate. However, the direct imposition of state constraints in power plant control raises challenges for standard RL methods. To address this, we propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control. Our method employs Lagrangian relaxation to convert the constrained optimization problem into an unconstrained objective, where trainable Lagrange multipliers enforce the state constraints. Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.10831 [pdf, other]

Understanding Video Transformers via Universal Concept Discovery

Authors: Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov

Abstract: This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research on concept-based interpretability has concentrated solely on image-level tasks. Comparatively, video models deal wit… ▽ More This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research on concept-based interpretability has concentrated solely on image-level tasks. Comparatively, video models deal with the added temporal dimension, increasing complexity and posing challenges in identifying dynamic concepts over time. In this work, we systematically address these challenges by introducing the first Video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an efficient approach for unsupervised identification of units of video transformer representations - concepts, and ranking their importance to the output of a model. The resulting concepts are highly interpretable, revealing spatio-temporal reasoning mechanisms and object-centric representations in unstructured video models. Performing this analysis jointly over a diverse set of supervised and self-supervised representations, we discover that some of these mechanism are universal in video transformers. Finally, we show that VTCD can be used for fine-grained action recognition and video object segmentation. △ Less

Submitted 10 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: CVPR 2024 (Highlight)

arXiv:2312.16215 [pdf, other]

SUNDIAL: 3D Satellite Understanding through Direct, Ambient, and Complex Lighting Decomposition

Authors: Nikhil Behari, Akshat Dave, Kushagra Tiwary, William Yang, Ramesh Raskar

Abstract: 3D modeling from satellite imagery is essential in areas of environmental science, urban planning, agriculture, and disaster response. However, traditional 3D modeling techniques face unique challenges in the remote sensing context, including limited multi-view baselines over extensive regions, varying direct, ambient, and complex illumination conditions, and time-varying scene changes across capt… ▽ More 3D modeling from satellite imagery is essential in areas of environmental science, urban planning, agriculture, and disaster response. However, traditional 3D modeling techniques face unique challenges in the remote sensing context, including limited multi-view baselines over extensive regions, varying direct, ambient, and complex illumination conditions, and time-varying scene changes across captures. In this work, we introduce SUNDIAL, a comprehensive approach to 3D reconstruction of satellite imagery using neural radiance fields. We jointly learn satellite scene geometry, illumination components, and sun direction in this single-model approach, and propose a secondary shadow ray casting technique to 1) improve scene geometry using oblique sun angles to render shadows, 2) enable physically-based disentanglement of scene albedo and illumination, and 3) determine the components of illumination from direct, ambient (sky), and complex sources. To achieve this, we incorporate lighting cues and geometric priors from remote sensing literature in a neural rendering approach, modeling physical properties of satellite scenes such as shadows, scattered sky illumination, and complex illumination and shading of vegetation and water. We evaluate the performance of SUNDIAL against existing NeRF-based techniques for satellite scene modeling and demonstrate improved scene and lighting disentanglement, novel view and lighting rendering, and geometry and sun direction estimation on challenging scenes with small baselines, sparse inputs, and variable illumination. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 8 pages, 6 figures

arXiv:2312.14184 [pdf]

Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning

Authors: Xiaodan Zhang, Sandeep Vemulapalli, Nabasmita Talukdar, Sumyeong Ahn, Jiankun Wang, Han Meng, Sardar Mehtab Bin Murtaza, Aakash Ajay Dave, Dmitry Leshchiner, Dimitri F. Joseph, Martin Witteveen-Lane, Dave Chesla, Jiayu Zhou, Bin Chen

Abstract: This study assesses the ability of state-of-the-art large language models (LLMs) including GPT-3.5, GPT-4, Falcon, and LLaMA 2 to identify patients with mild cognitive impairment (MCI) from discharge summaries and examines instances where the models' responses were misaligned with their reasoning. Utilizing the MIMIC-IV v2.2 database, we focused on a cohort aged 65 and older, verifying MCI diagnos… ▽ More This study assesses the ability of state-of-the-art large language models (LLMs) including GPT-3.5, GPT-4, Falcon, and LLaMA 2 to identify patients with mild cognitive impairment (MCI) from discharge summaries and examines instances where the models' responses were misaligned with their reasoning. Utilizing the MIMIC-IV v2.2 database, we focused on a cohort aged 65 and older, verifying MCI diagnoses against ICD codes and expert evaluations. The data was partitioned into training, validation, and testing sets in a 7:2:1 ratio for model fine-tuning and evaluation, with an additional metastatic cancer dataset from MIMIC III used to further assess reasoning consistency. GPT-4 demonstrated superior interpretative capabilities, particularly in response to complex prompts, yet displayed notable response-reasoning inconsistencies. In contrast, open-source models like Falcon and LLaMA 2 achieved high accuracy but lacked explanatory reasoning, underscoring the necessity for further research to optimize both performance and interpretability. The study emphasizes the significance of prompt engineering and the need for further exploration into the unexpected reasoning-response misalignment observed in GPT-4. The results underscore the promise of incorporating LLMs into healthcare diagnostics, contingent upon methodological advancements to ensure accuracy and clinical coherence of AI-generated outputs, thereby improving the trustworthiness of LLMs for medical decision-making. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.12433 [pdf, other]

TAO-Amodal: A Benchmark for Tracking Any Object Amodally

Authors: Cheng-Yen Hsieh, Kaihua Chen, Achal Dave, Tarasha Khurana, Deva Ramanan

Abstract: Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \… ▽ More Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \textit{modal} annotations in most benchmarks. To address the scarcity of amodal benchmarks, we introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences. Our dataset includes \textit{amodal} and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame. We investigate the current lay of the land in both amodal tracking and detection by benchmarking state-of-the-art modal trackers and amodal segmentation methods. We find that existing methods, even when adapted for amodal tracking, struggle to detect and track objects under heavy occlusion. To mitigate this, we explore simple finetuning schemes that can increase the amodal tracking and detection metrics of occluded objects by 2.1\% and 3.3\%. △ Less

Submitted 2 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Project Page: https://tao-amodal.github.io

arXiv:2311.16588 [pdf]

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Authors: Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D. L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li

Abstract: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing f… ▽ More This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/MedGen. △ Less

Submitted 9 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 5 figures, 4 tables

arXiv:2311.03666 [pdf, other]

Stochastic Control with Distributionally Robust Constraints for Cyber-Physical Systems Vulnerable to Attacks

Authors: Nishanth Venkatesh, Aditya Dave, Ioannis Faros, Andreas A. Malikopoulos

Abstract: In this paper, we investigate the control of a cyber-physical system (CPS) while accounting for its vulnerability to external attacks. We formulate a constrained stochastic problem with a robust constraint to ensure robust operation against potential attacks. We seek to minimize the expected cost subject to a constraint limiting the worst-case expected damage an attacker can impose on the CPS. We… ▽ More In this paper, we investigate the control of a cyber-physical system (CPS) while accounting for its vulnerability to external attacks. We formulate a constrained stochastic problem with a robust constraint to ensure robust operation against potential attacks. We seek to minimize the expected cost subject to a constraint limiting the worst-case expected damage an attacker can impose on the CPS. We present a dynamic programming decomposition to compute the optimal control strategy in this robust-constrained formulation and prove its recursive feasibility. We also illustrate the utility of our results by applying them to a numerical simulation. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 Figures with 3 sub-figures each, submitted to the ECC 2024 conference for review

arXiv:2310.06992 [pdf, other]

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

Authors: Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki

Abstract: Object tracking is central to robot perception and scene understanding. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This begs the question: can we re-purpose these large-scale pre-trained… ▽ More Object tracking is central to robot perception and scene understanding. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances in detecting and segmenting objects and parts in 2D static images in the wild. This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking? In this paper, we re-purpose an open-vocabulary detector, segmenter, and dense optical flow estimator, into a model that tracks and segments objects of any category in 2D videos. Our method predicts object and part tracks with associated language descriptions in monocular videos, rebuilding the pipeline of Tractor with modern large pre-trained models for static image detection and segmentation: we detect open-vocabulary object instances and propagate their boxes from frame to frame using a flow-based motion model, refine the propagated boxes with the box regression module of the visual detector, and prompt an open-world segmenter with the refined box to segment the objects. We decide the termination of an object track based on the objectness score of the propagated boxes, as well as forward-backward optical flow consistency. We re-identify objects across occlusions using deep feature matching. We show that our model achieves strong performance on multiple established video object segmentation and tracking benchmarks, and can produce reasonable tracks in manipulation data. In particular, our model outperforms previous state-of-the-art in UVO and BURST, benchmarks for open-world object tracking and segmentation, despite never being explicitly trained for tracking. We hope that our approach can serve as a simple and extensible framework for future research. △ Less

Submitted 25 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Project page available at https://wenhsuanchu.github.io/ovtracktor/

arXiv:2310.03047 [pdf, other]

Differentiable Modeling and Optimization of Battery Electrolyte Mixtures Using Geometric Deep Learning

Authors: Shang Zhu, Bharath Ramsundar, Emil Annevelink, Hongyi Lin, Adarsh Dave, Pin-Wen Guan, Kevin Gering, Venkatasubramanian Viswanathan

Abstract: Electrolytes play a critical role in designing next-generation battery systems, by allowing efficient ion transfer, preventing charge transfer, and stabilizing electrode-electrolyte interfaces. In this work, we develop a differentiable geometric deep learning (GDL) model for chemical mixtures, DiffMix, which is applied in guiding robotic experimentation and optimization towards fast-charging batte… ▽ More Electrolytes play a critical role in designing next-generation battery systems, by allowing efficient ion transfer, preventing charge transfer, and stabilizing electrode-electrolyte interfaces. In this work, we develop a differentiable geometric deep learning (GDL) model for chemical mixtures, DiffMix, which is applied in guiding robotic experimentation and optimization towards fast-charging battery electrolytes. In particular, we extend mixture thermodynamic and transport laws by creating GDL-learnable physical coefficients. We evaluate our model with mixture thermodynamics and ion transport properties, where we show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, with a robotic experimentation setup, Clio, we improve ionic conductivity of electrolytes by over 18.8% within 10 experimental steps, via differentiable optimization built on DiffMix gradients. By combining GDL, mixture physics laws, and robotic experimentation, DiffMix expands the predictive modeling methods for chemical mixtures and enables efficient optimization in large chemical spaces. △ Less

Submitted 1 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.12211 [pdf, other]

Physics-informed State-space Neural Networks for Transport Phenomena

Authors: Akshay J. Dave, Richard B. Vilim

Abstract: This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address… ▽ More This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address this issue by training deep neural networks with sensor data and physics-informing using components' Partial Differential Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable forward dynamics model. Through two in silico experiments -- a heated channel and a cooling system loop -- we demonstrate that PSMs offer a more accurate approach than a purely data-driven model. In the former experiment, PSMs demonstrated significantly lower average root-mean-square errors across test datasets compared to a purely data-driven neural network, with reductions of 44 %, 48 %, and 94 % in predicting pressure, velocity, and temperature, respectively. Beyond accuracy, PSMs demonstrate a compelling multitask capability, making them highly versatile. In this work, we showcase two: supervisory control of a nonlinear system through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates PSMs' ability to handle constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further posit that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems. △ Less

Submitted 18 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 19 pages, 13 figures

arXiv:2309.06519 [pdf, other]

doi 10.1109/LCSYS.2023.3339591

A Q-learning Approach for Adherence-Aware Recommendations

Authors: Ioannis Faros, Aditya Dave, Andreas A. Malikopoulos

Abstract: In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an "adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the "adherence level" that captures the frequency with wh… ▽ More In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. In this letter, we develop an "adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the "adherence level" that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation policy in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Journal ref: IEEE Control Systems Letters (L-CSS), 2024

arXiv:2309.03981 [pdf, other]

Routing in Mixed Transportation Systems for Mobility Equity

Authors: Heeseung Bang, Aditya Dave, Andreas A. Malikopoulos

Abstract: This letter proposes a routing framework in mixed transportation systems for improving mobility equity. We present a strategic routing game that governs interactions between compliant and noncompliant vehicles, where noncompliant vehicles are modeled with cognitive hierarchy theory. Then, we introduce a mobility equity metric (MEM) to quantify the accessibility and fairness in the transportation n… ▽ More This letter proposes a routing framework in mixed transportation systems for improving mobility equity. We present a strategic routing game that governs interactions between compliant and noncompliant vehicles, where noncompliant vehicles are modeled with cognitive hierarchy theory. Then, we introduce a mobility equity metric (MEM) to quantify the accessibility and fairness in the transportation network. We integrate the MEM into the routing framework to optimize it with adjustable weights for different transportation modes. The proposed approach bridges the gap between technological advancements and societal goals in mixed transportation systems to enhance efficiency and equity. We provide numerical examples and analysis of the results. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 6 pages, 5 figures

arXiv:2309.01238 [pdf, other]

Performance-Sensitive Potential Functions for Efficient Flow of Connected and Automated Vehicles

Authors: Filippos N. Tzortzoglou, Dionysios Theodosis, Aditya Dave, Andreas Malikopoulos

Abstract: Connected and automated vehicles (CAVs) provide the most intriguing opportunity for enabling users to monitor transportation network conditions and make better decisions for improving safety and transportation efficiency. In this paper, we address the problem of effectively coordinating CAVs on lane-based roadways. Our approach utilizes potential functions to generate repulsive forces between CAVs… ▽ More Connected and automated vehicles (CAVs) provide the most intriguing opportunity for enabling users to monitor transportation network conditions and make better decisions for improving safety and transportation efficiency. In this paper, we address the problem of effectively coordinating CAVs on lane-based roadways. Our approach utilizes potential functions to generate repulsive forces between CAVs that ensure collision avoidance. However, such potential functions can lead to unrealistic acceleration profiles and large inter-vehicle distances. The primary contribution of this work is the introduction of performance-sensitive potential functions to address these challenges. In our approach, the parameters of a potential function are determined through an optimization problem aiming to reduce both acceleration and inter-vehicle distances. To circumvent the computational implications due to the complexity of the resulting optimization problem that prevents the derivation of a real-time solution, we train a neural network model to learn the map** of initial conditions to optimal parameters derived offline. Then, we prove sufficient criteria for the sampled-data model to ensure that the neural network output does not activate any of the state and safety constraints. Finally, we provide simulation results to demonstrate the effectiveness of the proposed approach. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2306.04616 [pdf]

doi 10.1088/2053-1583/acfa0f

Dielectric breakdown and sub-wavelength patterning of monolayer hexagonal boron nitride using femtosecond pulses

Authors: Sabeeh Irfan Ahmad, Emmanuel Sarpong, Arpit Dave, Hsin-Yu Yao, Joel M. Solomon, **g-Kai Jiang, Chih-Wei Luo, Wen-Hao Chang, Tsing-Hua Her

Abstract: Hexagonal boron nitride (hBN) has emerged as a promising two-dimensional (2D) material for many applications in photonics. Although its linear and nonlinear optical properties have been extensively studied, its interaction with high-intensity laser pulses, which is important for high-harmonic generation, fabricating quantum emitters, and maskless patterning of hBN, has not been investigated. Here… ▽ More Hexagonal boron nitride (hBN) has emerged as a promising two-dimensional (2D) material for many applications in photonics. Although its linear and nonlinear optical properties have been extensively studied, its interaction with high-intensity laser pulses, which is important for high-harmonic generation, fabricating quantum emitters, and maskless patterning of hBN, has not been investigated. Here we report the first study of dielectric breakdown in hBN monolayers induced by single femtosecond laser pulses. We show that hBN has the highest breakdown threshold among all existing 2D materials. This enables us to observe clearly for the first time a linear dependence of breakdown threshold on the bandgap energy for 2D materials, demonstrating such a linear dependency is a universal scaling law independent of the dimensionality. We also observe counter-intuitively that hBN, which has a larger bandgap and mechanical strength than quartz, has a lower breakdown threshold. This implies carrier generation in hBN is much more efficient. Furthermore, we demonstrate the clean removal of hBN without damage to the surrounding hBN film or the substrate, indicating that hBN is optically very robust. The ablated features are shown to possess very small edge roughness, which is attributed to its ultrahigh fracture toughness. Finally, we demonstrate femtosecond laser patterning of hBN with sub-wavelength resolution, including an isolated stripe width of 200 nm. Our work advances the knowledge of light-hBN interaction in the strong field regime and firmly establishes femtosecond lasers as novel and promising tools for one-step deterministic patterning of hBN monolayers. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 21 pages in total. 16 pages in the main text, the rest are supplementary. 6 figures in the main text, 5 figures in the supplementary data

Journal ref: https://iopscience.iop.org/article/10.1088/2053-1583/acfa0f

arXiv:2304.11489 [pdf, other]

FVCARE:Formal Verification of Security Primitives in Resilient Embedded SoCs

Authors: Avani Dave, Nilanjan Banerjee, Chintan Patel

Abstract: With the increased utilization, the small embedded and IoT devices have become an attractive target for sophisticated attacks that can exploit the devices security critical information and data in malevolent activities. Secure boot and Remote Attestation (RA) techniques verifies the integrity of the devices software state at boot-time and runtime. Correct implementation and formal verification of… ▽ More With the increased utilization, the small embedded and IoT devices have become an attractive target for sophisticated attacks that can exploit the devices security critical information and data in malevolent activities. Secure boot and Remote Attestation (RA) techniques verifies the integrity of the devices software state at boot-time and runtime. Correct implementation and formal verification of these security primitives provide strong security guarantees and enhance user confidence. The formal verification of these security primitives is considered challenging, as it involves complex hardware software interactions, semantics gaps and requires bit-precise reasoning. To address these challenges, this paper presents FVCARE an end to end system co-verification framework. It also defines the security properties for resilient small embedded systems. FVCARE divides the end to end system co verification problem into two modules: 1) verifying the (bit precise) initial system settings, registers, and access control policies by hardware verification techniques, and 2) verifying the system specification, security properties, and functional correctness using source-level software abstraction of the hardware. The evaluation of proposed techniques on SRACARE based systems demonstrates its efficacy in security co verification. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2304.07389 [pdf, other]

Shape of You: Precise 3D shape estimations for diverse body types

Authors: Rohan Sarkar, Achal Dave, Gerard Medioni, Benjamin Biggs

Abstract: This paper presents Shape of You (SoY), an approach to improve the accuracy of 3D body shape estimation for vision-based clothing recommendation systems. While existing methods have successfully estimated 3D poses, there remains a lack of work in precise shape estimation, particularly for diverse human bodies. To address this gap, we propose two loss functions that can be readily integrated into p… ▽ More This paper presents Shape of You (SoY), an approach to improve the accuracy of 3D body shape estimation for vision-based clothing recommendation systems. While existing methods have successfully estimated 3D poses, there remains a lack of work in precise shape estimation, particularly for diverse human bodies. To address this gap, we propose two loss functions that can be readily integrated into parametric 3D human reconstruction pipelines. Additionally, we propose a test-time optimization routine that further improves quality. Our method improves over the recent SHAPY method by 17.7% on the challenging SSP-3D dataset. We consider our work to be a step towards a more accurate 3D shape estimation system that works reliably on diverse body types and holds promise for practical applications in the fashion industry. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.01308 [pdf, other]

Role of Transients in Two-Bounce Non-Line-of-Sight Imaging

Authors: Siddharth Somasundaram, Akshat Dave, Connor Henley, Ashok Veeraraghavan, Ramesh Raskar

Abstract: The goal of non-line-of-sight (NLOS) imaging is to image objects occluded from the camera's field of view using multiply scattered light. Recent works have demonstrated the feasibility of two-bounce (2B) NLOS imaging by scanning a laser and measuring cast shadows of occluded objects in scenes with two relay surfaces. In this work, we study the role of time-of-flight (ToF) measurements, \ie transie… ▽ More The goal of non-line-of-sight (NLOS) imaging is to image objects occluded from the camera's field of view using multiply scattered light. Recent works have demonstrated the feasibility of two-bounce (2B) NLOS imaging by scanning a laser and measuring cast shadows of occluded objects in scenes with two relay surfaces. In this work, we study the role of time-of-flight (ToF) measurements, \ie transients, in 2B-NLOS under multiplexed illumination. Specifically, we study how ToF information can reduce the number of measurements and spatial resolution needed for shape reconstruction. We present our findings with respect to tradeoffs in (1) temporal resolution, (2) spatial resolution, and (3) number of image captures by studying SNR and recoverability as functions of system parameters. This leads to a formal definition of the mathematical constraints for 2B lidar. We believe that our work lays an analytical groundwork for design of future NLOS imaging systems, especially as ToF sensors become increasingly ubiquitous. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2304.00397 [pdf, other]

Connected and Automated Vehicles in Mixed-Traffic: Learning Human Driver Behavior for Effective On-Ramp Merging

Authors: Nishanth Venkatesh, Viet-Anh Le, Aditya Dave, Andreas A. Malikopoulos

Abstract: Highway merging scenarios featuring mixed traffic conditions pose significant modeling and control challenges for connected and automated vehicles (CAVs) interacting with incoming on-ramp human-driven vehicles (HDVs). In this paper, we present an approach to learn an approximate information state model of CAV-HDV interactions for a CAV to maneuver safely during highway merging. In our approach, th… ▽ More Highway merging scenarios featuring mixed traffic conditions pose significant modeling and control challenges for connected and automated vehicles (CAVs) interacting with incoming on-ramp human-driven vehicles (HDVs). In this paper, we present an approach to learn an approximate information state model of CAV-HDV interactions for a CAV to maneuver safely during highway merging. In our approach, the CAV learns the behavior of an incoming HDV using approximate information states before generating a control strategy to facilitate merging. First, we validate the efficacy of this framework on real-world data by using it to predict the behavior of an HDV in mixed traffic situations extracted from the Next-Generation Simulation repository. Then, we generate simulation data for HDV-CAV interactions in a highway merging scenario using a standard inverse reinforcement learning approach. Without assuming a prior knowledge of the generating model, we show that our approximate information state model learns to predict the future trajectory of the HDV using only observations. Subsequently, we generate safe control policies for a CAV while merging with HDVs, demonstrating a spectrum of driving behaviors, from aggressive to conservative. We demonstrate the effectiveness of the proposed approach by performing numerical simulations. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.17790 [pdf, ps, other]

A Study of an Atomic Mobility Game With Uncertainty Under Prospect Theory

Authors: Ioannis Vasileios Chremos, Heeseung Bang, Aditya Dave, Viet-Anh Le, Andreas A. Malikopoulos

Abstract: In this paper, we present a study of a mobility game with uncertainty in the decision-making of travelers and incorporate prospect theory to model travel behavior. We formulate a mobility game that models how travelers distribute their traffic flows in a transportation network with splittable traffic, utilizing the Bureau of Public Roads function to establish the relationship between traffic flow… ▽ More In this paper, we present a study of a mobility game with uncertainty in the decision-making of travelers and incorporate prospect theory to model travel behavior. We formulate a mobility game that models how travelers distribute their traffic flows in a transportation network with splittable traffic, utilizing the Bureau of Public Roads function to establish the relationship between traffic flow and travel time cost. Given the inherent non-linearities and complexity introduced by the uncertainties, we propose a smooth approximation function to estimate the prospect-theoretic cost functions. As part of our analysis, we characterize the best-fit parameters and derive an upper bound for the error. We then show the existence of an equilibrium and its its best-possible approximation. △ Less

Submitted 18 March, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2202.07691

arXiv:2303.16321 [pdf, other]

Worst-Case Control and Learning Using Partial Observations Over an Infinite Time-Horizon

Authors: Aditya Dave, Ioannis Faros, Nishanth Venkatesh, Andreas A. Malikopoulos

Abstract: Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued un… ▽ More Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time instance. Our second contribution is defining an approximate information state that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy and illustrate the effectiveness of our approach in partially observed decision-making problems with a numerical example. △ Less

Submitted 31 March, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2301.05089 [pdf, other]

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

Authors: Aditya Dave, Nishanth Venkatesh, Andreas A. Malikopoulos

Abstract: In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate info… ▽ More In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state, and introduce conditions to identify an uncertain variable that can be used to compute an optimal strategy through a dynamic program (DP). Next, we relax these conditions and define approximate information states that can be learned from output data without knowledge of system dynamics. We use approximate information states to formulate a DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples. △ Less

Submitted 5 April, 2024; v1 submitted 12 January, 2023; originally announced January 2023.

Comments: Preliminary results related to this article were reported in arXiv:2203.15271

arXiv:2212.12645 [pdf, other]

HandsOff: Labeled Dataset Generation With No Additional Human Annotations

Authors: Austin Xu, Mariya I. Vasileva, Achal Dave, Arjun Seshadri

Abstract: Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations of synthetic images, which forces practitioners to seek out annotators, curate a set of synthetic images, and ensure the quality of generated labels. We introduce the HandsOff framework, a technique capable of pr… ▽ More Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations of synthetic images, which forces practitioners to seek out annotators, curate a set of synthetic images, and ensure the quality of generated labels. We introduce the HandsOff framework, a technique capable of producing an unlimited number of synthetic images and corresponding labels after being trained on less than 50 pre-existing labeled images. Our framework avoids the practical drawbacks of prior work by unifying the field of GAN inversion with dataset generation. We generate datasets with rich pixel-wise labels in multiple challenging domains such as faces, cars, full-body human poses, and urban driving scenes. Our method achieves state-of-the-art performance in semantic segmentation, keypoint detection, and depth estimation compared to prior dataset generation approaches and transfer learning baselines. We additionally showcase its ability to address broad challenges in model development which stem from fixed, hand-annotated datasets, such as the long-tail problem in semantic segmentation. Project page: austinxu87.github.io/handsoff. △ Less

Submitted 30 March, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: 22 pages, 20 figures. CVPR 2023

arXiv:2212.04531 [pdf, other]

ORCa: Glossy Objects as Radiance Field Cameras

Authors: Kushagra Tiwary, Akshat Dave, Nikhil Behari, Tzofi Klinghoffer, Ashok Veeraraghavan, Ramesh Raskar

Abstract: Reflections on glossy objects contain valuable and hidden information about the surrounding environment. By converting these objects into cameras, we can unlock exciting applications, including imaging beyond the camera's field-of-view and from seemingly impossible vantage points, e.g. from reflections on the human eye. However, this task is challenging because reflections depend jointly on object… ▽ More Reflections on glossy objects contain valuable and hidden information about the surrounding environment. By converting these objects into cameras, we can unlock exciting applications, including imaging beyond the camera's field-of-view and from seemingly impossible vantage points, e.g. from reflections on the human eye. However, this task is challenging because reflections depend jointly on object geometry, material properties, the 3D environment, and the observer viewing direction. Our approach converts glossy objects with unknown geometry into radiance-field cameras to image the world from the object's perspective. Our key insight is to convert the object surface into a virtual sensor that captures cast reflections as a 2D projection of the 5D environment radiance field visible to the object. We show that recovering the environment radiance fields enables depth and radiance estimation from the object to its surroundings in addition to beyond field-of-view novel-view synthesis, i.e. rendering of novel views that are only directly-visible to the glossy object present in the scene, but not the observer. Moreover, using the radiance field we can image around occluders caused by close-by objects in the scene. Our method is trained end-to-end on multi-view images of the object and jointly estimates object geometry, diffuse radiance, and the 5D environment radiance field. △ Less

Submitted 12 December, 2022; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: for more information, see https://ktiwary2.github.io/objectsascam/

arXiv:2211.08691 [pdf, other]

Towards Long-Tailed 3D Detection

Authors: Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong

Abstract: Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale lidar data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, contemporary benchmarks focus on only a few common classes (e.g., pedestrian and car) and neglect many rare classes in-the-tail (e.g., debris and stroller). However, AVs… ▽ More Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale lidar data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, contemporary benchmarks focus on only a few common classes (e.g., pedestrian and car) and neglect many rare classes in-the-tail (e.g., debris and stroller). However, AVs must still detect rare classes to ensure safe operation. Moreover, semantic classes are often organized within a hierarchy, e.g., tail classes such as child and construction-worker are arguably subclasses of pedestrian. However, such hierarchical relationships are often ignored, which may lead to misleading estimates of performance and missed opportunities for algorithmic innovation. We address these challenges by formally studying the problem of Long-Tailed 3D Detection (LT3D), which evaluates on all classes, including those in-the-tail. We evaluate and innovate upon popular 3D detection codebases, such as CenterPoint and PointPillars, adapting them for LT3D. We develop hierarchical losses that promote feature sharing across common-vs-rare classes, as well as improved detection metrics that award partial credit to "reasonable" mistakes respecting the hierarchy (e.g., mistaking a child for an adult). Finally, we point out that fine-grained tail class accuracy is particularly improved via multimodal fusion of RGB images with LiDAR; simply put, small fine-grained classes are challenging to identify from sparse (lidar) geometry alone, suggesting that multimodal cues are crucial to long-tailed 3D detection. Our modifications improve accuracy by 5% AP on average for all classes, and dramatically improve AP for rare classes (e.g., stroller AP improves from 3.6 to 31.6)! Our code is available at https://github.com/neeharperi/LT3D △ Less

Submitted 19 May, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: This work has been accepted to the Conference on Robot Learning (CoRL) 2022

arXiv:2210.01917 [pdf, other]

Differentiable Raycasting for Self-supervised Occupancy Forecasting

Authors: Tarasha Khurana, Peiyun Hu, Achal Dave, Jason Ziglar, David Held, Deva Ramanan

Abstract: Motion planning for safe autonomous driving requires learning how the environment around an ego-vehicle evolves with time. Ego-centric perception of driveable regions in a scene not only changes with the motion of actors in the environment, but also with the movement of the ego-vehicle itself. Self-supervised representations proposed for large-scale planning, such as ego-centric freespace, confoun… ▽ More Motion planning for safe autonomous driving requires learning how the environment around an ego-vehicle evolves with time. Ego-centric perception of driveable regions in a scene not only changes with the motion of actors in the environment, but also with the movement of the ego-vehicle itself. Self-supervised representations proposed for large-scale planning, such as ego-centric freespace, confound these two motions, making the representation difficult to use for downstream motion planners. In this paper, we use geometric occupancy as a natural alternative to view-dependent representations such as freespace. Occupancy maps naturally disentangle the motion of the environment from the motion of the ego-vehicle. However, one cannot directly observe the full 3D occupancy of a scene (due to occlusion), making it difficult to use as a signal for learning. Our key insight is to use differentiable raycasting to "render" future occupancy predictions into future LiDAR sweep predictions, which can be compared with ground-truth sweeps for self-supervised learning. The use of differentiable raycasting allows occupancy to emerge as an internal representation within the forecasting network. In the absence of groundtruth occupancy, we quantitatively evaluate the forecasting of raycasted LiDAR sweeps and show improvements of upto 15 F1 points. For downstream motion planners, where emergent occupancy can be directly used to guide non-driveable regions, this representation relatively reduces the number of collisions with objects by up to 17% as compared to freespace-centric motion planners. △ Less

Submitted 18 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: ECCV 2022. Code available at https://github.com/tarashakhurana/emergent-occ-forecasting

arXiv:2209.13787 [pdf, other]

On Robust Control of Partially Observed Uncertain Systems with Additive Costs

Authors: Aditya Dave, Nishanth Venkatesh, Andreas A. Malikopoulos

Abstract: In this paper, we consider the problem of optimizing the worst-case behavior of a partially observed system. All uncontrolled disturbances are modeled as finite-valued uncertain variables. Using the theory of cost distributions, we present a dynamic programming (DP) approach to compute a control strategy that minimizes the maximum possible total cost over a given time horizon. To improve the compu… ▽ More In this paper, we consider the problem of optimizing the worst-case behavior of a partially observed system. All uncontrolled disturbances are modeled as finite-valued uncertain variables. Using the theory of cost distributions, we present a dynamic programming (DP) approach to compute a control strategy that minimizes the maximum possible total cost over a given time horizon. To improve the computational efficiency of the optimal DP, we introduce a general definition for information states and show that many information states constructed in previous research efforts are special cases of ours. Additionally, we define approximate information states and an approximate DP that can further improve computational tractability by conceding a bounded performance loss. We illustrate the utility of these results using a numerical example. △ Less

Submitted 18 February, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Comments: This article is specializes to additive cost problems the theory and results presented for terminal cost problems in arXiv:2203.15271

arXiv:2209.12118 [pdf, other]

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

Authors: Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal Dave, Bastian Leibe, Deva Ramanan

Abstract: Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a result, published works usually target a particular benchmark, and are not easily comparable to eac… ▽ More Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a result, published works usually target a particular benchmark, and are not easily comparable to each another. We believe that the development of generalized methods that can tackle multiple tasks requires greater cohesion among these research sub-communities. In this paper, we aim to facilitate this by proposing BURST, a dataset which contains thousands of diverse videos with high-quality object masks, and an associated benchmark with six tasks involving object tracking and segmentation in video. All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison, and hence, more effectively pool knowledge from different methods across different tasks. Additionally, we demonstrate several baselines for all tasks and show that approaches for one task can be applied to another with a quantifiable and explainable performance difference. Dataset annotations and evaluation code is available at: https://github.com/Ali2500/BURST-benchmark. △ Less

Submitted 22 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

arXiv:2209.04334 [pdf, other]

doi 10.1016/j.anucene.2022.109593

Design of a Supervisory Control System for Autonomous Operation of Advanced Reactors

Authors: Akshay J. Dave, Taeseung Lee, Roberto Ponciroli, Richard B. Vilim

Abstract: Advanced reactors to be deployed in the coming decades will face deregulated energy markets, and may adopt flexible operation to boost profitability. To aid in the transition from baseload to flexible operation paradigm, autonomous operation is sought. This work focuses on the control aspect of autonomous operation. Specifically, a hierarchical control system is designed to support constraint enfo… ▽ More Advanced reactors to be deployed in the coming decades will face deregulated energy markets, and may adopt flexible operation to boost profitability. To aid in the transition from baseload to flexible operation paradigm, autonomous operation is sought. This work focuses on the control aspect of autonomous operation. Specifically, a hierarchical control system is designed to support constraint enforcement during routine operational transients. Within the system, data-driven modeling, physics-based state observation, and classical control algorithms are integrated to provide an adaptable and robust solution. A 320 MW Fluoride-cooled High-temperature Pebble-bed Reactor is the design basis for demonstrating the control system. The hierarchical control system consists of a supervisory layer and low-level layer. The supervisory layer receives requests to change the system's operating conditions, and accepts or rejects them based on constraints that have been assigned. Constraints are issued to keep the plant within an optimal operating region. The low-level layer interfaces with the actuators of the system to fulfill requested changes, while maintaining tracking and regulation duties. To accept requests at the supervisory layer, the Reference Governor algorithm was adopted. To model the dynamics of the reactor, a system identification algorithm, Dynamic Mode Decomposition, was utilized. To estimate the evolution of process variables that cannot be directly measured, the Unscented Kalman Filter, incorporating a nonlinear model of nuclear dynamics, was adopted. The composition of these algorithms led to a numerical demonstration of constraint enforcement during a 40 % power drop transient. Adaptability was demonstrated by modifying the constraint values, and enforcing them during the transient. Robustness was demonstrated by enforcing constraints under noisy environments. △ Less

Submitted 1 November, 2022; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: 19 pages, 12 figures

arXiv:2207.03928 [pdf, other]

doi 10.1038/s41524-023-01028-1

Accelerating Material Design with the Generative Toolkit for Scientific Discovery

Authors: Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan Nana Teukam, Giorgio Giannone, Samuel C. Hoffman, Matthew Buchan, Vijil Chenthamarakshan, Timothy Donovan, Hsiang Han Hsu, Federico Zipoli, Oliver Schilter, Akihiro Kishimoto, Lisa Hamada, Inkit Padhi, Karl Wehden, Lauren McHugh, Alexy Khrabrov, Payel Das, Seiji Takeda, John R. Smith

Abstract: With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible… ▽ More With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible open-source library enables scientists, developers, and researchers to train and use state-of-the-art generative models to accelerate scientific discovery focused on material design. △ Less

Submitted 31 January, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: 15 pages, 2 figures

Journal ref: Nature Partner Journals (npj) Computational Materials 9, 69 (2023)

arXiv:2205.01397 [pdf, other]

Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

Authors: Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt

Abstract: Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Con… ▽ More Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Concretely, we study five different possible causes for the robustness gains: (i) the training set size, (ii) the training distribution, (iii) language supervision at training time, (iv) language supervision at test time, and (v) the contrastive loss function. Our experiments show that the more diverse training distribution is the main cause for the robustness gains, with the other factors contributing little to no robustness. Beyond our experimental results, we also introduce ImageNet-Captions, a version of ImageNet with original text annotations from Flickr, to enable further controlled experiments of language-image training. △ Less

Submitted 22 August, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2203.15271 [pdf, other]

Approximate Information States for Worst-case Control of Uncertain Systems

Authors: Aditya Dave, Nishanth Venkatesh, Andreas A. Malikopoulos

Abstract: In this paper, we investigate a worst-case-scenario control problem with a partially observed state. We consider a non-stochastic formulation, where noises and disturbances in our dynamics are uncertain variables which take values in finite sets. In such problems, the optimal control strategy can be derived using a dynamic program (DP) with respect to the memory. The computational complexity of th… ▽ More In this paper, we investigate a worst-case-scenario control problem with a partially observed state. We consider a non-stochastic formulation, where noises and disturbances in our dynamics are uncertain variables which take values in finite sets. In such problems, the optimal control strategy can be derived using a dynamic program (DP) with respect to the memory. The computational complexity of this DP can be improved using a conditional range of the state instead of the memory. We present a more general definition of an information state which is sufficient to construct a DP without loss of optimality, and show that the conditional range is an example of an information state. Next, we extend this notion to define an approximate information state and an approximate DP. We also bound the maximum loss of optimality when using an approximate DP to derive the control strategy. Finally, we illustrate our results in a numerical example. △ Less

Submitted 24 September, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Journal ref: Proceedings of 61st IEEE Conference on Decision and Control, pp. 4945-4950, 2022

arXiv:2203.13458 [pdf, other]

PANDORA: Polarization-Aided Neural Decomposition Of Radiance

Authors: Akshat Dave, Yongyi Zhao, Ashok Veeraraghavan

Abstract: Reconstructing an object's geometry and appearance from multiple images, also known as inverse rendering, is a fundamental problem in computer graphics and vision. Inverse rendering is inherently ill-posed because the captured image is an intricate function of unknown lighting conditions, material properties and scene geometry. Recent progress in representing scene properties as coordinate-based n… ▽ More Reconstructing an object's geometry and appearance from multiple images, also known as inverse rendering, is a fundamental problem in computer graphics and vision. Inverse rendering is inherently ill-posed because the captured image is an intricate function of unknown lighting conditions, material properties and scene geometry. Recent progress in representing scene properties as coordinate-based neural networks have facilitated neural inverse rendering resulting in impressive geometry reconstruction and novel-view synthesis. Our key insight is that polarization is a useful cue for neural inverse rendering as polarization strongly depends on surface normals and is distinct for diffuse and specular reflectance. With the advent of commodity, on-chip, polarization sensors, capturing polarization has become practical. Thus, we propose PANDORA, a polarimetric inverse rendering approach based on implicit neural representations. From multi-view polarization images of an object, PANDORA jointly extracts the object's 3D geometry, separates the outgoing radiance into diffuse and specular and estimates the illumination incident on the object. We show that PANDORA outperforms state-of-the-art radiance decomposition techniques. PANDORA outputs clean surface reconstructions free from texture artefacts, models strong specularities accurately and estimates illumination under practical unstructured scenarios. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Project webpage: https://akshatdave.github.io/pandora

arXiv:2202.02094 [pdf, other]

Numerical Demonstration of Multiple Actuator Constraint Enforcement Algorithm for a Molten Salt Loop

Authors: Akshay J. Dave, Haoyu Wang, Roberto Ponciroli, Richard B. Vilim

Abstract: To advance the paradigm of autonomous operation for nuclear power plants, a data-driven machine learning approach to control is sought. Autonomous operation for next-generation reactor designs is anticipated to bolster safety and improve economics. However, any algorithms that are utilized need to be interpretable, adaptable, and robust. In this work, we focus on the specific problem of optimal… ▽ More To advance the paradigm of autonomous operation for nuclear power plants, a data-driven machine learning approach to control is sought. Autonomous operation for next-generation reactor designs is anticipated to bolster safety and improve economics. However, any algorithms that are utilized need to be interpretable, adaptable, and robust. In this work, we focus on the specific problem of optimal control during autonomous operation. We will demonstrate an interpretable and adaptable data-driven machine learning approach to autonomous control of a molten salt loop. To address interpretability, we utilize a data-driven algorithm to identify system dynamics in state-space representation. To address adaptability, a control algorithm will be utilized to modify actuator setpoints while enforcing constant, and time-dependent constraints. Robustness is not addressed in this work, and is part of future work. To demonstrate the approach, we designed a numerical experiment requiring intervention to enforce constraints during a load-follow type transient. △ Less

Submitted 25 February, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: 4 pages, 6 figures. Submitted to 2022 American Nuclear Society Annual Meeting

arXiv:2112.10743 [pdf, other]

doi 10.1063/5.0078054

Ultrafast Multi-Shot Ablation and Defect Generation in Monolayer Transition Metal Dichalcogenides

Authors: Joel M. Solomon, Sabeeh Irfan Ahmad, Arpit Dave, Li-Syuan Lu, Yu-Chen Wu, Wen-Hao Chang, Chih-Wei Luo, Tsing-Hua Her

Abstract: Transition metal dichalcogenides are known to possess large optical nonlinearities and driving these materials at high intensities is desirable for many applications. Understanding their optical responses under repetitive intense excitation is essential to improve the performance limit of these strong-field devices and to achieve efficient laser patterning. Here, we report the incubation study of… ▽ More Transition metal dichalcogenides are known to possess large optical nonlinearities and driving these materials at high intensities is desirable for many applications. Understanding their optical responses under repetitive intense excitation is essential to improve the performance limit of these strong-field devices and to achieve efficient laser patterning. Here, we report the incubation study of monolayer MoS${}_{2}$ and WS${}_{2}$ induced by 160 fs, 800 nm pulses in air to examine how their ablation threshold scales with the number of admitted laser pulses. Both materials were shown to outperform graphene and most bulk materials; specifically, MoS${}_{2}$ is as resistant to radiation degradation as the best of the bulk thin films with a record fast saturation. Our modeling provides convincing evidence that the small reduction in threshold and fast saturation of MoS${}_{2}$ originates in its excellent bonding integrity against radiation-induced softening. Sub-ablation damages, in the forms of vacancies, lattice disorder, and nano-voids, were revealed by transmission electron microscopy, photoluminescence, Raman, and second harmonic generation studies, which were attributed to the observed incubation. For the first time, a sub-ablation damage threshold is identified for monolayer MoS${}_{2}$ to be 78% of single-shot ablation threshold, below which MoS${}_{2}$ remains intact for many laser pulses. Our results firmly establish MoS${}_{2}$ as a robust material for strong-field devices and for high-throughput laser patterning. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Journal ref: AIP Advances 12, 015217 (2022)

arXiv:2112.06280 [pdf, other]

In-Memory Indexed Caching for Distributed Data Processing

Authors: Alexandru Uta, Bogdan Ghit, Ankur Dave, Jan Rellermeyer, Peter Boncz

Abstract: Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache th… ▽ More Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations. Moreover, it supports appends with multi-version concurrency control. We implement the Indexed DataFrame as a lightweight, standalone library which can be integrated with minimum effort in existing Spark programs. We analyze the performance of the Indexed DataFrame in cluster and cloud deployments with real-world datasets and benchmarks using both Apache Spark and Databricks Runtime. In our evaluation, we show that the Indexed DataFrame significantly speeds-up query execution when compared to a non-indexed dataframe, incurring modest memory overhead. △ Less

Submitted 8 February, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

Comments: Accepted for publication at IEEE IPDPS 2022

arXiv:2111.14786 [pdf, other]

doi 10.1038/s41467-022-32938-1

Autonomous optimization of nonaqueous battery electrolytes via robotic experimentation and machine learning

Authors: Adarsh Dave, Jared Mitchell, Sven Burke, Hongyi Lin, Jay Whitacre, Venkatasubramanian Viswanathan

Abstract: In this work, we introduce a novel workflow that couples robotics to machine-learning for efficient optimization of a non-aqueous battery electrolyte. A custom-built automated experiment named "Clio" is coupled to Dragonfly - a Bayesian optimization-based experiment planner. Clio autonomously optimizes electrolyte conductivity over a single-salt, ternary solvent design space. Using this workflow,… ▽ More In this work, we introduce a novel workflow that couples robotics to machine-learning for efficient optimization of a non-aqueous battery electrolyte. A custom-built automated experiment named "Clio" is coupled to Dragonfly - a Bayesian optimization-based experiment planner. Clio autonomously optimizes electrolyte conductivity over a single-salt, ternary solvent design space. Using this workflow, we identify 6 fast-charging electrolytes in 2 work-days and 42 experiments (compared with 60 days using exhaustive search of the 1000 possible candidates, or 6 days assuming only 10% of candidates are evaluated). Our method finds the highest reported conductivity electrolyte in a design space heavily explored by previous literature, converging on a high-conductivity mixture that demonstrates subtle electrolyte chemical physics. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 26 pages, 5 Figures, 7 Extended Data Figures

Showing 1–50 of 85 results for author: Dave, A