-
PostMark: A Robust Blackbox Watermark for Large Language Models
Authors:
Yapei Chang,
Kalpesh Krishna,
Amir Houmansadr,
John Wieting,
Mohit Iyyer
Abstract:
The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. I…
▽ More
The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. In this paper, we develop PostMark, a modular post-hoc watermarking procedure in which an input-dependent set of words (determined via a semantic embedding) is inserted into the text after the decoding process has completed. Critically, PostMark does not require logit access, which means it can be implemented by a third party. We also show that PostMark is more robust to paraphrasing attacks than existing watermarking methods: our experiments cover eight baseline algorithms, five base LLMs, and three datasets. Finally, we evaluate the impact of PostMark on text quality using both automated and human assessments, highlighting the trade-off between quality and robustness to paraphrasing. We release our code, outputs, and annotations at https://github.com/lilakk/PostMark.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
Authors:
Hua Shen,
Tiffany Knearem,
Reshmi Ghosh,
Kenan Alkiek,
Kundan Krishna,
Yachuan Liu,
Ziqiao Ma,
Savvas Petridis,
Yi-Hao Peng,
Li Qiwei,
Sushrita Rakshit,
Chenglei Si,
Yutong Xie,
Jeffrey P. Bigham,
Frank Bentley,
Joyce Chai,
Zachary Lipton,
Qiaozhu Mei,
Rada Mihalcea,
Michael Terry,
Diyi Yang,
Meredith Ringel Morris,
Paul Resnick,
David Jurgens
Abstract:
Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th…
▽ More
Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions.
△ Less
Submitted 17 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Noncommutative Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\{τ_n\}_{n=1}^\infty$ and $\{ω_m\}_{m=1}^\infty$ be two modular Parseval frames for a Hilbert C*-module $\mathcal{E}$. Then for every $x \in \mathcal{E}\setminus\{0\}$, we show that \begin{align} (1) \quad \quad \quad \quad \|θ_τx \|_0 \|θ_ωx \|_0 \geq \frac{1}{\sup_{n, m \in \mathbb{N}} \|\langle τ_n, ω_m\rangle \|^2}. \end{align} We call Inequality (1) as \textbf{Noncommutative Donoho-Stark…
▽ More
Let $\{τ_n\}_{n=1}^\infty$ and $\{ω_m\}_{m=1}^\infty$ be two modular Parseval frames for a Hilbert C*-module $\mathcal{E}$. Then for every $x \in \mathcal{E}\setminus\{0\}$, we show that \begin{align} (1) \quad \quad \quad \quad \|θ_τx \|_0 \|θ_ωx \|_0 \geq \frac{1}{\sup_{n, m \in \mathbb{N}} \|\langle τ_n, ω_m\rangle \|^2}. \end{align} We call Inequality (1) as \textbf{Noncommutative Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Inequality (1) is the noncommutative analogue of breakthrough Ricaud-Torrésani uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2013]}. In particular, Inequality (1) extends Elad-Bruckstein uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2002]} and Donoho-Stark uncertainty principle \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Continuous Krishna-Parthasarathy Entropic Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to…
▽ More
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to the special case of compact groups.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Authors:
Laksh Nanwani,
Kumaraditya Gupta,
Aditya Mathur,
Swayam Agrawal,
A. H. Abdul Hafez,
K. Madhava Krishna
Abstract:
Humans excel at forming mental maps of their surroundings, equip** them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasi…
▽ More
Humans excel at forming mental maps of their surroundings, equip** them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation
Authors:
Gaurav Singh,
Sanket Kalwar,
Md Faizal Karim,
Bipasha Sen,
Nagamanikandan Govindan,
Srinath Sridhar,
K Madhava Krishna
Abstract:
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore set…
▽ More
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained gras** without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model
Authors:
Amith Manoharan,
Aditya Sharma,
Himani Belsare,
Kaustab Pal,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int…
▽ More
Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories.
△ Less
Submitted 11 April, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Machine Learning Driven Global Optimisation Framework for Analog Circuit Design
Authors:
Ria Rashid,
Komala Krishna,
Clint Pazhayidam George,
Nandakumar Nambath
Abstract:
We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for anal…
▽ More
We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for analog circuits. Machine learning based global offline surrogate models, with the circuit design parameters as the input, are built in the design space for the analog circuits under study and is used to guide the optimisation algorithm, resulting in faster convergence and a reduced number of spice simulations. Multi-layer perceptron and random forest regressors are employed to predict the required design specifications of the analog circuit. Since the saturation condition of transistors is vital in the proper working of analog circuits, multi-layer perceptron classifiers are used to predict the saturation condition of each transistor in the circuit. The feasibility of the candidate solutions is verified using machine learning models before invoking spice simulations. We validate the proposed framework using three circuit topologies--a bandgap reference, a folded cascode operational amplifier, and a two-stage operational amplifier. The simulation results show better optimum values and lower standard deviations for fitness functions after convergence. Incorporating the machine learning-based predictions proposed in the optimisation method has resulted in the reduction of spice calls by 56%, 59%, and 83% when compared with standard approaches in the three test cases considered in the study.
△ Less
Submitted 26 February, 2024;
originally announced April 2024.
-
Unexpected Uncertainty Principle for Disc Banach Spaces
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n…
▽ More
Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n,m \in \mathbb{N} }|f_n(ω_m)|\right)^p\left(\displaystyle\sup_{n, m \in \mathbb{N}}|g_m(τ_n)|\right)^p}, \end{align} where \begin{align*} & θ_f: \mathcal{D}(θ_f) \ni x \mapsto θ_fx := \{f_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}), \quad θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx := \{g_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}). \end{align*} Inequality (1) is unexpectedly different from both bounded uncertainty principle arXiv:2308.00312v1 and unbounded uncertainty principle arXiv:2312.00366v1 for Banach spaces.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving
Authors:
Pranjal Paul,
Anant Garg,
Tushar Choudhary,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh…
▽ More
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, which aims to address this issue by estimating a goal location based on the given language command as an intermediate representation in an end-to-end setting. The estimated goal might fall in a non-desirable region, like on top of a car for a parking-like command, leading to inadequate planning. Hence, we propose to train the architecture in an end-to-end manner, resulting in iterative refinement of both the goal and the trajectory collectively. We validate the effectiveness of our method through comprehensive experiments conducted in diverse simulated environments. We report significant improvements in standard autonomous driving metrics, with a goal reaching Success Rate of 81%. We further showcase the versatility of LeGo-Drive across different driving scenarios and linguistic inputs, underscoring its potential for practical deployment in autonomous vehicles and intelligent transportation systems.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Nonlinear Heisenberg-Robertson-Schrodinger Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
Authors:
Kundan Krishna,
Sanjana Ramprasad,
Prakhar Gupta,
Byron C. Wallace,
Zachary C. Lipton,
Jeffrey P. Bigham
Abstract:
LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar…
▽ More
LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that are not supported by the reference document, and also presents evidence from the reference for facts that do appear to have support. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. Comprehensive evaluation by human raters shows that GenAudit can detect errors in 8 different LLM outputs when summarizing documents from diverse domains. To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision. We release our tool (GenAudit) and fact-checking model for public use.
△ Less
Submitted 16 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Nonlinear Maccone-Pati Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Functional Kup**er-Durisi-Bölcskei Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\…
▽ More
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\quad \|θ_fx\|_0\|θ_gx\|_0\geq \frac{\bigg[1-(\|θ_fx\|_0-1)\max\limits_{1\leq j,r \leq n,j\neq r}|f_j(τ_r)|\bigg]^+\bigg[1-(\|θ_g x\|_0-1)\max\limits_{1\leq k,s \leq m,k\neq s}|g_k(ω_s)|\bigg]^+}{\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|f_j(ω_k)|\right)\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|g_k(τ_j)|\right)}. \end{align}
We call Inequality (1) as \textbf{Functional Kup**er-Durisi-Bölcskei Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Kup**er, Durisi and Bölcskei \textit{[IEEE Trans. Inform. Theory (2012)]} (which improved the Donoho-Stark-Elad-Bruckstein uncertainty principle \textit{[SIAM J. Appl. Math. (1989), IEEE Trans. Inform. Theory (2002)]}). We also derive functional form of the uncertainity principle obtained by Studer, Kup**er, Pope and Bölcskei \textit{[EEE Trans. Inform. Theory (2012)]}.
△ Less
Submitted 1 January, 2024;
originally announced February 2024.
-
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
Authors:
Sanjana Ramprasad,
Kundan Krishna,
Zachary C Lipton,
Byron C Wallace
Abstract:
Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote…
▽ More
Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains? In this work we evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles. The dataset can be downloaded from https://github.com/sanjanaramprasad/zero_shot_faceval_domains
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
ATPPNet: Attention based Temporal Point cloud Prediction Network
Authors:
Kaustab Pal,
Aditya Sharma,
Avinash Sharma,
K. Madhava Krishna
Abstract:
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry d…
▽ More
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry drift. In this work, we present ATPPNet, a novel architecture that predicts future point cloud sequences given a sequence of previous time step point clouds obtained with LiDAR sensor. ATPPNet leverages Conv-LSTM along with channel-wise and spatial attention dually complemented by a 3D-CNN branch for extracting an enhanced spatio-temporal context to recover high quality fidel predictions of future point clouds. We conduct extensive experiments on publicly available datasets and report impressive performance outperforming the existing methods. We also conduct a thorough ablative study of the proposed architecture and provide an application study that highlights the potential of our model for tasks like odometry estimation.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Femtosecond laser-assisted selective holding with ultra-low power for direct manipulation of biological specimens
Authors:
Krishangi Krishna,
Joshua A. Burrow,
Zhaowei Jiang,
Wenyu Liu,
Anita Shukla,
Kimani C. Toussaint Jr
Abstract:
Traditional optical tweezers techniques often rely on high-power continuous wave (CW) lasers, which can introduce unwanted thermal effects and photodamage to delicate samples. To overcome these limitations, we demonstrate femtosecond laser assisted selective holding with ultra-low power (FLASH-UP). We find that the FLASH-UP exhibits a five times greater trap stiffness than CW-OT, and can trap at l…
▽ More
Traditional optical tweezers techniques often rely on high-power continuous wave (CW) lasers, which can introduce unwanted thermal effects and photodamage to delicate samples. To overcome these limitations, we demonstrate femtosecond laser assisted selective holding with ultra-low power (FLASH-UP). We find that the FLASH-UP exhibits a five times greater trap stiffness than CW-OT, and can trap at lower intensities. Furthermore, we demonstrate OT of different pathogenic bacteria species and find that FLASH-UP does not impact cell motility. These results pave the way for applications in sorting, bio-sensing, in vivo cell manipulation and single cell analysis.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Word-Representability of Graphs with respect to Split Recomposition
Authors:
Tithi Dwary,
K. V. Krishna
Abstract:
In this work, we show that the class of word-representable graphs is closed under split recomposition and determine the representation number of the graph obtained by recomposing two word-representable graphs. Accordingly, we show that the class of parity graphs is word-representable. Further, we obtain a characteristic property by which the recomposition of comparability graphs is a comparability…
▽ More
In this work, we show that the class of word-representable graphs is closed under split recomposition and determine the representation number of the graph obtained by recomposing two word-representable graphs. Accordingly, we show that the class of parity graphs is word-representable. Further, we obtain a characteristic property by which the recomposition of comparability graphs is a comparability graph. Consequently, we also establish the permutation-representation number (prn) of the resulting comparability graph. We also introduce a subclass of comparability graphs, called prn-irreducible graphs. We provide a criterion such that the split recomposition of two prn-irreducible graphs is a comparability graph and determine the prn of the resultant graph.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Accelerated parameter estimation in Bilby with relative binning
Authors:
Kruthi Krishna,
Aditya Vijaykumar,
Apratim Ganguly,
Colm Talbot,
Sylvia Biscoveanu,
Richard N George,
Natalie Williams,
Aaron Zimmerman
Abstract:
We describe an implementation of the relative binning technique to speed up parameter estimation of gravitational-wave signals. We first give a pedagogical overview of relative binning, discussing also the expressions for the likelihood marginalized over phase and distance. Then, we describe the details of the code in \texttt{Bilby}, an open-source software package commonly used for parameter esti…
▽ More
We describe an implementation of the relative binning technique to speed up parameter estimation of gravitational-wave signals. We first give a pedagogical overview of relative binning, discussing also the expressions for the likelihood marginalized over phase and distance. Then, we describe the details of the code in \texttt{Bilby}, an open-source software package commonly used for parameter estimation of gravitational-wave sources. Our code is able to reproduce the parameters of GW170817 in 14 hours on a single-core CPU, performs well on simulated signals, and passes the percentile-percentile (p-p) tests. We also illustrate that relative binning is an ideal technique to estimate the parameters of signals in next-generation gravitational wave detectors.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principles
Authors:
K. Mahesh Krishna
Abstract:
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB}
(1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f…
▽ More
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB}
(1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f x))ν(\operatorname{supp}(θ_g x)) \geq \frac{1}{\left(\displaystyle\sup_{α\in Ω, β\in Δ}|f_α(ω_β)|\right)\left(\displaystyle\sup_{α\in Ω, β\in Δ}|g_β(τ_α)|\right)}, \end{align} where \begin{align*} &θ_f:\mathcal{D}(θ_f) \ni x \mapsto θ_fx \in \mathcal{L}^p(Ω, μ); \quad θ_fx: Ω\ni α\mapsto (θ_fx) (α):= f_α(x) \in \mathbb{K},\\ &θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx \in \mathcal{L}^p(Δ, ν); \quad θ_gx: Δ\ni β\mapsto (θ_gx) (β):= g_β(x) \in \mathbb{K}. \end{align*} We call Inequality (1) as \textbf{Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Along with recent \textbf{Functional Continuous Uncertainty Principle} [arXiv:2308.00312], Inequality (1) also improves Ricaud-Torrésani uncertainty principle [IEEE Trans. Inform. Theory, 2013]. In particular, it improves Elad-Bruckstein uncertainty principle [IEEE Trans. Inform. Theory, 2002] and Donoho-Stark uncertainty principle [SIAM J. Appl. Math., 1989].
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing
Authors:
Dhruv Patel,
Shivani Chepuri,
Sarvesh Thakur,
K. Harikumar,
Ravi Kiran S.,
K. Madhava Krishna
Abstract:
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and co…
▽ More
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and count the number of windows of a building by deploying an Unmanned Aerial Vehicle (UAV) based remote sensing system is proposed. The proposed two-stage method automates the identification and counting of windows by develo** computer vision pipelines that utilize data from UAV's onboard camera and other sensors. Quantitative and Qualitative results show the effectiveness of our proposed approach in accurately detecting and counting the windows compared to the existing method.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
On the Permutation-Representation Number of Bipartite Graphs using Neighborhood Graphs
Authors:
Khyodeno Mozhui,
K. V. Krishna
Abstract:
The problems of determining the permutation-representation number (prn) and the representation number of bipartite graphs are open in the literature. Moreover, the decision problem corresponding to the determination of the prn of a bipartite graph is NP-complete. However, these numbers were established for certain subclasses of bipartite graphs, e.g., for crown graphs. Further, it was conjectured…
▽ More
The problems of determining the permutation-representation number (prn) and the representation number of bipartite graphs are open in the literature. Moreover, the decision problem corresponding to the determination of the prn of a bipartite graph is NP-complete. However, these numbers were established for certain subclasses of bipartite graphs, e.g., for crown graphs. Further, it was conjectured that the crown graphs have the highest representation number among the bipartite graphs. In this work, first, we reconcile the relation between the prn of a comparability graph and the dimension of its induced poset and review the upper bounds on the prn of bipartite graphs. Then, we study the prn of bipartite graphs using the notion called neighborhood graphs. This approach substantiates the aforesaid conjecture and gives us theoretical evidence. In this connection, we devise a polynomial-time procedure to construct a word that represents a given bipartite graph permutationally. Accordingly, we provide a better upper bound for the prn of bipartite graphs. Further, we construct a class of bipartite graphs, viz., extended crown graphs, defined over posets and investigate its prn using the neighborhood graphs.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
GEE! Grammar Error Explanation with Large Language Models
Authors:
Yixiao Song,
Kalpesh Krishna,
Rajesh Bhatt,
Kevin Gimpel,
Mohit Iyyer
Abstract:
Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for hel** users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the t…
▽ More
Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for hel** users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous and corrected sentences. We analyze the capability of GPT-4 in grammar error explanation, and find that it only produces explanations for 60.2% of the errors using one-shot prompting. To improve upon this performance, we develop a two-step pipeline that leverages fine-tuned and prompted large language models to perform structured atomic token edit extraction, followed by prompting GPT-4 to generate explanations. We evaluate our pipeline on German and Chinese grammar error correction data sampled from language learners with a wide range of proficiency levels. Human evaluation reveals that our pipeline produces 93.9% and 98.0% correct explanations for German and Chinese data, respectively. To encourage further research in this area, we will open-source our data and code.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Continuous Rankin Bound for Hilbert and Banach Spaces
Authors:
K. Mahesh Krishna
Abstract:
Let $(Ω, μ)$ be a measure space and $\{τ_α\}_{α\in Ω}$ be a normalized continuous Bessel family for a real Hilbert space $\mathcal{H}$. If the diagonal $Δ:= \{(α, α):α\in Ω\}$ is measurable in the measure space $Ω\times Ω$, then we show that \begin{align} (1) \quad\quad\quad\quad \sup _{α, β\in Ω, α\neq β}\langle τ_α, τ_β\rangle \geq \frac{-(μ\timesμ)(Δ)}{(μ\timesμ)((Ω\timesΩ)\setminusΔ)}. \end{al…
▽ More
Let $(Ω, μ)$ be a measure space and $\{τ_α\}_{α\in Ω}$ be a normalized continuous Bessel family for a real Hilbert space $\mathcal{H}$. If the diagonal $Δ:= \{(α, α):α\in Ω\}$ is measurable in the measure space $Ω\times Ω$, then we show that \begin{align} (1) \quad\quad\quad\quad \sup _{α, β\in Ω, α\neq β}\langle τ_α, τ_β\rangle \geq \frac{-(μ\timesμ)(Δ)}{(μ\timesμ)((Ω\timesΩ)\setminusΔ)}. \end{align} We call Inequality (1) as continuous Rankin bound. It improves 76 years old result of Rankin [\textit{Ann. of Math., 1947}]. It also answers one of the questions asked by K. M. Krishna in the paper [Continuous Welch bounds with applications, \textit{Commun. Korean Math. Soc., 2023}]. We also derive Banach space version of Inequality (1).
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
A Branch Group in a Class of Non-Contracting Weakly Regular Branch Groups
Authors:
Sagar Saha,
K. V. Krishna
Abstract:
We provide a class of non-contracting groups containing an infinite family of fractal and weakly regular branch groups, and study certain properties including abelianization, just infiniteness, and word problem. We present an example of a branch group in this class and show that it is of exponential growth. It seems this is the first example of a non-contracting branch group constructed explicitly…
▽ More
We provide a class of non-contracting groups containing an infinite family of fractal and weakly regular branch groups, and study certain properties including abelianization, just infiniteness, and word problem. We present an example of a branch group in this class and show that it is of exponential growth. It seems this is the first example of a non-contracting branch group constructed explicitly.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving
Authors:
Kaustab Pal,
Aditya Sharma,
Mohd Omama,
Parth N. Shah,
K. Madhava Krishna
Abstract:
In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha…
▽ More
In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon that precludes further resampling, an iterative process that makes sampling based optimal control formulations difficult to adopt in real time settings. Generating control samples around the network-predicted optimal mean retains the advantage of sample diversity while enabling real time rollout of trajectories that avoids multiple dynamic obstacles in an on-road navigation setting. Further the 3D CNN architecture implicitly learns the future trajectories of the dynamic agents in the scene resulting in successful collision free navigation despite no explicit future trajectory prediction. We show performance gain over multiple baselines in a number of on-road scenes through closed loop simulations in CARLA. We also showcase the real world applicability of our system by running it on our custom Autonomous Driving Platform (AutoDP).
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Hilbert Space Embedding-based Trajectory Optimization for Multi-Modal Uncertain Obstacle Trajectory Prediction
Authors:
Basant Sharma,
Aditya Sharma,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. H…
▽ More
Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. However, existing planners cannot handle the multi-modality based on just sample-level information of the predictions. With this motivation, this paper proposes a trajectory optimizer that can leverage the distributional aspects of the prediction in a computationally tractable and sample-efficient manner. Our optimizer can work with arbitrarily complex distributions and thus can be used with output distribution represented as a deep neural network. The core of our approach is built on embedding distribution in Reproducing Kernel Hilbert Space (RKHS), which we leverage in two ways. First, we propose an RKHS embedding approach to select probable samples from the obstacle trajectory distribution. Second, we rephrase chance-constrained optimization as distribution matching in RKHS and propose a novel sampling-based optimizer for its solution. We validate our approach with hand-crafted and neural network-based predictors trained on real-world datasets and show improvement over the existing stochastic optimization approaches in safety metrics.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions
Authors:
Sanket Kalwar,
Mihir Ungarala,
Shruti Jain,
Aaron Monis,
Krishna Reddy Konda,
Sourav Garg,
K Madhava Krishna
Abstract:
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in founda…
▽ More
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed $\nabla$HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io.
△ Less
Submitted 26 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Authors:
Tushar Choudhary,
Vikrant Dewangan,
Shivam Chandhok,
Shubham Priyadarshan,
Anushka Jain,
Arun K. Singh,
Siddharth Srivastava,
Krishna Murthy Jatavallabhula,
K. Madhava Krishna
Abstract:
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representation…
▽ More
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
△ Less
Submitted 14 November, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Continuous Deutsch Uncertainty Principle and Continuous Kraus Conjecture
Authors:
K. Mahesh Krishna
Abstract:
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $\{τ_α\}_{α\in Ω}$, $\{ω_β\}_{β\in Δ}$ be 1-bounded continuous Parseval frames for a Hilbert space $\mathcal{H}$. Then we show that \begin{align} (1) \quad \quad \quad \quad \log (μ(Ω)ν(Δ))\geq S_τ(h)+S_ω(h)\geq -2 \log \left(\frac{1+\displaystyle \sup_{α\in Ω, β\in Δ}|\langleτ_α, ω_β\rangle|}{2}\right) , \quad \forall h \in \mathcal{H}_τ\cap \mathcal{H…
▽ More
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $\{τ_α\}_{α\in Ω}$, $\{ω_β\}_{β\in Δ}$ be 1-bounded continuous Parseval frames for a Hilbert space $\mathcal{H}$. Then we show that \begin{align} (1) \quad \quad \quad \quad \log (μ(Ω)ν(Δ))\geq S_τ(h)+S_ω(h)\geq -2 \log \left(\frac{1+\displaystyle \sup_{α\in Ω, β\in Δ}|\langleτ_α, ω_β\rangle|}{2}\right) , \quad \forall h \in \mathcal{H}_τ\cap \mathcal{H}_ω, \end{align} where \begin{align*} &\mathcal{H}_τ:= \{h_1 \in \mathcal{H}: \langle h_1 , τ_α\rangle \neq 0, α\in Ω\}, \quad \mathcal{H}_ω:= \{h_2 \in \mathcal{H}: \langle h_2, ω_β\rangle \neq 0, β\in Δ\},\\ &S_τ(h):= -\displaystyle\int\limits_Ω\left|\left \langle \frac{h}{\|h\|}, τ_α\right\rangle \right|^2\log \left|\left \langle \frac{h}{\|h\|}, τ_α\right\rangle \right|^2\,dμ(α), \quad \forall h \in \mathcal{H}_τ, \\ & S_ω(h):= -\displaystyle\int\limits_Δ\left|\left \langle \frac{h}{\|h\|}, ω_β\right\rangle \right|^2\log \left|\left \langle \frac{h}{\|h\|}, ω_β\right\rangle \right|^2\,dν(β), \quad \forall h \in \mathcal{H}_ω. \end{align*} We call Inequality (1) as \textbf{Continuous Deutsch Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Deutsch \textit{[Phys. Rev. Lett., 1983]}. We formulate Kraus conjecture for 1-bounded continuous Parseval frames. We also derive continuous Deutsch uncertainty principles for Banach spaces.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Functional Deutsch Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\{f_j\}_{j=1}^n$ and $\{g_k\}_{k=1}^m$ be Parseval p-frames for a finite dimensional Banach space $\mathcal{X}$. Then we show that \begin{align} (1) \quad\quad\quad\quad \log (nm)\geq S_f (x)+S_g (x)\geq -p \log \left(\displaystyle\sup_{y \in \mathcal{X}_f\cap \mathcal{X}_g, \|y\|=1}\left(\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(y)g_k(y)|\right)\right), \quad \forall x \in \mathcal{X}_f\cap \m…
▽ More
Let $\{f_j\}_{j=1}^n$ and $\{g_k\}_{k=1}^m$ be Parseval p-frames for a finite dimensional Banach space $\mathcal{X}$. Then we show that \begin{align} (1) \quad\quad\quad\quad \log (nm)\geq S_f (x)+S_g (x)\geq -p \log \left(\displaystyle\sup_{y \in \mathcal{X}_f\cap \mathcal{X}_g, \|y\|=1}\left(\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(y)g_k(y)|\right)\right), \quad \forall x \in \mathcal{X}_f\cap \mathcal{X}_g, \end{align} where \begin{align*} &\mathcal{X}_f:= \{z\in \mathcal{X}: f_j(z)\neq 0, 1\leq j \leq n\}, \quad \mathcal{X}_g:= \{w\in \mathcal{X}: g_k(w)\neq 0, 1\leq k \leq m\},\\ &S_f (x):= -\sum_{j=1}^{n}\left|f_j\left(\frac{x}{\|x\|}\right)\right|^p\log \left|f_j\left(\frac{x}{\|x\|}\right)\right|^p, \quad S_g (x):= -\sum_{k=1}^{m}\left|g_k\left(\frac{x}{\|x\|}\right)\right|^p\log \left|g_k\left(\frac{x}{\|x\|}\right)\right|^p, \quad \forall x \in \mathcal{X}_g. \end{align*} We call Inequality (1) as \textbf{Functional Deutsch Uncertainty Principle}. For Hilbert spaces, we show that Inequality (1) reduces to the uncertainty principle obtained by Deutsch \textit{[Phys. Rev. Lett., 1983]}. We also derive a dual of Inequality (1).
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Control of Vortex Dynamics using Invariants
Authors:
Kartik Krishna,
Aditya G. Nair,
Anand Krishnan,
Steven L. Brunton,
Eurika Kaiser
Abstract:
Vortex-dominated flows are ubiquitous in engineering, and the ability to efficiently manipulate the dynamics of these vortices has broad applications, from wake sha** to mixing enhancement. However, the strongly nonlinear behavior of the vortex dynamics makes this a challenging task. In this work, we investigate the control of vortex dynamics by using a change of coordinates from the Biot-Savart…
▽ More
Vortex-dominated flows are ubiquitous in engineering, and the ability to efficiently manipulate the dynamics of these vortices has broad applications, from wake sha** to mixing enhancement. However, the strongly nonlinear behavior of the vortex dynamics makes this a challenging task. In this work, we investigate the control of vortex dynamics by using a change of coordinates from the Biot-Savart equations into well-known invariants, such as the Hamiltonian, linear, and angular impulses, which are Koopman eigenfunctions. We then combine the resulting model with model predictive control to generate control laws that force the vortex system using "virtual cylinders". The invariant model is beneficial as it provides a linear, global description of the vortex dynamics through a recently developed Koopman control scheme for conserved quantities and invariants. The use of this model has not been well studied in the literature in the context of control. In this paper, we seek to understand the effect of changing each invariant individually or multiple invariants simultaneously. We use the 4-vortex system as our primary test bed, as it is the simplest configuration that exhibits chaotic behavior. We show that by controlling to specific invariant quantities, we can modify the transition from chaotic to quasiperiodic states. Finally, we computationally demonstrate the effectiveness of invariant control on a toy example of tracer mixing in the 4-vortex system.
△ Less
Submitted 7 November, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Functional Continuous Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f x))^\frac{1}{p} ν(\operatorname{supp}(θ_g x))^\frac{1}{q} \geq \frac{1}{\di…
▽ More
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f x))^\frac{1}{p} ν(\operatorname{supp}(θ_g x))^\frac{1}{q} \geq \frac{1}{\displaystyle\sup_{α\in Ω, β\in Δ}|f_α(ω_β)|}, \quad ν(\operatorname{supp}(θ_g x))^\frac{1}{p} μ(\operatorname{supp}(θ_f x))^\frac{1}{q}\geq \frac{1}{\displaystyle\sup_{α\in Ω, β\in Δ}|g_β(τ_α)|}. \end{align} where \begin{align*} &θ_f: \mathcal{X} \ni x \mapsto θ_fx \in \mathcal{L}^p(Ω, μ); \quad θ_fx: Ω\ni α\mapsto (θ_fx) (α):= f_α(x) \in \mathbb{K}, &θ_g: \mathcal{X} \ni x \mapsto θ_gx \in \mathcal{L}^p(Δ, ν); \quad θ_gx: Δ\ni β\mapsto (θ_gx) (β):= g_β(x) \in \mathbb{K} \end{align*} and $q$ is the conjugate index of $p$. We call Inequality (1) as \textbf{Functional Continuous Uncertainty Principle}. It improves the Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle obtained by K. Mahesh Krishna in [arXiv:2304.03324v1 [math.FA], 5 April 2023]. It also answers a question asked by Prof. Philip B. Stark to the author. Based on Donoho-Elad Sparsity Theorem, we formulate Measure Minimization Conjecture.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Functional Donoho-Stark Approximate Support Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\la…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\label{ME} (1) \quad \quad \quad \quad &o(M)^\frac{1}{p}o(N)^\frac{1}{q}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|f_j(ω_k) |}\max \{1-\varepsilon-δ, 0\},\\ (2) \quad \quad \quad \quad&o(M)^\frac{1}{q}o(N)^\frac{1}{p}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}\max \{1-\varepsilon-δ, 0\},\label{ME2} \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^n \in \ell^p([n]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequalities (1) and (2) as \textbf{Functional Donoho-Stark Approximate Support Uncertainty Principle}. Inequalities (1) and (2) improve the finite approximate support uncertainty principle obtained by Donoho and Stark \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Words for the Graphs with Permutation-Representation Number at most Three
Authors:
Khyodeno Mozhui,
K. V. Krishna
Abstract:
The graphs with permutation-representation number (\textit{prn}) at most two are known. While a characterization for the class of graphs with the \textit{prn} at most three is an open problem, we summarize the graphs of this class that are known so far. Although it is known that the \textit{prn} of trees is at most three, in this work, we devise a polynomial-time algorithm for obtaining a word rep…
▽ More
The graphs with permutation-representation number (\textit{prn}) at most two are known. While a characterization for the class of graphs with the \textit{prn} at most three is an open problem, we summarize the graphs of this class that are known so far. Although it is known that the \textit{prn} of trees is at most three, in this work, we devise a polynomial-time algorithm for obtaining a word representing a given tree permutationally. Consequently, we determine the words representing even cycles. Contributing to the class of graphs with the \textit{prn} at most three, we determine the \textit{prn} as well as the representation number of book graphs.
△ Less
Submitted 29 September, 2023; v1 submitted 1 July, 2023;
originally announced July 2023.
-
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork
Authors:
Bipasha Sen,
Gaurav Singh,
Aditya Agarwal,
Rohith Agaram,
K Madhava Krishna,
Srinath Sridhar
Abstract:
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to…
▽ More
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.
△ Less
Submitted 23 December, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
UAP-BEV: Uncertainty Aware Planning using Bird's Eye View generated from Surround Monocular Images
Authors:
Vikrant Dewangan,
Basant Sharma,
Tushar Choudhary,
Sarthak Sharma,
Aakash Aanegola,
Arun K. Singh,
K. Madhava Krishna
Abstract:
Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous drivin…
▽ More
Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous driving. However, the current approaches have two critical gaps. First, the optimization process is simplistic and involves just evaluating a fixed set of trajectories over the cost map. The trajectory samples are not adapted based on their associated cost values. Second, the existing cost maps do not account for the uncertainty in the cost maps that can arise due to noise in RGB images, and BEV annotations. As a result, these approaches can struggle in challenging scenarios where there is abrupt cut-in, stop**, overtaking, merging, etc from the neighboring vehicles.
In this paper, we propose UAP-BEV: A novel approach that models the noise in Spatio-Temporal BEV predictions to create an uncertainty-aware occupancy grid map. Using queries of the distance to the closest occupied cell, we obtain a sample estimate of the collision probability of the ego-vehicle. Subsequently, our approach uses gradient-free sampling-based optimization to compute low-cost trajectories over the cost map. Importantly, the sampling distribution is adapted based on the optimal cost values of the sampled trajectories. By explicitly modeling probabilistic collision avoidance in the BEV space, our approach is able to outperform the cost-map-based baselines in collision avoidance, route completion, time to completion, and smoothness. To further validate our method, we also show results on the real-world dataset NuScenes, where we report improvements in collision avoidance and smoothness.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Functional Ghobber-Jaming Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*}
o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*}
o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all $x \in \mathcal{X}$, we show that \begin{align}\label{FGJU} (1) \quad \quad \quad \quad \|x\|\leq \left(1+\frac{1}{1-o(M)^\frac{1}{q}o(N)^\frac{1}{p}\displaystyle\max_{1\leq j,k\leq n}|g_k(τ_j)|}\right)\left[\left(\sum_{j\in M^c}|f_j(x)|^p\right)^\frac{1}{p}+\left(\sum_{k\in N^c}|g_k(x) |^p\right)^\frac{1}{p}\right]. \end{align}
We call Inequality (1) as \textbf{Functional Ghobber-Jaming Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Ghobber and Jaming \textit{[Linear Algebra Appl., 2011]}.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
USB: A Unified Summarization Benchmark Across Tasks and Domains
Authors:
Kundan Krishna,
Prakhar Gupta,
Sanjana Ramprasad,
Byron C. Wallace,
Jeffrey P. Bigham,
Zachary C. Lipton
Abstract:
While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization…
▽ More
While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization; (iii) topic-based summarization; (iv) compressing selected sentences into a one-line summary; (v) surfacing evidence for a summary sentence; (vi) predicting the factual accuracy of a summary sentence; (vii) identifying unsubstantiated spans in a summary sentence; (viii) correcting factual errors in summaries. We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models. For factuality-related tasks, we also evaluate existing heuristics to create training data and find that training on them results in worse performance than training on $20\times$ less human-labeled data. Our articles draw from $6$ domains, facilitating cross-domain analysis. On some tasks, the amount of training data matters more than the domain where it comes from, while for other tasks training specifically on data from the target domain, even if limited, is more beneficial.
△ Less
Submitted 4 December, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Authors:
Sewon Min,
Kalpesh Krishna,
Xinxi Lyu,
Mike Lewis,
Wen-tau Yih,
Pang Wei Koh,
Mohit Iyyer,
Luke Zettlemoyer,
Hannaneh Hajishirzi
Abstract:
Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of…
▽ More
Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of atomic facts and computes the percentage of atomic facts supported by a reliable knowledge source. We conduct an extensive human evaluation to obtain FACTSCOREs of people biographies generated by several state-of-the-art commercial LMs -- InstructGPT, ChatGPT, and the retrieval-augmented PerplexityAI -- and report new analysis demonstrating the need for such a fine-grained score (e.g., ChatGPT only achieves 58%). Since human evaluation is costly, we also introduce an automated model that estimates FACTSCORE using retrieval and a strong language model, with less than a 2% error rate. Finally, we use this automated metric to evaluate 6,500 generations from a new set of 13 recent LMs that would have cost $26K if evaluated by humans, with various findings: GPT-4 and ChatGPT are more factual than public models, and Vicuna and Alpaca are some of the best public models. FACTSCORE is available for public use via `pip install factscore`.
△ Less
Submitted 11 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Instance-Level Semantic Maps for Vision Language Navigation
Authors:
Laksh Nanwani,
Anmol Agarwal,
Kanishk Jain,
Raghav Prabhakar,
Aaron Monis,
Aditya Mathur,
Krishna Murthy,
Abdul Hafez,
Vineet Gandhi,
K. Madhava Krishna
Abstract:
Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this…
▽ More
Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this goal by creating a semantic spatial map representation of the environment without any labeled data. However, their representations are limited for practical applicability as they do not distinguish between different instances of the same object. In this work, we address this limitation by integrating instance-level information into spatial map representation using a community detection algorithm and utilizing word ontology learned by large language models (LLMs) to perform open-set semantic associations in the map** representation. The resulting map representation improves the navigation performance by two-fold (233%) on realistic language commands with instance-specific descriptions compared to the baseline. We validate the practicality and effectiveness of our approach through extensive qualitative and quantitative experiments.
△ Less
Submitted 1 July, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Group-Frames for Banach Spaces
Authors:
K. Mahesh Krishna
Abstract:
In the literature, frames generated by unitary representations of groups (known as group-frames) are studied only for Hilbert spaces. We make first study of frames for Banach spaces generated by isometric invertible representations of discrete groups on Banach spaces. These frames are characterized using left regular, right regular, Gram-matrices and group-matrices on classical sequence spaces. A…
▽ More
In the literature, frames generated by unitary representations of groups (known as group-frames) are studied only for Hilbert spaces. We make first study of frames for Banach spaces generated by isometric invertible representations of discrete groups on Banach spaces. These frames are characterized using left regular, right regular, Gram-matrices and group-matrices on classical sequence spaces. A sufficiently large collection of functional-vector pairs using the double commutant of the representation is identified which generate group-frames for Banach spaces. Subsequently, we study Schauder frames generated by time-frequency shift operators on finite dimensional Banach spaces. We derive Moyal formula, fundamental identity of Gabor analysis, Wexler-Raz criterion and Ron-Shen duality in functional form.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Finite Time Lyapunov Exponent Analysis of Model Predictive Control and Reinforcement Learning
Authors:
Kartik Krishna,
Steven L. Brunton,
Zhuoyuan Song
Abstract:
Finite-time Lyapunov exponents (FTLEs) provide a powerful approach to compute time-varying analogs of invariant manifolds in unsteady fluid flow fields. These manifolds are useful to visualize the transport mechanisms of passive tracers advecting with the flow. However, many vehicles and mobile sensors are not passive, but are instead actuated according to some intelligent trajectory planning or c…
▽ More
Finite-time Lyapunov exponents (FTLEs) provide a powerful approach to compute time-varying analogs of invariant manifolds in unsteady fluid flow fields. These manifolds are useful to visualize the transport mechanisms of passive tracers advecting with the flow. However, many vehicles and mobile sensors are not passive, but are instead actuated according to some intelligent trajectory planning or control law; for example, model predictive control and reinforcement learning are often used to design energy-efficient trajectories in a dynamically changing background flow. In this work, we investigate the use of FTLE on such controlled agents to gain insight into optimal transport routes for navigation in known unsteady flows. We find that these controlled FTLE (cFTLE) coherent structures separate the flow field into different regions with similar costs of transport to the goal location. These separatrices are functions of the planning algorithm's hyper-parameters, such as the optimization time horizon and the cost of actuation. Computing the invariant sets and manifolds of active agent dynamics in dynamic flow fields is useful in the context of robust motion control, hyperparameter tuning, and determining safe and collision-free trajectories for autonomous systems. Moreover, these cFTLE structures provide insight into effective deployment locations for mobile agents with actuation and energy constraints to traverse the ocean or atmosphere.
△ Less
Submitted 17 May, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\f…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\frac{1}{p}\|θ_f x\|_0^\frac{1}{q}\geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|g_k(τ_j)|}. \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^m \in \ell^p([m]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequality (1) as \textbf{Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Inequality (1) improves Ricaud-Torrésani uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2013]}. In particular, it improves Elad-Bruckstein uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2002]} and Donoho-Stark uncertainty principle \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
Authors:
Sudarshan S Harithas,
Gurkirat Singh,
Aneesh Chavan,
Sarthak Sharma,
Suraj Patni,
Chetan Arora,
K. Madhava Krishna
Abstract:
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads…
▽ More
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
Authors:
Kalpesh Krishna,
Yixiao Song,
Marzena Karpinska,
John Wieting,
Mohit Iyyer
Abstract:
The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B…
▽ More
The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics.
To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings while only classifying 1% of human-written sequences as AI-generated. We open-source our models, code and data.
△ Less
Submitted 17 October, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Stealing the Decoding Algorithms of Language Models
Authors:
Ali Naseh,
Kalpesh Krishna,
Mohit Iyyer,
Amir Houmansadr
Abstract:
A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human…
▽ More
A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human evaluation. Therefore, the identity and hyperparameters of such decoding algorithms are considered to be extremely valuable to their owners. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyperparameters of its decoding algorithms at very low monetary costs. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo. We demonstrate the feasibility of stealing such information with only a few dollars, e.g., $\$0.8$, $\$1$, $\$4$, and $\$40$ for the four versions of GPT-3.
△ Less
Submitted 1 December, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Absolutely Summing Morphisms between Hilbert C*-Modules and Modular Pietsch Factorization Problem
Authors:
K. Mahesh Krishna
Abstract:
Motivated from the theory of Hilbert-Schmidt morphisms between Hilbert C*-modules over commutative C*-algebras by Stern and van Suijlekom \textit{[J. Funct. Anal., 2021]}, we introduce the notion of p-absolutely summing morphisms between Hilbert C*-modules over commutative C*-algebras. We show that an adjointable morphism between Hilbert C*-modules over monotone closed commutative C*-algebra is 2-…
▽ More
Motivated from the theory of Hilbert-Schmidt morphisms between Hilbert C*-modules over commutative C*-algebras by Stern and van Suijlekom \textit{[J. Funct. Anal., 2021]}, we introduce the notion of p-absolutely summing morphisms between Hilbert C*-modules over commutative C*-algebras. We show that an adjointable morphism between Hilbert C*-modules over monotone closed commutative C*-algebra is 2-absolutely summing if and only if it is Hilbert-Schmidt. We formulate version of Pietsch factorization problem for p-absolutely summing morphisms and solve partially
△ Less
Submitted 6 February, 2023;
originally announced February 2023.