-
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
Authors:
Roopal Garg,
Andrea Burns,
Burcu Karagol Ayan,
Yonatan Bitton,
Ceslee Montgomery,
Yasumasa Onoe,
Andrew Bunner,
Ranjay Krishna,
Jason Baldridge,
Radu Soricut
Abstract:
Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replet…
▽ More
Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replete with missing information, visual inconsistencies, and hallucinations. To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness. Our dataset significantly improves across these dimensions compared to recently released datasets (+66%) and GPT-4V outputs (+48%). Furthermore, models fine-tuned with IIW data excel by +31% against prior work along the same human evaluation dimensions. Given our fine-tuned models, we also evaluate text-to-image generation and vision-language reasoning. Our model's descriptions can generate images closest to the original, as judged by both automated and human metrics. We also find our model produces more compositionally rich descriptions, outperforming the best baseline by up to 6% on ARO, SVO-Probes, and Winoground datasets.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Piezoelectric microresonators for sensitive spin detection
Authors:
Cecile Skoryna Kline,
Jorge Monroy-Ruz,
Krishna C Balram
Abstract:
Piezoelectric microresonators are indispensable in wireless communications, and underpin radio frequency filtering in mobile phones. These devices are usually analyzed in the quasi-(electro)static regime with the magnetic field effectively ignored. On the other hand, at GHz frequencies and especially in piezoelectric devices exploiting strong dimensional confinement of acoustic fields, the surface…
▽ More
Piezoelectric microresonators are indispensable in wireless communications, and underpin radio frequency filtering in mobile phones. These devices are usually analyzed in the quasi-(electro)static regime with the magnetic field effectively ignored. On the other hand, at GHz frequencies and especially in piezoelectric devices exploiting strong dimensional confinement of acoustic fields, the surface magnetic fields ($B_{1}$) can be significant. This $B_1$ field, which oscillates at GHz frequencies, but is confined to $μ$m-scale wavelengths provides a natural route to efficiently interface with nanoscale spin systems. We show through scaling arguments that $B_1{\propto}f^2$ for tightly focused acoustic fields at a given operation frequency $f$. We demonstrate the existence of these surface magnetic fields in a proof-of-principle experiment by showing excess power absorption at the focus of a surface acoustic wave (SAW), when a polished Yttrium-Iron-Garnet (YIG) sphere is positioned in the evanescent field, and the magnon resonance is tuned across the SAW transmission. Finally, we outline the prospects for sensitive spin detection using small mode volume piezoelectric microresonators, including the feasibility of electrical detection of single spins at cryogenic temperatures.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects
Authors:
Raveesh Garg,
Hyoukjun Kwon,
Eric Qin,
Yu-Hsin Chen,
Tushar Krishna,
Liangzhen Lai
Abstract:
Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimi…
▽ More
Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimization by previous works.
However, the design space of inter-operator pipelining is huge, and the space is not yet fully explored. In particular, identifying the right depth and granularity of pipelining (or no pipelining at all) is significantly dependent on the layer shapes and data volumes of weights and activations, and these are different even within a domain.
Moreover, works divide the substrate into large chunks and map one layer onto each chunk, which requires communicating halfway through or through the global buffer. However, for fine-grained inter-operation pipelining, placing the corresponding consumer of the next layer tile close to the producer tile of the current layer is a better way to exploit fine-grained spatial reuse.
In order to support variable number of layers (ie the right depth) and support multiple spatial organizations of layers (in accordance with the pipelining granularity) on the substrate, we propose PipeOrgan, a new class of spatial data organization strategy for energy efficient and congestion-free communication between the PEs for various pipeline depth and granularity. PipeOrgan takes advantage of flexible spatial organization and can allocate layers to PEs based on the granularity of pipelining. We also propose changes to the conventional mesh topology to improve the performance of coarse-grained allocation. PipeOrgan achieves 1.95x performance improvement over the state-of-the-art pipelined dataflow on XR-bench workloads.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
CodeFort: Robust Training for Code Generation Models
Authors:
Yuhao Zhang,
Shiqi Wang,
Haifeng Qian,
Zijian Wang,
Mingyue Shang,
Linbo Liu,
Sanjay Krishna Gouda,
Baishakhi Ray,
Murali Krishna Ramanathan,
Xiaofei Ma,
Anoop Deoras
Abstract:
Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models. Improving the robustness of code generation models is crucial to better user experience when these models are deployed in real-world applications. However, existing efforts have not addressed this issue for code generati…
▽ More
Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models. Improving the robustness of code generation models is crucial to better user experience when these models are deployed in real-world applications. However, existing efforts have not addressed this issue for code generation models. To fill this gap, we propose CodeFort, a framework to improve the robustness of code generation models, generalizing a large variety of code perturbations to enrich the training data and enabling various robust training strategies, mixing data augmentation, batch augmentation, adversarial logits pairing, and contrastive learning, all carefully designed to support high-throughput training. Extensive evaluations show that we improve the average robust pass rates of baseline CodeGen models from 14.79 to 21.74. Notably, the improvement in robustness against code-syntax perturbations is evidenced by a significant decrease in pass rate drop from 95.04% to 53.35%
△ Less
Submitted 11 April, 2024;
originally announced May 2024.
-
Exploring the Influence of Graph Operations on Zero Forcing Sets
Authors:
Krishna Menon,
Anurag Singh
Abstract:
Zero forcing in graphs is a coloring process where a colored vertex can force its unique uncolored neighbor to be colored. A zero forcing set is a set of initially colored vertices capable of eventually coloring all vertices of the graph. In this paper, we focus on the numbers $z(G; i)$, which is the number of zero forcing sets of size $i$ of the graph $G$. These numbers were initially studied by…
▽ More
Zero forcing in graphs is a coloring process where a colored vertex can force its unique uncolored neighbor to be colored. A zero forcing set is a set of initially colored vertices capable of eventually coloring all vertices of the graph. In this paper, we focus on the numbers $z(G; i)$, which is the number of zero forcing sets of size $i$ of the graph $G$. These numbers were initially studied by Boyer et al. where they conjectured that for any graph $G$ on $n$ vertices, $z(G; i) \leq z(P_n; i)$ for all $i \geq 1$ where $P_n$ is the path graph on $n$ vertices. The main aim of this paper is to show that several classes of graphs, including outerplanar graphs and threshold graphs, satisfy this conjecture. We do this by studying various graph operations and examining how they affect the number of zero forcing sets.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
An efficient quantifier elimination procedure for Presburger arithmetic
Authors:
Christoph Haase,
Shankara Narayanan Krishna,
Khushraj Madnani,
Om Swostik Mishra,
Georg Zetzsche
Abstract:
All known quantifier elimination procedures for Presburger arithmetic require doubly exponential time for eliminating a single block of existentially quantified variables. It has even been claimed in the literature that this upper bound is tight. We observe that this claim is incorrect and develop, as the main result of this paper, a quantifier elimination procedure eliminating a block of existent…
▽ More
All known quantifier elimination procedures for Presburger arithmetic require doubly exponential time for eliminating a single block of existentially quantified variables. It has even been claimed in the literature that this upper bound is tight. We observe that this claim is incorrect and develop, as the main result of this paper, a quantifier elimination procedure eliminating a block of existentially quantified variables in singly exponential time. As corollaries, we can establish the precise complexity of numerous problems. Examples include deciding (i) monadic decomposability for existential formulas, (ii) whether an existential formula defines a well-quasi ordering or, more generally, (iii) certain formulas of Presburger arithmetic with Ramsey quantifiers. Moreover, despite the exponential blowup, our procedure shows that under mild assumptions, even NP upper bounds for decision problems about quantifier-free formulas can be transferred to existential formulas. The technical basis of our results is a kind of small model property for parametric integer programming that generalizes the seminal results by von zur Gathen and Sieveking on small integer points in convex polytopes.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
CACTUS: Chemistry Agent Connecting Tool-Usage to Science
Authors:
Andrew D. McNaughton,
Gautham Ramalaxmi,
Agustin Kruel,
Carter R. Knutson,
Rohith A. Varikoti,
Neeraj Kumar
Abstract:
Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular dis…
▽ More
Large language models (LLMs) have shown remarkable potential in various domains, but they often lack the ability to access and reason over domain-specific knowledge and tools. In this paper, we introduced CACTUS (Chemistry Agent Connecting Tool-Usage to Science), an LLM-based agent that integrates cheminformatics tools to enable advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama2-7b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b and Mistral-7b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with domain-specific tools, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment. Furthermore, CACTUS represents a significant milestone in the field of cheminformatics, offering an adaptable tool for researchers engaged in chemistry and molecular discovery. By integrating the strengths of open-source LLMs with domain-specific tools, CACTUS has the potential to accelerate scientific advancement and unlock new frontiers in the exploration of novel, effective, and safe therapeutic candidates, catalysts, and materials. Moreover, CACTUS's ability to integrate with automated experimentation platforms and make data-driven decisions in real time opens up new possibilities for autonomous discovery.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Controlled Spalling of Single Crystal 4H-SiC Bulk Substrates
Authors:
Connor P Horn,
Christina Wicker,
Antoni Wellisz,
Cyrus Zeledon,
Pavani Vamsi Krishna Nittala,
F Joseph Heremans,
David D Awschalom,
Supratik Guha
Abstract:
We detail several scientific and engineering innovations which enable the controlled spalling of 10 - 50 micron thick films of single crystal 4H silicon carbide (4H-SiC) from bulk substrates. 4H-SiC's properties, including high thermal conductivity and a wide bandgap, make it an ideal candidate for high-temperature, high-voltage power electronic devices. Moreover, 4H-SiC has been shown to be an ex…
▽ More
We detail several scientific and engineering innovations which enable the controlled spalling of 10 - 50 micron thick films of single crystal 4H silicon carbide (4H-SiC) from bulk substrates. 4H-SiC's properties, including high thermal conductivity and a wide bandgap, make it an ideal candidate for high-temperature, high-voltage power electronic devices. Moreover, 4H-SiC has been shown to be an excellent host of solid-state atomic defect qubits for quantum computing and quantum networking. Because 4H-SiC single crystal substrates are expensive (due to long growth times and limited yield), techniques for removal and transfer of bulk-quality films in the tens-of-microns thickness range are highly desirable to allow for substrate reuse and integration of the separated films. In this work we utilize novel approaches for stressor layer thickness control and spalling crack initiation to demonstrate controlled spalling of 4H-SiC, the highest fracture toughness material spalled to date. Additionally, we demonstrate substrate re-use, bonding of the spalled films to carrier substrates, and explore the spin coherence of the spalled films. In preliminary studies we are able to achieve coherent spin control of neutral divacancy ($VV^{0}$) qubit ensembles and measure a quasi-bulk spin $T_{2}$ of 79.7 $μ$s in such spalled films.
△ Less
Submitted 30 June, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Authors:
Aaron J. Li,
Satyapriya Krishna,
Himabindu Lakkaraju
Abstract:
The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power. Despite the effectiveness of preference learning algorithms like Reinforcement Learning From Human Feedback (RLHF) in aligning human preferences, their assumed improvements on model trustwo…
▽ More
The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power. Despite the effectiveness of preference learning algorithms like Reinforcement Learning From Human Feedback (RLHF) in aligning human preferences, their assumed improvements on model trustworthiness haven't been thoroughly testified. Toward this end, this study investigates how models that have been aligned with general-purpose preference data on helpfulness and harmlessness perform across five trustworthiness verticals: toxicity, stereotypical bias, machine ethics, truthfulness, and privacy. For model alignment, we focus on three widely used RLHF variants: Supervised Finetuning (SFT), Proximal Policy Optimization (PPO), and Direct Preference Optimization (DPO). Through extensive empirical investigations, we discover that the improvement in trustworthiness by RLHF is far from guaranteed, and there exists a complex interplay between preference data, alignment algorithms, and specific trustworthiness aspects. Together, our results underscore the need for more nuanced approaches for model alignment. By shedding light on the intricate dynamics of these components within model alignment, we hope this research will guide the community towards develo** language models that are both capable and trustworthy.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Optimization of District Heating Network Parameters in Steady-State Operation
Authors:
Sai Krishna K. Hari,
Anatoly Zlotnik,
Shriram Srinivasan,
Kaarthik Sundar,
Mary Ewers
Abstract:
We examine the modeling, simulation, and optimization of district heating systems, which are widely used for thermal transport using steam or hot water as a carrier. We propose a generalizable framework to specify network models and scenario parameters, and develop an optimization method for evaluating system states including pressures, fluid flow rates, and temperatures throughout the network. Th…
▽ More
We examine the modeling, simulation, and optimization of district heating systems, which are widely used for thermal transport using steam or hot water as a carrier. We propose a generalizable framework to specify network models and scenario parameters, and develop an optimization method for evaluating system states including pressures, fluid flow rates, and temperatures throughout the network. The network modeling includes pipes, thermal plants, pumps, and passive or controllable loads as system components. We propose basic models for thermodynamic fluid transport and enforce the balance of physical quantities in steady-state flow over co-located outgoing and return networks. We formulate an optimization problem with steam and hot water as the outgoing and return carriers, as in legacy 20th century systems. The physical laws and engineering limitations are specified for each component type, and the thermal network flow optimization (TNFO) problem is formulated and solved for a realistic test network under several scenarios.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Can GPT-4 do L2 analytic assessment?
Authors:
Stefano Bannò,
Hari Krishna Vydana,
Kate M. Knill,
Mark J. F. Gales
Abstract:
Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of larg…
▽ More
Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of large language models presents new opportunities for automating the evaluation of specific aspects of L2 writing proficiency. In this paper, we perform a series of experiments using GPT-4 in a zero-shot fashion on a publicly available dataset annotated with holistic scores based on the Common European Framework of Reference and aim to extract detailed information about their underlying analytic components. We observe significant correlations between the automatically predicted analytic scores and multiple features associated with the individual proficiency components.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Uncovering an Interfacial Band Resulting from Orbital Hybridization in Nickelate Heterostructures
Authors:
Mingyao Chen,
Huimin Liu,
Xu He,
Minjuan Li,
Chi Sin Tang,
Mengxia Sun,
Krishna Prasad Koirala,
Mark E. Bowden,
Yangyang Li,
Xiongfang Liu,
Difan Zhou,
Shuo Sun,
Mark B. H. Breese,
Chuanbing Cai,
Yingge Du,
Andrew T. S. Wee,
Le Wang,
Xinmao Yin
Abstract:
The interaction of atomic orbitals at the interface of perovskite oxide heterostructures has been investigated for its profound impact on the band structures and electronic properties, giving rise to unique electronic states and a variety of tunable functionalities. In this study, we conducted an extensive investigation of the optical and electronic properties of epitaxial NdNiO3 thin films grown…
▽ More
The interaction of atomic orbitals at the interface of perovskite oxide heterostructures has been investigated for its profound impact on the band structures and electronic properties, giving rise to unique electronic states and a variety of tunable functionalities. In this study, we conducted an extensive investigation of the optical and electronic properties of epitaxial NdNiO3 thin films grown on a series of single crystal substrates. Unlike films synthesized on other substrates, NdNiO3 on SrTiO3 (NNO/STO) gives rise to a unique band structure which features an additional unoccupied band situated above the Fermi level. Our comprehensive investigation, which incorporated a wide array of experimental techniques and density functional theory calculations, revealed that the emergence of the interfacial band structure is primarily driven by the orbital hybridization between Ti 3d orbitals of the STO substrate and O 2p orbitals of the NNO thin film. Furthermore, exciton peaks have been detected in the optical spectra of the NNO/STO film, attributable to the pronounced electron-electron (e-e) and electron-hole (e-h) interactions propagating from the STO substrate into the NNO film. These findings underscore the substantial influence of interfacial orbital hybridization on the electronic structure of oxide thin-films, thereby offering key insights into tuning their interfacial properties.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Authors:
Laksh Nanwani,
Kumaraditya Gupta,
Aditya Mathur,
Swayam Agrawal,
A. H. Abdul Hafez,
K. Madhava Krishna
Abstract:
Humans excel at forming mental maps of their surroundings, equip** them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasi…
▽ More
Humans excel at forming mental maps of their surroundings, equip** them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Using LLMs in Software Requirements Specifications: An Empirical Evaluation
Authors:
Madhava Krishna,
Bhagesh Gaur,
Arsh Verma,
Pankaj Jalote
Abstract:
The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software deve…
▽ More
The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software development lifecycle. We assess the performance of GPT-4 and CodeLlama in drafting an SRS for a university club management system and compare it against human benchmarks using eight distinct criteria. Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS, delivering complete and consistent drafts. We also evaluate the capabilities of LLMs to identify and rectify problems in a given requirements document. Our experiments indicate that GPT-4 is capable of identifying issues and giving constructive feedback for rectifying them, while CodeLlama's results for validation were not as encouraging. We repeated the generation exercise for four distinct use cases to study the time saved by employing LLMs for SRS generation. The experiment demonstrates that LLMs may facilitate a significant reduction in development time for entry-level software engineers. Hence, we conclude that the LLMs can be gainfully used by software engineers to increase productivity by saving time and effort in generating, validating and rectifying software requirements.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Utilizing Large Language Models to Identify Reddit Users Considering Va** Cessation for Digital Interventions
Authors:
Sai Krishna Revanth Vuruma,
Dezhi Wu,
Saborny Sen Gupta,
Lucas Aust,
Valerie Lookingbill,
Caleb Henry,
Yang Ren,
Erin Kasson,
Li-Shiun Chen,
Patricia Cavazos-Rehg,
Dian Hu,
Ming Huang
Abstract:
The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of va** or e-cigarette use in the United States and other countr…
▽ More
The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of va** or e-cigarette use in the United States and other countries has caused an outbreak of e-cigarette and va** use-associated lung injury (EVALI), leading to hospitalizations and fatalities in 2019, highlighting the urgency to comprehend va** behaviors and develop effective strategies for cession. In this study, we extracted a sample dataset from one va** sub-community on Reddit to analyze users' quit va** intentions. Leveraging large language models including both the latest GPT-4 and traditional BERT-based language models for sentence-level quit-va** intention prediction tasks, this study compares the outcomes of these models against human annotations. Notably, when compared to human evaluators, GPT-4 model demonstrates superior consistency in adhering to annotation guidelines and processes, showcasing advanced capabilities to detect nuanced user quit-va** intentions that human evaluators might overlook. These preliminary findings emphasize the potential of GPT-4 in enhancing the accuracy and reliability of social media data analysis, especially in identifying subtle users' intentions that may elude human detection.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy
Authors:
Krishnamurthy Dvijotham,
H. Brendan McMahan,
Krishna Pillutla,
Thomas Steinke,
Abhradeep Thakurta
Abstract:
In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differential…
▽ More
In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility.
The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time).
Our first approach is based on a space-efficient streaming matrix multiplication algorithm for a class of Toeplitz matrices. We show that to instantiate this algorithm for DP continual counting, it is sufficient to find a low-degree rational function that approximates the square root on a circle in the complex plane. We then apply and extend tools from approximation theory to achieve this. We also derive efficient closed-forms for the objective function for arbitrarily many steps, and show direct numerical optimization yields a highly practical solution to the problem. Our second approach combines our first approach with a recursive construction similar to the binary tree mechanism.
△ Less
Submitted 6 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Mitigating Automotive Radar Interference using Onboard Intelligent Reflective Surface
Authors:
Shree Prasad Maruthi,
Karrthik G. K.,
Vijaya Krishna A.,
Mahbub Hassan,
**hong Yuan
Abstract:
The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), o…
▽ More
The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), or its "electromagnetic visibility." Our proposed method utilizes the IRS's ability to form a coherent reflection of the incident radar waveform back towards the source radar, thereby improving radar performance under interference. We evaluated both passive and active IRS options. Passive IRS, which does not support reflection amplification, was found to be counter-productive and actually decreased the vehicle's effective RCS instead of enhancing it. In contrast, active IRS, which can amplify the reflection power of individual elements, effectively combats all types of automotive radar interference when the reflective elements are configured with a 15-35 dB reflection gain.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
$Δ$ADAPT-VQE: Toward Accurate Calculation of Excitation Energies on Quantum Computers for BODIPY Molecules With Application in Photodynamic Therapy
Authors:
Anton Nykänen,
Leander Thiessen,
Elsi-Mari Borrelli,
Vijay Krishna,
Stefan Knecht,
Fabijan Pavošević
Abstract:
Quantum chemistry simulations offer a cost-effective way for computational design of BODIPY photosensitizers with potential use in photodynamic therapy (PDT). However, accurate predictions of photophysical properties, such as excitation energies, pose a challenge for the popular time-dependent density functional theory (TDDFT) and equation-of-motion coupled cluster with singles and doubles (EOM-CC…
▽ More
Quantum chemistry simulations offer a cost-effective way for computational design of BODIPY photosensitizers with potential use in photodynamic therapy (PDT). However, accurate predictions of photophysical properties, such as excitation energies, pose a challenge for the popular time-dependent density functional theory (TDDFT) and equation-of-motion coupled cluster with singles and doubles (EOM-CCSD) methods. By contrast, reliable descriptions can be achieved by multi-reference quantum chemistry methods, though unfortunately, their computational cost grows exponentially with the number of correlated electrons. Alternatively, quantum computing holds a great potential for exact simulation of photophysical properties in a computationally more efficient way. To this end, we introduce the state-specific $Δ$UCCSD-VQE (unitary coupled cluster with singles and doubles variational quantum eigensolver) and $Δ$ADAPT-VQE methods in which the electronically excited state is calculated via a non-Aufbau electronic configuration. The accuracy and capability of the developed methods are assessed against experimentally determined excitation energies for six BODIPY-derivatives. We show that the proposed methods predict accurate vertical excitation energies that are not only in good agreement with experimental reference data but also outperform popular quantum chemistry methods, such as TDDFT and EOM-CCSD. Spurred by its impressive performance and simplicity, we are confident that $Δ$ADAPT will emerge as the method of choice for guiding the rational design of photosensitizers for PDT and photocatalysis in the era of near-term quantum computing.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Explainable AI models for predicting liquefaction-induced lateral spreading
Authors:
Cheng-Hsi Hsiao,
Krishna Kumar,
Ellen Rathje
Abstract:
Earthquake-induced liquefaction can cause substantial lateral spreading, posing threats to infrastructure. Machine learning (ML) can improve lateral spreading prediction models by capturing complex soil characteristics and site conditions. However, the "black box" nature of ML models can hinder their adoption in critical decision-making. This study addresses this limitation by using SHapley Additi…
▽ More
Earthquake-induced liquefaction can cause substantial lateral spreading, posing threats to infrastructure. Machine learning (ML) can improve lateral spreading prediction models by capturing complex soil characteristics and site conditions. However, the "black box" nature of ML models can hinder their adoption in critical decision-making. This study addresses this limitation by using SHapley Additive exPlanations (SHAP) to interpret an eXtreme Gradient Boosting (XGB) model for lateral spreading prediction, trained on data from the 2011 Christchurch Earthquake. SHAP analysis reveals the factors driving the model's predictions, enhancing transparency and allowing for comparison with established engineering knowledge. The results demonstrate that the XGB model successfully identifies the importance of soil characteristics derived from Cone Penetration Test (CPT) data in predicting lateral spreading, validating its alignment with domain understanding. This work highlights the value of explainable machine learning for reliable and informed decision-making in geotechnical engineering and hazard assessment.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Modeling liquefaction-induced runout of a tailings dam using a hybrid finite element and material point method approach
Authors:
Brent Sordo,
Ellen Rathje,
Krishna Kumar
Abstract:
Tailings dams impound large amounts of saturated soil which can be highly susceptible to liquefaction. Liquefaction results in a severe loss of strength in the retained soil and potentially failure of the dam. If the dam is breached, a massive debris flow of liquefied soil is then released with potentially disastrous consequences downstream. Numerical models are frequently utilized to predict the…
▽ More
Tailings dams impound large amounts of saturated soil which can be highly susceptible to liquefaction. Liquefaction results in a severe loss of strength in the retained soil and potentially failure of the dam. If the dam is breached, a massive debris flow of liquefied soil is then released with potentially disastrous consequences downstream. Numerical models are frequently utilized to predict the liquefaction response of tailings dams and the potential runout, and these analyses inform engineering decisions regarding hazard avoidance and mitigation. The Finite Element Method (FEM) is a widespread tool which excels at modeling liquefaction triggering and initial movements, but it quickly loses accuracy when modeling large deformations due to mesh distortion. Conversely, the Material Point Method (MPM), a hybrid Eulerian-Lagrangian method, employs particles that move freely across a background grid and can account for large deformations without losing accuracy. However, issues with the accuracy of MPM's stress distributions and limits associated with the available boundary conditions impair its ability to predict liquefaction initiation. In this paper, we utilize a sequential hybridization of the FEM and MPM methods as a superior alternative to either individually. To demonstrate the efficacy of this hybrid method to simulate the entire process of tailings dam failures from initiation to runout, we model the 1978 Mochikoshi Tailings Dam failure.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
BASS: Batched Attention-optimized Speculative Sampling
Authors:
Haifeng Qian,
Sujan Kumar Gonugondla,
Sungsoo Ha,
Mingyue Shang,
Sanjay Krishna Gouda,
Ramesh Nallapati,
Sudipta Sengupta,
Xiaofei Ma,
Anoop Deoras
Abstract:
Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges.…
▽ More
Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15X speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what's feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3X the highest of that of regular decoding and around 10X of single-sequence speculative decoding.
△ Less
Submitted 26 June, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Authors:
Ankit Vani,
Bac Nguyen,
Samuel Lavoie,
Ranjay Krishna,
Aaron Courville
Abstract:
Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often f…
▽ More
Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Minimum Consistent Subset in Trees and Interval Graphs
Authors:
Aritra Banik,
Sayani Das,
Anil Maheshwari,
Bubai Manna,
Subhas C Nandy,
Krishna Priya K M,
Bodhayan Roy,
Sasanka Roy,
Abhishek Sahu
Abstract:
In the Minimum Consistent Subset (MCS) problem, we are presented with a connected simple undirected graph $G=(V,E)$, consisting of a vertex set $V$ of size $n$ and an edge set $E$. Each vertex in $V$ is assigned a color from the set $\{1,2,\ldots, c\}$. The objective is to determine a subset $V' \subseteq V$ with minimum possible cardinality, such that for every vertex $v \in V$, at least one of i…
▽ More
In the Minimum Consistent Subset (MCS) problem, we are presented with a connected simple undirected graph $G=(V,E)$, consisting of a vertex set $V$ of size $n$ and an edge set $E$. Each vertex in $V$ is assigned a color from the set $\{1,2,\ldots, c\}$. The objective is to determine a subset $V' \subseteq V$ with minimum possible cardinality, such that for every vertex $v \in V$, at least one of its nearest neighbors in $V'$ (measured in terms of the hop distance) shares the same color as $v$. The decision problem, indicating whether there exists a subset $V'$ of cardinality at most $l$ for some positive integer $l$, is known to be NP-complete even for planar graphs.
In this paper, we establish that the MCS problem for trees, when the number of colors $c$ is considered an input parameter, is NP-complete. We propose a fixed-parameter tractable (FPT) algorithm for MCS on trees running in $O(2^{6c}n^6)$ time, significantly improving the currently best-known algorithm whose running time is $O(2^{4c}n^{2c+3})$.
In an effort to comprehensively understand the computational complexity of the MCS problem across different graph classes, we extend our investigation to interval graphs. We show that it remains NP-complete for interval graphs, thus enriching graph classes where MCS remains intractable.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Evaluating Fast Adaptability of Neural Networks for Brain-Computer Interface
Authors:
Anupam Sharma,
Krishna Miyapuram
Abstract:
Electroencephalography (EEG) classification is a versatile and portable technique for building non-invasive Brain-computer Interfaces (BCI). However, the classifiers that decode cognitive states from EEG brain data perform poorly when tested on newer domains, such as tasks or individuals absent during model training. Researchers have recently used complex strategies like Model-agnostic meta-learni…
▽ More
Electroencephalography (EEG) classification is a versatile and portable technique for building non-invasive Brain-computer Interfaces (BCI). However, the classifiers that decode cognitive states from EEG brain data perform poorly when tested on newer domains, such as tasks or individuals absent during model training. Researchers have recently used complex strategies like Model-agnostic meta-learning (MAML) for domain adaptation. Nevertheless, there is a need for an evaluation strategy to evaluate the fast adaptability of the models, as this characteristic is essential for real-life BCI applications for quick calibration. We used motor movement and imaginary signals as input to Convolutional Neural Networks (CNN) based classifier for the experiments. Datasets with EEG signals typically have fewer examples and higher time resolution. Even though batch-normalization is preferred for Convolutional Neural Networks (CNN), we empirically show that layer-normalization can improve the adaptability of CNN-based EEG classifiers with not more than ten fine-tuning steps. In summary, the present work (i) proposes a simple strategy to evaluate fast adaptability, and (ii) empirically demonstrate fast adaptability across individuals as well as across tasks with simple transfer learning as compared to MAML approach.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
UCINet0: A Machine Learning based Receiver for 5G NR PUCCH Format 0
Authors:
Anil Kumar Yerrapragada,
Jeeva Keshav Sattianarayanin,
Radha Krishna Ganti
Abstract:
Accurate decoding of Uplink Control Information (UCI) on the Physical Uplink Control Channel (PUCCH) is essential for enabling 5G wireless links. This paper explores an AI/ML-based receiver design for PUCCH Format 0. Format 0 signaling encodes the UCI content within the phase of a known base waveform and even supports multiplexing of up to 12 users within the same time-frequency resources. Our fir…
▽ More
Accurate decoding of Uplink Control Information (UCI) on the Physical Uplink Control Channel (PUCCH) is essential for enabling 5G wireless links. This paper explores an AI/ML-based receiver design for PUCCH Format 0. Format 0 signaling encodes the UCI content within the phase of a known base waveform and even supports multiplexing of up to 12 users within the same time-frequency resources. Our first-of-a-kind neural network classifier, which we term UCINet0, is capable of predicting when no user is transmitting on the PUCCH, as well as decoding the UCI content of any number of multiplexed users, up to 12. Inference results with both simulated and hardware-captured field datasets show that the UCINet0 model outperforms conventional DFT-based decoders across all SNR ranges.
△ Less
Submitted 10 March, 2024;
originally announced April 2024.
-
Evolution of Magnetism in Magnetic Topological Semimetal NdSb$_x$Te$_{2-x+δ}$
Authors:
Santosh Karki Chhetri,
Rabindra Basnet,
Jian Wang,
Krishna Pandey,
Gokul Acharya,
Md Rafique Un Nabi,
Dinesh Upreti,
Josh Sakon,
Mansour Mortazavi,
** Hu
Abstract:
Magnetic topological semimetals LnSbTe (Ln = Lanthanide) have attracted intensive attention because of the presence of interplay between magnetism, topological, and electron correlations depending on the choices of magnetic Ln elements. Recently, varying Sb-Te composition has been found to effectively control the electronic and magnetic states in LnSbxTe$_{2-x}$. With this motivation, we report th…
▽ More
Magnetic topological semimetals LnSbTe (Ln = Lanthanide) have attracted intensive attention because of the presence of interplay between magnetism, topological, and electron correlations depending on the choices of magnetic Ln elements. Recently, varying Sb-Te composition has been found to effectively control the electronic and magnetic states in LnSbxTe$_{2-x}$. With this motivation, we report the evolution of magnetic properties with Sb-Te substitution in NdSb$_x$Te$_{2-x+δ}$. Our work reveals the interesting non-monotonic change in magnetic ordering temperature with varying composition stoichiometry. In addition, reducing the Sb content x drives the reorientation of moments from in-plane (ab-plane) to out-of-plane (c-axis) direction that results in the distinct magnetic structures for two end compounds NdTe$_2$ ($x = 0$) and NdSbTe ($x = 1$). Furthermore, the moment orientation in NdSb$_x$Te$_{2-x+δ}$ is also found to be strongly tunable upon application of weak magnetic field, leading to rich magnetic phases depending on the composition stoichiometry, temperature, and magnetic field. Such strong tuning of magnetism in this material establishes it as a promising platform for investigating tunable topological states and correlated topological physics.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Cell Balancing for the Transportation Sector: Techniques, Challenges, and Future Research Directions
Authors:
Anupama R Itagi,
Rakhee Kallimani,
Krishna Pai,
Sridhar Iyer,
Onel L. A. Lopez
Abstract:
Efficient and reliable energy systems are key to progress of society. High performance batteries are essential for widely used technologies like Electric Vehicles (EVs) and portable electronics. Additionally, an effective Battery Management System (BMS) is crucial to oversee vital parameters of battery. However, BMS can experience cell imbalance due to charging/discharging dynamics, which reduce b…
▽ More
Efficient and reliable energy systems are key to progress of society. High performance batteries are essential for widely used technologies like Electric Vehicles (EVs) and portable electronics. Additionally, an effective Battery Management System (BMS) is crucial to oversee vital parameters of battery. However, BMS can experience cell imbalance due to charging/discharging dynamics, which reduce battery capacity, lifespan, and efficiency, and raise critical safety concerns. This calls for effective cell-balancing techniques. Notably, the existing literature on cell balancing is limited, urgently necessitating a thorough survey to pinpoint key research gaps and suggest prompt solutions. In this article, cell balancing and corresponding techniques are reviewed. Initially, we detail comparison of passive cell balancing techniques and assess their respective advantages, drawbacks, and practical applications. Then, we discuss the strengths and weaknesses of active cell balancing methods and applicability of cell balancing for both, series and parallel-connected cells. Additionally, we examine the need for cell balancing in commonly used batteries, and applications in EVs. Lastly, we present detailed prospects which include challenges and directions for future research.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction
Authors:
Qinyuan Wu,
Mohammad Aflah Khan,
Soumi Das,
Vedant Nanda,
Bishwamittra Ghosh,
Camila Kolling,
Till Speicher,
Laurent Bindschaedler,
Krishna P. Gummadi,
Evimaria Terzi
Abstract:
We propose an approach for estimating the latent knowledge embedded inside large language models (LLMs). We leverage the in-context learning (ICL) abilities of LLMs to estimate the extent to which an LLM knows the facts stored in a knowledge base. Our knowledge estimator avoids reliability concerns with previous prompting-based methods, is both conceptually simpler and easier to apply, and we demo…
▽ More
We propose an approach for estimating the latent knowledge embedded inside large language models (LLMs). We leverage the in-context learning (ICL) abilities of LLMs to estimate the extent to which an LLM knows the facts stored in a knowledge base. Our knowledge estimator avoids reliability concerns with previous prompting-based methods, is both conceptually simpler and easier to apply, and we demonstrate that it can surface more of the latent knowledge embedded in LLMs. We also investigate how different design choices affect the performance of ICL-based knowledge estimation. Using the proposed estimator, we perform a large-scale evaluation of the factual knowledge of a variety of open source LLMs, like OPT, Pythia, Llama(2), Mistral, Gemma, etc. over a large set of relations and facts from the Wikidata knowledge base. We observe differences in the factual knowledge between different model families and models of different sizes, that some relations are consistently better known than others but that models differ in the precise facts they know, and differences in the knowledge of base models and their finetuned counterparts.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Authors:
Xingyu Fu,
Yushi Hu,
Bangzheng Li,
Yu Feng,
Haoyu Wang,
Xudong Lin,
Dan Roth,
Noah A. Smith,
Wei-Chiu Ma,
Ranjay Krishna
Abstract:
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challeng…
▽ More
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challenges for current multimodal LLMs because they resist mediation through natural language. Blink reformats 14 classic computer vision tasks into 3,807 multiple-choice questions, paired with single or multiple images and visual prompting. While humans get 95.70% accuracy on average, Blink is surprisingly challenging for existing multimodal LLMs: even the best-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only 13.17% and 7.63% higher than random guessing, indicating that such perception abilities have not "emerged" yet in recent multimodal LLMs. Our analysis also highlights that specialist CV models could solve these problems much better, suggesting potential pathways for future improvements. We believe Blink will stimulate the community to help multimodal LLMs catch up with human-level visual perception.
△ Less
Submitted 3 July, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Characterization of Capacity and Outage of RIS-aided Downlink Systems under Rician Fading
Authors:
Kali Krishna Kota,
Praful D. Mankar,
Harpreet S. Dhillon
Abstract:
This letter presents optimal beamforming and outage analysis for a Reconfigurable Intelligent Surface (RIS)-aided multiple input single output downlink system under Rician fading on both the direct and the RIS-assisted indirect links. We focus on maximizing the capacity for two transmitter architectures: fully digital (FD) and fully analog (FA). This capacity maximization problem with optimally co…
▽ More
This letter presents optimal beamforming and outage analysis for a Reconfigurable Intelligent Surface (RIS)-aided multiple input single output downlink system under Rician fading on both the direct and the RIS-assisted indirect links. We focus on maximizing the capacity for two transmitter architectures: fully digital (FD) and fully analog (FA). This capacity maximization problem with optimally configured RIS is shown to be $L_1$ norm-maximization with respect to the transmit beamformer. To obtain the optimal FD beamformer, we propose a complex $L_1$-PCA-based algorithm whose complexity is significantly lower than the existing semi-definite relaxation-based solutions. We also propose a low-complexity optimal beamforming algorithm to obtain the FA beamformer solution. Further, we derive analytical upper bounds on the SNR achievable by the proposed algorithms and utilize them to characterize the lower bounds on outage probabilities. The derived bounds are numerically shown to closely match the achievable performance for a low-rank channel matrix and are shown to be exact for a unit-rank channel matrix.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Language Model Cascades: Token-level uncertainty and beyond
Authors:
Neha Gupta,
Harikrishna Narasimhan,
Wittawat Jitkrittum,
Ankit Singh Rawat,
Aditya Krishna Menon,
Sanjiv Kumar
Abstract:
Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c…
▽ More
Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks. In this work, we initiate a systematic study of deferral rules for LM cascades. We begin by examining the natural extension of predicted class uncertainty to generative LM tasks, namely, the predicted sequence uncertainty. We show that this measure suffers from the length bias problem, either over- or under-emphasizing outputs based on their lengths. This is because LMs produce a sequence of uncertainty values, one for each output token; and moreover, the number of output tokens is variable across examples. To mitigate this issue, we propose to exploit the richer token-level uncertainty information implicit in generative LMs. We argue that naive predicted sequence uncertainty corresponds to a simple aggregation of these uncertainties. By contrast, we show that incorporating token-level uncertainty through learned post-hoc deferral rules can significantly outperform such simple aggregation strategies, via experiments on a range of natural language benchmarks with FLAN-T5 models. We further show that incorporating embeddings from the smaller model and intermediate layers of the larger model can give an additional boost in the overall cost-quality tradeoff.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Beyond the Rings: Polar Ring Galaxy NGC 4262 and its Globular Cluster System
Authors:
Akhil Krishna R,
Sreeja S Kartha,
Blesson Mathew
Abstract:
In the context of the hierarchical model of galaxy evolution, polar ring galaxies (PRGs) are considered the intermediate phase between ongoing mergers and quiescent galaxies. This study explores the globular cluster system (GCS) and its properties in the nearest PRG, NGC4262, serving as a pilot investigation to study GCS in nearby PRGs. We utilize wide and deep field observations of the CFHT as pa…
▽ More
In the context of the hierarchical model of galaxy evolution, polar ring galaxies (PRGs) are considered the intermediate phase between ongoing mergers and quiescent galaxies. This study explores the globular cluster system (GCS) and its properties in the nearest PRG, NGC4262, serving as a pilot investigation to study GCS in nearby PRGs. We utilize wide and deep field observations of the CFHT as part of the NGVS to investigate the GCS of NGC4262. We presented the first optical image of NGC4262 with an optically faint ring component. The photometric analysis of the GCS displays a distinct color bimodality. We estimate the total number of GCs for NGC4262 to be 266$\pm$16 GCs with a specific frequency of 4.2$\pm$0.8 and a specific mass of 0.23$\pm$0.01, which is relatively high compared to other galaxies of similar mass and environmental conditions. The spatial and azimuthal distributions of subpopulations reveal strong evidence of previous interactions within the host galaxy. The color distribution of GCS in NGC4262 shows a gradient of -0.05$\pm$0.01 within 5.5\arcmin, supporting the notion of past interactions and evolutionary transitions. PRG NGC4262 conforms to the overall trend of the GCS mass with respect to the halo mass. Furthermore, our investigation of the global scaling relations between GCS and host galaxy parameters provides further support for the hypothesis that PRGs are an intermediate phase connecting ongoing mergers and quiescent galaxies.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Parametric Sensitivities of a Wind-driven Baroclinic Ocean Using Neural Surrogates
Authors:
Yixuan Sun,
Elizabeth Cucuzzella,
Steven Brus,
Sri Hari Krishna Narayanan,
Balasubramanya Nadiga,
Luke Van Roekel,
Jan Hückelheim,
Sandeep Madireddy,
Patrick Heimbach
Abstract:
Numerical models of the ocean and ice sheets are crucial for understanding and simulating the impact of greenhouse gases on the global climate. Oceanic processes affect phenomena such as hurricanes, extreme precipitation, and droughts. Ocean models rely on subgrid-scale parameterizations that require calibration and often significantly affect model skill. When model sensitivities to parameters can…
▽ More
Numerical models of the ocean and ice sheets are crucial for understanding and simulating the impact of greenhouse gases on the global climate. Oceanic processes affect phenomena such as hurricanes, extreme precipitation, and droughts. Ocean models rely on subgrid-scale parameterizations that require calibration and often significantly affect model skill. When model sensitivities to parameters can be computed by using approaches such as automatic differentiation, they can be used for such calibration toward reducing the misfit between model output and data. Because the SOMA model code is challenging to differentiate, we have created neural network-based surrogates for estimating the sensitivity of the ocean model to model parameters. We first generated perturbed parameter ensemble data for an idealized ocean model and trained three surrogate neural network models. The neural surrogates accurately predicted the one-step forward ocean dynamics, of which we then computed the parametric sensitivity.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Optimum Beamforming and Grating Lobe Mitigation for Intelligent Reflecting Surfaces
Authors:
Sai Sanjay Narayanan,
Uday K Khankhoje,
Radha Krishna Ganti
Abstract:
Ensuring adequate wireless coverage in upcoming communication technologies such as 6G is expected to be challenging. This is because user demands of higher datarate require an increase in carrier frequencies, which in turn reduce the diffraction effects (and hence coverage) in complex multipath environments. Intelligent reflecting surfaces have been proposed as a way of restoring coverage by adapt…
▽ More
Ensuring adequate wireless coverage in upcoming communication technologies such as 6G is expected to be challenging. This is because user demands of higher datarate require an increase in carrier frequencies, which in turn reduce the diffraction effects (and hence coverage) in complex multipath environments. Intelligent reflecting surfaces have been proposed as a way of restoring coverage by adaptively reflecting incoming electromagnetic waves in desired directions. This is accomplished by judiciously adding extra phases at different points on the surface. In practice, these extra phases are only available in discrete quantities due to hardware constraints. Computing these extra phases is computationally challenging when they can only be picked from a discrete distribution, and existing approaches for solving this problem were either heuristic or based on evolutionary algorithms. We solve this problem by proposing fast algorithms with provably optimal solutions. Our algorithms have linear complexity, and are presented with rigorous proofs for their optimality. We show that the proposed algorithms exhibit better performance. We analyze situations when unwanted grating lobes arise in the radiation pattern, and discuss mitigation strategies, such as the use of triangular lattices and prephasing techniques, to eliminate them. We also demonstrate how our algorithms can leverage these techniques to deliver optimum beamforming solutions.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Capacity Maximization for RIS-assisted Multi-user MISO Communication Systems
Authors:
M. S. S. Manasa,
Kali Krishna Kota,
Praful D. Mankar,
Harpreet S. Dhillon
Abstract:
We consider a multi-user multiple input single output (MU-MISO) system assisted by a reconfigurable intelligent surface (RIS). For such a system, we aim to optimally select the RIS phase shifts and precoding vectors for maximizing the effective rank of the weighted channel covariance matrix which in turn improves the channel capacity. For a low-complex transmitter design, we employ maximum ratio t…
▽ More
We consider a multi-user multiple input single output (MU-MISO) system assisted by a reconfigurable intelligent surface (RIS). For such a system, we aim to optimally select the RIS phase shifts and precoding vectors for maximizing the effective rank of the weighted channel covariance matrix which in turn improves the channel capacity. For a low-complex transmitter design, we employ maximum ratio transmission (MRT) and minimum-mean square error (MMSE) precoding schemes along with water-filling algorithm-based power allocation. Further, we show that MRT and MMSE exhibit equivalent performance and become optimal when the channel effective rank is maximized by optimally configuring the RIS consisting of a large number of elements.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Overcoming Scene Context Constraints for Object Detection in wild using Defilters
Authors:
Vamshi Krishna Kancharla,
Neelam sinha
Abstract:
This paper focuses on improving object detection performance by addressing the issue of image distortions, commonly encountered in uncontrolled acquisition environments. High-level computer vision tasks such as object detection, recognition, and segmentation are particularly sensitive to image distortion. To address this issue, we propose a novel approach employing an image defilter to rectify ima…
▽ More
This paper focuses on improving object detection performance by addressing the issue of image distortions, commonly encountered in uncontrolled acquisition environments. High-level computer vision tasks such as object detection, recognition, and segmentation are particularly sensitive to image distortion. To address this issue, we propose a novel approach employing an image defilter to rectify image distortion prior to object detection. This method enhances object detection accuracy, as models perform optimally when trained on non-distorted images. Our experiments demonstrate that utilizing defiltered images significantly improves mean average precision compared to training object detection models on distorted images. Consequently, our proposed method offers considerable benefits for real-world applications plagued by image distortion. To our knowledge, the contribution lies in employing distortion-removal paradigm for object detection on images captured in natural settings. We achieved an improvement of 0.562 and 0.564 of mean Average precision on validation and test data.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
A Novel Optimization-Based Collision Avoidance For Autonomous On-Orbit Assembly
Authors:
Siavash Tavana,
Sepideh Faghihi,
Anton de Ruiter,
Krishna Dev Kumar
Abstract:
The collision avoidance constraints are prominent as non-convex, non-differentiable, and challenging when defined in optimization-based motion planning problems. To overcome these issues, this paper presents a novel non-conservative collision avoidance technique using the notion of convex optimization to establish the distance between robotic spacecraft and space structures for autonomous on-orbit…
▽ More
The collision avoidance constraints are prominent as non-convex, non-differentiable, and challenging when defined in optimization-based motion planning problems. To overcome these issues, this paper presents a novel non-conservative collision avoidance technique using the notion of convex optimization to establish the distance between robotic spacecraft and space structures for autonomous on-orbit assembly operations. The proposed technique defines each ellipsoidal- and polyhedral-shaped object as the union of convex compact sets, each represented non-conservatively by a real-valued convex function. Then, the functions are introduced as a set of constraints to a convex optimization problem to produce a new set of differentiable constraints resulting from the optimality conditions. These new constraints are later fed into an optimal control problem to enforce collision avoidance where the motion planning for the autonomous on-orbit assembly takes place. Numerical experiments for two assembly scenarios in tight environments are presented to demonstrate the capability and effectiveness of the proposed technique. The results show that this framework leads to optimal non-conservative trajectories for robotic spacecraft in tight environments. Although developed for autonomous on-orbit assembly, this technique could be used for any generic motion planning problem where collision avoidance is crucial.
△ Less
Submitted 15 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training
Authors:
Johannes Burchert,
Thorben Werner,
Vijaya Krishna Yalavarthi,
Diego Coello de Portugal,
Maximilian Stubbemann,
Lars Schmidt-Thieme
Abstract:
As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for e…
▽ More
As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding
Authors:
Yash Mehan,
Kumaraditya Gupta,
Rohit Jayanti,
Anirudh Govil,
Sourav Garg,
Madhava Krishna
Abstract:
Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction. Robotic tasks such as planning and navigation require a semantic understanding of the scene as well. This is typically achieved via object-level semantic segmentation. However, such methods struggle to segment out topological regions like "kitchen" in the scene. In this w…
▽ More
Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction. Robotic tasks such as planning and navigation require a semantic understanding of the scene as well. This is typically achieved via object-level semantic segmentation. However, such methods struggle to segment out topological regions like "kitchen" in the scene. In this work, we introduce a two-step pipeline. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self-attention transformer. Our language-topology alignment supports natural language querying, e.g., a "place to cook" locates the "kitchen". We outperform the current state-of-the-art on room segmentation by ~20% and room classification by ~12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Causal third-order viscous hydrodynamics within relaxation-time approximation
Authors:
Pushpa Panday,
Amaresh Jaiswal,
Binoy Krishna Patra
Abstract:
In the present work, we derive a linearly stable and causal theory of relativistic third-order viscous hydrodynamics from the Boltzmann equation with relaxation-time approximation. We employ viscous correction to the distribution function obtained using a Chapman-Enskog like iterative solution of the Boltzmann equation. Our derivation highlights the necessity of incorporating a new dynamical degre…
▽ More
In the present work, we derive a linearly stable and causal theory of relativistic third-order viscous hydrodynamics from the Boltzmann equation with relaxation-time approximation. We employ viscous correction to the distribution function obtained using a Chapman-Enskog like iterative solution of the Boltzmann equation. Our derivation highlights the necessity of incorporating a new dynamical degree of freedom, specifically an irreducible tensors of rank three, within this framework. This differs from the recent formulation of causal third-order theory from the method of moments which requires two dynamical degrees of freedom: an irreducible third-rank and a fourth-rank tensor. We verify the linear stability and causality of the proposed formulation by examining perturbations around a global equilibrium state.
△ Less
Submitted 29 May, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
EVE: Enabling Anyone to Train Robots using Augmented Reality
Authors:
Jun Wang,
Chun-Cheng Chang,
Jiafei Duan,
Dieter Fox,
Ranjay Krishna
Abstract:
The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training robots to automate tasks typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iO…
▽ More
The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training robots to automate tasks typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories. In a user study ($N=14$, $D=30$) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching-physically moving a real robot-in completion time, usability, motion intent communication, enjoyment, and preference ($mean_{p}=0.30$). We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.
△ Less
Submitted 18 May, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Authors:
Bo Peng,
Daniel Goldstein,
Quentin Anthony,
Alon Albalak,
Eric Alcaide,
Stella Biderman,
Eugene Cheah,
Xingjian Du,
Teddy Ferdinan,
Haowen Hou,
Przemysław Kazienko,
Kranthi Kiran GV,
Jan Kocoń,
Bartłomiej Koptyra,
Satyapriya Krishna,
Ronald McClelland Jr.,
Niklas Muennighoff,
Fares Obeid,
Atsushi Saito,
Guangyu Song,
Haoqin Tu,
Stanisław Woźniak,
Ruichong Zhang,
Bingchen Zhao,
Qihang Zhao
, et al. (3 additional authors not shown)
Abstract:
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni…
▽ More
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer
△ Less
Submitted 10 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Streamlining Ocean Dynamics Modeling with Fourier Neural Operators: A Multiobjective Hyperparameter and Architecture Optimization Approach
Authors:
Yixuan Sun,
Ololade Sowunmi,
Romain Egele,
Sri Hari Krishna Narayanan,
Luke Van Roekel,
Prasanna Balaprakash
Abstract:
Training an effective deep learning model to learn ocean processes involves careful choices of various hyperparameters. We leverage the advanced search algorithms for multiobjective optimization in DeepHyper, a scalable hyperparameter optimization software, to streamline the development of neural networks tailored for ocean modeling. The focus is on optimizing Fourier neural operators (FNOs), a da…
▽ More
Training an effective deep learning model to learn ocean processes involves careful choices of various hyperparameters. We leverage the advanced search algorithms for multiobjective optimization in DeepHyper, a scalable hyperparameter optimization software, to streamline the development of neural networks tailored for ocean modeling. The focus is on optimizing Fourier neural operators (FNOs), a data-driven model capable of simulating complex ocean behaviors. Selecting the correct model and tuning the hyperparameters are challenging tasks, requiring much effort to ensure model accuracy. DeepHyper allows efficient exploration of hyperparameters associated with data preprocessing, FNO architecture-related hyperparameters, and various model training strategies. We aim to obtain an optimal set of hyperparameters leading to the most performant model. Moreover, on top of the commonly used mean squared error for model training, we propose adopting the negative anomaly correlation coefficient as the additional loss term to improve model performance and investigate the potential trade-off between the two terms. The experimental results show that the optimal set of hyperparameters enhanced model performance in single timestep** forecasting and greatly exceeded the baseline configuration in the autoregressive rollout for long-horizon forecasting up to 30 days. Utilizing DeepHyper, we demonstrate an approach to enhance the use of FNOs in ocean dynamics forecasting, offering a scalable solution with improved precision.
△ Less
Submitted 10 April, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Infrared Spectroscopy for Diagnosing Superlattice Minibands in Magic-angle Twisted Bilayer Graphene
Authors:
Geng Li,
Roshan Krishna Kumar,
Petr Stepanov,
Pierre A. Pantaleón,
Zhen Zhan,
Hitesh Agarwal,
Adrien Bercher,
Julien Barrier,
Kenji Watanabe,
Takashi Taniguchi,
Alexey B. Kuzmenko,
Francisco Guinea,
Iacopo Torre,
Frank H. L. Koppens
Abstract:
Twisted bilayer graphene (TBG) represents a highly tunable, strongly correlated electron system owed to its unique flat electronic bands. However, understanding the single-particle band structure alone has been challenging due to complex lattice reconstruction effects and a lack of spectroscopic measurements over a broad energy range. Here, we probe the band structure of TBG around the magic angle…
▽ More
Twisted bilayer graphene (TBG) represents a highly tunable, strongly correlated electron system owed to its unique flat electronic bands. However, understanding the single-particle band structure alone has been challenging due to complex lattice reconstruction effects and a lack of spectroscopic measurements over a broad energy range. Here, we probe the band structure of TBG around the magic angle using infrared spectroscopy. Our measurements reveal spectral features originating from interband transitions whose energies are uniquely defined by the twist angle. By combining with quantum transport, we connect spectral features over a broad energy range (10 to 700 meV) spanning several superlattice minibands and track their evolution with twist angle. We compare our data with calculations of the band structures obtained via the continuum model and find good agreement only when considering a variation of interlayer/intralayer tunnelling parameters with the twist angle. Our analysis suggests that the magic angle also shifts due to lattice relaxation, and is better defined for a wide angular range from 0.9° to 1.1°. Our work provides spectroscopic insights into TBG's band structure and offers an optical fingerprint of the magic angle for screening heterostructures before nanofabrication.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Toward the van Benthem Characterization Theorem for Non-Distributive Modal Logic
Authors:
Yiwen Ding,
Krishna Manoorkar,
Mattia Panettiere,
Ruoding Wang
Abstract:
In this paper, we introduce the simulations and bisimulations on polarity-based semantics for non-distributive modal logic, which are natural generalizations of those notions on Kripke semantics for modal logic. We also generalize other important model-theoretic notions about Kripke semantics such as image-finite models, modally-saturated models, ultrafilter extension and ultrapower extension to t…
▽ More
In this paper, we introduce the simulations and bisimulations on polarity-based semantics for non-distributive modal logic, which are natural generalizations of those notions on Kripke semantics for modal logic. We also generalize other important model-theoretic notions about Kripke semantics such as image-finite models, modally-saturated models, ultrafilter extension and ultrapower extension to the non-distributive setting. By using these generalizations, we prove the Hennessy-Milner theorem and the van Benthem characterization theorem for non-distributive modal logic based on polarity-based semantics.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis
Authors:
Rohit Agarwal,
Arijit Das,
Alexander Horsch,
Krishna Agarwal,
Dilip K. Prasad
Abstract:
The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss,…
▽ More
The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss, classify, evaluate, and compare the methodologies that are adept at modeling haphazard inputs, additionally providing the corresponding code implementations and their carbon footprint. Moreover, we classify the datasets related to the field of haphazard inputs and introduce evaluation metrics specifically designed for datasets exhibiting imbalance. The code of each methodology can be found at https://github.com/Rohit102497/HaphazardInputsReview
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation
Authors:
Gaurav Singh,
Sanket Kalwar,
Md Faizal Karim,
Bipasha Sen,
Nagamanikandan Govindan,
Srinath Sridhar,
K Madhava Krishna
Abstract:
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore set…
▽ More
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained gras** without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
PointSAGE: Mesh-independent superresolution approach to fluid flow predictions
Authors:
Rajat Sarkar,
Krishna Sai Sudhir Aripirala,
Vishal Sudam Jadhav,
Sagar Srinivas Sakhinana,
Venkataramana Runkana
Abstract:
Computational Fluid Dynamics (CFD) serves as a powerful tool for simulating fluid flow across diverse industries. High-resolution CFD simulations offer valuable insights into fluid behavior and flow patterns, aiding in optimizing design features or enhancing system performance. However, as resolution increases, computational data requirements and time increase proportionately. This presents a pers…
▽ More
Computational Fluid Dynamics (CFD) serves as a powerful tool for simulating fluid flow across diverse industries. High-resolution CFD simulations offer valuable insights into fluid behavior and flow patterns, aiding in optimizing design features or enhancing system performance. However, as resolution increases, computational data requirements and time increase proportionately. This presents a persistent challenge in CFD. Recently, efforts have been directed towards accurately predicting fine-mesh simulations using coarse-mesh simulations, with geometry and boundary conditions as input. Drawing inspiration from models designed for super-resolution, deep learning techniques like UNets have been applied to address this challenge. However, these existing methods are limited to structured data and fail if the mesh is unstructured due to its inability to convolute. Additionally, incorporating geometry/mesh information in the training process introduces drawbacks such as increased data requirements, challenges in generalizing to unseen geometries for the same physical phenomena, and issues with robustness to mesh distortions. To address these concerns, we propose a novel framework, PointSAGE a mesh-independent network that leverages the unordered, mesh-less nature of Pointcloud to learn the complex fluid flow and directly predict fine simulations, completely neglecting mesh information. Utilizing an adaptable framework, the model accurately predicts the fine data across diverse point cloud sizes, regardless of the training dataset's dimension. We have evaluated the effectiveness of PointSAGE on diverse datasets in different scenarios, demonstrating notable results and a significant acceleration in computational time in generating fine simulations compared to standard CFD techniques.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations
Authors:
Krishna Subramani,
Paris Smaragdis,
Takuya Higuchi,
Mehrez Souden
Abstract:
Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short-Time Fourier Transform. However extending these applications to irregularly-spaced TF representations, like the Constant-Q transform, wavelets, or si…
▽ More
Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short-Time Fourier Transform. However extending these applications to irregularly-spaced TF representations, like the Constant-Q transform, wavelets, or sinusoidal analysis models, has not been possible since these representations cannot be directly stored in matrix form. In this paper, we formulate NMF in terms of continuous functions (instead of fixed vectors) and show that NMF can be extended to a wider variety of signal classes that need not be regularly sampled.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations
Authors:
Zishen Wan,
Che-Kai Liu,
Mohamed Ibrahim,
Hanchen Yang,
Samuel Spetalnick,
Tushar Krishna,
Arijit Raychowdhury
Abstract:
Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative com…
▽ More
Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative computation with high-dimensional matrix-vector multiplications and suffers from non-convergence problems.
In this paper, we present H3DFact, a heterogeneous 3D integrated in-memory compute engine capable of efficiently factorizing high-dimensional holographic representations. H3DFact exploits the computation-in-superposition capability of holographic vectors and the intrinsic stochasticity associated with memristive-based 3D compute-in-memory. Evaluated on large-scale factorization and perceptual problems, H3DFact demonstrates superior capability in factorization accuracy and operational capacity by up to five orders of magnitude, with 5.5x compute density, 1.2x energy efficiency improvements, and 5.9x less silicon footprint compared to iso-capacity 2D designs.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.