-
Linear codes with few weights over $\mathbb{F}_{p}+u\mathbb{F}_{p}$
Authors:
Pavan Kumar,
Noor Mohammad Khan
Abstract:
For any positive integer $m$ and an odd prime $p$; let $\mathbb{F}_{q}+u\mathbb{F}_{q}$, where $q=p^{m}$, be a ring extension of the ring $\mathbb{F}_{p}+u\mathbb{F}_{p}.$
In this paper, we construct linear codes over $\mathbb{F}_{p}+u\mathbb{F}_{p}$ by using trace function defined on $\mathbb{F}_{q}+u\mathbb{F}_{q}$ and determine their Hamming weight distributions by employing symplectic-weight…
▽ More
For any positive integer $m$ and an odd prime $p$; let $\mathbb{F}_{q}+u\mathbb{F}_{q}$, where $q=p^{m}$, be a ring extension of the ring $\mathbb{F}_{p}+u\mathbb{F}_{p}.$
In this paper, we construct linear codes over $\mathbb{F}_{p}+u\mathbb{F}_{p}$ by using trace function defined on $\mathbb{F}_{q}+u\mathbb{F}_{q}$ and determine their Hamming weight distributions by employing symplectic-weight distributions of their Gray images.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Five-Lee-weight linear codes over $\mathbb{F}_{q}+u\mathbb{F}_{q}$
Authors:
Dr. Pavan Kumar,
Dr. Noor Mohammad Khan
Abstract:
In this study, linear codes having their Lee-weight distributions over the semi-local ring $\mathbb{F}_{q}+u\mathbb{F}_{q}$ with $u^{2}=1$ are constructed using the defining set and Gauss sums for an odd prime $q $. Moreover, we derive complete Hamming-weight enumerators for the images of the constructed linear codes under the Gray map. We finally show an application to secret sharing schemes.
In this study, linear codes having their Lee-weight distributions over the semi-local ring $\mathbb{F}_{q}+u\mathbb{F}_{q}$ with $u^{2}=1$ are constructed using the defining set and Gauss sums for an odd prime $q $. Moreover, we derive complete Hamming-weight enumerators for the images of the constructed linear codes under the Gray map. We finally show an application to secret sharing schemes.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Overlay Space-Air-Ground Integrated Networks with SWIPT-Empowered Aerial Communications
Authors:
Anuradha Verma,
Pankaj Kumar Sharma,
Pawan Kumar,
Dong In Kim
Abstract:
In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employ…
▽ More
In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employed at the aerial transmitter. Specifically, we take the random locations of the satellite, ground and aerial receivers to investigate the outage performance of both the satellite-to-ground and aerial networks leveraging the stochastic tools. By taking into account the Shadowed-Rician fading for satellite link, the Nakagami-\emph{m} for ground link, and the Rician fading for aerial link, we derive analytical expressions for the outage probability of these networks. For a comprehensive analysis of aerial network, we consider both the perfect and imperfect successive interference cancellation (SIC) scenarios. Through our analysis, we illustrate that, unlike linear EH, the implementation of non-linear EH provides accurate figures for any target rate, underscoring the significance of using non-linear EH models. Additionally, the influence of key parameters is emphasized, providing guidelines for the practical design of an energy-efficient as well as spectrum-efficient future non-terrestrial networks. Monte Carlo simulations validate the accuracy of our theoretical developments.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
DocCGen: Document-based Controlled Code Generation
Authors:
Sameer Pimparkhede,
Mehant Kammakomati,
Srikanth G. Tamilselvam,
Prince Kumar,
Ashok Pon Kumar,
Pushpak Bhattacharyya
Abstract:
Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by…
▽ More
Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by LLMs during pre-training. Efforts have been made to mitigate this challenge via in-context learning through relevant examples or by fine-tuning. However, it suffers from problems, such as limited DSL samples and prompt sensitivity but enterprises maintain good documentation of the DSLs. Therefore, we propose DocCGen, a framework that can leverage such rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process. First, it detects the correct libraries using the library documentation that best matches the NL query. Then, it utilizes schema rules extracted from the documentation of these libraries to constrain the decoding. We evaluate our framework for two complex structured languages, Ansible YAML and Bash command, consisting of two settings: Out-of-domain (OOD) and In-domain (ID). Our extensive experiments show that DocCGen consistently improves different-sized language models across all six evaluation metrics, reducing syntactic and semantic errors in structured code. We plan to open-source the datasets and code to motivate research in constrained code generation.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models
Authors:
Francisco Eiras,
Aleksandar Petrov,
Phillip H. S. Torr,
M. Pawan Kumar,
Adel Bibi
Abstract:
Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined ta…
▽ More
Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined tasks remains distinct from the instruction-following context due to structural differences in the data. Our work addresses the gap in our understanding of these risks across diverse types of data in closed models - where providers control how user data is utilized in the fine-tuning process. We demonstrate how malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors, while maintaining an appearance of innocuity and reasonable downstream task performance. To address this issue, we propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data, showing this is more effective than existing baselines at re-establishing safety alignment while maintaining similar task performance.
△ Less
Submitted 1 July, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
ARDDQN: Attention Recurrent Double Deep Q-Network for UAV Coverage Path Planning and Data Harvesting
Authors:
Praveen Kumar,
Priyadarshni,
Rajiv Misra
Abstract:
Unmanned Aerial Vehicles (UAVs) have gained popularity in data harvesting (DH) and coverage path planning (CPP) to survey a given area efficiently and collect data from aerial perspectives, while data harvesting aims to gather information from various Internet of Things (IoT) sensor devices, coverage path planning guarantees that every location within the designated area is visited with minimal re…
▽ More
Unmanned Aerial Vehicles (UAVs) have gained popularity in data harvesting (DH) and coverage path planning (CPP) to survey a given area efficiently and collect data from aerial perspectives, while data harvesting aims to gather information from various Internet of Things (IoT) sensor devices, coverage path planning guarantees that every location within the designated area is visited with minimal redundancy and maximum efficiency. We propose the ARDDQN (Attention-based Recurrent Double Deep Q Network), which integrates double deep Q-networks (DDQN) with recurrent neural networks (RNNs) and an attention mechanism to generate path coverage choices that maximize data collection from IoT devices and to learn a control scheme for the UAV that generalizes energy restrictions. We employ a structured environment map comprising a compressed global environment map and a local map showing the UAV agent's locate efficiently scaling to large environments. We have compared Long short-term memory (LSTM), Bi-directional long short-term memory (Bi-LSTM), Gated recurrent unit (GRU) and Bidirectional gated recurrent unit (Bi-GRU) as recurrent neural networks (RNN) to the result without RNN We propose integrating the LSTM with the Attention mechanism to the existing DDQN model, which works best on evolution parameters, i.e., data collection, landing, and coverage ratios for the CPP and data harvesting scenarios.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models
Authors:
Devansh Jain,
Priyanshu Kumar,
Samuel Gehman,
Xuhui Zhou,
Thomas Hartvigsen,
Maarten Sap
Abstract:
Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-sca…
▽ More
Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scra** over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that toxicity increases as language resources decrease or model size increases. Although instruction- and preference-tuning reduce toxicity, the choice of preference-tuning method does not have any significant impact. Our findings shed light on crucial shortcomings of LLM safeguarding and highlight areas for future research.
△ Less
Submitted 20 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
TOPress3D: 3D topology optimization with design-dependent pressure loads in MATLAB
Authors:
Prabhat Kumar
Abstract:
This paper introduces ``TOPress3D," a 3D topology optimization MATLAB code for structures subjected to design-dependent pressure loads. With a primary focus on pedagogical objectives, the code provides an easy learning experience, making it a valuable tool and practical gateway for newcomers, students, and researchers towards this topic. TOPress3D uses Darcy's law with a drainage term to link the…
▽ More
This paper introduces ``TOPress3D," a 3D topology optimization MATLAB code for structures subjected to design-dependent pressure loads. With a primary focus on pedagogical objectives, the code provides an easy learning experience, making it a valuable tool and practical gateway for newcomers, students, and researchers towards this topic. TOPress3D uses Darcy's law with a drainage term to link the given pressure load to design variables, which is converted to consistent nodal loads. Compliance minimization subjected to volume constraint optimization problems with pressure loads are solved. Load sensitivities arising due to design-dependent nature of the loads are evaluated using the adjoint-variable approach. The method of moving asymptotes is used to update the design variables. TOPress3D is constituted by six main parts. Each is described in detail. The code is also tailored to solve different problems. The robustness and success of the code are demonstrated while designing a few pressure load-bearing structures. The code is provided in Appendix B and is available with extensions in the supplementary material and publicly at \url{https://github.com/PrabhatIn/TOPress3D}.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
On Existence of Latency Optimal Uncoded Storage Schemes in Geo-Distributed Data Storage Systems
Authors:
Srivathsa Acharya,
P. Vijay Kumar,
Viveck R. Cadambe
Abstract:
We consider the problem of geographically distributed data storage in a network of servers (or nodes) where the nodes are connected to each other via communication links having certain round-trip times (RTTs). Each node serves a specific set of clients, where a client can request for any of the files available in the distributed system. The parent node provides the requested file if available loca…
▽ More
We consider the problem of geographically distributed data storage in a network of servers (or nodes) where the nodes are connected to each other via communication links having certain round-trip times (RTTs). Each node serves a specific set of clients, where a client can request for any of the files available in the distributed system. The parent node provides the requested file if available locally; else it contacts other nodes that have the data needed to retrieve the requested file. This inter-node communication incurs a delay resulting in a certain latency in servicing the data request. The worst-case latency incurred at a servicing node and the system average latency are important performance metrics of a storage system, which depend not only on inter-node RTTs, but also on how the data is stored across the nodes. Data files could be placed in the nodes as they are, i.e., in uncoded fashion, or can be coded and placed. This paper provides the necessary and sufficient conditions for the existence of uncoded storage schemes that are optimal in terms of both per-node worst-case latency and system average latency. In addition, the paper provides efficient binary storage codes for a specific case where optimal uncoded schemes do not exist.
△ Less
Submitted 13 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
On Streaming Codes for Simultaneously Correcting Burst and Random Erasures
Authors:
Shobhit Bhatnagar,
Biswadip Chakraborty,
P. Vijay Kumar
Abstract:
Streaming codes are packet-level codes that recover dropped packets within a strict decoding-delay constraint. We study streaming codes over a sliding-window (SW) channel model which admits only those erasure patterns which allow either a single burst erasure of $\le b$ packets along with $\le e$ random packet erasures, or else, $\le a$ random packet erasures, in any sliding-window of $w$ time slo…
▽ More
Streaming codes are packet-level codes that recover dropped packets within a strict decoding-delay constraint. We study streaming codes over a sliding-window (SW) channel model which admits only those erasure patterns which allow either a single burst erasure of $\le b$ packets along with $\le e$ random packet erasures, or else, $\le a$ random packet erasures, in any sliding-window of $w$ time slots. We determine the optimal rate of a streaming code constructed via the popular diagonal embedding (DE) technique over such a SW channel under delay constraint $τ=(w-1)$ and provide an $O(w)$ field size code construction. For the case $e>1$, we show that it is not possible to significantly reduce this field size requirement, assuming the well-known MDS conjecture. We then provide a block code construction whose DE yields a streaming code achieving the rate derived above, over a field of size sub-linear in $w,$ for a family of parameters having $e=1.$ We show the field size optimality of this construction for some parameters, and near-optimality for others under a sparsity constraint. Additionally, we derive an upper-bound on the $d_{\text{min}}$ of a cyclic code and characterize cyclic codes which achieve this bound via their ability to simultaneously recover from burst and random erasures.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
On Streaming Codes for Burst and Random Errors
Authors:
Shobhit Bhatnagar,
P. Vijay Kumar
Abstract:
Streaming codes (SCs) are packet-level codes that recover erased packets within a strict decoding-delay deadline. Streaming codes for various packet erasure channel models such as sliding-window (SW) channel models that admit random or burst erasures in any SW of a fixed length have been studied in the literature, and the optimal rate as well as rate-optimal code constructions of SCs over such cha…
▽ More
Streaming codes (SCs) are packet-level codes that recover erased packets within a strict decoding-delay deadline. Streaming codes for various packet erasure channel models such as sliding-window (SW) channel models that admit random or burst erasures in any SW of a fixed length have been studied in the literature, and the optimal rate as well as rate-optimal code constructions of SCs over such channel models are known. In this paper, we study error-correcting streaming codes ($\text{SC}_{\text{ERR}}$s), i.e., packet-level codes which recover erroneous packets within a delay constraint. We study $\text{SC}_{\text{ERR}}$s for two classes of SW channel models, one that admits random packet errors, and another that admits multiple bursts of packet errors, in any SW of a fixed length. For the case of random packet errors, we establish the equivalence of an $\text{SC}_{\text{ERR}}$ and a corresponding SC that recovers from random packet erasures, thus determining the optimal rate of an $\text{SC}_{\text{ERR}}$ for this setting, and providing a rate-optimal code construction for all parameters. We then focus on SCs that recover from multiple erasure bursts and derive a rate-upper-bound for such SCs. We show the necessity of a divisibility constraint for the existence of an SC constructed by the popular diagonal embedding technique, that achieves this rate-bound under a stringent delay requirement. We then show that a construction known in the literature achieves this rate-bound when the divisibility constraint is met. We further show the equivalence of the SCs considered and $\text{SC}_{\text{ERR}}$s for the setting of multiple error bursts, under a stringent delay requirement.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Verified Neural Compressed Sensing
Authors:
Rudy Bunel,
Krishnamurthy Dvijotham,
M. Pawan Kumar,
Alessandro De Palma,
Robert Stanforth
Abstract:
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors.…
▽ More
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors. We focus on applying neural network verification to computational tasks with a precise notion of correctness, where a verifiably correct neural network provably solves the task at hand with no caveats. In particular, we develop an approach to train and verify the first provably correct neural networks for compressed sensing, i.e., recovering sparse vectors from a number of measurements smaller than the dimension of the vector. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. Furthermore, we show that the complexity of the network (number of neurons/layers) can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization
Authors:
Youbang Sun,
Tao Liu,
P. R. Kumar,
Shahin Shahrampour
Abstract:
This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this work, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all the agents in the multi-agent system, leading t…
▽ More
This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this work, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all the agents in the multi-agent system, leading to a game between agents. We assume all agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies the agents are more rational and behave closer to Nash policies. On the other hand, agents with larger regularization acts more randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium, our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction
Authors:
Yupeng Cao,
Zhi Chen,
Qingyun Pei,
Prashant Kumar,
K. P. Subbalakshmi,
Papa Momar Ndiaye
Abstract:
In the realm of financial analytics, leveraging unstructured data, such as earnings conference calls (ECCs), to forecast stock performance is a critical challenge that has attracted both academics and investors. While previous studies have used deep learning-based models to obtain a general view of ECCs, they often fail to capture detailed, complex information. Our study introduces a novel framewo…
▽ More
In the realm of financial analytics, leveraging unstructured data, such as earnings conference calls (ECCs), to forecast stock performance is a critical challenge that has attracted both academics and investors. While previous studies have used deep learning-based models to obtain a general view of ECCs, they often fail to capture detailed, complex information. Our study introduces a novel framework: \textbf{ECC Analyzer}, combining Large Language Models (LLMs) and multi-modal techniques to extract richer, more predictive insights. The model begins by summarizing the transcript's structure and analyzing the speakers' mode and confidence level by detecting variations in tone and pitch for audio. This analysis helps investors form an overview perception of the ECCs. Moreover, this model uses the Retrieval-Augmented Generation (RAG) based methods to meticulously extract the focuses that have a significant impact on stock performance from an expert's perspective, providing a more targeted analysis. The model goes a step further by enriching these extracted focuses with additional layers of analysis, such as sentiment and audio segment features. By integrating these insights, the ECC Analyzer performs multi-task predictions of stock performance, including volatility, value-at-risk (VaR), and return for different intervals. The results show that our model outperforms traditional analytic benchmarks, confirming the effectiveness of using advanced LLM techniques in financial analytics.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
PyTOaCNN: Topology optimization using an adaptive convolutional neural network in Python
Authors:
Khaish Singh Chadha,
Prabhat Kumar
Abstract:
This paper introduces an adaptive convolutional neural network (CNN) architecture capable of automating various topology optimization (TO) problems with diverse underlying physics. The proposed architecture has an encoder-decoder-type structure with dense layers added at the bottleneck region to capture complex geometrical features. The network is trained using datasets obtained by the problem-spe…
▽ More
This paper introduces an adaptive convolutional neural network (CNN) architecture capable of automating various topology optimization (TO) problems with diverse underlying physics. The proposed architecture has an encoder-decoder-type structure with dense layers added at the bottleneck region to capture complex geometrical features. The network is trained using datasets obtained by the problem-specific open-source TO codes. Tensorflow and Keras are the main libraries employed to develop and to train the model. Effectiveness and robustness of the proposed adaptive CNN model are demonstrated through its performance in compliance minimization problems involving constant and design-dependent loads and in addressing bulk modulus optimization. Once trained, the model takes user's input of the volume fraction as an image and instantly generates an output image of optimized design. The proposed CNN produces high-quality results resembling those obtained via open-source TO codes with negligible performance and volume fraction errors. The paper includes complete associated Python code (Appendix A) for the proposed CNN architecture and explains each part of the code to facilitate reproducibility and ease of learning.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
Authors:
Alexander Vedernikov,
Puneet Kumar,
Haoyu Chen,
Tapio Seppanen,
Xiaobai Li
Abstract:
Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Ten…
▽ More
Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain and boost processing speed, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form. Evaluated on the EngageNet dataset, the proposed method outperforms existing baselines, utilizing only two behavioral features (head pose rotations) compared to the 98 used in baseline models. Furthermore, comparative analysis shows TCCT-Net's architecture offers an order-of-magnitude improvement in inference speed compared to state-of-the-art image-based Recurrent Neural Network (RNN) methods. The code will be released at https://github.com/vedernikovphoto/TCCT_Net.
△ Less
Submitted 14 May, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data
Authors:
Yupeng Cao,
Zhi Chen,
Qingyun Pei,
Fabrizio Dimino,
Lorenzo Ausiello,
Prashant Kumar,
K. P. Subbalakshmi,
Papa Momar Ndiaye
Abstract:
The integration of Artificial Intelligence (AI) techniques, particularly large language models (LLMs), in finance has garnered increasing academic attention. Despite progress, existing studies predominantly focus on tasks like financial text summarization, question-answering (Q$\&$A), and stock movement prediction (binary classification), with a notable gap in the application of LLMs for financial…
▽ More
The integration of Artificial Intelligence (AI) techniques, particularly large language models (LLMs), in finance has garnered increasing academic attention. Despite progress, existing studies predominantly focus on tasks like financial text summarization, question-answering (Q$\&$A), and stock movement prediction (binary classification), with a notable gap in the application of LLMs for financial risk prediction. Addressing this gap, in this paper, we introduce \textbf{RiskLabs}, a novel framework that leverages LLMs to analyze and predict financial risks. RiskLabs uniquely combines different types of financial data, including textual and vocal information from Earnings Conference Calls (ECCs), market-related time series data, and contextual news data surrounding ECC release dates. Our approach involves a multi-stage process: initially extracting and analyzing ECC data using LLMs, followed by gathering and processing time-series data before the ECC dates to model and understand risk over different timeframes. Using multimodal fusion techniques, RiskLabs amalgamates these varied data features for comprehensive multi-task financial risk prediction. Empirical experiment results demonstrate RiskLab's effectiveness in forecasting both volatility and variance in financial markets. Through comparative experiments, we demonstrate how different data sources contribute to financial risk assessment and discuss the critical role of LLMs in this context. Our findings not only contribute to the AI in finance application but also open new avenues for applying LLMs in financial risk assessment.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks
Authors:
Neel Mishra,
Bamdev Mishra,
Pratik Jawanpuria,
Pawan Kumar
Abstract:
A novel first-order method is proposed for training generative adversarial networks (GANs). It modifies the Gauss-Newton method to approximate the min-max Hessian and uses the Sherman-Morrison inversion formula to calculate the inverse. The method corresponds to a fixed-point method that ensures necessary contraction. To evaluate its effectiveness, numerical experiments are conducted on various da…
▽ More
A novel first-order method is proposed for training generative adversarial networks (GANs). It modifies the Gauss-Newton method to approximate the min-max Hessian and uses the Sherman-Morrison inversion formula to calculate the inverse. The method corresponds to a fixed-point method that ensures necessary contraction. To evaluate its effectiveness, numerical experiments are conducted on various datasets commonly used in image generation tasks, such as MNIST, Fashion MNIST, CIFAR10, FFHQ, and LSUN. Our method is capable of generating high-fidelity images with greater diversity across multiple datasets. It also achieves the highest inception score for CIFAR10 among all compared methods, including state-of-the-art second-order methods. Additionally, its execution time is comparable to that of first-order min-max methods.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System
Authors:
Yidong Gong,
Pradeep Kumar
Abstract:
We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows au…
▽ More
We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Balancing Progress and Responsibility: A Synthesis of Sustainability Trade-Offs of AI-Based Systems
Authors:
Apoorva Nalini Pradeep Kumar,
Justus Bogner,
Markus Funke,
Patricia Lago
Abstract:
Recent advances in artificial intelligence (AI) capabilities have increased the eagerness of companies to integrate AI into software systems. While AI can be used to have a positive impact on several dimensions of sustainability, this is often overshadowed by its potential negative influence. While many studies have explored sustainability factors in isolation, there is insufficient holistic cover…
▽ More
Recent advances in artificial intelligence (AI) capabilities have increased the eagerness of companies to integrate AI into software systems. While AI can be used to have a positive impact on several dimensions of sustainability, this is often overshadowed by its potential negative influence. While many studies have explored sustainability factors in isolation, there is insufficient holistic coverage of potential sustainability benefits or costs that practitioners need to consider during decision-making for AI adoption. We therefore aim to synthesize trade-offs related to sustainability in the context of integrating AI into software systems. We want to make the sustainability benefits and costs of integrating AI more transparent and accessible for practitioners.
The study was conducted in collaboration with a Dutch financial organization. We first performed a rapid review that led to the inclusion of 151 research papers. Afterward, we conducted six semi-structured interviews to enrich the data with industry perspectives. The combined results showcase the potential sustainability benefits and costs of integrating AI. The labels synthesized from the review regarding potential sustainability benefits were clustered into 16 themes, with "energy management" being the most frequently mentioned one. 11 themes were identified in the interviews, with the top mentioned theme being "employee wellbeing". Regarding sustainability costs, the review discovered seven themes, with "deployment issues" being the most popular one, followed by "ethics & society". "Environmental issues" was the top theme from the interviews. Our results provide valuable insights to organizations and practitioners for understanding the potential sustainability implications of adopting AI.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Multi-Task Learning for Lung sound & Lung disease classification
Authors:
Suma K V,
Deepali Koppad,
Preethi Kumar,
Neha A Kantikar,
Surabhi Ramesh
Abstract:
In recent years, advancements in deep learning techniques have considerably enhanced the efficiency and accuracy of medical diagnostics. In this work, a novel approach using multi-task learning (MTL) for the simultaneous classification of lung sounds and lung diseases is proposed. Our proposed model leverages MTL with four different deep learning models such as 2D CNN, ResNet50, MobileNet and Dens…
▽ More
In recent years, advancements in deep learning techniques have considerably enhanced the efficiency and accuracy of medical diagnostics. In this work, a novel approach using multi-task learning (MTL) for the simultaneous classification of lung sounds and lung diseases is proposed. Our proposed model leverages MTL with four different deep learning models such as 2D CNN, ResNet50, MobileNet and Densenet to extract relevant features from the lung sound recordings. The ICBHI 2017 Respiratory Sound Database was employed in the current study. The MTL for MobileNet model performed better than the other models considered, with an accuracy of74\% for lung sound analysis and 91\% for lung diseases classification. Results of the experimentation demonstrate the efficacy of our approach in classifying both lung sounds and lung diseases concurrently.
In this study,using the demographic data of the patients from the database, risk level computation for Chronic Obstructive Pulmonary Disease is also carried out. For this computation, three machine learning algorithms namely Logistic Regression, SVM and Random Forest classifierswere employed. Among these ML algorithms, the Random Forest classifier had the highest accuracy of 92\%.This work helps in considerably reducing the physician's burden of not just diagnosing the pathology but also effectively communicating to the patient about the possible causes or outcomes.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model
Authors:
Braja Gopal Patra,
Lauren A. Lepow,
Praneet Kasi Reddy Jagadeesh Kumar,
Veer Vekaria,
Mohit Manoj Sharma,
Prakash Adekkanattu,
Brian Fennessy,
Gavin Hynes,
Isotta Landi,
Jorge A. Sanchez-Ruiz,
Euijung Ryu,
Joanna M. Biernacka,
Girish N. Nadkarni,
Ardesheer Talati,
Myrna Weissman,
Mark Olfson,
J. John Mann,
Alexander W. Charney,
Jyotishman Pathak
Abstract:
Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction.…
▽ More
Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction.
Data and Methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n=300) and Weill Cornell Medicine (WCM, n=225) were annotated and established a gold standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (e.g., social network, instrumental support, and loneliness).
Results: For extracting SS/SI, the RBS obtained higher macro-averaged f-scores than the LLM at both MSHS (0.89 vs. 0.65) and WCM (0.85 vs. 0.82). For extracting subcategories, the RBS also outperformed the LLM at both MSHS (0.90 vs. 0.62) and WCM (0.82 vs. 0.81).
Discussion and Conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. Intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS were designed and refined to follow the same specific rules as the gold standard annotations. Conversely, the LLM were more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages and are made available open-source for future testing.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Cross-layer Modeling and Design of Content Addressable Memories in Advanced Technology Nodes for Similarity Search
Authors:
Siri Narla,
Piyush Kumar,
Mohammad Adnaan,
Azad Naeemi
Abstract:
In this paper we present a comprehensive design and benchmarking study of Content Addressable Memory (CAM) at the 7nm technology node in the context of similarity search applications. We design CAM cells based on SRAM, spin-orbit torque, and ferroelectric field effect transistor devices and from their layouts extract cell parasitics using state of the art EDA tools. These parasitics are used to de…
▽ More
In this paper we present a comprehensive design and benchmarking study of Content Addressable Memory (CAM) at the 7nm technology node in the context of similarity search applications. We design CAM cells based on SRAM, spin-orbit torque, and ferroelectric field effect transistor devices and from their layouts extract cell parasitics using state of the art EDA tools. These parasitics are used to develop SPICE netlists to model search operations. We use a CAM-based dataset search and a sequential recommendation system to highlight the application-level performance degradation due to interconnect parasitics. We propose and evaluate two solutions to mitigate interconnect effects.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Read between the lines -- Functionality Extraction From READMEs
Authors:
Prince Kumar,
Srikanth Tamilselvam,
Dinesh Garg
Abstract:
While text summarization is a well-known NLP task, in this paper, we introduce a novel and useful variant of it called functionality extraction from Git README files. Though this task is a text2text generation at an abstract level, it involves its own peculiarities and challenges making existing text2text generation systems not very useful. The motivation behind this task stems from a recent surge…
▽ More
While text summarization is a well-known NLP task, in this paper, we introduce a novel and useful variant of it called functionality extraction from Git README files. Though this task is a text2text generation at an abstract level, it involves its own peculiarities and challenges making existing text2text generation systems not very useful. The motivation behind this task stems from a recent surge in research and development activities around the use of large language models for code-related tasks, such as code refactoring, code summarization, etc. We also release a human-annotated dataset called FuncRead, and develop a battery of models for the task. Our exhaustive experimentation shows that small size fine-tuned models beat any baseline models that can be designed using popular black-box or white-box large language models (LLMs) such as ChatGPT and Bard. Our best fine-tuned 7 Billion CodeLlama model exhibit 70% and 20% gain on the F1 score against ChatGPT and Bard respectively.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF
Authors:
Amey Hengle,
Aswini Kumar,
Sahajpreet Singh,
Anil Bandhakavi,
Md Shad Akhtar,
Tanmoy Chakroborty
Abstract:
Counterspeech, defined as a response to mitigate online hate speech, is increasingly used as a non-censorial solution. Addressing hate speech effectively involves dispelling the stereotypes, prejudices, and biases often subtly implied in brief, single-sentence statements or abuses. These implicit expressions challenge language models, especially in seq2seq tasks, as model performance typically exc…
▽ More
Counterspeech, defined as a response to mitigate online hate speech, is increasingly used as a non-censorial solution. Addressing hate speech effectively involves dispelling the stereotypes, prejudices, and biases often subtly implied in brief, single-sentence statements or abuses. These implicit expressions challenge language models, especially in seq2seq tasks, as model performance typically excels with longer contexts. Our study introduces CoARL, a novel framework enhancing counterspeech generation by modeling the pragmatic implications underlying social biases in hateful statements. CoARL's first two phases involve sequential multi-instruction tuning, teaching the model to understand intents, reactions, and harms of offensive statements, and then learning task-specific low-rank adapter weights for generating intent-conditioned counterspeech. The final phase uses reinforcement learning to fine-tune outputs for effectiveness and non-toxicity. CoARL outperforms existing benchmarks in intent-conditioned counterspeech generation, showing an average improvement of 3 points in intent-conformity and 4 points in argument-quality metrics. Extensive human evaluation supports CoARL's efficacy in generating superior and more context-appropriate responses compared to existing systems, including prominent LLMs like ChatGPT.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Measuring Non-Typical Emotions for Mental Health: A Survey of Computational Approaches
Authors:
Puneet Kumar,
Alexander Vedernikov,
Xiaobai Li
Abstract:
Analysis of non-typical emotions, such as stress, depression and engagement is less common and more complex compared to that of frequently discussed emotions like happiness, sadness, fear, and anger. The importance of these non-typical emotions has been increasingly recognized due to their implications on mental health and well-being. Stress and depression impact the engagement in daily tasks, hig…
▽ More
Analysis of non-typical emotions, such as stress, depression and engagement is less common and more complex compared to that of frequently discussed emotions like happiness, sadness, fear, and anger. The importance of these non-typical emotions has been increasingly recognized due to their implications on mental health and well-being. Stress and depression impact the engagement in daily tasks, highlighting the need to understand their interplay. This survey is the first to simultaneously explore computational methods for analyzing stress, depression, and engagement. We discuss the most commonly used datasets, input modalities, data processing techniques, and information fusion methods used for the computational analysis of stress, depression and engagement. A timeline and taxonomy of non-typical emotion analysis approaches along with their generic pipeline and categories are presented. Subsequently, we describe state-of-the-art computational approaches for non-typical emotion analysis, including a performance summary on the most commonly used datasets. Following this, we explore the applications, along with the associated challenges, limitations, and future research directions.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
Authors:
Mohammed Safi Ur Rahman Khan,
Priyam Mehta,
Ananth Sankar,
Umashankar Kumaravelan,
Sumanth Doddapaneni,
Suriyaprasaad G,
Varun Balan G,
Sparsh Jain,
Anoop Kunchukuttan,
Pratyush Kumar,
Raj Dabre,
Mitesh M. Khapra
Abstract:
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re…
▽ More
Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-response pairs. Recognizing the importance of both data quality and quantity, our approach combines highly curated manually verified data, unverified yet valuable data, and synthetic data. We build a clean, open-source pipeline for curating pre-training data from diverse sources, including websites, PDFs, and videos, incorporating best practices for crawling, cleaning, flagging, and deduplication. For instruction-fine tuning, we amalgamate existing Indic datasets, translate/transliterate English datasets into Indian languages, and utilize LLaMa2 and Mixtral models to create conversations grounded in articles from Indian Wikipedia and Wikihow. Additionally, we address toxicity alignment by generating toxic prompts for multiple scenarios and then generate non-toxic responses by feeding these toxic prompts to an aligned LLaMa2 model. We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages. The data and other artifacts created as part of this work are released with permissive licenses.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Provable Policy Gradient Methods for Average-Reward Markov Potential Games
Authors:
Min Cheng,
Ruida Zhou,
P. R. Kumar,
Chao Tian
Abstract:
We study Markov potential games under the infinite horizon average reward criterion. Most previous studies have been for discounted rewards. We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion. To set the stage for gradient-based methods, we first establish that the avera…
▽ More
We study Markov potential games under the infinite horizon average reward criterion. Most previous studies have been for discounted rewards. We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion. To set the stage for gradient-based methods, we first establish that the average reward is a smooth function of policies and provide sensitivity bounds for the differential value functions, under certain conditions on ergodicity and the second largest eigenvalue of the underlying Markov decision process (MDP). We prove that three algorithms, policy gradient, proximal-Q, and natural policy gradient (NPG), converge to an $ε$-Nash equilibrium with time complexity $O(\frac{1}{ε^2})$, given a gradient/differential Q function oracle. When policy gradients have to be estimated, we propose an algorithm with $\tilde{O}(\frac{1}{\min_{s,a}π(a|s)δ})$ sample complexity to achieve $δ$ approximation error w.r.t~the $\ell_2$ norm. Equipped with the estimator, we derive the first sample complexity analysis for a policy gradient ascent algorithm, featuring a sample complexity of $\tilde{O}(1/ε^5)$. Simulation studies are presented.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Effect of turbulent diffusion in modeling anaerobic digestion
Authors:
Jeremy Z. Yan,
Prashant Kumar,
Wolfgang Rauch
Abstract:
In this study, the impact of turbulent diffusion on mixing of biochemical reaction models is explored by implementing and validating different models. An original codebase called CHAD (Coupled Hydrodynamics and Anaerobic Digestion) is extended to incorporate turbulent diffusion and validate it against results from OpenFOAM with 2D Rayleigh-Taylor Instability and lid-driven cavity simulations. The…
▽ More
In this study, the impact of turbulent diffusion on mixing of biochemical reaction models is explored by implementing and validating different models. An original codebase called CHAD (Coupled Hydrodynamics and Anaerobic Digestion) is extended to incorporate turbulent diffusion and validate it against results from OpenFOAM with 2D Rayleigh-Taylor Instability and lid-driven cavity simulations. The models are then tested for the applications with Anaerobic Digestion - a widely used wastewater treatment method. The findings demonstrate that the implemented models accurately capture turbulent diffusion when provided with an accurate flow field. Specifically, a minor effect of chemical turbulent diffusion on biochemical reactions within the anaerobic digestion tank is observed, while thermal turbulent diffusion significantly influences mixing. By successfully implementing turbulent diffusion models in CHAD, its capabilities for more accurate anaerobic digestion simulations are enhanced, aiding in optimizing the design and operation of anaerobic digestion reactors in real-world wastewater treatment applications.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Don't Blame the Data, Blame the Model: Understanding Noise and Bias When Learning from Subjective Annotations
Authors:
Abhishek Anand,
Negar Mokhberian,
Prathyusha Naresh Kumar,
Anweasha Saha,
Zihao He,
Ashwin Rao,
Fred Morstatter,
Kristina Lerman
Abstract:
Researchers have raised awareness about the harms of aggregating labels especially in subjective tasks that naturally contain disagreements among human annotators. In this work we show that models that are only provided aggregated labels show low confidence on high-disagreement data instances. While previous studies consider such instances as mislabeled, we argue that the reason the high-disagreem…
▽ More
Researchers have raised awareness about the harms of aggregating labels especially in subjective tasks that naturally contain disagreements among human annotators. In this work we show that models that are only provided aggregated labels show low confidence on high-disagreement data instances. While previous studies consider such instances as mislabeled, we argue that the reason the high-disagreement text instances have been hard-to-learn is that the conventional aggregated models underperform in extracting useful signals from subjective tasks. Inspired by recent studies demonstrating the effectiveness of learning from raw annotations, we investigate classifying using Multiple Ground Truth (Multi-GT) approaches. Our experiments show an improvement of confidence for the high-disagreement instances.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Authors:
Tahir Javed,
Janki Atul Nawale,
Eldho Ittan George,
Sakshi Joshi,
Kaushal Santosh Bhogale,
Deovrat Mehendale,
Ishvinder Virender Sethi,
Aparna Ananthanarayanan,
Hafsah Faquih,
Pratiti Palit,
Sneha Ravishankar,
Saranya Sukumaran,
Tripura Panchagnula,
Sunjay Murali,
Kunal Sharad Gandhi,
Ambujavalli R,
Manickam K M,
C Venkata Vaijayanthi,
Krishnan Srinivasa Raghavan Karunganni,
Pratyush Kumar,
Mitesh M Khapra
Abstract:
We present INDICVOICES, a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language. Through this paper, we share our journey of capturing the cultural,…
▽ More
We present INDICVOICES, a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language. Through this paper, we share our journey of capturing the cultural, linguistic and demographic diversity of India to create a one-of-its-kind inclusive and representative dataset. More specifically, we share an open-source blueprint for data collection at scale comprising of standardised protocols, centralised tools, a repository of engaging questions, prompts and conversation scenarios spanning multiple domains and topics of interest, quality control mechanisms, comprehensive transcription guidelines and transcription tools. We hope that this open source blueprint will serve as a comprehensive starter kit for data collection efforts in other multilingual regions of the world. Using INDICVOICES, we build IndicASR, the first ASR model to support all the 22 languages listed in the 8th schedule of the Constitution of India. All the data, tools, guidelines, models and other materials developed as a part of this work will be made publicly available
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning
Authors:
Xiaoyu Zhang,
Matthew Chang,
Pranav Kumar,
Saurabh Gupta
Abstract:
A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often…
▽ More
A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often prohibitively expensive. In this work, we propose Diffusion Meets DAgger (DMD), a method to reap the benefits of DAgger without the cost for eye-in-hand imitation learning problems. Instead of collecting new samples to cover out-of-distribution states, DMD uses recent advances in diffusion models to synthesize these samples. This leads to robust performance from few demonstrations. We compare DMD against behavior cloning baseline across four tasks: pushing, stacking, pouring, and shirt hanging. In pushing, DMD achieves 80% success rate with as few as 8 expert demonstrations, where naive behavior cloning reaches only 20%. In stacking, DMD succeeds on average 92% of the time across 5 cups, versus 40% for BC. When pouring coffee beans, DMD transfers to another cup successfully 80% of the time. Finally, DMD attains 90% success rate for hanging shirt on a clothing rack.
△ Less
Submitted 5 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis
Authors:
Puneet Kumar,
Sarthak Malik,
Balasubramanian Raman,
Xiaobai Li
Abstract:
The Controllable Multimodal Feedback Synthesis (CMFeed) dataset enables the generation of sentiment-controlled feedback from multimodal inputs. It contains images, text, human comments, comments' metadata and sentiment labels. Existing datasets for related tasks such as multimodal summarization, visual question answering, visual dialogue, and sentiment-aware text generation do not incorporate trai…
▽ More
The Controllable Multimodal Feedback Synthesis (CMFeed) dataset enables the generation of sentiment-controlled feedback from multimodal inputs. It contains images, text, human comments, comments' metadata and sentiment labels. Existing datasets for related tasks such as multimodal summarization, visual question answering, visual dialogue, and sentiment-aware text generation do not incorporate training models using human-generated outputs and their metadata, a gap that CMFeed addresses. This capability is critical for develo** feedback systems that understand and replicate human-like spontaneous responses. Based on the CMFeed dataset, we define a novel task of controllable feedback synthesis to generate context-aware feedback aligned with the desired sentiment. We propose a benchmark feedback synthesis system comprising encoder, decoder, and controllability modules. It employs transformer and Faster R-CNN networks to extract features and generate sentiment-specific feedback, achieving a sentiment classification accuracy of 77.23%, which is 18.82% higher than models not leveraging the dataset's unique controllability features. Additionally, we incorporate a similarity module for relevance assessment through rank-based metrics.
△ Less
Submitted 5 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Single-GPU GNN Systems: Traps and Pitfalls
Authors:
Yidong Gong,
Arnab Tarafder,
Saima Afrin,
Pradeep Kumar
Abstract:
The current graph neural network (GNN) systems have established a clear trend of not showing training accuracy results, and directly or indirectly relying on smaller datasets for evaluations majorly. Our in-depth analysis shows that it leads to a chain of pitfalls in the system design and evaluation process, questioning the practicality of many of the proposed system optimizations, and affecting c…
▽ More
The current graph neural network (GNN) systems have established a clear trend of not showing training accuracy results, and directly or indirectly relying on smaller datasets for evaluations majorly. Our in-depth analysis shows that it leads to a chain of pitfalls in the system design and evaluation process, questioning the practicality of many of the proposed system optimizations, and affecting conclusions and lessons learned. We analyze many single-GPU systems and show the fundamental impact of these pitfalls. We further develop hypotheses, recommendations, and evaluation methodologies, and provide future directions. Finally, a new reference system is developed to establish a new line of optimizations rooted in solving the system-design pitfalls efficiently and practically. The proposed design can productively be integrated into prior works, thereby truly advancing the state-of-the-art.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Authors:
Namitha Padmanabhan,
Matthew Gwilliam,
Pulkit Kumar,
Shishira R Maiya,
Max Ehrlich,
Abhinav Shrivastava
Abstract:
The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image superresolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canv…
▽ More
The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image superresolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canvas (XINC), is a unified framework for explaining properties of INRs by examining the strength of each neuron's contribution to each output pixel. We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs which we study learn to ''see'' the frames they represent in surprising ways. For example, INRs tend to have highly distributed representations. While lacking high-level object semantics, they have a significant bias for color and edges, and are almost entirely space-agnostic. We arrive at our conclusions by examining how objects are represented across time in video INRs, using clustering to visualize similar neurons across layers and architectures, and show that this is dominated by motion. These insights demonstrate the general usefulness of our analysis framework. Our project page is available at https://namithap10.github.io/xinc.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
SoRoTop: a hitchhiker's guide to topology optimization MATLAB code for design-dependent pneumatic-driven soft robots
Authors:
Prabhat Kumar
Abstract:
Demands for pneumatic-driven soft robots are constantly rising for various applications. However, they are often designed manually due to the lack of systematic methods. Moreover, design-dependent characteristics of pneumatic actuation pose distinctive challenges. This paper provides a compact MATLAB code, named SoRoTop, and its various extensions for designing pneumatic-driven soft robots using t…
▽ More
Demands for pneumatic-driven soft robots are constantly rising for various applications. However, they are often designed manually due to the lack of systematic methods. Moreover, design-dependent characteristics of pneumatic actuation pose distinctive challenges. This paper provides a compact MATLAB code, named SoRoTop, and its various extensions for designing pneumatic-driven soft robots using topology optimization. The code uses the method of moving asymptotes as the optimizer and builds upon the approach initially presented in Kumar et al.(Struct Multidiscip Optim 61 (4): 1637-1655, 2020). The pneumatic load is modeled using Darcy's law with a conceptualized drainage term. Consistent nodal loads are determined from the resultant pressure field using the conventional finite element approach. The robust formulation is employed, i.e., the eroded and blueprint design descriptions are used. A min-max optimization problem is formulated using the output displacements of the eroded and blueprint designs. A volume constraint is imposed on the blueprint design, while the eroded design is used to apply a conceptualized strain energy constraint. The latter constraint aids in attaining optimized designs that can endure the applied load without compromising their performance. Sensitivities required for optimization are computed using the adjoint-variable method. The code is explained in detail, and various extensions are also presented. It is structured into pre-optimization, MMA optimization, and post-optimization operations, each of which is comprehensively detailed. The paper also illustrates the impact of load sensitivities on the optimized designs. SoRoTop is provided in Appendix A and is available with extensions in the supplementary material and publicly at \url{https://github.com/PrabhatIn/SoRoTop}.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Zero-shot Microclimate Prediction with Deep Learning
Authors:
Iman Deznabi,
Peeyush Kumar,
Madalina Fiterau
Abstract:
Weather station data is a valuable resource for climate prediction, however, its reliability can be limited in remote locations. To compound the issue, making local predictions often relies on sensor data that may not be accessible for a new, previously unmonitored location. In response to these challenges, we propose a novel zero-shot learning approach designed to forecast various climate measure…
▽ More
Weather station data is a valuable resource for climate prediction, however, its reliability can be limited in remote locations. To compound the issue, making local predictions often relies on sensor data that may not be accessible for a new, previously unmonitored location. In response to these challenges, we propose a novel zero-shot learning approach designed to forecast various climate measurements at new and unmonitored locations. Our method surpasses conventional weather forecasting techniques in predicting microclimate variables by leveraging knowledge extracted from other geographic locations.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
The Zeta ($ζ$) Notation for Complex Asymptotes
Authors:
Anurag Dutta,
K. Lakshmanan,
John Harshith,
A. Ramamoorthy,
C. Pradeep,
Pijush Kanti Kumar
Abstract:
Time Complexity is an important metric to compare algorithms based on their cardinality. The commonly used, trivial notations to qualify the same are the Big-Oh, Big-Omega, Big-Theta, Small-Oh, and Small-Omega Notations. All of them, consider time a part of the real entity, i.e., Time coincides with the horizontal axis in the argand plane. But what if the Time rather than completely coinciding wit…
▽ More
Time Complexity is an important metric to compare algorithms based on their cardinality. The commonly used, trivial notations to qualify the same are the Big-Oh, Big-Omega, Big-Theta, Small-Oh, and Small-Omega Notations. All of them, consider time a part of the real entity, i.e., Time coincides with the horizontal axis in the argand plane. But what if the Time rather than completely coinciding with the real axis of the argand plane, makes some angle with it? We are trying to focus on the case when the Time Complexity will have both real and imaginary components. For Instance, if $T\left(n\right)=\ n\log{n}$, the existing asymptomatic notations are capable of handling that in real time But, if we come across a problem where, $T\left(n\right)=\ n\log{n}+i\cdot n^2$, where, $i=\sqrt[2]{-1}$, the existing asymptomatic notations will not be able to catch up. To mitigate the same, in this research, we would consider proposing the Zeta Notation ($ζ$), which would qualify Time in both the Real and Imaginary Axis, as per the Argand Plane.
△ Less
Submitted 1 February, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization
Authors:
Rahul Chand,
Yashoteja Prabhu,
Pratyush Kumar
Abstract:
With the tremendous success of large transformer models in natural language understanding, down-sizing them for cost-effective deployments has become critical. Recent studies have explored the low-rank weight factorization techniques which are efficient to train, and apply out-of-the-box to any transformer architecture. Unfortunately, the low-rank assumption tends to be over-restrictive and hinder…
▽ More
With the tremendous success of large transformer models in natural language understanding, down-sizing them for cost-effective deployments has become critical. Recent studies have explored the low-rank weight factorization techniques which are efficient to train, and apply out-of-the-box to any transformer architecture. Unfortunately, the low-rank assumption tends to be over-restrictive and hinders the expressiveness of the compressed model. This paper proposes, DSFormer, a simple alternative factorization scheme which expresses a target weight matrix as the product of a small dense and a semi-structured sparse matrix. The resulting approximation is more faithful to the weight distribution in transformers and therefore achieves a stronger efficiency-accuracy trade-off. Another concern with existing factorizers is their dependence on a task-unaware initialization step which degrades the accuracy of the resulting model. DSFormer addresses this issue through a novel Straight-Through Factorizer (STF) algorithm that jointly learns all the weight factorizations to directly maximize the final task accuracy. Extensive experiments on multiple natural language understanding benchmarks demonstrate that DSFormer obtains up to 40% better compression than the state-of-the-art low-rank factorizers, leading semi-structured sparsity baselines and popular knowledge distillation approaches. Our approach is also orthogonal to mainstream compressors and offers up to 50% additional compression when added to popular distilled, layer-shared and quantized transformers. We empirically evaluate the benefits of STF over conventional optimization practices.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains
Authors:
Ananta Mukherjee,
Peeyush Kumar,
Boling Yang,
Nishanth Chandran,
Divya Gupta
Abstract:
This paper addresses privacy concerns in multi-agent reinforcement learning (MARL), specifically within the context of supply chains where individual strategic data must remain confidential. Organizations within the supply chain are modeled as agents, each seeking to optimize their own objectives while interacting with others. As each organization's strategy is contingent on neighboring strategies…
▽ More
This paper addresses privacy concerns in multi-agent reinforcement learning (MARL), specifically within the context of supply chains where individual strategic data must remain confidential. Organizations within the supply chain are modeled as agents, each seeking to optimize their own objectives while interacting with others. As each organization's strategy is contingent on neighboring strategies, maintaining privacy of state and action-related information is crucial. To tackle this challenge, we propose a game-theoretic, privacy-preserving mechanism, utilizing a secure multi-party computation (MPC) framework in MARL settings. Our major contribution is the successful implementation of a secure MPC framework, SecFloat on EzPC, to solve this problem. However, simply implementing policy gradient methods such as MADDPG operations using SecFloat, while conceptually feasible, would be programmatically intractable. To overcome this hurdle, we devise a novel approach that breaks down the forward and backward pass of the neural network into elementary operations compatible with SecFloat , creating efficient and secure versions of the MADDPG algorithm. Furthermore, we present a learning mechanism that carries out floating point operations in a privacy-preserving manner, an important feature for successful learning in MARL framework. Experiments reveal that there is on average 68.19% less supply chain wastage in 2 PC compared to no data share, while also giving on average 42.27% better average cumulative revenue for each player. This work paves the way for practical, privacy-preserving MARL, promising significant improvements in secure computation within supply chain contexts and broadly.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Authors:
Prashant Kumar,
Kshitij Madhav Bhat,
Vedang Bhupesh Shenvi Nadkarni,
Prem Kalra
Abstract:
Sparse LiDAR point clouds cause severe loss of detail of static structures and reduce the density of static points available for navigation. Reduced density can be detrimental to navigation under several scenarios. We observe that despite high sparsity, in most cases, the global topology of LiDAR outlining the static structures can be inferred. We utilize this property to obtain a backbone skeleto…
▽ More
Sparse LiDAR point clouds cause severe loss of detail of static structures and reduce the density of static points available for navigation. Reduced density can be detrimental to navigation under several scenarios. We observe that despite high sparsity, in most cases, the global topology of LiDAR outlining the static structures can be inferred. We utilize this property to obtain a backbone skeleton of a LiDAR scan in the form of a single connected component that is a proxy to its global topology. We utilize the backbone to augment new points along static structures to overcome sparsity. Newly introduced points could correspond to existing static structures or to static points that were earlier obstructed by dynamic objects. To the best of our knowledge, we are the first to use such a strategy for sparse LiDAR point clouds. Existing solutions close to our approach fail to identify and preserve the global static LiDAR topology and generate sub-optimal points. We propose GLiDR, a Graph Generative network that is topologically regularized using 0-dimensional Persistent Homology ($\mathcal{PH}$) constraints. This enables GLiDR to introduce newer static points along a topologically consistent global static LiDAR backbone. GLiDR generates precise static points using $32\times$ sparser dynamic scans and performs better than the baselines across three datasets. GLiDR generates a valuable byproduct - an accurate binary segmentation mask of static and dynamic objects that are helpful for navigation planning and safety in constrained environments. The newly introduced static points allow GLiDR to outperform LiDAR-based navigation using SLAM in several settings. Source code is available at https://kshitijbhat.github.io/glidr
△ Less
Submitted 24 May, 2024; v1 submitted 29 November, 2023;
originally announced December 2023.
-
Eye vs. AI: Human Gaze and Model Attention in Video Memorability
Authors:
Prajneya Kumar,
Eshika Khandelwal,
Makarand Tapaswi,
Vishnu Sreekumar
Abstract:
Understanding the factors that determine video memorability has important applications in areas such as educational technology and advertising. Towards this goal, we investigate the semantic and temporal attention mechanisms underlying video memorability. We propose a Transformer-based model with spatio-temporal attention that matches SoTA performance on video memorability prediction on a large na…
▽ More
Understanding the factors that determine video memorability has important applications in areas such as educational technology and advertising. Towards this goal, we investigate the semantic and temporal attention mechanisms underlying video memorability. We propose a Transformer-based model with spatio-temporal attention that matches SoTA performance on video memorability prediction on a large naturalistic video dataset. More importantly, the self-attention patterns show us where the model looks to predict memorability. We compare model attention against human gaze fixation density maps collected through a small-scale eye-tracking experiment where humans perform a video memory task. Quantitative saliency metrics show that the model attention and human gaze follow similar patterns. Furthermore, while panoptic segmentation confirms that the model and humans attend more to thing classes, stuff classes that receive increased/decreased attention tend to have higher memorability scores. We also observe that the model assigns greater importance to the initial frames, mimicking temporal attention patterns found in humans.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Introducing CHAD -- An ADM1 Solver for Direct Linking to Lagrangian CFD Software
Authors:
Prashant Kumar,
Zhenghao Yan,
Soroush Dabiri,
Nikolaus Rauch,
Wolfgang Rauch
Abstract:
Standard methods for modeling anaerobic digestion processes assume homogeneous conditions inside the tank and thus suffer from the negligence of hydrodynamics. In this work, we present the software toolbox Coupled Hydrodynamics and Anaerobic Digestion (CHAD), a novel parallelized solver that is capable of utilizing CFD results as the basis for Anaerobic digestion model No.1 (ADMno1) simulations. C…
▽ More
Standard methods for modeling anaerobic digestion processes assume homogeneous conditions inside the tank and thus suffer from the negligence of hydrodynamics. In this work, we present the software toolbox Coupled Hydrodynamics and Anaerobic Digestion (CHAD), a novel parallelized solver that is capable of utilizing CFD results as the basis for Anaerobic digestion model No.1 (ADMno1) simulations. CHAD uses a particle-based Lagrangian CFD solver i.e., DualSPHysics (DSPH) as input and provides for a parallelized, C++ code implementation of the standard ADMno1. This paper demonstrates a conceptual and numerical verification of the toolbox and outlines the future pathway to enhance the approach.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs
Authors:
Yu-Heng Hung,
**-Chun Hsieh,
Akshay Mete,
P. R. Kumar
Abstract:
We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature map**. While the existing regression-based approaches have been theoretically shown to achieve nearly-optimal regret, they are computationally rather inefficient due to the need for a…
▽ More
We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature map**. While the existing regression-based approaches have been theoretically shown to achieve nearly-optimal regret, they are computationally rather inefficient due to the need for a large number of optimization runs in each time step, especially when the state and action spaces are large. To address this issue, we propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE), which is a classic model-based exploration principle in the adaptive control literature for resolving the well-known closed-loop identification problem of Maximum Likelihood Estimation. We formally show that (i) VBMLE enjoys $\widetilde{O}(d\sqrt{T})$ regret, where $T$ is the time horizon and $d$ is the dimension of the model parameter, and (ii) VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step. In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct and uncover an interesting connection between linear MDPs and online learning, which could be of independent interest. Finally, the simulation results show that VBMLE significantly outperforms the benchmark method in terms of both empirical regret and computation time.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Enhancing ML model accuracy for Digital VLSI circuits using diffusion models: A study on synthetic data generation
Authors:
Prasha Srivastava,
Pawan Kumar,
Zia Abbas
Abstract:
Generative AI has seen remarkable growth over the past few years, with diffusion models being state-of-the-art for image generation. This study investigates the use of diffusion models in generating artificial data generation for electronic circuits for enhancing the accuracy of subsequent machine learning models in tasks such as performance assessment, design, and testing when training data is us…
▽ More
Generative AI has seen remarkable growth over the past few years, with diffusion models being state-of-the-art for image generation. This study investigates the use of diffusion models in generating artificial data generation for electronic circuits for enhancing the accuracy of subsequent machine learning models in tasks such as performance assessment, design, and testing when training data is usually known to be very limited. We utilize simulations in the HSPICE design environment with 22nm CMOS technology nodes to obtain representative real training data for our proposed diffusion model. Our results demonstrate the close resemblance of synthetic data using diffusion model to real data. We validate the quality of generated data, and demonstrate that data augmentation certainly effective in predictive analysis of VLSI design for digital circuits.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Topology optimization of fluidic pressure-driven multi-material compliant mechanisms
Authors:
Prabhat Kumar,
Josh Pinskier,
David Howard,
Matthijs Langelaar
Abstract:
Compliant mechanisms actuated by pneumatic loads are receiving increasing attention due to their direct applicability as soft robots that perform tasks using their flexible bodies. Using multiple materials to build them can further improve their performance and efficiency. Due to developments in additive manufacturing, the fabrication of multi-material soft robots is becoming a real possibility. T…
▽ More
Compliant mechanisms actuated by pneumatic loads are receiving increasing attention due to their direct applicability as soft robots that perform tasks using their flexible bodies. Using multiple materials to build them can further improve their performance and efficiency. Due to developments in additive manufacturing, the fabrication of multi-material soft robots is becoming a real possibility. To exploit this opportunity, there is a need for a dedicated design approach. This paper offers a systematic approach to develo** such mechanisms using topology optimization. The extended SIMP scheme is employed for multi-material modeling. The design-dependent nature of the pressure load is modeled using the Darcy law with a volumetric drainage term. Flow coefficient of each element is interpolated using a smoothed Heaviside function. The obtained pressure field is converted to consistent nodal loads. The adjoint-variable approach is employed to determine the sensitivities. A robust formulation is employed, wherein a min-max optimization problem is formulated using the output displacements of the eroded and blueprint designs. Volume constraints are applied to the blueprint design, whereas the strain energy constraint is formulated with respect to the eroded design. The efficacy and success of the approach are demonstrated by designing pneumatically actuated multi-material gripper and contractor mechanisms. A numerical study confirms that multiple-material mechanisms perform relatively better than their single-material counterparts.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Alpha Elimination: Using Deep Reinforcement Learning to Reduce Fill-In during Sparse Matrix Decomposition
Authors:
Arpan Dasgupta,
Pawan Kumar
Abstract:
A large number of computational and scientific methods commonly require decomposing a sparse matrix into triangular factors as LU decomposition. A common problem faced during this decomposition is that even though the given matrix may be very sparse, the decomposition may lead to a denser triangular factors due to fill-in. A significant fill-in may lead to prohibitively larger computational costs…
▽ More
A large number of computational and scientific methods commonly require decomposing a sparse matrix into triangular factors as LU decomposition. A common problem faced during this decomposition is that even though the given matrix may be very sparse, the decomposition may lead to a denser triangular factors due to fill-in. A significant fill-in may lead to prohibitively larger computational costs and memory requirement during decomposition as well as during the solve phase. To this end, several heuristic sparse matrix reordering methods have been proposed to reduce fill-in before the decomposition. However, finding an optimal reordering algorithm that leads to minimal fill-in during such decomposition is known to be a NP-hard problem. A reinforcement learning based approach is proposed for this problem. The sparse matrix reordering problem is formulated as a single player game. More specifically, Monte-Carlo tree search in combination with neural network is used as a decision making algorithm to search for the best move in our game. The proposed method, alphaElimination is found to produce significantly lesser non-zeros in the LU decomposition as compared to existing state-of-the-art heuristic algorithms with little to no increase in overall running time of the algorithm. The code for the project will be publicly available here\footnote{\url{https://github.com/misterpawan/alphaEliminationPaper}}.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
Authors:
Youbang Sun,
Tao Liu,
Ruida Zhou,
P. R. Kumar,
Shahin Shahrampour
Abstract:
This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $ε$-Nash Equilibrium (NE) within…
▽ More
This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $ε$-Nash Equilibrium (NE) within $\mathcal{O}(1/ε)$ iterations. This improves upon the previous best result of $\mathcal{O}(1/ε^2)$ iterations and is of the same order, $\mathcal{O}(1/ε)$, that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.
△ Less
Submitted 27 October, 2023; v1 submitted 15 October, 2023;
originally announced October 2023.
-
TOaCNN: Adaptive Convolutional Neural Network for Multidisciplinary Topology Optimization
Authors:
Khaish Singh Chadha,
Prabhat Kumar
Abstract:
This paper presents an adaptive convolutional neural network (CNN) architecture that can automate diverse topology optimization (TO) problems having different underlying physics. The architecture uses the encoder-decoder networks with dense layers in the middle which includes an additional adaptive layer to capture complex geometrical features. The network is trained using the dataset obtained fro…
▽ More
This paper presents an adaptive convolutional neural network (CNN) architecture that can automate diverse topology optimization (TO) problems having different underlying physics. The architecture uses the encoder-decoder networks with dense layers in the middle which includes an additional adaptive layer to capture complex geometrical features. The network is trained using the dataset obtained from the three open-source TO codes involving different physics. The robustness and success of the presented adaptive CNN are demonstrated on compliance minimization problems with constant and design-dependent loads and material bulk modulus optimization. The architecture takes the user's input of the volume fraction. It instantly generates optimized designs resembling their counterparts obtained via open-source TO codes with negligible performance and volume fraction error.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.