Search | arXiv e-print repository

Composite Concept Extraction through Backdooring

Authors: Banibrata Ghosh, Haripriya Harikumar, Khoa D Doan, Svetha Venkatesh, Santu Rana

Abstract: Learning composite concepts, such as \textquotedbl red car\textquotedbl , from individual examples -- like a white car representing the concept of \textquotedbl car\textquotedbl{} and a red strawberry representing the concept of \textquotedbl red\textquotedbl -- is inherently challenging. This paper introduces a novel method called Composite Concept Extractor (CoCE), which leverages techniques fro… ▽ More Learning composite concepts, such as \textquotedbl red car\textquotedbl , from individual examples -- like a white car representing the concept of \textquotedbl car\textquotedbl{} and a red strawberry representing the concept of \textquotedbl red\textquotedbl -- is inherently challenging. This paper introduces a novel method called Composite Concept Extractor (CoCE), which leverages techniques from traditional backdoor attacks to learn these composite concepts in a zero-shot setting, requiring only examples of individual concepts. By repurposing the trigger-based model backdooring mechanism, we create a strategic distortion in the manifold of the target object (e.g., \textquotedbl car\textquotedbl ) induced by example objects with the target property (e.g., \textquotedbl red\textquotedbl ) from objects \textquotedbl red strawberry\textquotedbl , ensuring the distortion selectively affects the target objects with the target property. Contrastive learning is then employed to further refine this distortion, and a method is formulated for detecting objects that are influenced by the distortion. Extensive experiments with in-depth analysis across different datasets demonstrate the utility and applicability of our proposed approach. △ Less

Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12818 [pdf, other]

Optimal Bailouts in Diversified Financial Networks

Authors: Krishna Dasaratha, Santosh Venkatesh, Rakesh Vohra

Abstract: Widespread default involves substantial deadweight costs which could be countered by injecting capital into failing firms. Injections have positive spillovers that can trigger a repayment cascade. But which firms should a regulator bailout so as to minimize the total injection of capital while ensuring solvency of all firms? While the problem is, in general, NP-hard, for a wide range of networks t… ▽ More Widespread default involves substantial deadweight costs which could be countered by injecting capital into failing firms. Injections have positive spillovers that can trigger a repayment cascade. But which firms should a regulator bailout so as to minimize the total injection of capital while ensuring solvency of all firms? While the problem is, in general, NP-hard, for a wide range of networks that arise from a stochastic block model, we show that the optimal bailout can be implemented by a simple policy that targets firms based on their characteristics and position in the network. Specific examples of the setting include core-periphery networks. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.16388 [pdf, other]

Multi-Reference Preference Optimization for Large Language Models

Authors: Hung Le, Quan Tran, Dung Nguyen, Kien Do, Saloni Mittal, Kelechi Ogueji, Svetha Venkatesh

Abstract: How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning opti… ▽ More How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning optimization by introducing close-formed supervised losses. However, a significant limitation of the current approach is its design for a single reference model only, neglecting to leverage the collective power of numerous pretrained LLMs. To overcome this limitation, we introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models, substantially enhancing preference learning capabilities compared to the single-reference DPO. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance. Furthermore, MRPO effectively finetunes LLMs to exhibit superior performance in several downstream natural language processing tasks such as GSM8K and TruthfulQA. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 20 pages

arXiv:2405.15254 [pdf, other]

Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime

Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh

Abstract: This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banac… ▽ More This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology, assuming only finite-energy neural activations; and a novel representor theory for neural networks in terms of a matrix-valued kernel. The first model is exact (un-approximated) and global, casting the neural network as an elements in a reproducing kernel Banach space (RKBS); we use this model to provide tight bounds on Rademacher complexity. The second model is exact and local, casting the change in neural network function resulting from a bounded change in weights and biases (ie. a training step) in reproducing kernel Hilbert space (RKHS) in terms of a local-intrinsic neural kernel (LiNK). This local model provides insight into model adaptation through tight bounds on Rademacher complexity of network adaptation. We also prove that the neural tangent kernel (NTK) is a first-order approximation of the LiNK kernel. Finally, and noting that the LiNK does not provide a representor theory for technical reasons, we present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK). This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models. Throughout the paper (a) feedforward ReLU networks and (b) residual networks (ResNet) are used as illustrative examples. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14405 [pdf, other]

Qubit-efficient Variational Quantum Algorithms for Image Segmentation

Authors: Supreeth Mysore Venkatesh, Antonio Macaluso, Marlon Nuske, Matthias Klusch, Andreas Dengel

Abstract: Quantum computing is expected to transform a range of computational tasks beyond the reach of classical algorithms. In this work, we examine the application of variational quantum algorithms (VQAs) for unsupervised image segmentation to partition images into separate semantic regions. Specifically, we formulate the task as a graph cut optimization problem and employ two established qubit-efficient… ▽ More Quantum computing is expected to transform a range of computational tasks beyond the reach of classical algorithms. In this work, we examine the application of variational quantum algorithms (VQAs) for unsupervised image segmentation to partition images into separate semantic regions. Specifically, we formulate the task as a graph cut optimization problem and employ two established qubit-efficient VQAs, which we refer to as Parametric Gate Encoding (PGE) and Ancilla Basis Encoding (ABE), to find the optimal segmentation mask. In addition, we propose Adaptive Cost Encoding (ACE), a new approach that leverages the same circuit architecture as ABE but adopts a problem-dependent cost function. We benchmark PGE, ABE and ACE on synthetically generated images, focusing on quality and trainability. ACE shows consistently faster convergence in training the parameterized quantum circuits in comparison to PGE and ABE. Furthermore, we provide a theoretical analysis of the scalability of these approaches against the Quantum Approximate Optimization Algorithm (QAOA), showing a significant cutback in the quantum resources, especially in the number of qubits that logarithmically depends on the number of pixels. The results validate the strengths of ACE, while concurrently highlighting its inherent limitations and challenges. This paves way for further research in quantum-enhanced computer vision. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 7 pages, 4 figures, 2 tables

arXiv:2405.04389 [pdf, ps, other]

Triangulated characterizations of singularities

Authors: Pat Lank, Sridhar Venkatesh

Abstract: This work presents a range of triangulated characterizations for important classes of singularities such as derived splinters, rational singularities, and Du Bois singularities. An invariant called 'level' in a triangulated category can be used to measure the failure of a variety to have a prescribed singularity type. We provide explicit computations of this invariant for reduced Nagata schemes of… ▽ More This work presents a range of triangulated characterizations for important classes of singularities such as derived splinters, rational singularities, and Du Bois singularities. An invariant called 'level' in a triangulated category can be used to measure the failure of a variety to have a prescribed singularity type. We provide explicit computations of this invariant for reduced Nagata schemes of Krull dimension one and for affine cones over smooth projective hypersurfaces. Furthermore, these computations are utilized to produce upper bounds for Rouquier dimension on the respective bounded derived categories. △ Less

Submitted 10 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: Current: Removed properness assumptions, removed Section 4, improved exposition. Previous: Initial version

MSC Class: 14F08 (primary); 14B05 (secondary); 14F17; 14A30; 14E15; 18G80

arXiv:2404.19668 [pdf, other]

SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks

Authors: Sreyes Venkatesh, Razvan Marinescu, Jason K. Eshraghian

Abstract: Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. While extensive research has focused on weight quanti… ▽ More Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. While extensive research has focused on weight quantization, quantization-aware training (QAT), and their application to SNNs, the precision reduction of state variables during training has been largely overlooked, potentially diminishing inference performance. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets. We provide an ablation analysis of the effects of weight and state quantization, both individually and combined, and how they impact models. Our comprehensive empirical evaluation includes full precision, 8-bit, 4-bit, and 2-bit quantized SNNs, using QAT, stateful QAT (SQUAT), and post-training quantization methods. The findings indicate that the combination of QAT and SQUAT enhance performance the most, but given the choice of one or the other, QAT improves performance by the larger degree. These trends are consistent all datasets. Our methods have been made available in our Python library snnTorch: https://github.com/jeshraghian/snntorch. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 10 pages, 4 figures, accepted at NICE 2024

arXiv:2404.12680 [pdf, other]

VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection

Authors: Raghavendra Ramachandra, Narayan Vetrekar, Sushma Venkatesh, Savita Nageshker, Jag Mohan Singh, R. S. Gad

Abstract: Facial biometrics are an essential components of smartphones to ensure reliable and trustworthy authentication. However, face biometric systems are vulnerable to Presentation Attacks (PAs), and the availability of more sophisticated presentation attack instruments such as 3D silicone face masks will allow attackers to deceive face recognition systems easily. In this work, we propose a novel Presen… ▽ More Facial biometrics are an essential components of smartphones to ensure reliable and trustworthy authentication. However, face biometric systems are vulnerable to Presentation Attacks (PAs), and the availability of more sophisticated presentation attack instruments such as 3D silicone face masks will allow attackers to deceive face recognition systems easily. In this work, we propose a novel Presentation Attack Detection (PAD) algorithm based on 3D point clouds captured using the frontal camera of a smartphone to detect presentation attacks. The proposed PAD algorithm, VoxAtnNet, processes 3D point clouds to obtain voxelization to preserve the spatial structure. Then, the voxelized 3D samples were trained using the novel convolutional attention network to detect PAs on the smartphone. Extensive experiments were carried out on the newly constructed 3D face point cloud dataset comprising bona fide and two different 3D PAIs (3D silicone face mask and wrap photo mask), resulting in 3480 samples. The performance of the proposed method was compared with existing methods to benchmark the detection performance using three different evaluation protocols. The experimental results demonstrate the improved performance of the proposed method in detecting both known and unknown face presentation attacks. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted in 2024 18th International Conference on Automatic Face and Gesture Recognition (FG)

arXiv:2404.11870 [pdf, ps, other]

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

Authors: Hung Le, Dung Nguyen, Kien Do, Svetha Venkatesh, Truyen Tran

Abstract: We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly usin… ▽ More We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM's exceptional length extrapolating capabilities and improved performance in tasks that require symbol processing, such as algorithmic reasoning and Dyck language recognition. PANM helps Transformer achieve up to 100% generalization accuracy in compositional learning tasks and significantly better results in mathematical reasoning, question answering and machine translation tasks. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.04767 [pdf, ps, other]

The intersection cohomology Hodge module of toric varieties

Authors: Hyunsuk Kim, Sridhar Venkatesh

Abstract: We study the Hodge filtration of the intersection cohomology Hodge module for toric varieties. More precisely, we study the cohomology sheaves of the graded de Rham complex of the intersection cohomology Hodge module and give a precise formula relating it with the stalks of the intersection cohomology as a constructible complex. The main idea is to use the Ishida complex in order to compute the hi… ▽ More We study the Hodge filtration of the intersection cohomology Hodge module for toric varieties. More precisely, we study the cohomology sheaves of the graded de Rham complex of the intersection cohomology Hodge module and give a precise formula relating it with the stalks of the intersection cohomology as a constructible complex. The main idea is to use the Ishida complex in order to compute the higher direct images of the sheaf of reflexive differentials. △ Less

Submitted 22 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 24 pages, minor changes

MSC Class: 14B05; 14C30; 14F10; 14M25; 14Q99; 32S35; 52B22

arXiv:2402.17701 [pdf, other]

Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet

Authors: Satvik Venkatesh, Arthur Benilov, Philip Coleman, Frederic Roskam

Abstract: There have been significant advances in deep learning for music demixing in recent years. However, there has been little attention given to how these neural networks can be adapted for real-time low-latency applications, which could be helpful for hearing aids, remixing audio streams and live shows. In this paper, we investigate the various challenges involved in adapting current demixing models i… ▽ More There have been significant advances in deep learning for music demixing in recent years. However, there has been little attention given to how these neural networks can be adapted for real-time low-latency applications, which could be helpful for hearing aids, remixing audio streams and live shows. In this paper, we investigate the various challenges involved in adapting current demixing models in the literature for this use case. Subsequently, inspired by the Hybrid Demucs architecture, we propose the Hybrid Spectrogram Time-domain Audio Separation Network HS-TasNet, which utilises the advantages of spectral and waveform domains. For a latency of 23 ms, the HS-TasNet obtains an overall signal-to-distortion ratio (SDR) of 4.65 on the MusDB test set, and increases to 5.55 with additional training data. These results demonstrate the potential of efficient demixing for real-time low-latency music applications. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024

ACM Class: I.5.1; I.5.4

arXiv:2402.17679 [pdf, ps, other]

The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks

Authors: Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Amir M. Mir, Sofia Reis, Eric Bodden

Abstract: The application of Large Language Models (LLMs) in software engineering, particularly in static analysis tasks, represents a paradigm shift in the field. In this paper, we investigate the role that current LLMs can play in improving callgraph analysis and type inference for Python programs. Using the PyCG, HeaderGen, and TypeEvalPy micro-benchmarks, we evaluate 26 LLMs, including OpenAI's GPT seri… ▽ More The application of Large Language Models (LLMs) in software engineering, particularly in static analysis tasks, represents a paradigm shift in the field. In this paper, we investigate the role that current LLMs can play in improving callgraph analysis and type inference for Python programs. Using the PyCG, HeaderGen, and TypeEvalPy micro-benchmarks, we evaluate 26 LLMs, including OpenAI's GPT series and open-source models such as LLaMA. Our study reveals that LLMs show promising results in type inference, demonstrating higher accuracy than traditional methods, yet they exhibit limitations in callgraph analysis. This contrast emphasizes the need for specialized fine-tuning of LLMs to better suit specific static analysis tasks. Our findings provide a foundation for further research towards integrating LLMs for static analysis tasks. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: To be published in: ICSE FORGE 2024 (AI Foundation Models and Software Engineering)

arXiv:2402.17343 [pdf, other]

Enhanced Bayesian Optimization via Preferential Modeling of Abstract Properties

Authors: Arun Kumar A V, Alistair Shilton, Sunil Gupta, Santu Rana, Stewart Greenhill, Svetha Venkatesh

Abstract: Experimental (design) optimization is a key driver in designing and discovering new products and processes. Bayesian Optimization (BO) is an effective tool for optimizing expensive and black-box experimental design processes. While Bayesian optimization is a principled data-driven approach to experimental optimization, it learns everything from scratch and could greatly benefit from the expertise… ▽ More Experimental (design) optimization is a key driver in designing and discovering new products and processes. Bayesian Optimization (BO) is an effective tool for optimizing expensive and black-box experimental design processes. While Bayesian optimization is a principled data-driven approach to experimental optimization, it learns everything from scratch and could greatly benefit from the expertise of its human (domain) experts who often reason about systems at different abstraction levels using physical properties that are not necessarily directly measured (or measurable). In this paper, we propose a human-AI collaborative Bayesian framework to incorporate expert preferences about unmeasured abstract properties into the surrogate modeling to further boost the performance of BO. We provide an efficient strategy that can also handle any incorrect/misleading expert bias in preferential judgments. We discuss the convergence behavior of our proposed framework. Our experimental results involving synthetic functions and real-world datasets show the superiority of our method against the baselines. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 19 Pages, 6 Figures

arXiv:2402.13739 [pdf, ps, other]

Hidden Gems in the Rough: Computational Notebooks as an Uncharted Oasis for IDEs

Authors: Sergey Titov, Konstantin Grotov, Ashwin Prasad S. Venkatesh

Abstract: In this paper, we outline potential ways for the further development of computational notebooks in Integrated Development Environments (IDEs). We discuss notebooks integration with IDEs, focusing on three main areas: facilitating experimentation, adding collaborative features, and improving code comprehension. We propose that better support of notebooks will not only benefit the notebooks, but als… ▽ More In this paper, we outline potential ways for the further development of computational notebooks in Integrated Development Environments (IDEs). We discuss notebooks integration with IDEs, focusing on three main areas: facilitating experimentation, adding collaborative features, and improving code comprehension. We propose that better support of notebooks will not only benefit the notebooks, but also enhance IDEs by supporting new development processes native to notebooks. In conclusion, we suggest that adapting IDEs for more experimentation-oriented notebook processes will prepare them for the future of AI-powered programming. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.10931 [pdf, other]

Enabling discovery of materials through enhanced generalisability of deep learning models

Authors: Sherif Abdulkader Tawfik, Tri Minh Nguyen, Salvy P. Russo, Truyen Tran, Sunil Gupta, Svetha Venkatesh

Abstract: The road towards the discovery of new, useful materials is full of imperfections. Machine learning, which is currently the power horse of material discovery when it works in concert with density functional theory, has been able to accelerate the discovery of new materials for various applications by learning the properties of known, stable materials to infer the properties of unknown, deformed mat… ▽ More The road towards the discovery of new, useful materials is full of imperfections. Machine learning, which is currently the power horse of material discovery when it works in concert with density functional theory, has been able to accelerate the discovery of new materials for various applications by learning the properties of known, stable materials to infer the properties of unknown, deformed materials. Physics-informed machine learning (PIML) is particularly believed to bridge the gap between known materials and the virtually infinite space of crystal structures with imperfections such as defects, grain boundaries, composition disorders and others. State of the art PIML models struggle to bridge this gap, however. In this work we show that a critical correction of the physics underpinning a PIML, our direct integration of external potential (DIEP) method, which mimics the integration of the external potential term in typical density functional theory calculations, improves the generalisability of the model to a range of imperfect structures, including diamond defects. By training DIEP to predict the potential energy surface, we demonstrate the ability of the model in predicting the onset of fracture of pristine and defective carbon nanotubes. △ Less

Submitted 30 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: 14 pages, 4 figures

arXiv:2402.03577 [pdf, other]

Revisiting the Dataset Bias Problem from a Statistical Perspective

Authors: Kien Do, Dung Nguyen, Hung Le, Thao Le, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh

Abstract: In this paper, we study the "dataset bias" problem from a statistical standpoint, and identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b in the input x, represented by p(u|b) differing significantly from p(u). Since p(u|b) appears as part of the sampling distributions in the standard maximum log-likelihood (MLL) objective, a mod… ▽ More In this paper, we study the "dataset bias" problem from a statistical standpoint, and identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b in the input x, represented by p(u|b) differing significantly from p(u). Since p(u|b) appears as part of the sampling distributions in the standard maximum log-likelihood (MLL) objective, a model trained on a biased dataset via MLL inherently incorporates such correlation into its parameters, leading to poor generalization to unbiased test data. From this observation, we propose to mitigate dataset bias via either weighting the objective of each sample n by \frac{1}{p(u_{n}|b_{n})} or sampling that sample with a weight proportional to \frac{1}{p(u_{n}|b_{n})}. While both methods are statistically equivalent, the former proves more stable and effective in practice. Additionally, we establish a connection between our debiasing approach and causal reasoning, reinforcing our method's theoretical foundation. However, when the bias label is unavailable, computing p(u|b) exactly is difficult. To overcome this challenge, we propose to approximate \frac{1}{p(u|b)} using a biased classifier trained with "bias amplification" losses. Extensive experiments on various biased datasets demonstrate the superiority of our method over existing debiasing techniques in most settings, validating our theoretical analysis. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2312.16882 [pdf, ps, other]

TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools

Authors: Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Jiawei Wang, Amir M. Mir, Li Li, Eric Bodden

Abstract: In light of the growing interest in type inference research for Python, both researchers and practitioners require a standardized process to assess the performance of various type inference techniques. This paper introduces TypeEvalPy, a comprehensive micro-benchmarking framework for evaluating type inference tools. TypeEvalPy contains 154 code snippets with 845 type annotations across 18 categori… ▽ More In light of the growing interest in type inference research for Python, both researchers and practitioners require a standardized process to assess the performance of various type inference techniques. This paper introduces TypeEvalPy, a comprehensive micro-benchmarking framework for evaluating type inference tools. TypeEvalPy contains 154 code snippets with 845 type annotations across 18 categories that target various Python features. The framework manages the execution of containerized tools, transforms inferred types into a standardized format, and produces meaningful metrics for assessment. Through our analysis, we compare the performance of six type inference tools, highlighting their strengths and limitations. Our findings provide a foundation for further research and optimization in the domain of Python type inference. △ Less

Submitted 2 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: To be published in ICSE 2024

arXiv:2312.11818 [pdf, other]

Root Cause Explanation of Outliers under Noisy Mechanisms

Authors: Phuoc Nguyen, Truyen Tran, Sunil Gupta, Thin Nguyen, Svetha Venkatesh

Abstract: Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus… ▽ More Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus can not attribute the outlier score to the edges of the mechanism if the anomaly occurs in the connections. In this paper, we consider both individual edge and node of each mechanism when identifying the root causes. We introduce a noisy functional causal model to account for this purpose. Then, we employ Bayesian learning and inference methods to infer the noises of the nodes and edges. We then represent the functional form of a target outlier leaf as a function of the node and edge noises. Finally, we propose an efficient gradient-based attribution method to compute the anomaly attribution scores which scales linearly with the number of nodes and edges. Experiments on simulated datasets and two real-world scenario datasets show better anomaly attribution performance of the proposed method compared to the baselines. Our method scales to larger graphs with more nodes and edges. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted AAAI 2024

arXiv:2312.04095 [pdf, other]

Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection

Authors: Tuan Hoang, Santu Rana, Sunil Gupta, Svetha Venkatesh

Abstract: Recent data-privacy laws have sparked interest in machine unlearning, which involves removing the effect of specific training samples from a learnt model as if they were never present in the original training dataset. The challenge of machine unlearning is to discard information about the ``forget'' data in the learnt model without altering the knowledge about the remaining dataset and to do so mo… ▽ More Recent data-privacy laws have sparked interest in machine unlearning, which involves removing the effect of specific training samples from a learnt model as if they were never present in the original training dataset. The challenge of machine unlearning is to discard information about the ``forget'' data in the learnt model without altering the knowledge about the remaining dataset and to do so more efficiently than the naive retraining approach. To achieve this, we adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU), in which the model takes steps in the orthogonal direction to the gradient subspaces deemed unimportant for the retaining dataset, so as to its knowledge is preserved. By utilizing Stochastic Gradient Descent (SGD) to update the model weights, our method can efficiently scale to any model and dataset size. We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible. Our code is available at https://github.com/hnanhtuan/projected_gradient_unlearning. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted to WACV 2024

arXiv:2311.12912 [pdf, other]

Q-Seg: Quantum Annealing-based Unsupervised Image Segmentation

Authors: Supreeth Mysore Venkatesh, Antonio Macaluso, Marlon Nuske, Matthias Klusch, Andreas Dengel

Abstract: In this study, we present Q-Seg, a novel unsupervised image segmentation method based on quantum annealing, tailored for existing quantum hardware. We formulate the pixel-wise segmentation problem, which assimilates spectral and spatial information of the image, as a graph-cut optimization task. Our method efficiently leverages the interconnected qubit topology of the D-Wave Advantage device, offe… ▽ More In this study, we present Q-Seg, a novel unsupervised image segmentation method based on quantum annealing, tailored for existing quantum hardware. We formulate the pixel-wise segmentation problem, which assimilates spectral and spatial information of the image, as a graph-cut optimization task. Our method efficiently leverages the interconnected qubit topology of the D-Wave Advantage device, offering superior scalability over existing quantum approaches and outperforming state-of-the-art classical methods. Our empirical evaluations on synthetic datasets reveal that Q-Seg offers better runtime performance against the classical optimizer Gurobi. Furthermore, we evaluate our method on segmentation of Earth Observation images, an area of application where the amount of labeled data is usually very limited. In this case, Q-Seg demonstrates near-optimal results in flood map** detection with respect to classical supervised state-of-the-art machine learning methods. Also, Q-Seg provides enhanced segmentation for forest coverage compared to existing annotated masks. Thus, Q-Seg emerges as a viable alternative for real-world applications using available quantum hardware, particularly in scenarios where the lack of labeled data and computational runtime are critical. △ Less

Submitted 30 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 12 pages, 9 figures, 1 table

arXiv:2311.11566 [pdf, other]

Does complimentary information from multispectral imaging improve face presentation attack detection?

Authors: Narayan Vetrekar, Raghavendra Ramachandra, Sushma Venkatesh, Jyoti D. Pawar, R. S. Gad

Abstract: Presentation Attack Detection (PAD) has been extensively studied, particularly in the visible spectrum. With the advancement of sensing technology beyond the visible range, multispectral imaging has gained significant attention in this direction. We present PAD based on multispectral images constructed for eight different presentation artifacts resulted from three different artifact species. In th… ▽ More Presentation Attack Detection (PAD) has been extensively studied, particularly in the visible spectrum. With the advancement of sensing technology beyond the visible range, multispectral imaging has gained significant attention in this direction. We present PAD based on multispectral images constructed for eight different presentation artifacts resulted from three different artifact species. In this work, we introduce Face Presentation Attack Multispectral (FPAMS) database to demonstrate the significance of employing multispectral imaging. The goal of this work is to study complementary information that can be combined in two different ways (image fusion and score fusion) from multispectral imaging to improve the face PAD. The experimental evaluation results present an extensive qualitative analysis of 61650 sample multispectral images collected for bonafide and artifacts. The PAD based on the score fusion and image fusion method presents superior performance, demonstrating the significance of employing multispectral imaging to detect presentation artifacts. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted in International IEEE Applied Sensing Conference (IEEE APSCON) 2024

arXiv:2310.16808 [pdf, other]

Fingervein Verification using Convolutional Multi-Head Attention Network

Authors: Raghavendra Ramachandra, Sushma Venkatesh

Abstract: Biometric verification systems are deployed in various security-based access-control applications that require user-friendly and reliable person verification. Among the different biometric characteristics, fingervein biometrics have been extensively studied owing to their reliable verification performance. Furthermore, fingervein patterns reside inside the skin and are not visible outside; therefo… ▽ More Biometric verification systems are deployed in various security-based access-control applications that require user-friendly and reliable person verification. Among the different biometric characteristics, fingervein biometrics have been extensively studied owing to their reliable verification performance. Furthermore, fingervein patterns reside inside the skin and are not visible outside; therefore, they possess inherent resistance to presentation attacks and degradation due to external factors. In this paper, we introduce a novel fingervein verification technique using a convolutional multihead attention network called VeinAtnNet. The proposed VeinAtnNet is designed to achieve light weight with a smaller number of learnable parameters while extracting discriminant information from both normal and enhanced fingervein images. The proposed VeinAtnNet was trained on the newly constructed fingervein dataset with 300 unique fingervein patterns that were captured in multiple sessions to obtain 92 samples per unique fingervein. Extensive experiments were performed on the newly collected dataset FV-300 and the publicly available FV-USM and FV-PolyU fingervein dataset. The performance of the proposed method was compared with five state-of-the-art fingervein verification systems, indicating the efficacy of the proposed VeinAtnNet. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

arXiv:2309.13704 [pdf, other]

Sound-Print: Generalised Face Presentation Attack Detection using Deep Representation of Sound Echoes

Authors: Raghavendra Ramachandra, Jag Mohan Singh, Sushma Venkatesh

Abstract: Facial biometrics are widely deployed in smartphone-based applications because of their usability and increased verification accuracy in unconstrained scenarios. The evolving applications of smartphone-based facial recognition have also increased Presentation Attacks (PAs), where an attacker can present a Presentation Attack Instrument (PAI) to maliciously gain access to the application. Because t… ▽ More Facial biometrics are widely deployed in smartphone-based applications because of their usability and increased verification accuracy in unconstrained scenarios. The evolving applications of smartphone-based facial recognition have also increased Presentation Attacks (PAs), where an attacker can present a Presentation Attack Instrument (PAI) to maliciously gain access to the application. Because the materials used to generate PAI are not deterministic, the detection of unknown presentation attacks is challenging. In this paper, we present an acoustic echo-based face Presentation Attack Detection (PAD) on a smartphone in which the PAs are detected based on the reflection profiles of the transmitted signal. We propose a novel transmission signal based on the wide pulse that allows us to model the background noise before transmitting the signal and increase the Signal-to-Noise Ratio (SNR). The received signal reflections were processed to remove background noise and accurately represent reflection characteristics. The reflection profiles of the bona fide and PAs are different owing to the different reflection characteristics of the human skin and artefact materials. Extensive experiments are presented using the newly collected Acoustic Sound Echo Dataset (ASED) with 4807 samples captured from bona fide and four different types of PAIs, including print (two types), display, and silicone face-mask attacks. The obtained results indicate the robustness of the proposed method for detecting unknown face presentation attacks. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: Accepted in IJCB 2023

arXiv:2308.14288 [pdf, other]

doi 10.1093/mnras/stad2593

GRB Optical and X-ray Plateau Properties Classifier Using Unsupervised Machine Learning

Authors: Shubham Bhardwaj, Maria G. Dainotti, Sachin Venkatesh, Aditya Narendra, Anish Kalsi, Enrico Rinaldi, Agnieszka Pollo

Abstract: The division of Gamma-ray bursts (GRBs) into different classes, other than the "short" and "long", has been an active field of research. We investigate whether GRBs can be classified based on a broader set of parameters, including prompt and plateau emission ones. Observational evidence suggests the existence of more GRB sub-classes, but results so far are either conflicting or not statistically s… ▽ More The division of Gamma-ray bursts (GRBs) into different classes, other than the "short" and "long", has been an active field of research. We investigate whether GRBs can be classified based on a broader set of parameters, including prompt and plateau emission ones. Observational evidence suggests the existence of more GRB sub-classes, but results so far are either conflicting or not statistically significant. The novelty here is producing a machine-learning-based classification of GRBs using their observed X-rays and optical properties. We used two data samples: the first, composed of 203 GRBs, is from the Neil Gehrels Swift Observatory (Swift/XRT), and the latter, composed of 134 GRBs, is from the ground-based Telescopes and Swift/UVOT. Both samples possess the plateau emission (a flat part of the light curve happening after the prompt emission, the main GRB event). We have applied the Gaussian Mixture Model (GMM) to explore multiple parameter spaces and sub-class combinations to reveal if there is a match between the current observational sub-classes and the statistical classification. With these samples and the algorithm, we spot a few micro-trends in certain cases, but we cannot conclude that any clear trend exists in classifying GRBs. These microtrends could point towards a deeper understanding of the physical meaning of these classes (e.g., a different environment of the same progenitor or different progenitors). However, a larger sample and different algorithms could achieve such goals. Thus, this methodology can lead to deeper insights in the future. △ Less

Submitted 6 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 20 pages, 10 figures (one has 4 panels, two have a single panel, six have 8 panels, one has 6 panels), 4 tables. Accepted for publication in MNRAS

Report number: RIKEN-iTHEMS-Report-23

Journal ref: MNRAS, Volume 525, Issue 4, pp.5204-5223, November 2023

arXiv:2308.13542 [pdf, other]

LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying

Authors: Thommen George Karimpanal, Laknath Buddhika Semage, Santu Rana, Hung Le, Truyen Tran, Sunil Gupta, Svetha Venkatesh

Abstract: Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. For example, by observing a partial stack of cubes, LLMs can predict the correct sequence in which the remaining cubes should be stack… ▽ More Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. For example, by observing a partial stack of cubes, LLMs can predict the correct sequence in which the remaining cubes should be stacked by extrapolating the observed patterns (e.g., cube sizes, colors or other attributes) in the partial stack. In this work, we introduce LaGR (Language-Guided Reinforcement learning), which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent, in order to subsequently guide the latter's training. However, as RL training is generally not sample-efficient, deploying this approach would inherently imply that the LLM be repeatedly queried for solutions; a process that can be expensive and infeasible. To address this issue, we introduce SEQ (sample efficient querying), where we simultaneously train a secondary RL agent to decide when the LLM should be queried for solutions. Specifically, we use the quality of the solutions emanating from the LLM as the reward to train this agent. We show that our proposed framework LaGR-SEQ enables more efficient primary RL training, while simultaneously minimizing the number of queries to the LLM. We demonstrate our approach on a series of tasks and highlight the advantages of our approach, along with its limitations and potential future research directions. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 18 pages, 11 figures

arXiv:2308.04836 [pdf, other]

Beyond Surprise: Improving Exploration Through Surprise Novelty

Authors: Hung Le, Kien Do, Dung Nguyen, Svetha Venkatesh

Abstract: We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capabili… ▽ More We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games. △ Less

Submitted 30 January, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: 17 pages including Appendix

arXiv:2308.00285 [pdf, other]

Predictive Modeling through Hyper-Bayesian Optimization

Authors: Manisha Senadeera, Santu Rana, Sunil Gupta, Svetha Venkatesh

Abstract: Model selection is an integral problem of model based optimization techniques such as Bayesian optimization (BO). Current approaches often treat model selection as an estimation problem, to be periodically updated with observations coming from the optimization iterations. In this paper, we propose an alternative way to achieve both efficiently. Specifically, we propose a novel way of integrating m… ▽ More Model selection is an integral problem of model based optimization techniques such as Bayesian optimization (BO). Current approaches often treat model selection as an estimation problem, to be periodically updated with observations coming from the optimization iterations. In this paper, we propose an alternative way to achieve both efficiently. Specifically, we propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster. The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured by a score function and fed back, capturing how well the model helped convergence in the function space. The score function is derived in such a way that it neutralizes the effect of the moving nature of the BO in the function space, thus kee** the model selection problem stationary. This back and forth leads to quick convergence for both model selection and BO in the function space. In addition to improved sample efficiency, the framework outputs information about the black-box function. Convergence is proved, and experimental results show significant improvement compared to standard BO. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.12729 [pdf, ps, other]

Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction

Authors: Hung Tran, Vuong Le, Svetha Venkatesh, Truyen Tran

Abstract: Humans are highly adaptable, swiftly switching between different modes to progressively handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. While neuroscienc… ▽ More Humans are highly adaptable, swiftly switching between different modes to progressively handle different tasks, situations and contexts. In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline. While neuroscience and cognitive science have confirmed this multi-mechanism nature of human behavior, machine modeling approaches for human motion are trailing behind. While attempted to use gradually morphing structures (e.g., graph attention networks) to model the dynamic HOI patterns, they miss the expeditious and discrete mode-switching nature of the human motion. To bridge that gap, this work proposes to model two concurrent mechanisms that jointly control human motion: the Persistent process that runs continually on the global scale, and the Transient sub-processes that operate intermittently on the local context of the human while interacting with objects. These two mechanisms form an interactive Persistent-Transient Duality that synergistically governs the activity sequences. We model this conceptual duality by a parent-child neural network of Persistent and Transient channels with a dedicated neural module for dynamic mechanism switching. The framework is trialed on HOI motion forecasting. On two rich datasets and a wide variety of settings, the model consistently delivers superior performances, proving its suitability for the challenge. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted at ICCV 2023

arXiv:2306.16574 [pdf, other]

On Lengths of $\mathbb{F}_2[x,y,z]/(x^{d_1}, y^{d_2},z^{d_3}, x+y+z)$

Authors: Fiona Han, Jennifer Kenkel, Daniel Li, Sridhar Venkatesh, Ashley Wiles

Abstract: In this paper, we provide a formula for the vector space dimension of the ring $\mathbb{F}_2[x,y,z]/(x^{d_1}, y^{d_2},z^{d_3}, x+y+z)$ over $\mathbb{F}_2$ when $d_1,d_2,d_3$ all lie between successive powers of $2$. For general $d_1,d_2,d_3$, we provide a simple algorithm to calculate the vector space dimension of $\mathbb{F}_2[x,y,z]/(x^{d_1}, y^{d_2},z^{d_3}, x+y+z)$ by combining our formula wit… ▽ More In this paper, we provide a formula for the vector space dimension of the ring $\mathbb{F}_2[x,y,z]/(x^{d_1}, y^{d_2},z^{d_3}, x+y+z)$ over $\mathbb{F}_2$ when $d_1,d_2,d_3$ all lie between successive powers of $2$. For general $d_1,d_2,d_3$, we provide a simple algorithm to calculate the vector space dimension of $\mathbb{F}_2[x,y,z]/(x^{d_1}, y^{d_2},z^{d_3}, x+y+z)$ by combining our formula with certain results of Chungsim Han (1992). △ Less

Submitted 11 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.10179 [pdf, ps, other]

doi 10.1007/s00229-024-01553-3

Local vanishing for toric varieties

Authors: Wanchun Shen, Sridhar Venkatesh, Anh Duc Vo

Abstract: Let $X$ be a toric variety. We establish vanishing (and non-vanishing) results for the sheaves $R^if_*Ω^p_{\tilde X}(\log E)$, where $f: \tilde{X} \to X$ is a strong log resolution of singularities with reduced exceptional divisor $E$. These extend the local vanishing theorem for toric varieties in [MOP20]. Our consideration of these sheaves is motivated by the notion of $k$-rational singularities… ▽ More Let $X$ be a toric variety. We establish vanishing (and non-vanishing) results for the sheaves $R^if_*Ω^p_{\tilde X}(\log E)$, where $f: \tilde{X} \to X$ is a strong log resolution of singularities with reduced exceptional divisor $E$. These extend the local vanishing theorem for toric varieties in [MOP20]. Our consideration of these sheaves is motivated by the notion of $k$-rational singularities introduced by Friedman and Laza [FL22b]. In particular, our results lead to criteria for toric varieties to have $k$-rational singularities, as defined in [SVV23]. △ Less

Submitted 28 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 17 pages; v2: Minor changes, following the referee's comments; to appear in manuscripta mathematica

MSC Class: 14M25; 14F17; 14B05

arXiv:2306.03977 [pdf, ps, other]

On $k$-Du Bois and $k$-rational singularities

Authors: Wanchun Shen, Sridhar Venkatesh, Anh Duc Vo

Abstract: We introduce new notions of $k$-Du Bois and $k$-rational singularities, extending the previous definitions in the case of local complete intersections (lci), to include natural examples outside of this setting. We study the stability of these notions under general hyperplane sections and show that varieties with $k$-rational singularities are $k$-Du Bois, extending previous results in [MP22b] and… ▽ More We introduce new notions of $k$-Du Bois and $k$-rational singularities, extending the previous definitions in the case of local complete intersections (lci), to include natural examples outside of this setting. We study the stability of these notions under general hyperplane sections and show that varieties with $k$-rational singularities are $k$-Du Bois, extending previous results in [MP22b] and [FL22b] in the lci and the isolated singularities cases. In the process, we identify the aspects of the theory that depend only on the vanishing of higher cohomologies of Du Bois complexes (or related objects), and not on the behaviour of the Kähler differentials. △ Less

Submitted 13 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: improved presentation of proof of Thm B, extended Prop 4.2 and Cor 4.3 to pre-k-rational singularities

arXiv:2305.01294 [pdf, other]

Differential Newborn Face Morphing Attack Detection using Wavelet Scatter Network

Authors: Raghavendra Ramachandra, Sushma Venkatesh, Guoqiang Li, Kiran Raja

Abstract: Face Recognition System (FRS) are shown to be vulnerable to morphed images of newborns. Detecting morphing attacks stemming from face images of newborn is important to avoid unwanted consequences, both for security and society. In this paper, we present a new reference-based/Differential Morphing Attack Detection (MAD) method to detect newborn morphing images using Wavelet Scattering Network (WSN)… ▽ More Face Recognition System (FRS) are shown to be vulnerable to morphed images of newborns. Detecting morphing attacks stemming from face images of newborn is important to avoid unwanted consequences, both for security and society. In this paper, we present a new reference-based/Differential Morphing Attack Detection (MAD) method to detect newborn morphing images using Wavelet Scattering Network (WSN). We propose a two-layer WSN with 250 $\times$ 250 pixels and six rotations of wavelets per layer, resulting in 577 paths. The proposed approach is validated on a dataset of 852 bona fide images and 2460 morphing images constructed using face images of 42 unique newborns. The obtained results indicate a gain of over 10\% in detection accuracy over other existing D-MAD techniques. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: accepted in 5th International Conference on Bio-engineering for Smart Technologies (BIO-SMART 2023)

arXiv:2304.12144 [pdf]

Inductive sensing of magnetic microrobots under actuation by rotating magnetic fields

Authors: Michael G. Christiansen, Lucien Stöcklin, Cameron Forbrigger, Shashaank Abhinav Venkatesh, Simone Schuerle

Abstract: The engineering space for magnetically manipulated biomedical microrobots is rapidly expanding. This includes synthetic, bioinspired, and biohybrid designs, some of which may eventually assume clinical roles aiding drug delivery or performing other therapeutic functions. Actuating these microrobots with rotating magnetic fields (RMFs) and the magnetic torques they exert offers the advantages of ef… ▽ More The engineering space for magnetically manipulated biomedical microrobots is rapidly expanding. This includes synthetic, bioinspired, and biohybrid designs, some of which may eventually assume clinical roles aiding drug delivery or performing other therapeutic functions. Actuating these microrobots with rotating magnetic fields (RMFs) and the magnetic torques they exert offers the advantages of efficient mechanical energy transfer and scalable instrumentation. Nevertheless, closed-loop control still requires a complementary noninvasive imaging modality to reveal position and trajectory, such as ultrasound or x-rays, increasing complexity and posing a barrier to use. Here, we investigate the possibility of combining actuation and sensing via inductive detection of model microrobots under field magnitudes ranging from 0.5 mT to 10s of mT rotating at 1 Hz to 100 Hz. A prototype apparatus accomplishes this using adjustment mechanisms for both phase and amplitude to finely balance sense and compensation coils, suppressing the background signal of the driving RMF by 90 dB. Rather than relying on frequency decomposition to analyze signals, we show that, for rotational actuation, phase decomposition is more appropriate. We demonstrate inductive detection of a micromagnet placed in distinct viscous environments using RMFs with fixed and time-varying frequencies. Finally, we show how magnetostatic gating fields can spatially isolate inductive signals from a micromagnet actuated by an RMF, with the resolution set by the relative magnitude of the gating field and the RMF. The concepts developed here lay a foundation for future closed-loop control schemes for magnetic microrobots based on simultaneous inductive sensing and actuation. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: 33 pages, 4 main figures, 6 supplementary figures

arXiv:2304.07218 [pdf, other]

doi 10.1145/3587135.3592192

QuACS: Variational Quantum Algorithm for Coalition Structure Generation in Induced Subgraph Games

Authors: Supreeth Mysore Venkatesh, Antonio Macaluso, Matthias Klusch

Abstract: Coalition Structure Generation (CSG) is an NP-Hard problem in which agents are partitioned into mutually exclusive groups to maximize their social welfare. In this work, we propose QuACS, a novel hybrid quantum classical algorithm for Coalition Structure Generation in Induced Subgraph Games (ISGs). Starting from a coalition structure where all the agents belong to a single coalition, QuACS recursi… ▽ More Coalition Structure Generation (CSG) is an NP-Hard problem in which agents are partitioned into mutually exclusive groups to maximize their social welfare. In this work, we propose QuACS, a novel hybrid quantum classical algorithm for Coalition Structure Generation in Induced Subgraph Games (ISGs). Starting from a coalition structure where all the agents belong to a single coalition, QuACS recursively identifies the optimal partition into two disjoint subsets. This problem is reformulated as a QUBO and then solved using QAOA. Given an $n$-agent ISG, we show that the proposed algorithm outperforms existing approximate classical solvers with a runtime of $\mathcal{O}(n^2)$ and an expected approximation ratio of $92\%$. Furthermore, it requires a significantly lower number of qubits and allows experiments on medium-sized problems compared to existing quantum solutions. To show the effectiveness of QuACS we perform experiments on standard benchmark datasets using quantum simulation. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 7 pages, 2 figures, 1 table

arXiv:2304.03510 [pdf, other]

Multispectral Imaging for Differential Face Morphing Attack Detection: A Preliminary Study

Authors: Raghavendra Ramachandra, Sushma Venkatesh, Naser Damer, Narayan Vetrekar, Rajendra Gad

Abstract: Face morphing attack detection is emerging as an increasingly challenging problem owing to advancements in high-quality and realistic morphing attack generation. Reliable detection of morphing attacks is essential because these attacks are targeted for border control applications. This paper presents a multispectral framework for differential morphing-attack detection (D-MAD). The D-MAD methods ar… ▽ More Face morphing attack detection is emerging as an increasingly challenging problem owing to advancements in high-quality and realistic morphing attack generation. Reliable detection of morphing attacks is essential because these attacks are targeted for border control applications. This paper presents a multispectral framework for differential morphing-attack detection (D-MAD). The D-MAD methods are based on using two facial images that are captured from the ePassport (also called the reference image) and the trusted device (for example, Automatic Border Control (ABC) gates) to detect whether the face image presented in ePassport is morphed. The proposed multispectral D-MAD framework introduce a multispectral image captured as a trusted capture to acquire seven different spectral bands to detect morphing attacks. Extensive experiments were conducted on the newly created Multispectral Morphed Datasets (MSMD) with 143 unique data subjects that were captured using both visible and multispectral cameras in multiple sessions. The results indicate the superior performance of the proposed multispectral framework compared to visible images. △ Less

Submitted 25 October, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024

arXiv:2303.14004 [pdf, other]

Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins

Authors: Raghavendra Ramachandra, Sushma Venkatesh, Gaurav Jaswal, Guoqiang Li

Abstract: Face morphing attacks have emerged as a potential threat, particularly in automatic border control scenarios. Morphing attacks permit more than one individual to use travel documents that can be used to cross borders using automatic border control gates. The potential for morphing attacks depends on the selection of data subjects (accomplice and malicious actors). This work investigates lookalike… ▽ More Face morphing attacks have emerged as a potential threat, particularly in automatic border control scenarios. Morphing attacks permit more than one individual to use travel documents that can be used to cross borders using automatic border control gates. The potential for morphing attacks depends on the selection of data subjects (accomplice and malicious actors). This work investigates lookalike and identical twins as the source of face morphing generation. We present a systematic study on benchmarking the vulnerability of Face Recognition Systems (FRS) to lookalike and identical twin morphing images. Therefore, we constructed new face morphing datasets using 16 pairs of identical twin and lookalike data subjects. Morphing images from lookalike and identical twins are generated using a landmark-based method. Extensive experiments are carried out to benchmark the attack potential of lookalike and identical twins. Furthermore, experiments are designed to provide insights into the impact of vulnerability with normal face morphing compared with lookalike and identical twin face morphing. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: Accepted in IWBF 2023

arXiv:2303.01684 [pdf, other]

BO-Muse: A human expert and AI teaming framework for accelerated experimental design

Authors: Sunil Gupta, Alistair Shilton, Arun Kumar A V, Shannon Ryan, Majid Abdolshah, Hung Le, Santu Rana, Julian Berk, Mahad Rashid, Svetha Venkatesh

Abstract: In this paper we introduce BO-Muse, a new approach to human-AI teaming for the optimization of expensive black-box functions. Inspired by the intrinsic difficulty of extracting expert knowledge and distilling it back into AI models and by observations of human behavior in real-world experimental design, our algorithm lets the human expert take the lead in the experimental process. The human expert… ▽ More In this paper we introduce BO-Muse, a new approach to human-AI teaming for the optimization of expensive black-box functions. Inspired by the intrinsic difficulty of extracting expert knowledge and distilling it back into AI models and by observations of human behavior in real-world experimental design, our algorithm lets the human expert take the lead in the experimental process. The human expert can use their domain expertise to its full potential, while the AI plays the role of a muse, injecting novelty and searching for areas of weakness to break the human out of over-exploitation induced by cognitive entrenchment. With mild assumptions, we show that our algorithm converges sub-linearly, at a rate faster than the AI or human alone. We validate our algorithm using synthetic data and with human experts performing real-world experiments. △ Less

Submitted 30 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 34 Pages, 7 Figures and 5 Tables

arXiv:2302.04013 [pdf, other]

Zero-shot Sim2Real Adaptation Across Environments

Authors: Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana, Svetha Venkatesh

Abstract: Simulation based learning often provides a cost-efficient recourse to reinforcement learning applications in robotics. However, simulators are generally incapable of accurately replicating real-world dynamics, and thus bridging the sim2real gap is an important problem in simulation based learning. Current solutions to bridge the sim2real gap involve hybrid simulators that are augmented with neural… ▽ More Simulation based learning often provides a cost-efficient recourse to reinforcement learning applications in robotics. However, simulators are generally incapable of accurately replicating real-world dynamics, and thus bridging the sim2real gap is an important problem in simulation based learning. Current solutions to bridge the sim2real gap involve hybrid simulators that are augmented with neural residual models. Unfortunately, they require a separate residual model for each individual environment configuration (i.e., a fixed setting of environment variables such as mass, friction etc.), and thus are not transferable to new environments quickly. To address this issue, we propose a Reverse Action Transformation (RAT) policy which learns to imitate simulated policies in the real-world. Once learnt from a single environment, RAT can then be deployed on top of a Universal Policy Network to achieve zero-shot adaptation to new environments. We empirically evaluate our approach in a set of continuous control tasks and observe its advantage as a few-shot and zero-shot learner over competing baselines. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2302.00205 [pdf, other]

Gradient Descent in Neural Networks as Sequential Learning in RKBS

Authors: Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh

Abstract: The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of… ▽ More The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel {\em Banach} space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training. △ Less

Submitted 31 January, 2023; originally announced February 2023.

arXiv:2301.06926 [pdf, ps, other]

Memory-Augmented Theory of Mind Network

Authors: Dung Nguyen, Phuoc Nguyen, Hung Le, Kien Do, Svetha Venkatesh, Truyen Tran

Abstract: Social reasoning necessitates the capacity of theory of mind (ToM), the ability to contextualise and attribute mental states to others without having access to their internal cognitive structure. Recent machine learning approaches to ToM have demonstrated that we can train the observer to read the past and present behaviours of other agents and infer their beliefs (including false beliefs about th… ▽ More Social reasoning necessitates the capacity of theory of mind (ToM), the ability to contextualise and attribute mental states to others without having access to their internal cognitive structure. Recent machine learning approaches to ToM have demonstrated that we can train the observer to read the past and present behaviours of other agents and infer their beliefs (including false beliefs about things that no longer exist), goals, intentions and future actions. The challenges arise when the behavioural space is complex, demanding skilful space navigation for rapidly changing contexts for an extended period. We tackle the challenges by equip** the observer with novel neural memory mechanisms to encode, and hierarchical attention to selectively retrieve information about others. The memories allow rapid, selective querying of distal related past behaviours of others to deliberatively reason about their current mental state, beliefs and future behaviours. This results in ToMMY, a theory of mind model that learns to reason while making little assumptions about the underlying mental processes. We also construct a new suite of experiments to demonstrate that memories facilitate the learning process and achieve better theory of mind performance, especially for high-demand false-belief tasks that require inferring through multiple steps of changes. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: Accepted for publication at AAAI 2023

arXiv:2301.04949 [pdf, ps, other]

A Formal Power Series Approach to Multiplicative Dynamic Feedback Interconnection

Authors: Kurusch Ebrahimi-Fard, G. S. Venkatesh

Abstract: The goal of the paper is multi-fold. First, an explicit formula is derived to compute the non-commutative generating series of a closed-loop system when a (multi-input, multi-output) plant, given in Chen--Fliess series description is in multiplicative output feedback interconnection with another system, also given as Chen--Fliess series. Furthermore, it is shown that the multiplicative dynamic out… ▽ More The goal of the paper is multi-fold. First, an explicit formula is derived to compute the non-commutative generating series of a closed-loop system when a (multi-input, multi-output) plant, given in Chen--Fliess series description is in multiplicative output feedback interconnection with another system, also given as Chen--Fliess series. Furthermore, it is shown that the multiplicative dynamic output feedback connection has a natural interpretation as a transformation group acting on the plant. A computational framework for computing the generating series for multiplicative dynamic output feedback is devised utilizing the Hopf algebras of the coordinate functions corresponding to the shuffle group and the multiplicative feedback group. The pre--Lie algebra in multiplicative feedback is shown to be an example of Foissy's com-pre-Lie algebras indexed by matrices with certain structure. △ Less

Submitted 23 October, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

arXiv:2301.04419 [pdf, other]

Static Analysis Driven Enhancements for Comprehension in Machine Learning Notebooks

Authors: Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Mouli Chekkapalli, Jiawei Wang, Li Li, Eric Bodden

Abstract: Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating and sharing machine-learning based solutions, primarily written in Python. Recent studies have demonstrated, however, that a large portion of Jupyter notebooks available on public platforms are undocumented and lacks a… ▽ More Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating and sharing machine-learning based solutions, primarily written in Python. Recent studies have demonstrated, however, that a large portion of Jupyter notebooks available on public platforms are undocumented and lacks a narrative structure. This reduces the readability of these notebooks. To address this shortcoming, this paper presents HeaderGen, a novel tool-based approach that automatically annotates code cells with categorical markdown headers based on a taxonomy of ML operations, and classifies and displays function calls according to this taxonomy. For this functionality to be realized, HeaderGen enhances an existing call graph analysis in PyCG. To improve precision, HeaderGen extends PyCG's analysis with support for handling external library code and flow-sensitivity. The former is realized by facilitating the resolution of function return-types. The evaluation on 15 real-world Jupyter notebooks from Kaggle shows that HeaderGen's underlying call graph analysis yields high accuracy (95.6% precision and 95.3% recall). This is because HeaderGen can resolve return-types of external libraries where existing type inference tools such as pytype (by Google), pyright (by Microsoft), and Jedi fall short. The header generation has a precision of 85.7% and a recall rate of 92.8%. In a user study, HeaderGen helps participants finish comprehension and navigation tasks faster. To further evaluate the type inference capability of tools, we introduce TypeEvalPy, a framework for evaluating type inference tools with a micro-benchmark containing 154 code snippets and 845 type annotations. Our comparative analysis on four tools revealed that HeaderGen outperforms other tools in exact matches with the ground truth. △ Less

Submitted 11 June, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: To be published in: EMSE Journal

arXiv:2212.14863 [pdf, other]

Rabinowitz Fukaya categories and the categorical formal punctured neighborhood of infinity

Authors: Sheel Ganatra, Yuan Gao, Sara Venkatesh

Abstract: This paper constructs and studies the Rabinowitz (wrapped) Fukaya category, a categorical invariant of exact cylindrical Lagrangians in a Liouville manifold whose cohomological morphisms, ``Rabinowitz wrapped Floer homology groups" measure the failure of wrapped Floer cohomology to satisfy Poincare duality (and in particular vanish for any pair with at least one compact Lagrangian). Our main resul… ▽ More This paper constructs and studies the Rabinowitz (wrapped) Fukaya category, a categorical invariant of exact cylindrical Lagrangians in a Liouville manifold whose cohomological morphisms, ``Rabinowitz wrapped Floer homology groups" measure the failure of wrapped Floer cohomology to satisfy Poincare duality (and in particular vanish for any pair with at least one compact Lagrangian). Our main result, answering a conjecture of Abouzaid, relates the Rabinowitz and usual wrapped Fukaya category by way of a general construction introduced by Efimov, the categorical formal punctured neighborhood of infinity. As an application, we show how Rabinowitz Fukaya categories can be fit into - and in particular often computed in terms of - mirror symmetry. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: 82 pages, 5 figures, comments welcome

MSC Class: 53D37; 18G70; 14J33; 16E40

arXiv:2212.11372 [pdf, other]

doi 10.1007/978-3-031-36030-5_11

GCS-Q: Quantum Graph Coalition Structure Generation

Authors: Supreeth Mysore Venkatesh, Antonio Macaluso, Matthias Klusch

Abstract: The problem of generating an optimal coalition structure for a given coalition game of rational agents is to find a partition that maximizes their social welfare and is known to be NP-hard. This paper proposes GCS-Q, a novel quantum-supported solution for Induced Subgraph Games (ISGs) in coalition structure generation. GCS-Q starts by considering the grand coalition as initial coalition structure… ▽ More The problem of generating an optimal coalition structure for a given coalition game of rational agents is to find a partition that maximizes their social welfare and is known to be NP-hard. This paper proposes GCS-Q, a novel quantum-supported solution for Induced Subgraph Games (ISGs) in coalition structure generation. GCS-Q starts by considering the grand coalition as initial coalition structure and proceeds by iteratively splitting the coalitions into two nonempty subsets to obtain a coalition structure with a higher coalition value. In particular, given an $n$-agent ISG, the GCS-Q solves the optimal split problem $\mathcal{O} (n)$ times using a quantum annealing device, exploring $\mathcal{O}(2^n)$ partitions at each step. We show that GCS-Q outperforms the currently best classical solvers with its runtime in the order of $n^2$ and an expected worst-case approximation ratio of $93\%$ on standard benchmark datasets. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: 6 pages, 3 figures

arXiv:2211.16036 [pdf, other]

doi 10.3847/2515-5172/aca6ec

PERISTOLE: PackagE that geneRates tIme delay plotS caused by graviTatiOnaL lEnsing

Authors: T. S. Sachin Venkatesh, Gaurav Pundir

Abstract: We present PERISTOLE to study the various time delays associated with the pulsar rotation and other general relativistic aspects of binary pulsars. It is made available as an open-source python package which takes some parameters of the double pulsar system as input and outputs the rotational and latitudinal lensing delays along with the geometric and Shapiro delays that arise due to gravitational… ▽ More We present PERISTOLE to study the various time delays associated with the pulsar rotation and other general relativistic aspects of binary pulsars. It is made available as an open-source python package which takes some parameters of the double pulsar system as input and outputs the rotational and latitudinal lensing delays along with the geometric and Shapiro delays that arise due to gravitational lensing. This package was intended to provide a way to quickly analyse, evaluate and study the differences between variations of the same systems and also to quantify the consequences that different parameters have over the system. Through this research note, we briefly describe the motivation behind PERISTOLE and showcase its capabilities using the only double pulsar system ever found, J0737-3039. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 8 subfigures placed as sets of 2 to showcase the graph functions, accepted to RNAAS

Journal ref: Research Notes of the AAS, 6(12), 255

arXiv:2211.13208 [pdf, other]

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Authors: Thanh Nguyen-Tang, Ming Yin, Sunil Gupta, Svetha Venkatesh, Raman Arora

Abstract: Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$ being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We pr… ▽ More Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$ being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrap** and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of \emph{concentrability} with respect to an optimal policy, the proposed algorithm yields a fast rate of $\tilde{\mathcal{O}}(\frac{1}{K})$ for offline RL when there is a positive gap in the optimal Q-value functions, even when the offline data were adaptively collected. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when $K$ exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first $\tilde{\mathcal{O}}(\frac{1}{K})$ bound and absolute zero sub-optimality bound respectively for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds. △ Less

Submitted 27 January, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: AAAI'23

arXiv:2211.07615 [pdf, other]

UGIF: UI Grounded Instruction Following

Authors: Sagar Gubbi Venkatesh, Partha Talukdar, Srini Narayanan

Abstract: Smartphone users often find it difficult to navigate myriad menus to perform common tasks such as "How to block calls from unknown numbers?". Currently, help documents with step-by-step instructions are manually written to aid the user. The user experience can be further enhanced by grounding the instructions in the help document to the UI and overlaying a tutorial on the phone UI. To build such t… ▽ More Smartphone users often find it difficult to navigate myriad menus to perform common tasks such as "How to block calls from unknown numbers?". Currently, help documents with step-by-step instructions are manually written to aid the user. The user experience can be further enhanced by grounding the instructions in the help document to the UI and overlaying a tutorial on the phone UI. To build such tutorials, several natural language processing components including retrieval, parsing, and grounding are necessary, but there isn't any relevant dataset for such a task. Thus, we introduce UGIF-DataSet, a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone containing 4,184 tasks across 8 languages. As an initial approach to this problem, we propose retrieving the relevant instruction steps based on the user's query and parsing the steps using Large Language Models (LLMs) to generate macros that can be executed on-device. The instruction steps are often available only in English, so the challenge includes cross-modal, cross-lingual retrieval of English how-to pages from user queries in many languages and map** English instruction steps to UI in a potentially different language. We compare the performance of different LLMs including PaLM and GPT-3 and find that the end-to-end task completion rate is 48% for English UI but the performance drops to 32% for other languages. We analyze the common failure modes of existing models on this task and point out areas for improvement. △ Less

Submitted 23 May, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2209.10359 [pdf, other]

Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation

Authors: Kien Do, Hung Le, Dung Nguyen, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh

Abstract: Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be… ▽ More Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be large if the generator and the student are trained adversarially, causing the student to forget the knowledge it acquired at previous steps. To alleviate this problem, we propose a simple yet effective method called Momentum Adversarial Distillation (MAD) which maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Since the EMA generator can be considered as an ensemble of the generator's old versions and often undergoes a smaller change in updates compared to the generator, training on its synthetic samples can help the student recall the past knowledge and prevent the student from adapting too quickly to new updates of the generator. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2207.12106 [pdf, other]

Black-box Few-shot Knowledge Distillation

Authors: Dang Nguyen, Sunil Gupta, Kien Do, Svetha Venkatesh

Abstract: Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens… ▽ More Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: To appear at ECCV 2022

arXiv:2207.08300 [pdf, ps, other]

A Formal Power Series Approach to Multiplicative Dynamic and Static Output Feedback

Authors: G. S. Venkatesh

Abstract: The goal of the paper is two-fold. The first of which is to derive an explicit formula to compute the generating series of a closed-loop system when a plant, given in a Chen-Fliess series description is in multiplicative output feedback connection with another system given in Chen-Fliess series description. Further, the objective extends in showing that the multiplicative dynamic output feedback c… ▽ More The goal of the paper is two-fold. The first of which is to derive an explicit formula to compute the generating series of a closed-loop system when a plant, given in a Chen-Fliess series description is in multiplicative output feedback connection with another system given in Chen-Fliess series description. Further, the objective extends in showing that the multiplicative dynamic output feedback connection has a natural interpretation as a transformation group acting on the plant. The second of the two-part goal of this paper is same as the first part albeit when the Chen-Fliess series in the feedback is replaced by a memoryless map. The paper provides an explicit formula to compute the generating series of a closed-loop system in multiplicative static output feedback connection and shows that the static feedback has a natural interpretation as a transformation group acting on the plant. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: accepted in $25^{th}$ Int. Symposium on Mathematical Theory of Networks and Systems, Bayreuth, Germany, 2022

Showing 1–50 of 253 results for author: Venkatesh, S