-
Preserving Data Privacy for ML-driven Applications in Open Radio Access Networks
Authors:
Pranshav Gajjar,
Azuka Chie**a,
Vijay K. Shah
Abstract:
Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper a…
▽ More
Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper aims to address such privacy concerns by examining the representative case study of shared database scenarios in 5G Open Radio Access Network (O-RAN) networks where we have a shared database within the near-real-time (near-RT) RAN intelligent controller. We focus on securing the data that can be used by machine learning (ML) models for spectrum sharing and interference mitigation applications without compromising the model and network performances. The underlying idea is to leverage a (i) Shuffling-based learnable encryption technique to encrypt the data, following which, (ii) employ a custom Vision transformer (ViT) as the trained ML model that is capable of performing accurate inferences on such encrypted data. The paper offers a thorough analysis and comparisons with analogous convolutional neural networks (CNN) as well as deeper architectures (such as ResNet-50) as baselines. Our experiments showcase that the proposed approach significantly outperforms the baseline CNN with an improvement of 24.5% and 23.9% for the percent accuracy and F1-Score respectively when operated on encrypted data. Though deeper ResNet-50 architecture is obtained as a slightly more accurate model, with an increase of 4.4%, the proposed approach boasts a reduction of parameters by 99.32%, and thus, offers a much-improved prediction time by nearly 60%.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Cardinal-Utility Matching Markets: The Quest for Envy-Freeness, Pareto-Optimality, and Efficient Computability
Authors:
Thorben Tröbst,
Vijay V. Vazirani
Abstract:
Unlike ordinal-utility matching markets, which are well-developed from the viewpoint of both theory and practice, recent insights from a computer science perspective have left cardinal-utility matching markets in a state of flux. The celebrated pricing-based mechanism for one-sided cardinal-utility matching markets due to Hylland and Zeckhauser, which had long eluded efficient algorithms, was fina…
▽ More
Unlike ordinal-utility matching markets, which are well-developed from the viewpoint of both theory and practice, recent insights from a computer science perspective have left cardinal-utility matching markets in a state of flux. The celebrated pricing-based mechanism for one-sided cardinal-utility matching markets due to Hylland and Zeckhauser, which had long eluded efficient algorithms, was finally shown to be intractable; the problem of computing an approximate equilibrium is PPAD-complete.
This led us to ask the question: is there an alternative, polynomial time, mechanism for one-sided cardinal-utility matching markets which achieves the desirable properties of HZ, i.e.\ (ex-ante) envy-freeness (EF) and Pareto-optimality (PO)? In this paper we show:
1. The problem of finding an EF+PO lottery in a one-sided cardinal-utility matching market is PPAD-complete.
2. A $(2 + ε)$-approximately envy-free and (exactly) Pareto-optimal lottery can be found in polynomial time using Nash bargaining. Moreover, the resulting mechanism is $(2 + ε)$-approximately incentive compatible.
We also present several results on two-sided cardinal-utility matching markets, including non-existence of EF+PO lotteries as well as existence of justified-envy-free and weak Pareto-optimal lotteries.
△ Less
Submitted 4 April, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Rationality of Learning Algorithms in Repeated Normal-Form Games
Authors:
Shivam Bajaj,
Pranoy Das,
Yevgeniy Vorobeychik,
Vijay Gupta
Abstract:
Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rational…
▽ More
Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rationality ratio, which is the ratio of the highest payoff an agent can obtain by deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be $c$-rational if its rationality ratio is at most $c$ irrespective of the game. We first establish that popular learning algorithms such as fictitious play and regret matching are not $c$-rational for any constant $c\geq 1$. We then propose and analyze two algorithms that are provably $1$-rational under mild assumptions, and have the same properties as (a generalized version of) fictitious play and regret matching, respectively, if all agents follow them. Finally, we show that if an assumption of perfect monitoring is not satisfied, there are games for which $c$-rational algorithms do not exist, and illustrate our results with numerical case studies.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
MetaVRadar: Measuring Metaverse Virtual Reality Network Activity
Authors:
Minzhao Lyu,
Rahul Dev Tripathi,
Vijay Sivaraman
Abstract:
The "metaverse", wherein users can enter virtual worlds to work, study, play, shop, socialize, and entertain, is fast becoming a reality, attracting billions of dollars in investment from companies such as Meta, Microsoft, and Clipo Labs. Further, virtual reality (VR) headsets from entities like Oculus, HTC, and Microsoft are rapidly maturing to provide fully immersive experiences to metaverse use…
▽ More
The "metaverse", wherein users can enter virtual worlds to work, study, play, shop, socialize, and entertain, is fast becoming a reality, attracting billions of dollars in investment from companies such as Meta, Microsoft, and Clipo Labs. Further, virtual reality (VR) headsets from entities like Oculus, HTC, and Microsoft are rapidly maturing to provide fully immersive experiences to metaverse users. However, little is known about the network dynamics of metaverse VR applications in terms of service domains, flow counts, traffic rates and volumes, content location and latency, etc., which are needed to make telecommunications network infrastructure "metaverse ready". This paper is an empirical measurement study of metaverse VR network behavior aimed at hel** telecommunications network operators better provision and manage the network to ensure good user experience. Using illustrative hour-long network traces of metaverse sessions on the Oculus VR headset, we first develop a categorization of user activity into distinct states ranging from login home to streetwalking and event attendance to asset trading, and undertake a detailed analysis of network traffic per state, identifying unique service domains, protocols, flow profiles, and volumetric patterns, thereby highlighting the vastly more complex nature of a metaverse session compared to streaming video or gaming. Armed with the network behavioral profiles, our second contribution develops a real-time method MetaVRadar to detect metaverse session and classify the user activity state leveraging formalized flow signatures and volumetric attributes. Our third contribution practically implements MetaVRadar, evaluates its accuracy in our lab environment, and demonstrates its usability in a large university network so operators can better monitor and plan resources to support requisite metaverse user experience.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Arboreal Obstructed Atomic Insulating and Metallic Phases of Fermions
Authors:
Gurkirat Singh,
Surajit Bera,
Vijay B. Shenoy
Abstract:
We explore phases of free fermions on arenas that do not tessellate a manifold. Specializing to arboreal arenas described by tree graphs which possess a notion of translation symmetry, we study possible fermionic phases in the BDI symmetry class on the $p$-coordinated Bethe lattice. We find that there are $p$ distinct obstructed atomic insulating phases that are characterized by distinct edge stat…
▽ More
We explore phases of free fermions on arenas that do not tessellate a manifold. Specializing to arboreal arenas described by tree graphs which possess a notion of translation symmetry, we study possible fermionic phases in the BDI symmetry class on the $p$-coordinated Bethe lattice. We find that there are $p$ distinct obstructed atomic insulating phases that are characterized by distinct edge states, pattern of entanglement, and a winding characteristic that we define here. These distinct insulting phases are always separated by a metallic region in the parameter space rather than isolated quantum critical points. The metallic region itself comprises several distinct metallic phases that are distinguished by the winding characteristic and correlation functions. The correlation functions of distinct subsystems display non-analytic behavior at distinct points in the metallic region, signaling a cascade of subsystem transitions. An intriguing feature of these arboreal metals is the presence of truncated subsystems with zero energy boundary modes despite being gapless. This work suggests new opportunities for synthetic quantum systems to realize these novel phases.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Secret Collusion Among Generative AI Agents
Authors:
Sumeet Ramesh Motwani,
Mikhail Baranchuk,
Martin Strohmeier,
Vijay Bolina,
Philip H. S. Torr,
Lewis Hammond,
Christian Schroeder de Witt
Abstract:
Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensi…
▽ More
Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both the AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
System-level Analysis of Adversarial Attacks and Defenses on Intelligence in O-RAN based Cellular Networks
Authors:
Azuka Chie**a,
Brian Kim,
Kaushik Chowhdury,
Vijay K. Shah
Abstract:
While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intellige…
▽ More
While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intelligence components known as xApps within the O-RAN's near-real-time RAN Intelligent Controller (near-RT RIC) platform. Our study begins by develo** a malicious xApp designed to execute adversarial attacks on two types of test data - spectrograms and key performance metrics (KPMs), stored in the RIC database within the near-RT RIC. To mitigate these threats, we utilize a distillation technique that involves training a teacher model at a high softmax temperature and transferring its knowledge to a student model trained at a lower softmax temperature, which is deployed as the robust ML model within xApp. We prototype an over-the-air LTE/5G O-RAN testbed to assess the impact of these attacks and the effectiveness of the distillation defense technique by leveraging an ML-based Interference Classification (InterClass) xApp as an example. We examine two versions of InterClass xApp under distinct scenarios, one based on Convolutional Neural Networks (CNNs) and another based on Deep Neural Networks (DNNs) using spectrograms and KPMs as input data respectively. Our findings reveal up to 100% and 96.3% degradation in the accuracy of both the CNN and DNN models respectively resulting in a significant decline in network performance under considered adversarial attacks. Under the strict latency constraints of the near-RT RIC closed control loop, our analysis shows that the distillation technique outperforms classical adversarial training by achieving an accuracy of up to 98.3% for mitigating such attacks.
△ Less
Submitted 13 February, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Astigmatic Speckle-learned OAM Shift Keying and OAM Multiplexing
Authors:
Trishita Das,
Manas Ranjan Pandit,
Venugopal Raskatla,
Purnesh Singh Badavath,
Vijay Kumar
Abstract:
Orbital angular momentum (OAM)-carrying beams have gained significant attention in recent years due to their unique properties and potential to improve spectral efficiency and data transmission rates in optical communication systems. However, fully exploiting the capabilities of the entire OAM mode spectrum remains challenging. The emergence of AI-driven OAM mode identification has revolutionized…
▽ More
Orbital angular momentum (OAM)-carrying beams have gained significant attention in recent years due to their unique properties and potential to improve spectral efficiency and data transmission rates in optical communication systems. However, fully exploiting the capabilities of the entire OAM mode spectrum remains challenging. The emergence of AI-driven OAM mode identification has revolutionized the demultiplexing process within optical communication channels. OAM beams with different orders are orthogonal, allowing each beam to serve as a distinct signal carrier. Combining multiple OAM beams can effectively enhance channel capacity. In this paper, we adopt speckle-learned demultiplexing to demultiplex OAM beams via its speckle pattern that is more resilient to alignment and noise. However, the use of only non-intensity degenerate beams limits the utilization of multiplexing resources. This approach aims to fully leverage the full spectrum of OAM beams by introducing astigmatism in far-field speckle patterns using a tilted spherical convex lens. We then conduct a comprehensive analysis of two innovative information encoding techniques: OAM shift keying and OAM multiplexing. We successfully demonstrate an optical communication link encoded using both OAM shift keying and OAM multiplexing, followed by accurate decoding via speckle-learned demultiplexing.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Spreading Information via Social Networks: An Irrelevance Result
Authors:
Yu Awaya,
Vijay Krishna
Abstract:
An informed planner wishes to spread information among a group of agents in order to induce efficient coordination -- say the adoption of a new technology with positive externalities. The agents are connected via a social network. The planner informs a seed and then the information spreads via the network. While the structure of the network affects the rate of diffusion, we show that the rate of a…
▽ More
An informed planner wishes to spread information among a group of agents in order to induce efficient coordination -- say the adoption of a new technology with positive externalities. The agents are connected via a social network. The planner informs a seed and then the information spreads via the network. While the structure of the network affects the rate of diffusion, we show that the rate of adoption is the same for all acyclic networks.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
ASCENT: A Context-Aware Spectrum Coexistence Design and Implementation Toolset for Policymakers in Satellite Bands
Authors:
Ta-seen Reaz Niloy,
Saurav Kumar,
Aniruddha Hore,
Zoheb Hassan,
Carl Dietrich,
Eric W. Burger,
Jeffrey H. Reed,
Vijay K. Shah
Abstract:
This paper introduces ASCENT (context Aware Spectrum Coexistence Design and Implementation) toolset, an advanced context-aware terrestrial satellite spectrum sharing toolset designed for researchers, policymakers, and regulators. It serves two essential purposes (a) evaluating the potential for harmful interference to primary users in satellite bands and (b) facilitating the analysis, design, and…
▽ More
This paper introduces ASCENT (context Aware Spectrum Coexistence Design and Implementation) toolset, an advanced context-aware terrestrial satellite spectrum sharing toolset designed for researchers, policymakers, and regulators. It serves two essential purposes (a) evaluating the potential for harmful interference to primary users in satellite bands and (b) facilitating the analysis, design, and implementation of diverse regulatory policies on spectrum usage and sharing. Notably, ASCENT implements a closed-loop feedback system that allows dynamic adaptation of policies according to a wide range of contextual factors (e.g., weather, buildings, summer/winter foliage, etc.) and feedback on the impact of these policies through realistic simulation. Specifically, ASCENT comprises the following components (i) interference evaluation tool for evaluating interference at the incumbents in a spectrum-sharing environment while taking the underlying contexts, (ii) dynamic spectrum access (DSA) framework for providing context-aware instructions to adapt networking parameters and control secondary terrestrial network's access to the shared spectrum band according to context aware prioritization, (iii) Context broker to acquire essential and relevant contexts from external context information providers; and (iv) DSA Database to store dynamic and static contexts and the regulator's policy information. The closed-loop feedback system of ASCENT is implemented by integrating these components in a modular software architecture. A case study of sharing the lower 12 GHz Ku band (12.2-12.7 GHz) with the 5G terrestrial cellular network is considered, and the usability of ASCENT is demonstrated by dynamically changing exclusion zone's radius in different weather conditions.
△ Less
Submitted 15 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Context-Aware Spectrum Coexistence of Terrestrial Beyond 5G Networks in Satellite Bands
Authors:
Ta Seen Reaz Niloy,
Zoheb Hasan,
Rob Smith,
Vikram R. Anapana,
Vijay K. Shah
Abstract:
Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art s…
▽ More
Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art spectrum-sharing policies usually consider several worst-case assumptions and ignore site-specific contextual factors in making spectrum-sharing decisions, and thus, often results in under-utilization of the shared band for the secondary licensees. To address such limitations, this paper introduces CAT3S (Context-Aware Terrestrial-Satellite Spectrum Sharing) framework that empowers the coexisting terrestrial 5G network to maximize utilization of the shared satellite band without creating harmful interference to the incumbent links by exploiting the contextual factors. CAT3S consists of the following two components: (i) context-acquisition unit to collect and process essential contextual information for spectrum sharing and (ii) context-aware base station (BS) control unit to optimize the set of operational BSs and their operation parameters (i.e., transmit power and active beams per sector). To evaluate the performance of the CAT3S, a realistic spectrum coexistence case study over the 12 GHz band is considered. Experiment results demonstrate that the proposed CAT3S achieves notably higher spectrum utilization than state-of-the-art spectrum-sharing policies in different weather contexts.
△ Less
Submitted 14 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Digits micro-model for accurate and secure transactions
Authors:
Chirag Chhablani,
Nikhita Sharma,
Jordan Hosier,
Vijay K. Gurbani
Abstract:
Automatic Speech Recognition (ASR) systems are used in the financial domain to enhance the caller experience by enabling natural language understanding and facilitating efficient and intuitive interactions. Increasing use of ASR systems requires that such systems exhibit very low error rates. The predominant ASR models to collect numeric data are large, general-purpose commercial models -- Google…
▽ More
Automatic Speech Recognition (ASR) systems are used in the financial domain to enhance the caller experience by enabling natural language understanding and facilitating efficient and intuitive interactions. Increasing use of ASR systems requires that such systems exhibit very low error rates. The predominant ASR models to collect numeric data are large, general-purpose commercial models -- Google Speech-to-text (STT), or Amazon Transcribe -- or open source (OpenAI's Whisper). Such ASR models are trained on hundreds of thousands of hours of audio data and require considerable resources to run. Despite recent progress large speech recognition models, we highlight the potential of smaller, specialized "micro" models. Such light models can be trained perform well on number recognition specific tasks, competing with general models like Whisper or Google STT while using less than 80 minutes of training time and occupying at least an order of less memory resources. Also, unlike larger speech recognition models, micro-models are trained on carefully selected and curated datasets, which makes them highly accurate, agile, and easy to retrain, while using low compute resources. We present our work on creating micro models for multi-digit number recognition that handle diverse speaking styles reflecting real-world pronunciation patterns. Our work contributes to domain-specific ASR models, improving digit recognition accuracy, and privacy of data. An added advantage, their low resource consumption allows them to be hosted on-premise, kee** private data local instead uploading to an external cloud. Our results indicate that our micro-model makes less errors than the best-of-breed commercial or open-source ASRs in recognizing digits (1.8% error rate of our best micro-model versus 5.8% error rate of Whisper), and has a low memory footprint (0.66 GB VRAM for our model versus 11 GB VRAM for Whisper).
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis
Authors:
Zhengyang Lu,
Stefan Siemer,
Piyush Jha,
Joel Day,
Florin Manea,
Vijay Ganesh
Abstract:
Modern SMT solvers, such as Z3, offer user-controllable strategies, enabling users to tailor solving strategies for their unique set of instances, thus dramatically enhancing solver performance for their use case. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task fo…
▽ More
Modern SMT solvers, such as Z3, offer user-controllable strategies, enabling users to tailor solving strategies for their unique set of instances, thus dramatically enhancing solver performance for their use case. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task for both solver developers and users alike.
In this paper, we address this problem of automatic SMT strategy synthesis via a novel Monte Carlo Tree Search (MCTS) based method. Our method treats strategy synthesis as a sequential decision-making process, whose search tree corresponds to the strategy space, and employs MCTS to navigate this vast search space. The key innovations that enable our method to identify effective strategies, while kee** costs low, are the ideas of layered and staged MCTS search. These novel heuristics allow for a deeper and more efficient exploration of the strategy space, enabling us to synthesize more effective strategies than the default ones in state-of-the-art (SOTA) SMT solvers. We implement our method, dubbed Z3alpha, as part of the Z3 SMT solver. Through extensive evaluations across six important SMT logics, Z3alpha demonstrates superior performance compared to the SOTA synthesis tool FastSMT, the default Z3 solver, and the CVC5 solver on most benchmarks. Remarkably, on a challenging QF_BV benchmark set, Z3alpha solves 42.7% more instances than the default strategy in the Z3 SMT solver.
△ Less
Submitted 30 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
AlphaMapleSAT: An MCTS-based Cube-and-Conquer SAT Solver for Hard Combinatorial Problems
Authors:
Piyush Jha,
Zhengyu Li,
Zhengyang Lu,
Curtis Bright,
Vijay Ganesh
Abstract:
This paper introduces AlphaMapleSAT, a novel Monte Carlo Tree Search (MCTS) based Cube-and-Conquer (CnC) SAT solving method aimed at efficiently solving challenging combinatorial problems. Despite the tremendous success of CnC solvers in solving a variety of hard combinatorial problems, the lookahead cubing techniques at the heart of CnC have not evolved much for many years. Part of the reason is…
▽ More
This paper introduces AlphaMapleSAT, a novel Monte Carlo Tree Search (MCTS) based Cube-and-Conquer (CnC) SAT solving method aimed at efficiently solving challenging combinatorial problems. Despite the tremendous success of CnC solvers in solving a variety of hard combinatorial problems, the lookahead cubing techniques at the heart of CnC have not evolved much for many years. Part of the reason is the sheer difficulty of coming up with new cubing techniques that are both low-cost and effective in partitioning input formulas into sub-formulas, such that the overall runtime is minimized.
Lookahead cubing techniques used by current state-of-the-art CnC solvers, such as March, keep their cubing costs low by constraining the search for the optimal splitting variables. By contrast, our key innovation is a deductively-driven MCTS-based lookahead cubing technique, that performs a deeper heuristic search to find effective cubes, while kee** the cubing cost low. We perform an extensive comparison of AlphaMapleSAT against the March CnC solver on challenging combinatorial problems such as the minimum Kochen-Specker and Ramsey problems. We also perform ablation studies to verify the efficacy of the MCTS heuristic search for the cubing problem. Results show up to 2.3x speedup in parallel (and up to 27x in sequential) elapsed real time.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
Authors:
Vinotha R,
Hepsiba D,
L. D. Vijay Anand,
Deepak John Reji
Abstract:
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to commu…
▽ More
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for hel** speech disorders to communicate more effectively as well as for professionals seeking to integrate voice cloning or speech synthesis capabilities into their projects. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice. The architecture of the system comprises a speaker verification system, a synthesizer, a vocoder, and noise reduction. Speaker verification system trained on a varied set of speakers to achieve optimal generalization performance without relying on transcriptions. Synthesizer is trained using both audio and transcriptions that generate Mel spectrogram from a text and vocoder which converts the generated Mel Spectrogram into corresponding audio signal. Then the audio signal is processed by a noise reduction algorithm to eliminate unwanted noise and enhance speech clarity. The performance of synthesized speech from seen and unseen speakers are then evaluated using subjective and objective evaluation such as Mean Opinion Score (MOS), Gross Pitch Error (GPE), and Spectral distortion (SD). The model can create speech in distinct voices by including speaker characteristics that are chosen randomly.
△ Less
Submitted 16 February, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Fractional Conformal Map, Qubit Dynamics and the Leggett-Garg Inequality
Authors:
Sourav Paul,
Anant Vijay Varma,
Sourin Das
Abstract:
Any pure state of a qubit can be geometrically represented as a point on the extended complex plane through stereographic projection. By employing successive conformal maps on the extended complex plane, we can generate an effective discrete-time evolution of the pure states of the qubit. This work focuses on a subset of analytic maps known as fractional linear conformal maps. We show that these m…
▽ More
Any pure state of a qubit can be geometrically represented as a point on the extended complex plane through stereographic projection. By employing successive conformal maps on the extended complex plane, we can generate an effective discrete-time evolution of the pure states of the qubit. This work focuses on a subset of analytic maps known as fractional linear conformal maps. We show that these maps serve as a unifying framework for a diverse range of quantum-inspired conceivable dynamics, including (i) unitary dynamics,(ii) non-unitary but linear dynamics and (iii) non-unitary and non-linear dynamics where linearity (non-linearity) refers to the action of the discrete time evolution operator on the Hilbert space. We provide a characterization of these maps in terms of Leggett-Garg Inequality complemented with No-signaling in Time (NSIT) and Arrow of Time (AoT) conditions.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Spontaneous localization at a potential saddle point from edge state reconstruction in a quantum Hall point contact
Authors:
Liam A. Cohen,
Noah L. Samuelson,
Taige Wang,
Kai Klocke,
Cian C. Reeves,
Takashi Taniguchi,
Kenji Watanabe,
Sagar Vijay,
Michael P. Zaletel,
Andrea F. Young
Abstract:
Quantum point contacts (QPCs) are an essential component in mesoscopic devices. Here, we study the transmission of quantum Hall edge modes through a gate-defined QPC in monolayer graphene. We observe resonant tunneling peaks and a nonlinear conductance pattern characteristic of Coulomb-blockaded localized states. The in-plane electric polarizability reveals the states are localized at a classicall…
▽ More
Quantum point contacts (QPCs) are an essential component in mesoscopic devices. Here, we study the transmission of quantum Hall edge modes through a gate-defined QPC in monolayer graphene. We observe resonant tunneling peaks and a nonlinear conductance pattern characteristic of Coulomb-blockaded localized states. The in-plane electric polarizability reveals the states are localized at a classically-unstable electrostatic saddle point. We explain this unexpected finding within a self-consistent Thomas-Fermi model, finding that localization of a zero-dimensional state at the saddle point is favored whenever the applied confinement potential is sufficiently soft compared to the Coulomb energy. Our results provide a direct demonstration of Coulomb-driven reconstruction at the boundary of a quantum Hall system.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Catastrophic Interference is Mitigated in Naturalistic Power-Law Learning Environments
Authors:
Atith Gandhi,
Raj Sanjay Shah,
Vijay Marupudi,
Sashank Varma
Abstract:
Neural networks often suffer from catastrophic interference (CI): performance on previously learned tasks drops off significantly when learning a new task. This contrasts strongly with humans, who can sequentially learn new tasks without appreciably forgetting previous tasks. Prior work has explored various techniques for mitigating CI such as regularization, rehearsal, generative replay, and dist…
▽ More
Neural networks often suffer from catastrophic interference (CI): performance on previously learned tasks drops off significantly when learning a new task. This contrasts strongly with humans, who can sequentially learn new tasks without appreciably forgetting previous tasks. Prior work has explored various techniques for mitigating CI such as regularization, rehearsal, generative replay, and distillation methods. The current work takes a different approach, one guided by cognitive science research showing that in naturalistic environments, the probability of encountering a task decreases as a power-law of the time since it was last performed. We argue that a realistic evaluation of techniques for the mitigation of CI should be performed in simulated naturalistic learning environments. Thus, we evaluate the extent of mitigation of CI when training simple rehearsal-based methods in power-law environments similar to the ones humans face. Our work explores this novel rehearsal-based approach for a domain-incremental task: learning permutations in the MNIST task. We compare our rehearsal environment with other baselines to show its efficacy in promoting continual learning. Additionally, we investigate whether this environment shows forward facilitation, i.e., faster learning of later tasks. Next, we explore the robustness of our learning environment to the number of tasks, model size, and amount of data rehearsed after each task. Notably, our results show that the performance is comparable or superior to that of models trained using popular regularization methods and also to rehearsals in non-power-law environments. The benefits of this training paradigm include simplicity and the lack of a need for extra neural circuitry. In addition, because our method is orthogonal to other methods, future research can combine training in power-law environments with other continual learning mechanisms.
△ Less
Submitted 22 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Estimation of Tsallis entropy for exponentially distributed several populations
Authors:
Naveen Kumar,
Ambesh Dixit,
Vivek Vijay
Abstract:
We study the estimation of Tsallis entropy of a finite number of independent populations, each following an exponential distribution with the same scale parameter and distinct location parameters for $q>0$. We derive a Stein-type improved estimate, establishing the inadmissibility of the best affine equivariant estimate of the parameter function. A class of smooth estimates utilizing the Brewster…
▽ More
We study the estimation of Tsallis entropy of a finite number of independent populations, each following an exponential distribution with the same scale parameter and distinct location parameters for $q>0$. We derive a Stein-type improved estimate, establishing the inadmissibility of the best affine equivariant estimate of the parameter function. A class of smooth estimates utilizing the Brewster technique is obtained, resulting in a significant improvement in the risk value. We computed the Brewster-Zidek estimates for both one and two populations, to illustrate the comparison with best affine equivariant and Stein-type estimates. We further derive that the Bayesian estimate, employing an inverse gamma prior, which takes the best affine equivariant estimate as a particular case. We provide a numerical illustration utilizing simulated samples for a single population. The purpose is to demonstrate the impact of sample size, location parameter, and entropic index on the estimates.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Authors:
Angels Balaguer,
Vinamra Benara,
Renato Luiz de Freitas Cunha,
Roberto de M. Estevão Filho,
Todd Hendry,
Daniel Holstein,
Jennifer Marsman,
Nick Mecklenburg,
Sara Malvar,
Leonardo O. Nunes,
Rafael Padilha,
Morris Sharp,
Bruno Silva,
Swati Sharma,
Vijay Aski,
Ranveer Chandra
Abstract:
There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well…
▽ More
There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset. Agriculture as an industry has not seen much penetration of AI, and we study a potentially disruptive application - what if we could provide location-specific insights to a farmer? Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge, and the quantitative and qualitative benefits of RAG and fine-tuning. We see an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. In one particular experiment, we also demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains.
△ Less
Submitted 30 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Network Anatomy and Real-Time Measurement of Nvidia GeForce NOW Cloud Gaming
Authors:
Minzhao Lyu,
Sharat Chandra Madanapalli,
Arun Vishwanath,
Vijay Sivaraman
Abstract:
Cloud gaming, wherein game graphics is rendered in the cloud and streamed back to the user as real-time video, expands the gaming market to billions of users who do not have gaming consoles or high-power graphics PCs. Companies like Nvidia, Amazon, Sony and Microsoft are investing in building cloud gaming platforms to tap this large unserved market. However, cloud gaming requires the user to have…
▽ More
Cloud gaming, wherein game graphics is rendered in the cloud and streamed back to the user as real-time video, expands the gaming market to billions of users who do not have gaming consoles or high-power graphics PCs. Companies like Nvidia, Amazon, Sony and Microsoft are investing in building cloud gaming platforms to tap this large unserved market. However, cloud gaming requires the user to have high bandwidth and stable network connectivity - whereas a typical console game needs about 100-200 kbps, a cloud game demands minimum 10-20 Mbps. This makes the Internet Service Provider (ISP) a key player in ensuring the end-user's good gaming experience. In this paper we develop a method to detect Nvidia's GeForce NOW cloud gaming sessions over their network infrastructure, and measure associated user experience. In particular, we envision ISPs taking advantage of our method to provision network capacity at the right time and in the right place to support growth in cloud gaming at the right experience level; as well as identify the role of contextual factors such as user setup (browser vs app) and connectivity type (wired vs wireless) in performance degradation. We first present a detailed anatomy of flow establishment and volumetric profiles of cloud gaming sessions over multiple platforms, followed by a method to detect gameplay and measure key experience aspects such as latency, frame rate and resolution via real-time analysis of network traffic. The insights and methods are also validated in the lab for XBox Cloud Gaming platform. We then implement and deploy our method in a campus network to capture gameplay behaviors and experience measures across various user setups and connectivity types which we believe are valuable for network operators.
△ Less
Submitted 13 February, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Why Change Your Controller When You Can Change Your Planner: Drag-Aware Trajectory Generation for Quadrotor Systems
Authors:
Hanli Zhang,
Anusha Srikanthan,
Spencer Folk,
Vijay Kumar,
Nikolai Matni
Abstract:
Motivated by the increasing use of quadrotors for payload delivery, we consider a joint trajectory generation and feedback control design problem for a quadrotor experiencing aerodynamic wrenches. Unmodeled aerodynamic drag forces from carried payloads can lead to catastrophic outcomes. Prior work model aerodynamic effects as residual dynamics or external disturbances in the control problem leadin…
▽ More
Motivated by the increasing use of quadrotors for payload delivery, we consider a joint trajectory generation and feedback control design problem for a quadrotor experiencing aerodynamic wrenches. Unmodeled aerodynamic drag forces from carried payloads can lead to catastrophic outcomes. Prior work model aerodynamic effects as residual dynamics or external disturbances in the control problem leading to a reactive policy that could be catastrophic. Moreover, redesigning controllers and tuning control gains on hardware platforms is a laborious effort. In this paper, we argue that adapting the trajectory generation component kee** the controller fixed can improve trajectory tracking for quadrotor systems experiencing drag forces. To achieve this, we formulate a drag-aware planning problem by applying a suitable relaxation to an optimal quadrotor control problem, introducing a tracking cost function which measures the ability of a controller to follow a reference trajectory. This tracking cost function acts as a regularizer in trajectory generation and is learned from data obtained from simulation. Our experiments in both simulation and on the Crazyflie hardware platform show that changing the planner reduces tracking error by as much as 83%. Evaluation on hardware demonstrates that our planned path, as opposed to a baseline, avoids controller saturation and catastrophic outcomes during aggressive maneuvers.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control
Authors:
Saurav Agarwal,
Ramya Muthukrishnan,
Walker Gosrich,
Vijay Kumar,
Alejandro Ribeiro
Abstract:
Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolution neural network (CNN…
▽ More
Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolution neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multi-layer perceptron (MLP) computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models -- trained using imitation learning -- outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior.
△ Less
Submitted 8 February, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
A novel framework for generalization of deep hidden physics models
Authors:
Vijay Kag,
Birupaksha Pal
Abstract:
Modelling of systems where the full system information is unknown is an oft encountered problem for various engineering and industrial applications, as it's either impossible to consider all the complex physics involved or simpler models are considered to keep within the limits of the available resources. Recent advances in greybox modelling like the deep hidden physics models address this space b…
▽ More
Modelling of systems where the full system information is unknown is an oft encountered problem for various engineering and industrial applications, as it's either impossible to consider all the complex physics involved or simpler models are considered to keep within the limits of the available resources. Recent advances in greybox modelling like the deep hidden physics models address this space by combining data and physics. However, for most real-life applications, model generalizability is a key issue, as retraining a model for every small change in system inputs and parameters or modification in domain configuration can render the model economically unviable. In this work we present a novel enhancement to the idea of hidden physics models which can generalize for changes in system inputs, parameters and domains. We also show that this approach holds promise in system discovery as well and helps learn the hidden physics for the changed system inputs, parameters and domain configuration.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models
Authors:
Nigel Doering,
Cyril Gorlla,
Trevor Tuttle,
Adhvaith Vijay
Abstract:
Fine-tuning large pre-trained language models for downstream tasks remains a critical challenge in natural language processing. This paper presents an empirical analysis comparing two efficient fine-tuning methods - BitFit and adapter modules - to standard full model fine-tuning. Experiments conducted on GLUE benchmark datasets (MRPC, COLA, STS-B) reveal several key insights. The BitFit approach,…
▽ More
Fine-tuning large pre-trained language models for downstream tasks remains a critical challenge in natural language processing. This paper presents an empirical analysis comparing two efficient fine-tuning methods - BitFit and adapter modules - to standard full model fine-tuning. Experiments conducted on GLUE benchmark datasets (MRPC, COLA, STS-B) reveal several key insights. The BitFit approach, which trains only bias terms and task heads, matches full fine-tuning performance across varying amounts of training data and time constraints. It demonstrates remarkable stability even with only 30\% of data, outperforming full fine-tuning at intermediate data levels. Adapter modules exhibit high variability, with inconsistent gains over default models. The findings indicate BitFit offers an attractive balance between performance and parameter efficiency. Our work provides valuable perspectives on model tuning, emphasizing robustness and highlighting BitFit as a promising alternative for resource-constrained or streaming task settings. The analysis offers actionable guidelines for efficient adaptation of large pre-trained models, while illustrating open challenges in stabilizing techniques like adapter modules.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series
Authors:
Vijay Ekambaram,
Arindam Jati,
Pankaj Dayama,
Sumanta Mukherjee,
Nam H. Nguyen,
Wesley M. Gifford,
Chandra Reddy,
Jayant Kalagnanam
Abstract:
Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on develo** pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot f…
▽ More
Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on develo** pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40\%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. Model weights for our initial variant (TTM-Q) are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-v1. Model weights for more sophisticated variants (TTM-B, TTM-E, and TTM-A) will be shared soon. The source code for TTM can be accessed at https://github.com/ibm-granite/granite-tsfm/tree/main/tsfm_public/models/tinytimemixer.
△ Less
Submitted 5 June, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Analysis and Validation of Image Search Engines in Histopathology
Authors:
Isaiah Lahr,
Saghir Alfasly,
Peyman Nejat,
Jibran Khan,
Luke Kottom,
Vaishnavi Kumbhar,
Areej Alsaafin,
Abubakr Shafique,
Sobhan Hemati,
Ghazal Alabtah,
Nneka Comfere,
Dennis Murphee,
Aaron Mangold,
Saba Yasir,
Chady Meroueh,
Lisa Boardman,
Vijay H. Shah,
Joaquin J. Garcia,
H. R. Tizhoosh
Abstract:
Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient ma…
▽ More
Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient matching. In this paper, we report extensive analysis and validation of four search methods bag of visual words (BoVW), Yottixel, SISH, RetCCL, and some of their potential variants. We analyze their algorithms and structures and assess their performance. For this evaluation, we utilized four internal datasets ($1269$ patients) and three public datasets ($1207$ patients), totaling more than $200,000$ patches from $38$ different classes/subtypes across five primary sites. Certain search engines, for example, BoVW, exhibit notable efficiency and speed but suffer from low accuracy. Conversely, search engines like Yottixel demonstrate efficiency and speed, providing moderately accurate results. Recent proposals, including SISH, display inefficiency and yield inconsistent outcomes, while alternatives like RetCCL prove inadequate in both accuracy and efficiency. Further research is imperative to address the dual aspects of accuracy and minimal storage requirements in histopathological image search.
△ Less
Submitted 8 June, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Towards Model-Free LQR Control over Rate-Limited Channels
Authors:
Aritra Mitra,
Lintao Ye,
Vijay Gupta
Abstract:
Given the success of model-free methods for control design in many problem settings, it is natural to ask how things will change if realistic communication channels are utilized for the transmission of gradients or policies. While the resulting problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed t…
▽ More
Given the success of model-free methods for control design in many problem settings, it is natural to ask how things will change if realistic communication channels are utilized for the transmission of gradients or policies. While the resulting problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed that the model of the system is known. As a step towards bridging the fields of model-free control design and networked control systems, we ask: \textit{Is it possible to solve basic control problems - such as the linear quadratic regulator (LQR) problem - in a model-free manner over a rate-limited channel?} Toward answering this question, we study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially fast convergence to the globally optimal policy, with \textit{no deterioration of the exponent relative to the unquantized setting}. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Facebook Report on Privacy of fNIRS data
Authors:
Md Imran Hossen,
Sai Venkatesh Chilukoti,
Liqun Shan,
Vijay Srinivas Tida,
Xiali Hei
Abstract:
The primary goal of this project is to develop privacy-preserving machine learning model training techniques for fNIRS data. This project will build a local model in a centralized setting with both differential privacy (DP) and certified robustness. It will also explore collaborative federated learning to train a shared model between multiple clients without sharing local fNIRS datasets. To preven…
▽ More
The primary goal of this project is to develop privacy-preserving machine learning model training techniques for fNIRS data. This project will build a local model in a centralized setting with both differential privacy (DP) and certified robustness. It will also explore collaborative federated learning to train a shared model between multiple clients without sharing local fNIRS datasets. To prevent unintentional private information leakage of such clients' private datasets, we will also implement DP in the federated learning setting.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
Authors:
S P Sharan,
Francesco Pittaluga,
Vijay Kumar B G,
Manmohan Chandraker
Abstract:
Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that requir…
▽ More
Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that require complex driving maneuvers. To address these limitations, we investigate the possibility of leveraging the common-sense reasoning capabilities of Large Language Models (LLMs) such as GPT4 and Llama2 to generate plans for self-driving vehicles. In particular, we develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Guided by commonsense reasoning abilities of LLMs, our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach. Through extensive evaluation on the nuPlan benchmark, we achieve state-of-the-art performance, outperforming all existing pure learning- and rule-based methods across most metrics. Our code will be available at https://llmassist.github.io.
△ Less
Submitted 29 December, 2023;
originally announced January 2024.
-
Generating Enhanced Negatives for Training Language-Based Object Detectors
Authors:
Shiyu Zhao,
Long Zhao,
Vijay Kumar B. G,
Yumin Suh,
Dimitris N. Metaxas,
Manmohan Chandraker,
Samuel Schulter
Abstract:
The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make…
▽ More
The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make the space of negatives extremely large. Prior works randomly sample negatives or use rule-based techniques to build them. In contrast, we propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data. Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images. Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks. Code is available at \url{https://github.com/xiaofeng94/Gen-Enhanced-Negs}.
△ Less
Submitted 12 April, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Physics-informed neural network for modeling dynamic linear elasticity
Authors:
Vijay Kag,
Venkatesh Gopinath
Abstract:
In this work, we present the physics-informed neural network (PINN) model applied particularly to dynamic problems in solid mechanics. We focus on forward and inverse problems. Particularly, we show how a PINN model can be used efficiently for material identification in a dynamic setting. In this work, we assume linear continuum elasticity. We show results for two-dimensional (2D) plane strain pro…
▽ More
In this work, we present the physics-informed neural network (PINN) model applied particularly to dynamic problems in solid mechanics. We focus on forward and inverse problems. Particularly, we show how a PINN model can be used efficiently for material identification in a dynamic setting. In this work, we assume linear continuum elasticity. We show results for two-dimensional (2D) plane strain problem and then we proceed to apply the same techniques for a three-dimensional (3D) problem. As for the training data we use the solution based on the finite element method. We rigorously show that PINN models are accurate, robust and computationally efficient, especially as a surrogate model for material identification problems. Also, we employ state-of-the-art techniques from the PINN literature which are an improvement to the vanilla implementation of PINN. Based on our results, we believe that the framework we have developed can be readily adapted to computational platforms for solving multiple dynamic problems in solid mechanics.
△ Less
Submitted 4 January, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Navigating the Concurrency Landscape: A Survey of Race Condition Vulnerability Detectors
Authors:
Aishwarya Upadhyay,
Vijay Laxmi,
Smita Naval
Abstract:
As technology continues to advance and we usher in the era of Industry 5.0, there has been a profound paradigm shift in operating systems, file systems, web, and network applications. The conventional utilization of multiprocessing and multicore systems has made concurrent programming increasingly pervasive. However, this transformation has brought about a new set of issues known as concurrency bu…
▽ More
As technology continues to advance and we usher in the era of Industry 5.0, there has been a profound paradigm shift in operating systems, file systems, web, and network applications. The conventional utilization of multiprocessing and multicore systems has made concurrent programming increasingly pervasive. However, this transformation has brought about a new set of issues known as concurrency bugs, which, due to their wide prevalence in concurrent programs, have led to severe failures and potential security exploits. Over the past two decades, numerous researchers have dedicated their efforts to unveiling, detecting, mitigating, and preventing these bugs, with the last decade witnessing a surge in research within this domain. Among the spectrum of concurrency bugs, data races or race condition vulnerabilities stand out as the most prevalent, accounting for a staggering 80\% of all concurrency bugs. This survey paper is focused on the realm of race condition bug detectors. We systematically categorize these detectors based on the diverse methodologies they employ. Additionally, we delve into the techniques and algorithms associated with race detection, tracing the evolution of this field over time. Furthermore, we shed light on the application of fuzzing techniques in the detection of race condition vulnerabilities. By reviewing these detectors and their static analyses, we draw conclusions and outline potential future research directions, including enhancing accuracy, performance, applicability, and comprehensiveness in race condition vulnerability detection.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
LingoQA: Video Question Answering for Autonomous Driving
Authors:
Ana-Maria Marcu,
Long Chen,
Jan Hünermann,
Alice Karnsund,
Benoit Hanotte,
Prajwal Chidananda,
Saurabh Nair,
Vijay Badrinarayanan,
Alex Kendall,
Jamie Shotton,
Elahe Arani,
Oleg Sinavski
Abstract:
Autonomous driving has long faced a challenge with public acceptance due to the lack of explainability in the decision-making process. Video question-answering (QA) in natural language provides the opportunity for bridging this gap. Nonetheless, evaluating the performance of Video QA models has proved particularly tough due to the absence of comprehensive benchmarks. To fill this gap, we introduce…
▽ More
Autonomous driving has long faced a challenge with public acceptance due to the lack of explainability in the decision-making process. Video question-answering (QA) in natural language provides the opportunity for bridging this gap. Nonetheless, evaluating the performance of Video QA models has proved particularly tough due to the absence of comprehensive benchmarks. To fill this gap, we introduce LingoQA, a benchmark specifically for autonomous driving Video QA. The LingoQA trainable metric demonstrates a 0.95 Spearman correlation coefficient with human evaluations. We introduce a Video QA dataset of central London consisting of 419k samples that we release with the paper. We establish a baseline vision-language model and run extensive ablation studies to understand its performance.
△ Less
Submitted 19 March, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Automated Clinical Coding for Outpatient Departments
Authors:
Viktor Schlegel,
Abhinav Ramesh Kashyap,
Thanh-Tung Nguyen,
Tsung-Han Yang,
Vijay Prakash Dwivedi,
Wei-Hsian Yin,
Jeng Wei,
Stefan Winkler
Abstract:
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they pr…
▽ More
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.
△ Less
Submitted 24 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Graph Transformers for Large Graphs
Authors:
Vijay Prakash Dwivedi,
Yozen Liu,
Anh Tuan Luu,
Xavier Bresson,
Neil Shah,
Tong Zhao
Abstract:
Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the sc…
▽ More
Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for develo** scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
A Survey of Side-Channel Attacks in Context of Cache -- Taxonomies, Analysis and Mitigation
Authors:
Ankit Pulkit,
Smita Naval,
Vijay Laxmi
Abstract:
Side-channel attacks have become prominent attack surfaces in cyberspace. Attackers use the side information generated by the system while performing a task. Among the various side-channel attacks, cache side-channel attacks are leading as there has been an enormous growth in cache memory size in last decade, especially Last Level Cache (LLC). The adversary infers the information from the observab…
▽ More
Side-channel attacks have become prominent attack surfaces in cyberspace. Attackers use the side information generated by the system while performing a task. Among the various side-channel attacks, cache side-channel attacks are leading as there has been an enormous growth in cache memory size in last decade, especially Last Level Cache (LLC). The adversary infers the information from the observable behavior of shared cache memory. This paper covers the detailed study of cache side-channel attacks and compares different microarchitectures in the context of side-channel attacks. Our main contributions are: (1) We have summarized the fundamentals and essentials of side-channel attacks and various attack surfaces (taxonomies). We also discussed different exploitation techniques, highlighting their capabilities and limitations. (2) We discussed cache side-channel attacks and analyzed the existing literature on cache side-channel attacks on various parameters like microarchitectures, cross-core exploitation, methodology, target, etc. (3) We discussed the detailed analysis of the existing mitigation strategies to prevent cache side-channel attacks. The analysis includes hardware- and software-based countermeasures, examining their strengths and weaknesses. We also discussed the challenges and trade-offs associated with mitigation strategies. This survey is supposed to provide a deeper understanding of the threats posed by these attacks to the research community with valuable insights into effective defense mechanisms.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Indecomposable integrally closed modules of rank 3 over two-dimensional regular local rings
Authors:
Futoshi Hayasaka,
Vijay Kodiyalam
Abstract:
We characterise ideals in two-dimensional regular local rings that arise as ideals of maximal minors of indecomposable integrally closed modules of rank three.
We characterise ideals in two-dimensional regular local rings that arise as ideals of maximal minors of indecomposable integrally closed modules of rank three.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Nonadiabatic transitions during a passage near a critical point
Authors:
Nikolai A. Sinitsyn,
Vijay Ganesh Sadhasivam,
Fumika Suzuki
Abstract:
The passage through a critical point of a many-body quantum system leads to abundant nonadiabatic excitations. Here, we explore a regime, in which the critical point is not crossed although the system is passing slowly very close to it. We show that the leading exponent for the excitation probability then can be obtained by standard arguments of the Dykhne formula but the exponential prefactor is…
▽ More
The passage through a critical point of a many-body quantum system leads to abundant nonadiabatic excitations. Here, we explore a regime, in which the critical point is not crossed although the system is passing slowly very close to it. We show that the leading exponent for the excitation probability then can be obtained by standard arguments of the Dykhne formula but the exponential prefactor is no longer simple, and behaves as a power law on the characteristic transition rate. We derive this prefactor for the nonlinear Landau-Zener (nLZ) model by adjusting the Dykhne's approach. Then, we introduce an exactly solvable model of the transition near a critical point in the Stark ladder. We derive the number of the excitations for it without approximations, and find qualitatively similar results for the excitation scaling.
△ Less
Submitted 16 February, 2024; v1 submitted 17 December, 2023;
originally announced December 2023.
-
A Comparative Analysis of Large Language Models for Code Documentation Generation
Authors:
Shubhang Shekhar Dvivedi,
Vyshnav Vijay,
Sai Leela Rahul Pujari,
Shoumik Lodh,
Dhruv Kumar
Abstract:
This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for generation of code documentation. Code documentation is an essential part of the software writing process. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and Starchat on various parameters like Accuracy, Completeness, Relevance, Understandability, Readability and Time Taken for different…
▽ More
This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for generation of code documentation. Code documentation is an essential part of the software writing process. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and Starchat on various parameters like Accuracy, Completeness, Relevance, Understandability, Readability and Time Taken for different levels of code documentation. Our evaluation employs a checklist-based system to minimize subjectivity, providing a more objective assessment. We find that, barring Starchat, all LLMs consistently outperform the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and Bard exhibit superior performance across various parameters compared to open-source/source-available LLMs, namely LLama 2 and StarChat. Considering the time taken for generation, GPT-4 demonstrated the longest duration, followed by Llama2, Bard, with ChatGPT and Starchat having comparable generation times. Additionally, file level documentation had a considerably worse performance across all parameters (except for time taken) as compared to inline and function level documentation.
△ Less
Submitted 27 April, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Black Hole Spectroscopy for Precessing Binary Black Hole Coalescences
Authors:
Hengrui Zhu,
Harrison Siegel,
Keefe Mitman,
Maximiliano Isi,
Will M. Farr,
Michael Boyle,
Nils Deppe,
Lawrence E. Kidder,
Sizheng Ma,
Jordan Moxon,
Kyle C. Nelli,
Harald P. Pfeiffer,
Mark A. Scheel,
Saul A. Teukolsky,
William Throwe,
Vijay Varma,
Nils L. Vu
Abstract:
To accurately perform black hole spectroscopy, it is essential to know which quasinormal modes dominate astrophysical ringdown signals. In this Letter, we present a phenomenological description of the quasinormal modes that are excited in the ringdowns of comparable mass, quasi-circular precessing binary black hole coalescences. By analyzing an exhaustive catalog of numerical relativity simulation…
▽ More
To accurately perform black hole spectroscopy, it is essential to know which quasinormal modes dominate astrophysical ringdown signals. In this Letter, we present a phenomenological description of the quasinormal modes that are excited in the ringdowns of comparable mass, quasi-circular precessing binary black hole coalescences. By analyzing an exhaustive catalog of numerical relativity simulations, we confirm that the relative fundamental quasinormal mode amplitudes of precessing systems are related to those of non-precessing systems by a simple rotation, and that additional structure in the spectrum is connected to the system's kick velocity and other asymmetries in the orbital dynamics. We find that the ringdowns of precessing systems need not be dominated by the ${(\ell,m)=(2,\pm 2)}$ quasinormal modes. These results build upon previous works on waveform modeling, and are consistent with a recent ringdown analysis of the LIGO-Virgo gravitational wave signal GW190521.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
The entropy of finite gravitating regions
Authors:
Vijay Balasubramanian,
Charlie Cummings
Abstract:
We develop a formalism for calculating the entanglement entropy of an arbitrary spatial region of a gravitating spacetime at a moment of time symmetry. The crucial ingredient is a path integral over embeddings of the region into the overall spacetime, interpretable as a sum over the edge modes associated with the region. We find that the entanglement entropy of a gravitating region equals the mini…
▽ More
We develop a formalism for calculating the entanglement entropy of an arbitrary spatial region of a gravitating spacetime at a moment of time symmetry. The crucial ingredient is a path integral over embeddings of the region into the overall spacetime, interpretable as a sum over the edge modes associated with the region. We find that the entanglement entropy of a gravitating region equals the minimal surface area among all regions that enclose it. This suggests a notion of "terrestrial holography" where regions of space can encode larger ones, in contrast to the standard form of holography, in which degrees of freedom on the celestial sphere at the boundary of the universe encode the interior.
△ Less
Submitted 11 January, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Shape-dependent motility of polar inclusions in active baths
Authors:
Pritha Dolai,
Aditya Singh Rajput,
K. Vijay Kumar
Abstract:
Collections of persistently moving active particles are an example of a nonequilibrium heat bath. One way to study the nature of nonequilibrium fluctuations in such systems is to follow the dynamics of an embedded probe particle. With this aim, we study the dynamics of an anisotropic inclusion embedded in a bath of active particles. By studying various statistical correlation functions of the dyna…
▽ More
Collections of persistently moving active particles are an example of a nonequilibrium heat bath. One way to study the nature of nonequilibrium fluctuations in such systems is to follow the dynamics of an embedded probe particle. With this aim, we study the dynamics of an anisotropic inclusion embedded in a bath of active particles. By studying various statistical correlation functions of the dynamics, we show that the emergent motility of this inclusion depends on its shape as well as the properties of the active bath. We demonstrate that both the decorrelation time of the net force on the inclusion and the dwell time of bath particles in a geometrical trap on the inclusion have a non-monotonic dependence on its shape. We also find that the motility of the inclusion is optimal when the volume fraction of the active bath is close to the value for the onset of motility induced phase separation.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Geometric phases in generalized radical Floquet dynamics
Authors:
Brenden Roberts,
Sagar Vijay,
Arpit Dua
Abstract:
The Pancharatnam phase is a generalization of the Berry phase that applies to discrete sequences of quantum states. Here, we show that the Pancharatnam phase is a natural invariant for a wide class of quantum many-body dynamics involving measurements. We specifically investigate how a non-trivial Pancharatnam phase arises in the trajectories of Floquet quantum error-correcting codes and show that…
▽ More
The Pancharatnam phase is a generalization of the Berry phase that applies to discrete sequences of quantum states. Here, we show that the Pancharatnam phase is a natural invariant for a wide class of quantum many-body dynamics involving measurements. We specifically investigate how a non-trivial Pancharatnam phase arises in the trajectories of Floquet quantum error-correcting codes and show that this phase can be extracted in a "computationally-assisted" interferometry protocol, involving additional post-processing based on the measurement record that defines a given quantum many-body trajectory. This Pancharatnam phase can also be directly related to the Berry phase accrued by continuous unitary evolution within a gapped phase. For the $\mathbb Z_2$ Floquet code of Hastings and Haah, we show that the associated family of unitary evolutions is the radical chiral Floquet phase. We demonstrate this correspondence explicitly by studying an exactly-solvable model of interacting spins.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Quantum chaos, integrability, and late times in the Krylov basis
Authors:
Vijay Balasubramanian,
Javier M. Magan,
Qingyue Wu
Abstract:
Quantum chaotic systems are conjectured to display a spectrum whose fine-grained features (gaps and correlations) are well described by Random Matrix Theory (RMT). We propose and develop a complementary version of this conjecture: quantum chaotic systems display a Lanczos spectrum whose local means and covariances are well described by RMT. To support this proposal, we first demonstrate its validi…
▽ More
Quantum chaotic systems are conjectured to display a spectrum whose fine-grained features (gaps and correlations) are well described by Random Matrix Theory (RMT). We propose and develop a complementary version of this conjecture: quantum chaotic systems display a Lanczos spectrum whose local means and covariances are well described by RMT. To support this proposal, we first demonstrate its validity in examples of chaotic and integrable systems. We then show that for Haar-random initial states in RMTs the mean and covariance of the Lanczos spectrum suffices to produce the full long time behavior of general survival probabilities including the spectral form factor, as well as the spread complexity. In addition, for initial states with continuous overlap with energy eigenstates, we analytically find the long time averages of the probabilities of Krylov basis elements in terms of the mean Lanczos spectrum. This analysis suggests a notion of eigenstate complexity, the statistics of which differentiate integrable systems and classes of quantum chaos. Finally, we clarify the relation between spread complexity and the universality classes of RMT by exploring various values of the Dyson index and Poisson distributed spectra.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clip** Threshold and Noise Multiplier Estimation
Authors:
Sai Venkatesh Chilukoti,
Md Imran Hossen,
Liqun Shan,
Vijay Srinivas Tida,
Xiai Hei
Abstract:
DP-SGD has emerged as a popular method to protect personally identifiable information in deep learning applications. Unfortunately, DP-SGD's per-sample gradient clip** and uniform noise addition during training can significantly degrade model utility. To enhance the model's utility, researchers proposed various adaptive DP-SGD methods. However, we examine and discover that these techniques resul…
▽ More
DP-SGD has emerged as a popular method to protect personally identifiable information in deep learning applications. Unfortunately, DP-SGD's per-sample gradient clip** and uniform noise addition during training can significantly degrade model utility. To enhance the model's utility, researchers proposed various adaptive DP-SGD methods. However, we examine and discover that these techniques result in greater privacy leakage or lower accuracy than the traditional DP-SGD method, or a lack of evaluation on a complex data set such as CIFAR100. To address these limitations, we propose an Auto DP-SGD. Our method automates clip** threshold estimation based on the DL model's gradient norm and scales the gradients of each training sample without losing gradient information. This helps to improve the algorithm's utility while using a less privacy budget. To further improve accuracy, we introduce automatic noise multiplier decay mechanisms to decrease the noise multiplier after every epoch. Finally, we develop closed-form mathematical expressions using tCDP accountant for automatic noise multiplier and automatic clip** threshold estimation. Through extensive experimentation, we demonstrate that Auto DP-SGD outperforms existing SOTA DP-SGD methods in privacy and accuracy on various benchmark datasets. We also show that privacy can be improved by lowering the scale factor and using learning rate schedulers without significantly reducing accuracy. Specifically, Auto DP-SGD, when used with a step noise multiplier, improves accuracy by 3.20, 1.57, 6.73, and 1.42 for the MNIST, CIFAR10, CIFAR100, and AG News Corpus datasets, respectively. Furthermore, it obtains a substantial reduction in the privacy budget of 94.9, 79.16, 67.36, and 53.37 for the corresponding data sets.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Authors:
Kristen Grauman,
Andrew Westbury,
Lorenzo Torresani,
Kris Kitani,
Jitendra Malik,
Triantafyllos Afouras,
Kumar Ashutosh,
Vijay Baiyya,
Siddhant Bansal,
Bikram Boote,
Eugene Byrne,
Zach Chavis,
Joya Chen,
Feng Cheng,
Fu-Jen Chu,
Sean Crane,
Avijit Dasgupta,
**g Dong,
Maria Escobar,
Cristhian Forigua,
Abrham Gebreselasie,
Sanjay Haresh,
**g Huang,
Md Mohaiminul Islam,
Suyog Jain
, et al. (76 additional authors not shown)
Abstract:
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from…
▽ More
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/
△ Less
Submitted 29 April, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Is stochastic thermodynamics the key to understanding the energy costs of computation?
Authors:
David Wolpert,
Jan Korbel,
Christopher Lynn,
Farita Tasnim,
Joshua Grochow,
Gülce Kardeş,
James Aimone,
Vijay Balasubramanian,
Eric de Giuli,
David Doty,
Nahuel Freitas,
Matteo Marsili,
Thomas E. Ouldridge,
Andrea Richa,
Paul Riechers,
Édgar Roldán,
Brenda Rubenstein,
Zoltan Toroczkai,
Joseph Paradiso
Abstract:
The relationship between the thermodynamic and computational characteristics of dynamical physical systems has been a major theoretical interest since at least the 19th century, and has been of increasing practical importance as the energetic cost of digital devices has exploded over the last half century. One of the most important thermodynamic features of real-world computers is that they operat…
▽ More
The relationship between the thermodynamic and computational characteristics of dynamical physical systems has been a major theoretical interest since at least the 19th century, and has been of increasing practical importance as the energetic cost of digital devices has exploded over the last half century. One of the most important thermodynamic features of real-world computers is that they operate very far from thermal equilibrium, in finite time, with many quickly (co-)evolving degrees of freedom. Such computers also must almost always obey multiple physical constraints on how they work. For example, all modern digital computers are periodic processes, governed by a global clock. Another example is that many computers are modular, hierarchical systems, with strong restrictions on the connectivity of their subsystems. This properties hold both for naturally occurring computers, like brains or Eukaryotic cells, as well as digital systems. These features of real-world computers are absent in 20th century analyses of the thermodynamics of computational processes, which focused on quasi-statically slow processes. However, the field of stochastic thermodynamics has been developed in the last few decades - and it provides the formal tools for analyzing systems that have exactly these features of real-world computers. We argue here that these tools, together with other tools currently being developed in stochastic thermodynamics, may help us understand at a far deeper level just how the fundamental physical properties of dynamic systems are related to the computation that they perform.
△ Less
Submitted 30 November, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Perspective on new implementations of atomtronic circuits
Authors:
Juan Polo,
Wayne J. Chetcuti,
Enrico C. Domanti,
Philip Kitson,
Andreas Osterloh,
Francesco Perciavalle,
Vijay Pal Singh,
Luigi Amico
Abstract:
In this article, we provide perspectives for atomtronics circuits on quantum technology platforms beyond simple bosonic or fermionic cold atom matter-wave currents. Specifically, we consider (i) matter-wave schemes with multi-component quantum fluids; (ii) networks of Rydberg atoms that provide a radically new concept of atomtronics circuits in which the flow, rather than in terms of matter, occur…
▽ More
In this article, we provide perspectives for atomtronics circuits on quantum technology platforms beyond simple bosonic or fermionic cold atom matter-wave currents. Specifically, we consider (i) matter-wave schemes with multi-component quantum fluids; (ii) networks of Rydberg atoms that provide a radically new concept of atomtronics circuits in which the flow, rather than in terms of matter, occurs through excitations; (iii) hybrid matter-wave circuits - cavities systems that can be used to study atomtronic circuits beyond the standard solutions and provide new schemes for integrated matter-wave networks. We also sketch how driving these systems can open new pathways for atomtronics.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.