Search | arXiv e-print repository

Thinking beyond Bias: Analyzing Multifaceted Impacts and Implications of AI on Gendered Labour

Authors: Satyam Mohla, Bishnupriya Bagh, Anupam Guha

Abstract: Artificial Intelligence with its multifaceted technologies and integral role in global production significantly impacts gender dynamics particularly in gendered labor. This paper emphasizes the need to explore AIs broader impacts on gendered labor beyond its current emphasis on the generation and perpetuation of epistemic biases. We draw attention to how the AI industry as an integral component of… ▽ More Artificial Intelligence with its multifaceted technologies and integral role in global production significantly impacts gender dynamics particularly in gendered labor. This paper emphasizes the need to explore AIs broader impacts on gendered labor beyond its current emphasis on the generation and perpetuation of epistemic biases. We draw attention to how the AI industry as an integral component of the larger economic structure is transforming the nature of work. It is expanding the prevalence of platform based work models and exacerbating job insecurity particularly for women. Of critical concern is the increasing exclusion of women from meaningful engagement in the digital labor force. This issue often overlooked demands urgent attention from the AI research community. Understanding AIs multifaceted role in gendered labor requires a nuanced examination of economic transformation and its implications for gender equity. By shedding light on these intersections this paper aims to stimulate in depth discussions and catalyze targeted actions aimed at mitigating the gender disparities accentuated by AI driven transformations. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Under review. An unindexed peer-reviewed working draft was accepted for presentation at IJCAI 2021 Workshop on AI for Social Good organized by Harvard CRCS

arXiv:2405.20179 [pdf, other]

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

Authors: Zichao Hu, Junyi Jessy Li, Arjun Guha, Joydeep Biswas

Abstract: Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the pe… ▽ More Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the performance gap with proprietary LLMs? While Self-Instruct is a promising solution by generating a diverse set of training data, it cannot verify the correctness of these programs. In contrast, a robot simulator with a well-defined world can identify execution errors but limits the diversity of programs that it can verify. In this work, we introduce Robo-Instruct, which brings the best of both worlds -- it promotes the diversity of Self-Instruct while providing the correctness of simulator-based checking. Robo-Instruct introduces RoboSim to synthesize a consistent world state on the fly by inferring properties relevant to the program being checked, and simulating actions accordingly. Furthermore, the instructions and programs generated by Self-Instruct may be subtly inconsistent -- such as the program missing a step implied by the instruction. Robo-Instruct further addresses this with InstAlign, an instruction-program alignment procedure that revises the task instruction to reflect the actual results of the generated program. Given a few seed task descriptions and the robot APIs, Robo-Instruct is capable of generating a training dataset using only a small open-weight model. This dataset can then be used to fine-tune small open-weight language models, enabling them to match or even exceed the performance of several proprietary LLMs, such as GPT-3.5-Turbo and Gemini-Pro. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.10009 [pdf, ps, other]

Relating interfacial Rossby wave interaction in shear flows with Feynman's two-state coupled quantum system model for the Josephson junction

Authors: Eyal Heifetz, Nimrod Bratspiess, Anirban Guha, Leo Maas

Abstract: Here we show how Feynman's simplified model for the Josephson junction, as a macroscopic two-state coupled quantum system, has a one-to-one correspondence with the stable dynamics of two interfacial Rossby waves in piecewise linear shear flows. The conservation of electric charge and energy of the superconducting electron gas layers become respectively equivalent to the conservation of wave action… ▽ More Here we show how Feynman's simplified model for the Josephson junction, as a macroscopic two-state coupled quantum system, has a one-to-one correspondence with the stable dynamics of two interfacial Rossby waves in piecewise linear shear flows. The conservation of electric charge and energy of the superconducting electron gas layers become respectively equivalent to the conservation of wave action and pseudoenergy of the Rossby waves. Quantum-like tunneling is enabled via action-at-a-distance between the two Rossby waves. Furthermore, the quantum-like phenomena of avoided crossing between eigenstates, described by the Klein-Gordon equation, is obtained as well in the classical shear flow system. In the latter, it results from the inherent difference in pseudoenergy between the in-phase and anti-phased normal modes of the interfacial waves. This provides an intuitive physical meaning to the role of the wavefunction's phase in the quantum system. A partial analog to the quantum collapse of the wavefunction is also obtained due to the existence of a separatrix between "normal mode regions of influence" on the phase plane, describing the system's dynamics. As for two-state quantum bits (qubits), the two-Rossby wave system solutions can be represented on a Bloch sphere, where the Hadamard gate transforms the two normal modes/eigenstates into an intuitive computational basis in which only one interface is occupied by a Rossby wave. Yet, it is a classical system which lacks exact analogs to collapse and entanglement, thus cannot be used for quantum computation, even in principle. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.01903 [pdf, other]

Activation Steering for Robust Type Prediction in CodeLLMs

Authors: Francesca Lucchetti, Arjun Guha

Abstract: Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our me… ▽ More Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our methodology relies on activation steering, which involves editing internal model activations to steer the model towards the correct prediction. We contribute a novel way to construct steering vectors by taking inspiration from mutation testing, which constructs minimal semantics-breaking code edits. In contrast, we construct steering vectors from semantics-preserving code edits. We apply our approach to the task of type prediction for the gradually typed languages Python and TypeScript. This approach corrects up to 90% of type mispredictions. Finally, we show that steering vectors calculated from Python activations reliably correct type mispredictions in TypeScript, and vice versa. This result suggests that LLMs may be learning to transfer knowledge of types across programming languages. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 16 pages, 7 figures

arXiv:2402.19173 [pdf, other]

StarCoder 2 and The Stack v2: The Next Generation

Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.13795 [pdf, other]

Estimating the dark matter halo velocity and surface temperature of some known pulsars due to dark matter capture

Authors: Debashree Sen, Atanu Guha

Abstract: Considering four known pulsars J1906+0746, J1933-6211, J2043+1711 and the Vela pulsar, we study the scenario of dark matter (DM) capture in neutron stars (NSs). For the purpose we choose four well-known relativistic mean field models to obtain the radius corresponding to the observed mass of these pulsars and consequently the scattering cross-section of DM with the different particles of the $β$ s… ▽ More Considering four known pulsars J1906+0746, J1933-6211, J2043+1711 and the Vela pulsar, we study the scenario of dark matter (DM) capture in neutron stars (NSs). For the purpose we choose four well-known relativistic mean field models to obtain the radius corresponding to the observed mass of these pulsars and consequently the scattering cross-section of DM with the different particles of the $β$ stable NS matter. The estimated DM-electron scattering cross-section in this work is stringent compared to the current direct detection experimental probe. We then compute the lower limit on the halo velocity of DM for the four pulsars from the knowledge of the upper limit on effective temperature of the individual pulsars. We also extend our work to calculate the value of the effective temperature with the different models using the fitted values of the halo velocity of DM of the four pulsars with respect to their distances from the galactic center. Our findings are consistent with the analysis of the observed data. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2401.15232 [pdf, other]

How Beginning Programmers and Code LLMs (Mis)read Each Other

Authors: Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman

Abstract: Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluat… ▽ More Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluating the correctness of generated code, and editing prompts when the generated code is incorrect. This paper presents a large-scale controlled study of how 120 beginning coders across three academic institutions approach writing and editing prompts. A novel experimental design allows us to target specific steps in the text-to-code process and reveals that beginners struggle with writing and editing prompts, even for problems at their skill level and when correctness is automatically determined. Our mixed-methods evaluation provides insight into student processes and perceptions with key implications for non-expert Code LLM use within and outside of education. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: Conditionally Accepted to CHI 2024

arXiv:2401.14419 [pdf, other]

doi 10.1103/PhysRevD.109.043038

Constraining the mass of fermionic dark matter from its feeble interaction with hadronic matter via dark mediators in neutron stars

Authors: Atanu Guha, Debashree Sen

Abstract: Considering ten well-known relativistic mean field models, we invoke feeble interaction between hadronic matter and fermionic dark matter (DM) $χ$ via new physics scalar ($φ$) and vector ($ξ$) mediators in neutron star core, thereby forming DM admixed neutron stars (DMANSs). The chosen masses of the DM fermion ($m_χ$) and the mediators ($m_φ$ and $m_ξ$) are consistent with the self-interaction con… ▽ More Considering ten well-known relativistic mean field models, we invoke feeble interaction between hadronic matter and fermionic dark matter (DM) $χ$ via new physics scalar ($φ$) and vector ($ξ$) mediators in neutron star core, thereby forming DM admixed neutron stars (DMANSs). The chosen masses of the DM fermion ($m_χ$) and the mediators ($m_φ$ and $m_ξ$) are consistent with the self-interaction constraint from Bullet cluster while their respective couplings ($y_φ$ and $y_ξ$) are also constrained by the present day relic abundance. Assuming that both $φ$ and $ξ$ contribute equally to the relic abundance, we compute the equation of state of the DMANSs and consequently their structural properties. We found that for a particular (constant) DM density, the presence of lighter DM results in more massive DMANSs with larger radius. In the light of the various recent constraints like those from the massive pulsar PSR J0740+6620, the gravitational wave (GW170817) data and the results of NICER experiments for PSR J0030+0451 and PSR J0740+6620, we provide a bound on $m_χ$ within the framework of the present work as $m_χ\approx$ (0.1 $-$ 30) GeV for a wide range of fixed DM Fermi momenta $k_F^χ$=(0.01 $-$ 0.06) GeV. In the case of the hadronic models that yield larger radii corresponding to the low mass neutron stars in the no-DM scenario, interaction with comparatively heavier DM fermion is necessary in order to ensure that the DMANSs obtained with such models satisfy the radius constraints from both GW170817 and NICER data for PSR J0030+0451. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted for publication in Phys. Rev. D ; 16 Pages, 10 figures

Journal ref: Phys. Rev. D, Vol. 109, No. 4 (2024)

arXiv:2401.07750 [pdf, other]

Constraints on cosmic-ray boosted dark matter with realistic cross section

Authors: Atanu Guha, Jong-Chul Park

Abstract: Sub-MeV cold dark-matter particles are unable to produce electronic recoil in conventional dark-matter direct detection experiments such as XENONnT and LUX-ZEPLIN above the detector threshold. The mechanism of boosted dark matter comes into picture to constrain the parameter space of such low mass dark matter from direct detection experiments. We consider the effect of the leading components of co… ▽ More Sub-MeV cold dark-matter particles are unable to produce electronic recoil in conventional dark-matter direct detection experiments such as XENONnT and LUX-ZEPLIN above the detector threshold. The mechanism of boosted dark matter comes into picture to constrain the parameter space of such low mass dark matter from direct detection experiments. We consider the effect of the leading components of cosmic rays to boost the cold dark matter, which results in significant improvements on the exclusion limits compared to the existing ones. To present concrete study results, we choose to work on models consisting of a dark-matter particle $χ$ with an additional $U(1)'$ gauge symmetry including the secluded dark photon, $U(1)_{B-L}$, and $U(1)_{L_e-L_μ}$. We find that the energy dependence of the scattering cross section plays a crucial role in improving the constraints. In addition, we systematically estimate the Earth shielding effect on boosted dark matter in losing energy while traveling to the underground detector through the Earth. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 27 pages, 10 figures, 3 appendices

arXiv:2312.12450 [pdf, other]

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Authors: Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha

Abstract: A significant amount of research is focused on develo** and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of c… ▽ More A significant amount of research is focused on develo** and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of code and an instruction to modify the code. The editing instruction may ask for a feature to be added or removed, describe a bug and ask for a fix, or ask for a different kind of solution. We introduce a carefully crafted benchmark of code editing tasks and use it to evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is better than the best open model at code editing tasks. We also introduce a new, carefully curated, permissively licensed training dataset of code editing tasks coupled with natural language instructions. Using this training dataset, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities, closing the gap between open and closed models. All code, data, and models are available at https://github.com/nuprl/CanItEdit. △ Less

Submitted 19 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.11183 [pdf, other]

doi 10.1109/LRA.2024.3360020

Deploying and Evaluating LLMs to Program Service Mobile Robots

Authors: Zichao Hu, Francesca Lucchetti, Claire Schlesinger, Yash Saxena, Anders Freeman, Sadanand Modak, Arjun Guha, Joydeep Biswas

Abstract: Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contr… ▽ More Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs' capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of generated programs by checking execution traces starting with multiple initial states, and checking whether the traces satisfy temporal logic properties that encode correctness for each task. RoboEval also includes multiple prompts per task to test for the robustness of program generation. We evaluate several popular state-of-the-art LLMs with the RoboEval benchmark, and perform a thorough analysis of the modes of failures, resulting in a taxonomy that highlights common pitfalls of LLMs at generating robot programs. We release our code and benchmark at https://amrl.cs.utexas.edu/codebotler/. △ Less

Submitted 21 February, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: 8 pages, Accepted at IEEE Robotics and Automation Letters (RA-L)

Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2853-2860, March 2024

arXiv:2309.14054 [pdf, other]

Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks

Authors: Piyush Tiwary, Atri Guha, Subhodip Panda, Prathosh A. P

Abstract: The increased attention to regulating the outputs of deep generative models, driven by growing concerns about privacy and regulatory compliance, has highlighted the need for effective control over these models. This necessity arises from instances where generative models produce outputs containing undesirable, offensive, or potentially harmful content. To tackle this challenge, the concept of mach… ▽ More The increased attention to regulating the outputs of deep generative models, driven by growing concerns about privacy and regulatory compliance, has highlighted the need for effective control over these models. This necessity arises from instances where generative models produce outputs containing undesirable, offensive, or potentially harmful content. To tackle this challenge, the concept of machine unlearning has emerged, aiming to forget specific learned information or to erase the influence of undesired data subsets from a trained model. The objective of this work is to prevent the generation of outputs containing undesired features from a pre-trained GAN where the underlying training data set is inaccessible. Our approach is inspired by a crucial observation: the parameter space of GANs exhibits meaningful directions that can be leveraged to suppress specific undesired features. However, such directions usually result in the degradation of the quality of generated samples. Our proposed method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples. This method unfolds in two stages: in the initial stage, we adapt the pre-trained GAN using negative samples provided by the user, while in the subsequent stage, we focus on unlearning the undesired feature. During the latter phase, we train the pre-trained GAN using positive samples, incorporating a repulsion regularizer. This regularizer encourages the model's parameters to be away from the parameters associated with the adapted model from the first stage while also maintaining the quality of generated samples. To the best of our knowledge, our approach stands as first method addressing unlearning in GANs. We validate the effectiveness of our method through comprehensive experiments. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 15 pages, 12 figures

arXiv:2308.12545 [pdf, other]

npm-follower: A Complete Dataset Tracking the NPM Ecosystem

Authors: Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell

Abstract: Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of package… ▽ More Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of packages. However, prior work on NPM dataset construction typically has two limitations: 1) only metadata is scraped, and 2) packages or versions that are deleted from NPM can not be scraped. Over 330,000 versions of packages were deleted from NPM between July 2022 and May 2023. This data is critical for researchers as it often pertains to important questions of security and malware. We present npm-follower, a dataset and crawling architecture which archives metadata and code of all packages and versions as they are published, and is thus able to retain data which is later deleted. The dataset currently includes over 35 million versions of packages, and grows at a rate of about 1 million versions per month. The dataset is designed to be easily used by researchers answering questions involving either metadata or program analysis. Both the code and dataset are available at https://dependencies.science. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.09895 [pdf, other]

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Authors: Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha

Abstract: Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript)… ▽ More Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available. Low resource languages include OCaml, Racket, and several others. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate Python code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket. Furthermore, we use an open model (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket. On established benchmarks (MultiPL-E), these models outperform other open Code LLMs. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer. △ Less

Submitted 10 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.08347 [pdf, ps, other]

Continuing WebAssembly with Effect Handlers

Authors: Luna Phipps-Costin, Andreas Rossberg, Arjun Guha, Daan Leijen, Daniel Hillerström, KC Sivaramakrishnan, Matija Pretnar, Sam Lindley

Abstract: WebAssembly (Wasm) is a low-level portable code format offering near native performance. It is intended as a compilation target for a wide variety of source languages. However, Wasm provides no direct support for non-local control flow features such as async/await, generators/iterators, lightweight threads, first-class continuations, etc. This means that compilers for source languages with such fe… ▽ More WebAssembly (Wasm) is a low-level portable code format offering near native performance. It is intended as a compilation target for a wide variety of source languages. However, Wasm provides no direct support for non-local control flow features such as async/await, generators/iterators, lightweight threads, first-class continuations, etc. This means that compilers for source languages with such features must ceremoniously transform whole source programs in order to target Wasm. We present WasmFX, an extension to Wasm which provides a universal target for non-local control features via effect handlers, enabling compilers to translate such features directly into Wasm. Our extension is minimal and only adds three main instructions for creating, suspending, and resuming continuations. Moreover, our primitive instructions are type-safe providing typed continuations which are well-aligned with the design principles of Wasm whose stacks are typed. We present a formal specification of WasmFX and show that the extension is sound. We have implemented WasmFX as an extension to the Wasm reference interpreter and also built a prototype WasmFX extension for Wasmtime, a production-grade Wasm engine, piggybacking on Wasmtime's existing fibers API. The preliminary performance results for our prototype are encouraging, and we outline future plans to realise a native implementation △ Less

Submitted 13 September, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

arXiv:2306.12354 [pdf]

Seat pan angle optimization for vehicle ride comfort using finite element model of human spine

Authors: Raj Desai, Ankit Vekaria, Anirban Guha, P. Seshu

Abstract: Ride comfort of the driver/occupant of a vehicle has been usually analyzed by multibody biodynamic models of human beings. Accurate modeling of critical segments of the human body, e.g. the spine requires these models to have a very high number of segments. The resultant increase in degrees of freedom makes these models difficult to analyze and not able to provide certain details such as seat pres… ▽ More Ride comfort of the driver/occupant of a vehicle has been usually analyzed by multibody biodynamic models of human beings. Accurate modeling of critical segments of the human body, e.g. the spine requires these models to have a very high number of segments. The resultant increase in degrees of freedom makes these models difficult to analyze and not able to provide certain details such as seat pressure distribution, the effect of cushion shapes, material, etc. This work presents a finite element based model of a human being seated in a vehicle in which the spine has been modelled in 3-D. It consists of cervical to coccyx vertebrae, ligaments, and discs and has been validated against modal frequencies reported in the literature. It was then subjected to sinusoidal vertical RMS acceleration of 0.1 g for mimicking road induced vibration. The dynamic characteristics of the human body were studied in terms of the seat to head transmissibility and intervertebral disc pressure. The effect of the seat pan angle on these parameters was studied and it was established that the optimum angle should lie between 15 and 19 degrees. This work is expected to be followed up by more simulations of this nature to study other human body comfort and seat design related parameters leading to optimized seat designs for various ride conditions. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.04556 [pdf, other]

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Authors: Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

Abstract: Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. Stud… ▽ More Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. Our students wrote these prompts while working interactively with a Code LLM, and we observed very mixed success rates. We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. We analyze the prompts and find significant variation in students' prompting techniques. We also find that nondeterministic LLM sampling could mislead students into thinking that their prompts are more (or less) effective than they actually are, which has implications for how to teach with Code LLMs. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.17145 [pdf, other]

Type Prediction With Program Decomposition and Fill-in-the-Type Training

Authors: Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

Abstract: TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not… ▽ More TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file. All code, data, and models are available at: https://github.com/GammaTauAI/opentau. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.06161 [pdf, other]

StarCoder: may the source be with you!

Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license. △ Less

Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.07301 [pdf, other]

Inflation and the late time acceleration from Hossenfelder-Verlinde gravity

Authors: Youngsub Yoon, Atanu Guha

Abstract: We show that Hossenfelder's covariant formulation of Verlinde's emergent gravity predicts inflation and the late-time acceleration at the same time, without assuming a separate field such as inflaton, whose sole purpose is producing inflation. In particular, for the current deceleration parameter $q=-0.95$ to $-0.55$, we obtained $λ^2$, the mass of the imposter field, from $1.85\times 10^4$ to… ▽ More We show that Hossenfelder's covariant formulation of Verlinde's emergent gravity predicts inflation and the late-time acceleration at the same time, without assuming a separate field such as inflaton, whose sole purpose is producing inflation. In particular, for the current deceleration parameter $q=-0.95$ to $-0.55$, we obtained $λ^2$, the mass of the imposter field, from $1.85\times 10^4$ to $2.26\times 10^4$. We also note that the value of $λ$ around $q=-0.93$ coincides with the inverse of fine structure constant. △ Less

Submitted 23 May, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Previous numerical mistakes fixed. Simulations for four different values of q. Connection with the fine structure constant suggested

arXiv:2304.01651 [pdf, other]

Socio-economic landscape of digital transformation & public NLP systems: A critical review

Authors: Satyam Mohla, Anupam Guha

Abstract: The current wave of digital transformation has spurred digitisation reforms and has led to prodigious development of AI & NLP systems, with several of them entering the public domain. There is a perception that these systems have a non trivial impact on society but there is a dearth of literature in critical AI exploring what kinds of systems exist and how do they operate. This paper constructs a… ▽ More The current wave of digital transformation has spurred digitisation reforms and has led to prodigious development of AI & NLP systems, with several of them entering the public domain. There is a perception that these systems have a non trivial impact on society but there is a dearth of literature in critical AI exploring what kinds of systems exist and how do they operate. This paper constructs a broad taxonomy of NLP systems which impact or are impacted by the ``public'' and provides a concrete analyses via various instrumental and normative lenses on the socio-technical nature of these systems. This paper categorises thirty examples of these systems into seven families, namely; finance, customer service, policy making, education, healthcare, law, and security, based on their public use cases. It then critically analyses these applications, first the priors and assumptions they are based on, then their mechanisms, possible methods of data collection, the models and error functions used, etc. This paper further delves into exploring the socio-economic and political contexts in which these families of systems are generally used and their potential impact on the same, and the function creep of these systems. It provides commentary on the potential long-term downstream impact of these systems on communities which use them. Aside from providing a birds eye view of what exists our in depth analysis provides insights on what is lacking in the current discourse on NLP in particular and critical AI in general, proposes additions to the current framework of analysis, provides recommendations future research direction, and highlights the need to importance of exploring the social in this socio-technical system. △ Less

Submitted 27 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: Under review

arXiv:2304.00394 [pdf, other]

A Large Scale Analysis of Semantic Versioning in NPM

Authors: Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell

Abstract: The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a "semantic versioning" ('semver') scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possi… ▽ More The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a "semantic versioning" ('semver') scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-travelling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times. We segment our analysis to allow for a direct analysis of security-relevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers' imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.14789 [pdf, other]

5 wave interactions in internal gravity waves

Authors: Saranraj Gururaj, Anirban Guha

Abstract: We use multiple-scale analysis to study a 5-wave system (5WS) composed of two different internal gravity wave triads. Each of these triads consists of a parent wave and two daughter waves, with one daughter wave common between the two triads. The parent waves are assumed to have the same frequency and wavevector norm co-existing in a region of constant background stratification. Such 5-wave system… ▽ More We use multiple-scale analysis to study a 5-wave system (5WS) composed of two different internal gravity wave triads. Each of these triads consists of a parent wave and two daughter waves, with one daughter wave common between the two triads. The parent waves are assumed to have the same frequency and wavevector norm co-existing in a region of constant background stratification. Such 5-wave systems may emerge in oceans, for example, via tide-topography interactions, generating multiple parent internal waves that overlap. Two 2D cases are considered: Case 1(2) has parent waves with the same horizontal (vertical) wavenumber but with different vertical (horizontal) wavenumber. For both cases, the 5WS is more unstable than triads for $f/ω_1\gtrapprox0.3$, where $ω_1$ and $f$ are the parent wave and the local Coriolis frequency, respectively. For $f/ω_1\gtrapprox0.3$, the common daughter wave's frequency is $\approx ω_1-f $ and $f$ respectively for Cases 1 and 2. For 3D cases, 5WSs become more unstable as the angle ($θ$) between the horizontal wavevectors of the parent waves is decreased. Moreover, for any $θ$, 5WSs have higher growth rates than triads for $f/ω_1\gtrapprox0.3$. Numerical simulations match the theoretical growth rates of 5WSs for a wide range of latitudes, except when $f/ω_1\approx0.5$ (critical latitude). More than three daughter waves are forced by the two parent waves when $f/ω_1\approx0.5$. We formulate a reduced order model which shows that for any $θ$, the maximum growth rate near the critical latitude is approximately twice the maximum growth rate of all triads. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.07464 [pdf, other]

doi 10.1175/JPO-D-23-0247.1

Understanding Stokes drift mechanism via crest and trough phase estimates

Authors: Anirban Guha, Akanksha Gupta

Abstract: By providing mathematical estimates, this paper answers a fundamental question -- "what leads to Stokes drift"? Although overwhelmingly understood for water waves, Stokes drift is a generic mechanism that stems from kinematics and occurs in any non-transverse wave in fluids. To showcase its generality, we undertake a comparative study of the pathline equation of sound (1D) and intermediate-depth w… ▽ More By providing mathematical estimates, this paper answers a fundamental question -- "what leads to Stokes drift"? Although overwhelmingly understood for water waves, Stokes drift is a generic mechanism that stems from kinematics and occurs in any non-transverse wave in fluids. To showcase its generality, we undertake a comparative study of the pathline equation of sound (1D) and intermediate-depth water (2D) waves. Although we obtain a closed-form solution $\mathbf{x}(t)$ for the specific case of linear sound waves, a more generic and meaningful approach involves the application of asymptotic methods and expressing variables in terms of the Lagrangian phase $θ$. We show that the latter reduces the 2D pathline equation of water waves to 1D. Using asymptotic methods, we solve the respective pathline equation for sound and water waves, and for each case, we obtain a parametric representation of particle position $\mathbf{x}(θ)$ and elapsed time $t(θ)$. Such a parametric description has allowed us to obtain second-order-accurate expressions for the time duration, horizontal displacement, and average horizontal velocity of a particle in the crest and trough phases. All these quantities are of higher magnitude in the crest phase in comparison to the trough, leading to a forward drift, i.e. Stokes drift. We also explore particle trajectory due to second-order Stokes waves and compare it with linear waves. While finite amplitude waves modify the estimates obtained from linear waves, the understanding acquired from linear waves is generally found to be valid. △ Less

Submitted 24 February, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

arXiv:2302.12163 [pdf, other]

doi 10.4230/LIPIcs.ECOOP.2023.37

Do Machine Learning Models Produce TypeScript Types That Type Check?

Authors: Ming-Ho Yee, Arjun Guha

Abstract: Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the re… ▽ More Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the research community, there has been significant interest in using machine learning to automate TypeScript type migration. Existing machine learning models report a high degree of accuracy in predicting individual TypeScript type annotations. However, in this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker? We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps that are necessary for using a type prediction model, (1) importing types for a project's dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type predictions when needed. We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that have never been typed before. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully. △ Less

Submitted 11 July, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Published at the 37th European Conference on Object-Oriented Programming (ECOOP 2023)

arXiv:2302.02092 [pdf, other]

Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics

Authors: Jiacheng Zhu, Jielin Qiu, Aritra Guha, Zhuolin Yang, Xuanlong Nguyen, Bo Li, Ding Zhao

Abstract: We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connectin… ▽ More We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on \textit{four} datasets, including CIFAR-100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR-100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesic-based interpolation with a practical off-the-shelf strategy that can be combined with existing robust training methods. △ Less

Submitted 28 August, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 34 pages, 3 figures, 18 tables

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:43129-43157, 2023

arXiv:2301.11496 [pdf, other]

On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances

Authors: Aritra Guha, Nhat Ho, XuanLong Nguyen

Abstract: Dirichlet Process mixture models (DPMM) in combination with Gaussian kernels have been an important modeling tool for numerous data domains arising from biological, physical, and social sciences. However, this versatility in applications does not extend to strong theoretical guarantees for the underlying parameter estimates, for which only a logarithmic rate is achieved. In this work, we (re)intro… ▽ More Dirichlet Process mixture models (DPMM) in combination with Gaussian kernels have been an important modeling tool for numerous data domains arising from biological, physical, and social sciences. However, this versatility in applications does not extend to strong theoretical guarantees for the underlying parameter estimates, for which only a logarithmic rate is achieved. In this work, we (re)introduce and investigate a metric, named Orlicz-Wasserstein distance, in the study of the Bayesian contraction behavior for the parameters. We show that despite the overall slow convergence guarantees for all the parameters, posterior contraction for parameters happens at almost polynomial rates in outlier regions of the parameter space. Our theoretical results provide new insight in understanding the convergence behavior of parameters arising from various settings of hierarchical Bayesian nonparametric models. In addition, we provide an algorithm to compute the metric by leveraging Sinkhorn divergences and validate our findings through a simulation study. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 2 figures

arXiv:2301.03988 [pdf, other]

SantaCoder: don't reach for the stars!

Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode. △ Less

Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2211.04568 [pdf, ps, other]

Towards Algorithmic Fairness in Space-Time: Filling in Black Holes

Authors: Cheryl Flynn, Aritra Guha, Subhabrata Majumdar, Divesh Srivastava, Zhengyi Zhou

Abstract: New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from h… ▽ More New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from historical redlining practices; and studies have found varying quality and coverage in the collection and sharing of open-source geospatial data. Despite the extensive literature on machine learning (ML) fairness, few algorithmic strategies have been proposed to mitigate such biases. In this paper we highlight the unique challenges for quantifying and addressing spatio-temporal biases, through the lens of use cases presented in the scientific literature and media. We envision a roadmap of ML strategies that need to be developed or adapted to quantify and overcome these challenges -- including transfer learning, active learning, and reinforcement learning techniques. Further, we discuss the potential role of ML in providing guidance to policy makers on issues related to spatial fairness. △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2209.15577 [pdf, other]

On knots that divide ribbon knotted surfaces

Authors: Hans U. Boden, Ceyhun Elmacioglu, Anshul Guha, Homayun Karimi, William Rushworth, Yun-chi Tang, Bryan Wang Peng Jun

Abstract: We define a knot to be half ribbon if it is the cross-section of a ribbon 2-knot, and observe that ribbon implies half ribbon implies slice. We introduce the half ribbon genus of a knot K, the minimum genus of a ribbon knotted surface of which K is a cross-section. We compute this genus for all prime knots up to 12 crossings, and many 13-crossing knots. The same approach yields new computations of… ▽ More We define a knot to be half ribbon if it is the cross-section of a ribbon 2-knot, and observe that ribbon implies half ribbon implies slice. We introduce the half ribbon genus of a knot K, the minimum genus of a ribbon knotted surface of which K is a cross-section. We compute this genus for all prime knots up to 12 crossings, and many 13-crossing knots. The same approach yields new computations of the doubly slice genus. We also introduce the half fusion number of a knot K, that measures the complexity of ribbon 2-knots of which K is a cross-section. We show that it is bounded from below by the Levine-Tristram signatures, and differs from the standard fusion number by an arbitrarily large amount. △ Less

Submitted 2 November, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 13 pages, 1 figure, 1 table. Comments welcome. V2: typographical corrections

MSC Class: 57K10; 57K45

arXiv:2209.09021 [pdf, other]

doi 10.1093/mnras/stac2675

Vector dark boson mediated feeble interaction between fermionic dark matter and strange quark matter in quark stars

Authors: Debashree Sen, Atanu Guha

Abstract: We study the structural properties like the gravitational mass, radius and tidal deformability of dark matter (DM) admixed strange quark stars (SQSs). For the purpose we consider the vector MIT Bag model to describe the strange quark matter (SQM) and investigate the possible presence of accreted DM in the SQSs consequently forming DM admixed SQSs. We introduce feeble interaction between SQM and th… ▽ More We study the structural properties like the gravitational mass, radius and tidal deformability of dark matter (DM) admixed strange quark stars (SQSs). For the purpose we consider the vector MIT Bag model to describe the strange quark matter (SQM) and investigate the possible presence of accreted DM in the SQSs consequently forming DM admixed SQSs. We introduce feeble interaction between SQM and the accreted fermionic DM via a vector dark boson mediator. Considering the present literature, in the context of possible presence of DM in SQSs, this work is the first to consider interaction between DM and SQM in the DM admixed SQSs. The mass of the DM fermion ($m_χ$) and the vector mediator ($m_ξ$) and the coupling ($y_ξ$) between them are determined in accordance with the constraint from Bullet cluster and the present day relic abundance, respectively. We find that the presence of DM reduces both the mass and radius of the star compared to the no-DM case. The massive the DM fermion, the lower the values of maximum mass and radius of the DM admixed SQSs. For the chosen values of $m_χ$ and corresponding values of $m_ξ$ and $y_ξ$, the computed structural properties of the DM admixed SQSs satisfy all the various present day astrophysical constraints.We obtain massive DM admixed SQSs configurations consistent with the GW190814 observational data. Hence the secondary compact object associated with this event may be a DM admixed SQS. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Accepted for Publication in Monthly Notices of the Royal Astronomical Society

Journal ref: MNRAS 517, 518-525 (2022)

arXiv:2208.09405 [pdf, other]

doi 10.1103/PhysRevD.107.015010

Bounds on boosted dark matter from direct detection: The role of energy-dependent cross sections

Authors: Debjyoti Bardhan, Supritha Bhowmick, Diptimoy Ghosh, Atanu Guha, Divya Sachdeva

Abstract: The recoil threshold of Direct Detection experiments limits the mass range of Dark Matter (DM) particles that can be detected, with most DD experiments being blind to sub-MeV DM particles. However, these light DM particles can be boosted to very high energies via collisions with energetic Cosmic Ray electrons. This allows Dark Matter particles to induce detectable recoil in the target of Direct De… ▽ More The recoil threshold of Direct Detection experiments limits the mass range of Dark Matter (DM) particles that can be detected, with most DD experiments being blind to sub-MeV DM particles. However, these light DM particles can be boosted to very high energies via collisions with energetic Cosmic Ray electrons. This allows Dark Matter particles to induce detectable recoil in the target of Direct Detection experiments. We derive constraints on scattering cross section of DM and electron, using XENONnT and Super-Kamiokande data. Vector and scalar mediators are considered, in the heavy and light regimes. We discuss the importance of including energy dependent cross sections (due to specific Lorentz structure of the vertex) in our analysis, and show that the bounds can be significantly different than the results obtained assuming constant energy-independent cross-section, often assumed in the literature for simplicity. Our bounds are also compared with other astrophysical and cosmological constraints. △ Less

Submitted 13 January, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: 11 pages, 3 figures; Title modified

Journal ref: Phys.Rev.D 107 (2023) 1, 015010

arXiv:2208.08227 [pdf, other]

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Authors: Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda

Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities wi… ▽ More Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages. We use MultiPL-E to extend the HumanEval benchmark and MBPP benchmark to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex, CodeGen, and InCoder. We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages. △ Less

Submitted 19 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2206.00807 [pdf]

Applied Federated Learning: Architectural Design for Robust and Efficient Learning in Privacy Aware Settings

Authors: Branislav Stojkovic, Jonathan Woodbridge, Zhihan Fang, Jerry Cai, Andrey Petrov, Sathya Iyer, Daoyu Huang, Patrick Yau, Arvind Sastha Kumar, Hitesh Jawa, Anamita Guha

Abstract: The classical machine learning paradigm requires the aggregation of user data in a central location where machine learning practitioners can preprocess data, calculate features, tune models and evaluate performance. The advantage of this approach includes leveraging high performance hardware (such as GPUs) and the ability of machine learning practitioners to do in depth data analysis to improve mo… ▽ More The classical machine learning paradigm requires the aggregation of user data in a central location where machine learning practitioners can preprocess data, calculate features, tune models and evaluate performance. The advantage of this approach includes leveraging high performance hardware (such as GPUs) and the ability of machine learning practitioners to do in depth data analysis to improve model performance. However, these advantages may come at a cost to data privacy. User data is collected, aggregated, and stored on centralized servers for model development. Centralization of data poses risks, including a heightened risk of internal and external security incidents as well as accidental data misuse. Federated learning with differential privacy is designed to avoid the server-side centralization pitfall by bringing the ML learning step to users' devices. Learning is done in a federated manner where each mobile device runs a training loop on a local copy of a model. Updates from on-device models are sent to the server via encrypted communication and through differential privacy to improve the global model. In this paradigm, users' personal data remains on their devices. Surprisingly, model training in this manner comes at a fairly minimal degradation in model performance. However, federated learning comes with many other challenges due to its distributed nature, heterogeneous compute environments and lack of data visibility. This paper explores those challenges and outlines an architectural design solution we are exploring and testing to productionize federated learning at Meta scale. △ Less

Submitted 7 June, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2203.13737 [pdf, other]

Flexible and Optimal Dependency Management via Max-SMT

Authors: Donald Pinckney, Federico Cassano, Arjun Guha, Jon Bell, Massimiliano Culpo, Todd Gamblin

Abstract: Package managers such as NPM have become essential for software development. The NPM repository hosts over 2 million packages and serves over 43 billion downloads every week. Unfortunately, the NPM dependency solver has several shortcomings. 1) NPM is greedy and often fails to install the newest versions of dependencies; 2) NPM's algorithm leads to duplicated dependencies and bloated code, which i… ▽ More Package managers such as NPM have become essential for software development. The NPM repository hosts over 2 million packages and serves over 43 billion downloads every week. Unfortunately, the NPM dependency solver has several shortcomings. 1) NPM is greedy and often fails to install the newest versions of dependencies; 2) NPM's algorithm leads to duplicated dependencies and bloated code, which is particularly bad for web applications that need to minimize code size; 3) NPM's vulnerability fixing algorithm is also greedy, and can even introduce new vulnerabilities; and 4) NPM's ability to duplicate dependencies can break stateful frameworks and requires a lot of care to workaround. Although existing tools try to address these problems they are either brittle, rely on post hoc changes to the dependency tree, do not guarantee optimality, or are not composable. We present PacSolve, a unifying framework and implementation for dependency solving which allows for customizable constraints and optimization goals. We use PacSolve to build MaxNPM, a complete, drop-in replacement for NPM, which empowers developers to combine multiple objectives when installing dependencies. We evaluate MaxNPM with a large sample of packages from the NPM ecosystem and show that it can: 1) reduce more vulnerabilities in dependencies than NPM's auditing tool in 33% of cases; 2) chooses newer dependencies than NPM in 14% of cases; and 3) chooses fewer dependencies than NPM in 21% of cases. All our code and data is open and available. △ Less

Submitted 24 August, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2201.12991 [pdf, ps, other]

Federated Learning with Erroneous Communication Links

Authors: Mahyar Shirvanimoghaddam, Ayoob Salari, Yifeng Gao, Aradhika Guha

Abstract: In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $ε$ and $1-ε$, respectively. We proved that the FL algorithm in the presence of communication errors… ▽ More In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $ε$ and $1-ε$, respectively. We proved that the FL algorithm in the presence of communication errors, where the CN uses the past local update if the fresh one is not received from a device, converges to the same global parameter as that the FL algorithm converges to without any communication error. We provide several simulation results to validate our theoretical analysis. We also show that when the dataset is uniformly distributed among devices, the FL algorithm that only uses fresh updates and discards missing updates might converge faster than the FL algorithm that uses past local updates. △ Less

Submitted 11 April, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: The paper is accepted for publication in IEEE Communications Letters

arXiv:2201.02422 [pdf, other]

doi 10.1017/jfm.2022.886

A new Lagrangian drift mechanism due to current-bathymetry interactions: applications in coastal cross-shelf transport

Authors: Akanksha Gupta, Anirban Guha

Abstract: We show that in free surface flows, a uniform, streamwise current over small-amplitude wavy bottom topography generates cross-stream drift velocity. This drift mechanism, referred to as the current-bathymetry interaction induced drift (CBIID), is specifically understood in the context of a simplified nearshore environment consisting of a uniform alongshore current, onshore propagating surface wave… ▽ More We show that in free surface flows, a uniform, streamwise current over small-amplitude wavy bottom topography generates cross-stream drift velocity. This drift mechanism, referred to as the current-bathymetry interaction induced drift (CBIID), is specifically understood in the context of a simplified nearshore environment consisting of a uniform alongshore current, onshore propagating surface waves, and monochromatic wavy bottom making an oblique angle with the shoreline. CBIID is found to originate from the steady, non-homogeneous solution of the governing system of equations. Similar to Stokes drift induced by surface waves, CBIID also generates a compensating Eulerian return flow to satisfy the no-flux lateral boundaries, e.g. the shoreline. CBIID increases with an increase in the particle's initial depth, bottom undulation's amplitude, and the strength of the alongshore current. Additionally, CBIID near the free (bottom) surface increases (decreases) with an increase in bottom undulation's wavelength. Maximum CBIID is obtained for long wavelength bottom topography that approximately makes $π/4$ angle with the shoreline. Unlike Stokes drift, particle excursions due to current-bathymetry interactions might not be small, hence analytical expressions based on the small-excursion approximation could be inaccurate. We provide an alternative $z$-bounded approximation, which leads to highly accurate expressions for drift velocity and time period of particles especially located near the free surface. Realistic parametric analysis reveals that in some nearshore environments, CBIID's contribution to the net Lagrangian drift can be as important as Stokes drift, implying that CBIID can have major implications in cross-shelf tracer transport. △ Less

Submitted 20 October, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

arXiv:2110.06903 [pdf, other]

doi 10.1103/PhysRevD.107.015003

EFT analysis of leptophilic dark matter at future electron-positron colliders in the mono-photon and mono-$Z$ channels

Authors: Saumyen Kundu, Atanu Guha, Prasanta Kumar Das, P. S. Bhupal Dev

Abstract: We consider the possibility that dark matter (DM) only interacts with the Standard Model leptons, but not quarks at tree level, and analyze the future lepton collider prospects of such leptophilic DM in the monophoton and mono-$Z$ (both leptonic and hadronic) channels. Adopting a model-independent effective field theory framework, we consider all possible dimension-six operators of scalar-pseudosc… ▽ More We consider the possibility that dark matter (DM) only interacts with the Standard Model leptons, but not quarks at tree level, and analyze the future lepton collider prospects of such leptophilic DM in the monophoton and mono-$Z$ (both leptonic and hadronic) channels. Adopting a model-independent effective field theory framework, we consider all possible dimension-six operators of scalar-pseudoscalar (SP), vector-axial vector (VA), and tensor-axial tensor (TAT) types for a fermionic DM and derive the collider sensitivities on the effective cutoff scale $Λ$ as a function of the DM mass. As a concrete example, we take the beam configurations of the International Linear Collider with $\sqrt s=1$ TeV and $8$ ab$^{-1}$ integrated luminosity, including the effect of beam polarization, and show that it can probe leptophilic DM at $3σ$ level up to $Λ$ values of $6.6$, $8.8$, and $7.1$ TeV for the SP-, VA- and TAT-type operators, respectively. This is largely complementary to the direct and indirect searches for leptophilic DM and can potentially provide the best-ever sensitivity in the low-mass DM regime. △ Less

Submitted 29 December, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: 29 pages, 17 figures, 16 tables, version to appear in Phys. Rev. D

Journal ref: Phys. Rev. D, 107:015003 (2023)

arXiv:2110.00025 [pdf, other]

doi 10.1103/PhysRevD.105.103029

Exclusion limits on Dark Matter-Neutrino Scattering Cross-section

Authors: Diptimoy Ghosh, Atanu Guha, Divya Sachdeva

Abstract: We derive new constraints on combination of dark matter - electron cross-section ($σ_{χe}$) and dark matter - neutrino cross-section ($σ_{χν}$) utilising the gain in kinetic energy of the dark matter (DM) particles due to scattering with the cosmic ray electrons and the diffuse supernova neutrino background (DSNB). Since the flux of the DSNB neutrinos is comparable to the CR electron flux in the e… ▽ More We derive new constraints on combination of dark matter - electron cross-section ($σ_{χe}$) and dark matter - neutrino cross-section ($σ_{χν}$) utilising the gain in kinetic energy of the dark matter (DM) particles due to scattering with the cosmic ray electrons and the diffuse supernova neutrino background (DSNB). Since the flux of the DSNB neutrinos is comparable to the CR electron flux in the energy range $\sim 1\,{\rm MeV} - 50 \,{\rm MeV}$, scattering with the DSNB neutrinos can also boost low-mass DM significantly in addition to the boost due to interaction with the cosmic ray electrons. We use the XENON1T as well as the Super-Kamiokande data to derive bounds on $σ_{χe}$ and $σ_{χν}$. While our bounds for $σ_{χe}$ are comparable with those in the literature, we show that the Super-Kamiokande experiment provides the strongest constraint on $σ_{χν}$ for DM masses below a few MeV. △ Less

Submitted 27 May, 2022; v1 submitted 30 September, 2021; originally announced October 2021.

Journal ref: Phys. Rev. D 105, 103029 (2022)

arXiv:2109.05049 [pdf, other]

Solver-based Gradual Type Migration

Authors: Luna Phipps-Costin, Carolyn Jane Anderson, Michael Greenberg, Arjun Guha

Abstract: Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static ty** as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of \emph{automated type migration}: given a dynamic program, infer addition… ▽ More Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static ty** as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of \emph{automated type migration}: given a dynamic program, infer additional or improved type annotations. Existing type migration algorithms prioritize different goals, such as maximizing type precision, maintaining compatibility with unmigrated code, and preserving the semantics of the original program. We argue that the type migration problem involves fundamental compromises: optimizing for a single goal often comes at the expense of others. Ideally, a type migration tool would flexibly accommodate a range of user priorities. We present TypeWhich, a new approach to automated type migration for the gradually-typed lambda calculus with some extensions. Unlike prior work, which relies on custom solvers, TypeWhich produces constraints for an off-the-shelf MaxSMT solver. This allows us to easily express objectives, such as minimizing the number of necessary syntactic coercions, and constraining the type of the migration to be compatible with unmigrated code. We present the first comprehensive evaluation of GTLC type migration algorithms, and compare TypeWhich to four other tools from the literature. Our evaluation uses prior benchmarks, and a new set of ``challenge problems.'' Moreover, we design a new evaluation methodology that highlights the subtleties of gradual type migration. In addition, we apply TypeWhich to a suite of benchmarks for Grift, a programming language based on the GTLC. TypeWhich is able to reconstruct all human-written annotations on all but one program. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:2107.06407 [pdf]

Watching Single Unmodified Enzymes at Work

Authors: Cuifeng Ying, Edona Karakaci, Esteban Bermudez-Urena, Alessandro Ianiro, Ceri Foster, Saurabh Awasthi, Anirvan Guha, Louise Bryan, Jonathan List, Sandor Balog, Guillermo P. Acuna, Reuven Gordon, Michael Mayer

Abstract: Many proteins undergo conformational changes during their activity. A full understanding of the function of these proteins can only be obtained if different conformations and transitions between them can be monitored in aqueous solution, with adequate temporal resolution and, ideally, on a single-molecule level. Interrogating conformational dynamics of single proteins remains, however, exquisitely… ▽ More Many proteins undergo conformational changes during their activity. A full understanding of the function of these proteins can only be obtained if different conformations and transitions between them can be monitored in aqueous solution, with adequate temporal resolution and, ideally, on a single-molecule level. Interrogating conformational dynamics of single proteins remains, however, exquisitely challenging and typically requires site-directed chemical modification combined with rigorous minimization of possible artifacts. These obstacles limit the number of single-protein investigations. The work presented here introduces an approach that traps single unmodified proteins from solution in a plasmonic hotspot and makes it possible to assign changes in refractive index to changes in protein conformation while monitoring these changes for minutes to hours with a temporal resolution at least as fast as 40 microseconds. The resulting single molecule data reveals that adenylate kinase employs a hidden enzymatic sub-cycle during catalysis, that citrate synthase populates a previously unknown intermediate conformation, which is more important for its enzymatic activity than its well-known open conformation, that hemoglobin transitions in several steps from its deoxygenated and rigid T state to its oxygenated and flexible R state, and that apo-calmodulin thermally unfolds and refolds in steps that correspond to conformational changes of individual protein domains. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 20 pages, 4 figures

arXiv:2106.10353 [pdf, other]

doi 10.1088/1475-7516/2021/09/027

Feeble DM-SM Interaction via New Scalar and Vector Mediators in Rotating Neutron Stars

Authors: Atanu Guha, Debashree Sen

Abstract: We investigate the possible presence of dark matter (DM) in massive and rotating neutron stars (NSs). For the purpose we extend our previous work [1] to introduce a light new physics vector mediator besides a scalar one in order to ensure feeble interaction between fermionic DM and $β$ stable hadronic matter in NSs. The masses of DM fermion, the mediators and the couplings are chosen consistent wi… ▽ More We investigate the possible presence of dark matter (DM) in massive and rotating neutron stars (NSs). For the purpose we extend our previous work [1] to introduce a light new physics vector mediator besides a scalar one in order to ensure feeble interaction between fermionic DM and $β$ stable hadronic matter in NSs. The masses of DM fermion, the mediators and the couplings are chosen consistent with the self-interaction constraint from Bullet cluster and from present day relic abundance. Assuming that both the scalar and vector mediators contribute equally to the relic abundance, we compute the equation of state (EoS) of the DM admixed NSs to find that the present consideration of the vector new physics mediator do not bring any significant change to the EoS and static NS properties of DM admixed NSs compared to the case where only the scalar mediator was considered [1]. However, the obtained structural properties in static conditions are in good agreement with the various constraints on them from massive pulsars like PSR J0348+0432 and PSR J0740+6620, the gravitational wave (GW170817) data and the recently obtained results of NICER experiments for PSR J0030+0451 and PSR J0740+6620. We also extended our work to compute the rotational properties of DM admixed NSs rotating at different angular velocities. The present results in this regard suggest that the secondary component of GW190814 may be a rapidly rotating massive DM admixed NS. The constraints on rotational frequency from pulsars like PSR B1937+21 and PSR J1748-2446ad are also satisfied by our present results. Also, the constraints on moment of inertia are satisfied considering slow rotation. The universality relation in terms of normalized moment of inertia also holds good with our DM admixed EoS. △ Less

Submitted 29 August, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 20 Pages, 8 figures, Accepted for Publication in JCAP

Report number: JCAP_070P_0621

Journal ref: JCAP 09 (2021) 027

arXiv:2106.03198 [pdf, other]

doi 10.1017/jfm.2022.431

Resonant and near-resonant internal wave triads for non-uniform stratifications. Part 2: Vertically bounded domain with mild-slope bathymetry

Authors: Saranraj Gururaj, Anirban Guha

Abstract: Weakly nonlinear internal wave-wave interaction is a key mechanism that cascades energy from large to small scales, leading to ocean turbulence and mixing. Oceans typically have a non-uniform density stratification profile; moreover, submarine topography leads to a spatially varying ocean depth ($h$). Under these conditions and assuming mild-slope bathymetry, we employ multiple-scale analysis to d… ▽ More Weakly nonlinear internal wave-wave interaction is a key mechanism that cascades energy from large to small scales, leading to ocean turbulence and mixing. Oceans typically have a non-uniform density stratification profile; moreover, submarine topography leads to a spatially varying ocean depth ($h$). Under these conditions and assuming mild-slope bathymetry, we employ multiple-scale analysis to derive the wave amplitude equations for triadic- and self-interactions. The waves are assumed to have a slowly (rapidly) varying amplitude (phase) in space and time. For uniform stratifications, the horizontal wavenumber ($k$) condition for waves ($1$,$2$,$3$), given by ${k}_{(1,a)}+{k}_{(2,b)}+{k}_{(3,c)}=0$, is unaffected as $h$ is varied, where $(a,b,c)$ denote the modenumber. Moreover, the nonlinear coupling coefficients (NLC) are proportional to $1/h^2$, implying that triadic waves grow faster while travelling up a seamount. For non-uniform stratifications, triads that do not satisfy the condition $a=b=c$ may not satisfy the horizontal wavenumber condition as $h$ is varied, and unlike uniform stratification, the NLC may not decrease (increase) monotonically with increasing (decreasing) $h$. NLC, and hence wave growth rates for both triads and self-interactions, can also vary rapidly with $h$. The most unstable daughter wave combination of a triad with a mode-1 parent wave can also change for relatively small changes in $h$. We also investigate higher-order self-interactions in the presence of a monochromatic, small amplitude bathymetry; here the bathymetry behaves as a zero frequency wave. We derive the amplitude evolution equations and show that higher-order self-interactions might be a viable mechanism of energy cascade. △ Less

Submitted 3 June, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: Accepted in the Journal of Fluid Mechanics

arXiv:2105.06577 [pdf, other]

Online Algorithms and Policies Using Adaptive and Machine Learning Approaches

Authors: Anuradha M. Annaswamy, Anubhav Guha, Yingnan Cui, Sunbochen Tang, Peter A. Fisher, Joseph E. Gaudio

Abstract: This paper considers the problem of real-time control and learning in dynamic systems subjected to parametric uncertainties. We propose a combination of a Reinforcement Learning (RL) based policy in the outer loop suitably chosen to ensure stability and optimality for the nominal dynamics, together with Adaptive Control (AC) in the inner loop so that in real-time AC contracts the closed-loop dynam… ▽ More This paper considers the problem of real-time control and learning in dynamic systems subjected to parametric uncertainties. We propose a combination of a Reinforcement Learning (RL) based policy in the outer loop suitably chosen to ensure stability and optimality for the nominal dynamics, together with Adaptive Control (AC) in the inner loop so that in real-time AC contracts the closed-loop dynamics towards a stable trajectory traced out by RL. Two classes of nonlinear dynamic systems are considered, both of which are control-affine. The first class of dynamic systems utilizes equilibrium points %with expansion forms around these points and a Lyapunov approach while second class of nonlinear systems uses contraction theory. AC-RL controllers are proposed for both classes of systems and shown to lead to online policies that guarantee stability using a high-order tuner and accommodate parametric uncertainties and magnitude limits on the input. In addition to establishing a stability guarantee with real-time control, the AC-RL controller is also shown to lead to parameter learning with persistent excitation for the first class of systems. Numerical validations of all algorithms are carried out using a quadrotor landing task on a moving platform. △ Less

Submitted 9 June, 2023; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: 38 pages

arXiv:2104.06141 [pdf, other]

doi 10.1093/mnras/stab1056

Implications of Feebly Interacting Dark Sector on Neutron Star Properties and Constraints from GW170817

Authors: Debashree Sen, Atanu Guha

Abstract: We investigate the effect of feeble interaction of dark matter (DM) with hadronic matter on the equation of state (EoS) and structural properties of neutron stars (NSs) in static conditions. For the purpose we adopt the effective chiral model for the hadronic sector and for the first time in the context of possible existence of DM inside NSs, we introduce DM-SM interaction through light new physic… ▽ More We investigate the effect of feeble interaction of dark matter (DM) with hadronic matter on the equation of state (EoS) and structural properties of neutron stars (NSs) in static conditions. For the purpose we adopt the effective chiral model for the hadronic sector and for the first time in the context of possible existence of DM inside NSs, we introduce DM-SM interaction through light new physics mediator. Moreover, the mass of DM fermion, the mediator and the coupling are adopted from the self-interaction constraint from Bullet cluster and from present day relic abundance. Within the considered framework, the work highlights the underlying stiffening of EoS in presence of DM fermion of mass of the order of a few GeV compared to the no-DM scenario. Consequently, the maximum gravitational mass of NS is obtained consistent with the bounds from the most massive pulsars which were not satisfied with the hadronic matter EoS alone. The estimates of radius and tidal deformability of 1.4 $M_{\odot}$ NS and the tidal deformabilities of the individual components of the binary neutron stars (BNS) associated with GW170817 are all in good agreement with the individual constraints obtained from GW170817 observation of BNS merger. △ Less

Submitted 7 June, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: 12 Pages, 13 figures, Minors typos are corrected in the latest version

Journal ref: Mon.Not.Roy.Astron.Soc. 504 (2021) 3, 3354-3363

arXiv:2103.16551 [pdf, other]

Online Policies for Real-Time Control Using MRAC-RL

Authors: Anubhav Guha, Anuradha Annaswamy

Abstract: In this paper, we propose the Model Reference Adaptive Control & Reinforcement Learning (MRAC-RL) approach to develo** online policies for systems in which modeling errors occur in real-time. Although reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems, discrepancies between simulated dynamics and the true target dynamics can cause… ▽ More In this paper, we propose the Model Reference Adaptive Control & Reinforcement Learning (MRAC-RL) approach to develo** online policies for systems in which modeling errors occur in real-time. Although reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems, discrepancies between simulated dynamics and the true target dynamics can cause trained policies to fail to generalize and adapt appropriately when deployed in the real-world. The MRAC-RL framework generates online policies by utilizing an inner-loop adaptive controller together with a simulation-trained outer-loop RL policy. This structure allows MRAC-RL to adapt and operate effectively in a target environment, even when parametric uncertainties exists. We propose a set of novel MRAC algorithms, apply them to a class of nonlinear systems, derive the associated control laws, provide stability guarantees for the resulting closed-loop system, and show that the adaptive tracking objective is achieved. Using a simulation study of an automated quadrotor landing task, we demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms and techniques through the generation of online policies. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: Submitted to CDC 2021

arXiv:2103.04880 [pdf, other]

Iterative Program Synthesis for Adaptable Social Navigation

Authors: Jarrett Holtz, Simon Andrews, Arjun Guha, Joydeep Biswas

Abstract: Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations -- model-based approaches require c… ▽ More Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations -- model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations. In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining program synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies. We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort. △ Less

Submitted 30 August, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: IROS 2021

arXiv:2103.00364 [pdf]

doi 10.1038/s41467-021-25503-9

Predicting post-operative right ventricular failure using video-based deep learning

Authors: Rohan Shad, Nicolas Quach, Robyn Fong, Patpilai Kasinpila, Cayley Bowles, Miguel Castro, Ashrith Guha, Eddie Suarez, Stefan Jovinge, Sang** Lee, Theodore Boeve, Myriam Amsallem, Xiu Tang, Francois Haddad, Yasuhiro Shudo, Y. Joseph Woo, Jeffrey Teuteberg, John P. Cunningham, Curt P. Langlotz, William Hiesinger

Abstract: Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart fun… ▽ More Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart function. Furthermore, all modern echocardiography artificial intelligence (AI) systems are similarly limited by design - automating measurements of the same reductionist metrics rather than utilizing the wealth of data embedded within each echo study. This underutilization is most evident in situations where clinical decision making is guided by subjective assessments of disease acuity, and tools that predict disease onset within clinically actionable timeframes are unavailable. Predicting the likelihood of develo** post-operative right ventricular failure (RV failure) in the setting of mechanical circulatory support is one such clinical example. To address this, we developed a novel video AI system trained to predict post-operative right ventricular failure (RV failure), using the full spatiotemporal density of information from pre-operative echocardiography scans. We achieve an AUC of 0.729, specificity of 52% at 80% sensitivity and 46% sensitivity at 80% specificity. Furthermore, we show that our ML system significantly outperforms a team of human experts tasked with predicting RV failure on independent clinical evaluation. Finally, the methods we describe are generalizable to any cardiac clinical decision support application where treatment or patient selection is guided by qualitative echocardiography assessments. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: 12 pages, 3 figures

Journal ref: Nat Commun 12, 5192 (2021)

arXiv:2102.07695 [pdf, other]

Scalable nonparametric Bayesian learning for heterogeneous and dynamic velocity fields

Authors: Sunrit Chakraborty, Aritra Guha, Rayleigh Lei, XuanLong Nguyen

Abstract: Analysis of heterogeneous patterns in complex spatio-temporal data finds usage across various domains in applied science and engineering, including training autonomous vehicles to navigate in complex traffic scenarios. Motivated by applications arising in the transportation domain, in this paper we develop a model for learning heterogeneous and dynamic patterns of velocity field data. We draw from… ▽ More Analysis of heterogeneous patterns in complex spatio-temporal data finds usage across various domains in applied science and engineering, including training autonomous vehicles to navigate in complex traffic scenarios. Motivated by applications arising in the transportation domain, in this paper we develop a model for learning heterogeneous and dynamic patterns of velocity field data. We draw from basic nonparameric Bayesian modeling elements such as hierarchical Dirichlet process and infinite hidden Markov model, while the smoothness of each homogeneous velocity field element is captured with a Gaussian process prior. Of particular focus is a scalable approximate inference method for the proposed model; this is achieved by employing sequential MAP estimates from the infinite HMM model and an efficient sequential GP posterior computation technique, which is shown to work effectively on simulated data sets. Finally, we demonstrate the effectiveness of our techniques to the NGSIM dataset of complex multi-vehicle interactions. △ Less

Submitted 15 February, 2021; originally announced February 2021.

Comments: 5 tables, 8 figures

arXiv:2102.03895 [pdf, other]

Functional optimal transport: map estimation and domain adaptation for functional data

Authors: Jiacheng Zhu, Aritra Guha, Dat Do, Mengdi Xu, XuanLong Nguyen, Ding Zhao

Abstract: We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator map** a Hilbert space of functions to another. For numerous machine learning tasks, data can be naturally viewed as samples drawn from spaces of functions, such… ▽ More We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator map** a Hilbert space of functions to another. For numerous machine learning tasks, data can be naturally viewed as samples drawn from spaces of functions, such as curves and surfaces, in high dimensions. Optimal transport for functional data analysis provides a useful framework of treatment for such domains. { Since probability measures in infinite dimensional spaces generally lack absolute continuity (that is, with respect to non-degenerate Gaussian measures), the Monge map in the standard optimal transport theory for finite dimensional spaces may not exist. Our approach to the optimal transport problem in infinite dimensions is by a suitable regularization technique -- we restrict the class of transport maps to be a Hilbert-Schmidt space of operators.} To this end, we develop an efficient algorithm for finding the stochastic transport map between functional domains and provide theoretical guarantees on the existence, uniqueness, and consistency of our estimate for the Hilbert-Schmidt operator. We validate our method on synthetic datasets and examine the functional properties of the transport map. Experiments on real-world datasets of robot arm trajectories further demonstrate the effectiveness of our method on applications in domain adaptation. △ Less

Submitted 28 August, 2023; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: 48 pages, 10 figures, 3 tables

Showing 1–50 of 129 results for author: Guha, A