Skip to main content

Showing 1–50 of 58 results for author: Guha, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16207  [pdf, other

    cs.CY

    Thinking beyond Bias: Analyzing Multifaceted Impacts and Implications of AI on Gendered Labour

    Authors: Satyam Mohla, Bishnupriya Bagh, Anupam Guha

    Abstract: Artificial Intelligence with its multifaceted technologies and integral role in global production significantly impacts gender dynamics particularly in gendered labor. This paper emphasizes the need to explore AIs broader impacts on gendered labor beyond its current emphasis on the generation and perpetuation of epistemic biases. We draw attention to how the AI industry as an integral component of… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review. An unindexed peer-reviewed working draft was accepted for presentation at IJCAI 2021 Workshop on AI for Social Good organized by Harvard CRCS

  2. arXiv:2405.20179  [pdf, other

    cs.CL cs.AI cs.RO

    Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

    Authors: Zichao Hu, Junyi Jessy Li, Arjun Guha, Joydeep Biswas

    Abstract: Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the pe… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2404.01903  [pdf, other

    cs.CL cs.LG cs.PL

    Activation Steering for Robust Type Prediction in CodeLLMs

    Authors: Francesca Lucchetti, Arjun Guha

    Abstract: Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our me… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 7 figures

  4. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  5. arXiv:2401.15232  [pdf, other

    cs.HC

    How Beginning Programmers and Code LLMs (Mis)read Each Other

    Authors: Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman

    Abstract: Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluat… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Conditionally Accepted to CHI 2024

  6. arXiv:2312.12450  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

    Authors: Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha

    Abstract: A significant amount of research is focused on develo** and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of c… ▽ More

    Submitted 19 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  7. Deploying and Evaluating LLMs to Program Service Mobile Robots

    Authors: Zichao Hu, Francesca Lucchetti, Claire Schlesinger, Yash Saxena, Anders Freeman, Sadanand Modak, Arjun Guha, Joydeep Biswas

    Abstract: Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contr… ▽ More

    Submitted 21 February, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: 8 pages, Accepted at IEEE Robotics and Automation Letters (RA-L)

    Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2853-2860, March 2024

  8. arXiv:2309.14054  [pdf, other

    cs.LG cs.AI cs.CV

    Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks

    Authors: Piyush Tiwary, Atri Guha, Subhodip Panda, Prathosh A. P

    Abstract: The increased attention to regulating the outputs of deep generative models, driven by growing concerns about privacy and regulatory compliance, has highlighted the need for effective control over these models. This necessity arises from instances where generative models produce outputs containing undesirable, offensive, or potentially harmful content. To tackle this challenge, the concept of mach… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 15 pages, 12 figures

  9. arXiv:2308.12545  [pdf, other

    cs.SE

    npm-follower: A Complete Dataset Tracking the NPM Ecosystem

    Authors: Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell

    Abstract: Software developers typically rely upon a large network of dependencies to build their applications. For instance, the NPM package repository contains over 3 million packages and serves tens of billions of downloads weekly. Understanding the structure and nature of packages, dependencies, and published code requires datasets that provide researchers with easy access to metadata and code of package… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  10. arXiv:2308.09895  [pdf, other

    cs.PL cs.LG

    Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

    Authors: Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha

    Abstract: Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript)… ▽ More

    Submitted 10 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

  11. arXiv:2308.08347  [pdf, ps, other

    cs.PL

    Continuing WebAssembly with Effect Handlers

    Authors: Luna Phipps-Costin, Andreas Rossberg, Arjun Guha, Daan Leijen, Daniel Hillerström, KC Sivaramakrishnan, Matija Pretnar, Sam Lindley

    Abstract: WebAssembly (Wasm) is a low-level portable code format offering near native performance. It is intended as a compilation target for a wide variety of source languages. However, Wasm provides no direct support for non-local control flow features such as async/await, generators/iterators, lightweight threads, first-class continuations, etc. This means that compilers for source languages with such fe… ▽ More

    Submitted 13 September, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

  12. arXiv:2306.12354  [pdf

    physics.med-ph cs.HC

    Seat pan angle optimization for vehicle ride comfort using finite element model of human spine

    Authors: Raj Desai, Ankit Vekaria, Anirban Guha, P. Seshu

    Abstract: Ride comfort of the driver/occupant of a vehicle has been usually analyzed by multibody biodynamic models of human beings. Accurate modeling of critical segments of the human body, e.g. the spine requires these models to have a very high number of segments. The resultant increase in degrees of freedom makes these models difficult to analyze and not able to provide certain details such as seat pres… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  13. arXiv:2306.04556  [pdf, other

    cs.LG cs.HC cs.SE

    StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

    Authors: Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

    Abstract: Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. Stud… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  14. arXiv:2305.17145  [pdf, other

    cs.SE cs.LG cs.PL

    Type Prediction With Program Decomposition and Fill-in-the-Type Training

    Authors: Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

    Abstract: TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  15. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  16. arXiv:2304.01651  [pdf, other

    cs.CY cs.HC

    Socio-economic landscape of digital transformation & public NLP systems: A critical review

    Authors: Satyam Mohla, Anupam Guha

    Abstract: The current wave of digital transformation has spurred digitisation reforms and has led to prodigious development of AI & NLP systems, with several of them entering the public domain. There is a perception that these systems have a non trivial impact on society but there is a dearth of literature in critical AI exploring what kinds of systems exist and how do they operate. This paper constructs a… ▽ More

    Submitted 27 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Under review

  17. arXiv:2304.00394  [pdf, other

    cs.SE

    A Large Scale Analysis of Semantic Versioning in NPM

    Authors: Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell

    Abstract: The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a "semantic versioning" ('semver') scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possi… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

  18. Do Machine Learning Models Produce TypeScript Types That Type Check?

    Authors: Ming-Ho Yee, Arjun Guha

    Abstract: Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the re… ▽ More

    Submitted 11 July, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: Published at the 37th European Conference on Object-Oriented Programming (ECOOP 2023)

  19. arXiv:2302.02092  [pdf, other

    cs.LG stat.ML

    Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics

    Authors: Jiacheng Zhu, Jielin Qiu, Aritra Guha, Zhuolin Yang, Xuanlong Nguyen, Bo Li, Ding Zhao

    Abstract: We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connectin… ▽ More

    Submitted 28 August, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: 34 pages, 3 figures, 18 tables

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:43129-43157, 2023

  20. arXiv:2301.03988  [pdf, other

    cs.SE cs.AI cs.LG

    SantaCoder: don't reach for the stars!

    Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

    Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More

    Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

  21. arXiv:2211.04568  [pdf, ps, other

    stat.AP cs.CY cs.LG

    Towards Algorithmic Fairness in Space-Time: Filling in Black Holes

    Authors: Cheryl Flynn, Aritra Guha, Subhabrata Majumdar, Divesh Srivastava, Zhengyi Zhou

    Abstract: New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from h… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  22. arXiv:2208.08227  [pdf, other

    cs.LG cs.PL

    MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

    Authors: Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda

    Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities wi… ▽ More

    Submitted 19 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  23. arXiv:2206.00807  [pdf

    cs.LG

    Applied Federated Learning: Architectural Design for Robust and Efficient Learning in Privacy Aware Settings

    Authors: Branislav Stojkovic, Jonathan Woodbridge, Zhihan Fang, Jerry Cai, Andrey Petrov, Sathya Iyer, Daoyu Huang, Patrick Yau, Arvind Sastha Kumar, Hitesh Jawa, Anamita Guha

    Abstract: The classical machine learning paradigm requires the aggregation of user data in a central location where machine learning practitioners can preprocess data, calculate features, tune models and evaluate performance. The advantage of this approach includes leveraging high performance hardware (such as GPUs) and the ability of machine learning practitioners to do in depth data analysis to improve mo… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

  24. arXiv:2203.13737  [pdf, other

    cs.SE

    Flexible and Optimal Dependency Management via Max-SMT

    Authors: Donald Pinckney, Federico Cassano, Arjun Guha, Jon Bell, Massimiliano Culpo, Todd Gamblin

    Abstract: Package managers such as NPM have become essential for software development. The NPM repository hosts over 2 million packages and serves over 43 billion downloads every week. Unfortunately, the NPM dependency solver has several shortcomings. 1) NPM is greedy and often fails to install the newest versions of dependencies; 2) NPM's algorithm leads to duplicated dependencies and bloated code, which i… ▽ More

    Submitted 24 August, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

  25. arXiv:2201.12991  [pdf, ps, other

    cs.LG cs.IT

    Federated Learning with Erroneous Communication Links

    Authors: Mahyar Shirvanimoghaddam, Ayoob Salari, Yifeng Gao, Aradhika Guha

    Abstract: In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $ε$ and $1-ε$, respectively. We proved that the FL algorithm in the presence of communication errors… ▽ More

    Submitted 11 April, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: The paper is accepted for publication in IEEE Communications Letters

  26. arXiv:2109.05049  [pdf, other

    cs.PL

    Solver-based Gradual Type Migration

    Authors: Luna Phipps-Costin, Carolyn Jane Anderson, Michael Greenberg, Arjun Guha

    Abstract: Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static ty** as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of \emph{automated type migration}: given a dynamic program, infer addition… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  27. arXiv:2105.06577  [pdf, other

    cs.LG math.DS math.OC

    Online Algorithms and Policies Using Adaptive and Machine Learning Approaches

    Authors: Anuradha M. Annaswamy, Anubhav Guha, Yingnan Cui, Sunbochen Tang, Peter A. Fisher, Joseph E. Gaudio

    Abstract: This paper considers the problem of real-time control and learning in dynamic systems subjected to parametric uncertainties. We propose a combination of a Reinforcement Learning (RL) based policy in the outer loop suitably chosen to ensure stability and optimality for the nominal dynamics, together with Adaptive Control (AC) in the inner loop so that in real-time AC contracts the closed-loop dynam… ▽ More

    Submitted 9 June, 2023; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: 38 pages

  28. arXiv:2103.04880  [pdf, other

    cs.RO cs.PL

    Iterative Program Synthesis for Adaptable Social Navigation

    Authors: Jarrett Holtz, Simon Andrews, Arjun Guha, Joydeep Biswas

    Abstract: Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations -- model-based approaches require c… ▽ More

    Submitted 30 August, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: IROS 2021

  29. Predicting post-operative right ventricular failure using video-based deep learning

    Authors: Rohan Shad, Nicolas Quach, Robyn Fong, Patpilai Kasinpila, Cayley Bowles, Miguel Castro, Ashrith Guha, Eddie Suarez, Stefan Jovinge, Sang** Lee, Theodore Boeve, Myriam Amsallem, Xiu Tang, Francois Haddad, Yasuhiro Shudo, Y. Joseph Woo, Jeffrey Teuteberg, John P. Cunningham, Curt P. Langlotz, William Hiesinger

    Abstract: Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart fun… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Comments: 12 pages, 3 figures

    Journal ref: Nat Commun 12, 5192 (2021)

  30. arXiv:2102.07695  [pdf, other

    stat.ML cs.LG stat.ME

    Scalable nonparametric Bayesian learning for heterogeneous and dynamic velocity fields

    Authors: Sunrit Chakraborty, Aritra Guha, Rayleigh Lei, XuanLong Nguyen

    Abstract: Analysis of heterogeneous patterns in complex spatio-temporal data finds usage across various domains in applied science and engineering, including training autonomous vehicles to navigate in complex traffic scenarios. Motivated by applications arising in the transportation domain, in this paper we develop a model for learning heterogeneous and dynamic patterns of velocity field data. We draw from… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: 5 tables, 8 figures

  31. arXiv:2102.03895  [pdf, other

    stat.ML cs.LG stat.AP

    Functional optimal transport: map estimation and domain adaptation for functional data

    Authors: Jiacheng Zhu, Aritra Guha, Dat Do, Mengdi Xu, XuanLong Nguyen, Ding Zhao

    Abstract: We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator map** a Hilbert space of functions to another. For numerous machine learning tasks, data can be naturally viewed as samples drawn from spaces of functions, such… ▽ More

    Submitted 28 August, 2023; v1 submitted 7 February, 2021; originally announced February 2021.

    Comments: 48 pages, 10 figures, 3 tables

  32. arXiv:2011.10562  [pdf, other

    eess.SY cs.LG cs.RO

    MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty

    Authors: Anubhav Guha, Anuradha Annaswamy

    Abstract: Reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems. For many such systems, these policies are trained in a simulated environment. Due to discrepancies between the simulated model and the true system dynamics, RL trained policies often fail to generalize and adapt appropriately when deployed in the real-world environment. Current res… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Short version submitted to Learning for Dynamics & Control (L4DC) 2021 Conference

  33. Wasm/k: Delimited Continuations for WebAssembly

    Authors: Donald Pinckney, Arjun Guha, Yuriy Brun

    Abstract: WebAssembly is designed to be an alternative to JavaScript that is a safe, portable, and efficient compilation target for a variety of languages. The performance of high-level languages depends not only on the underlying performance of WebAssembly, but also on the quality of the generated WebAssembly code. In this paper, we identify several features of high-level languages that current approaches… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020)

  34. arXiv:2009.06693  [pdf, other

    cs.DC cs.LG

    Accelerating Graph Sampling for Graph Machine Learning using GPUs

    Authors: Abhinav Jangda, Sandeep Polisetty, Arjun Guha, Marco Serafini

    Abstract: Representation learning algorithms automatically learn the features of data. Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN. However, sampling time can be a significant fraction of training time, and existing systems do not efficiently parallelize sampling. Samplin… ▽ More

    Submitted 10 May, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Published in EuroSys 2021

  35. Multimodal Noisy Segmentation based fragmented burn scars identification in Amazon Rainforest

    Authors: Satyam Mohla, Sidharth Mohla, Anupam Guha, Biplab Banerjee

    Abstract: Detection of burn marks due to wildfires in inaccessible rain forests is important for various disaster management and ecological studies. The fragmented nature of arable landscapes and diverse crop** patterns often thwart the precise map** of burn scars. Recent advances in remote-sensing and availability of multimodal data offer a viable solution to this map** problem. However, the task to… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: 5 pages, 5 figures. Accepted at IEEE International Conference on Systems, Man and Cybernetics 2020. Earlier draft presented at Harvard CRCS AI for Social Good Workshop 2020

    Journal ref: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

  36. arXiv:2008.04133  [pdf, other

    cs.AI cs.PL cs.RO

    Robot Action Selection Learning via Layered Dimension Informed Program Synthesis

    Authors: Jarrett Holtz, Arjun Guha, Joydeep Biswas

    Abstract: Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to re… ▽ More

    Submitted 12 November, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

  37. arXiv:2006.10241  [pdf, other

    cs.LG cs.MA cs.RO stat.AP stat.ML

    Robust Unsupervised Learning of Temporal Dynamic Interactions

    Authors: Aritra Guha, Rayleigh Lei, Jiacheng Zhu, XuanLong Nguyen, Ding Zhao

    Abstract: Robust representation learning of temporal dynamic interactions is an important problem in robotic learning in general and automated unsupervised learning in particular. Temporal dynamic interactions can be described by (multiple) geometric trajectories in a suitable space over which unsupervised learning techniques may be applied to extract useful features from raw and high-dimensional data measu… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  38. arXiv:2001.04397  [pdf, other

    cs.RO

    SMT-based Robot Transition Repair

    Authors: Jarrett Holtz, Arjun Guha, Joydeep Biswas

    Abstract: State machines are a common model for robot behaviors. Transition functions often rely on parameterized conditions to model preconditions for the controllers, where the correct values of the parameters depend on factors relating to the environment or the specific robot. In the absence of specific calibration procedures a roboticist must painstakingly adjust the parameters through a series of trial… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: In submission to AIJ. arXiv admin note: text overlap with arXiv:1802.01706

  39. arXiv:1911.02178  [pdf, other

    cs.DC cs.PL

    A Language-based Serverless Function Accelerator

    Authors: Emily Herbert, Arjun Guha

    Abstract: Serverless computing is an approach to cloud computing that allows programmers to run serverless functions in response to external events. Serverless functions are priced at sub-second granularity, support transparent elasticity, and relieve programmers from managing the operating system. Thus serverless functions allow programmers to focus on writing application code, and the cloud provider to ma… ▽ More

    Submitted 3 August, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

  40. arXiv:1909.07190  [pdf, other

    cs.PL cs.DC

    Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs

    Authors: Abhinav Jangda, Arjun Guha

    Abstract: Domain-specific languages that execute image processing pipelineson GPUs, such as Halide and Forma, operate by 1) dividing the image into overlapped tiles, and 2) fusing loops to improve memory locality. However, current approaches have limitations: 1) they require intra thread block synchronization, which has a non-trivial cost, 2) they must choose between small tiles that require more overlapped… ▽ More

    Submitted 8 September, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

  41. arXiv:1909.03110  [pdf, other

    cs.CY cs.PL cs.RO

    Making High-Performance Robots Safe and Easy to Use for an Introduction to Computing

    Authors: Joseph Spitzer, Joydeep Biswas, Arjun Guha

    Abstract: Robots are a popular platform for introducing computing and artificial intelligence to novice programmers. However, programming state-of-the-art robots is very challenging, and requires knowledge of concurrency, operation safety, and software engineering skills, which can take years to teach. In this paper, we present an approach to introducing computing that allows students to safely and easily p… ▽ More

    Submitted 21 November, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: 8 pages, 7 figures, 4 tables

  42. arXiv:1909.02778  [pdf, other

    cs.RO

    Automatic Failure Recovery for End-User Programs on Service Mobile Robots

    Authors: Jenna Claire Hammond, Joydeep Biswas, Arjun Guha

    Abstract: For service mobile robots to be most effective, it must be possible for non-experts and even end-users to program them to do new tasks. Regardless of the programming method (e.g., by demonstration or traditional programming), robot task programs are challenging to write, because they rely on multiple actions to succeed, including human-robot interactions. Unfortunately, interactions are prone to f… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  43. arXiv:1905.11009  [pdf, other

    stat.ML cs.LG

    Dirichlet Simplex Nest and Geometric Inference

    Authors: Mikhail Yurochkin, Aritra Guha, Yuekai Sun, XuanLong Nguyen

    Abstract: We propose Dirichlet Simplex Nest, a class of probabilistic models suitable for a variety of data types, and develop fast and provably accurate inference algorithms by accounting for the model's convex geometry and low dimensional simplicial structure. By exploiting the connection to Voronoi tessellation and properties of Dirichlet distribution, the proposed inference algorithm is shown to achieve… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  44. Formal Foundations of Serverless Computing

    Authors: Abhinav Jangda, Donald Pinckney, Yuriy Brun, Arjun Guha

    Abstract: Serverless computing (also known as functions as a service) is a new cloud computing abstraction that makes it easier to write robust, large-scale web services. In serverless computing, programmers write what are called serverless functions, and the cloud platform transparently manages the operating system, resource allocation, load-balancing, and fault tolerance. When demand for the service spike… ▽ More

    Submitted 4 October, 2020; v1 submitted 15 February, 2019; originally announced February 2019.

    Journal ref: PACMPL, OOPSLA issue, vol. 3, October 2019, pp. 149:1-149:26

  45. Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code

    Authors: Abhinav Jangda, Bobby Powers, Emery Berger, Arjun Guha

    Abstract: All major web browsers now support WebAssembly, a low-level bytecode intended to serve as a compilation target for code written in languages like C and C++. A key goal of WebAssembly is performance parity with native code; previous work reports near parity, with many applications compiled to WebAssembly running on average 10% slower than native code. However, this evaluation was limited to a suite… ▽ More

    Submitted 31 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: Accepted (to appear) at USENIX Annual Technical Conference 2019

  46. arXiv:1809.08738  [pdf, other

    stat.ML cs.CL cs.LG

    Scalable inference of topic evolution via models for latent geometric structures

    Authors: Mikhail Yurochkin, Zhiwei Fan, Aritra Guha, Paraschos Koutris, XuanLong Nguyen

    Abstract: We develop new models and algorithms for learning the temporal dynamics of the topic polytopes and related geometric objects that arise in topic model based inference. Our model is nonparametric Bayesian and the corresponding inference algorithm is able to discover new topics as the time progresses. By exploiting the connection between the modeling of topic polytope evolution, Beta-Bernoulli proce… ▽ More

    Submitted 1 November, 2019; v1 submitted 23 September, 2018; originally announced September 2018.

    Comments: NeurIPS 2019

  47. arXiv:1802.02974  [pdf, other

    cs.PL

    Putting in All the Stops: Execution Control for JavaScript

    Authors: Samuel Baxter, Rachit Nigam, Joe Gibbs Politz, Shriram Krishnamurthi, Arjun Guha

    Abstract: Scores of compilers produce JavaScript, enabling programmers to use many languages on the Web, reuse existing code, and even use Web IDEs. Unfortunately, most compilers inherit the browser's compromised execution model, so long-running programs freeze the browser tab, infinite loops crash IDEs, and so on. The few compilers that avoid these problems suffer poor performance and are difficult to engi… ▽ More

    Submitted 15 April, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

    Comments: In proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2018

  48. arXiv:1802.01706  [pdf, other

    cs.RO cs.PL

    Interactive Robot Transition Repair With SMT

    Authors: Jarrett Holtz, Arjun Guha, Joydeep Biswas

    Abstract: Complex robot behaviors are often structured as state machines, where states encapsulate actions and a transition function switches between states. Since transitions depend on physical parameters, when the environment changes, a roboticist has to painstakingly readjust the parameters to work in the new environment. We present interactive SMT-based Robot Transition Repair (SRTR): instead of manuall… ▽ More

    Submitted 5 May, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: International Joint Conference on Artificial Intelligence (IJCAI), 2018

  49. Tortoise: Interactive System Configuration Repair

    Authors: Aaron Weiss, Arjun Guha, Yuriy Brun

    Abstract: System configuration languages provide powerful abstractions that simplify managing large-scale, networked systems. Thousands of organizations now use configuration languages, such as Puppet. However, specifications written in configuration languages can have bugs and the shell remains the simplest way to debug a misconfigured system. Unfortunately, it is unsafe to use the shell to fix problems wh… ▽ More

    Submitted 15 September, 2017; originally announced September 2017.

    Comments: Published version in proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE) 2017

    Journal ref: in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2017, pp. 625-636

  50. arXiv:1701.04548  [pdf, ps, other

    cs.DM

    Analytic Connectivity in General Hypergraphs

    Authors: Ashwin Guha, Muni Sreenivas Pydi, Biswajit Paria, Ambedkar Dukkipati

    Abstract: In this paper we extend the known results of analytic connectivity to non-uniform hypergraphs. We prove a modified Cheeger's inequality and also give a bound on analytic connectivity with respect to the degree sequence and diameter of a hypergraph.

    Submitted 17 January, 2017; originally announced January 2017.