-
The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development
Authors:
Steven I. Ross,
Fernando Martinez,
Stephanie Houde,
Michael Muller,
Justin D. Weisz
Abstract:
Large language models (LLMs) have recently been applied in software engineering to perform tasks such as translating code between programming languages, generating code from natural language, and autocompleting code as it is being written. When used within development tools, these systems typically treat each model invocation independently from all previous invocations, and only a specific limited…
▽ More
Large language models (LLMs) have recently been applied in software engineering to perform tasks such as translating code between programming languages, generating code from natural language, and autocompleting code as it is being written. When used within development tools, these systems typically treat each model invocation independently from all previous invocations, and only a specific limited functionality is exposed within the user interface. This approach to user interaction misses an opportunity for users to more deeply engage with the model by having the context of their previous interactions, as well as the context of their code, inform the model's responses. We developed a prototype system -- the Programmer's Assistant -- in order to explore the utility of conversational interactions grounded in code, as well as software engineers' receptiveness to the idea of conversing with, rather than invoking, a code-fluent LLM. Through an evaluation with 42 participants with varied levels of programming experience, we found that our system was capable of conducting extended, multi-turn discussions, and that it enabled additional knowledge and capabilities beyond code generation to emerge from the LLM. Despite skeptical initial expectations for conversational programming assistance, participants were impressed by the breadth of the assistant's capabilities, the quality of its responses, and its potential for improving their productivity. Our work demonstrates the unique potential of conversational interactions with LLMs for co-creative processes like software development.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
A Case Study in Engineering a Conversational Programming Assistant's Persona
Authors:
Steven I. Ross,
Michael Muller,
Fernando Martinez,
Stephanie Houde,
Justin D. Weisz
Abstract:
The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A di…
▽ More
The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A discussion of the evolution of the prompt provides a case study in how to coax an existing foundation model to behave in a desirable manner for a particular application.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Toward General Design Principles for Generative AI Applications
Authors:
Justin D. Weisz,
Michael Muller,
Jessica He,
Stephanie Houde
Abstract:
Generative AI technologies are growing in power, utility, and use. As generative technologies are being incorporated into mainstream applications, there is a need for guidance on how to design those applications to foster productive and safe use. Based on recent research on human-AI co-creation within the HCI and AI communities, we present a set of seven principles for the design of generative AI…
▽ More
Generative AI technologies are growing in power, utility, and use. As generative technologies are being incorporated into mainstream applications, there is a need for guidance on how to design those applications to foster productive and safe use. Based on recent research on human-AI co-creation within the HCI and AI communities, we present a set of seven principles for the design of generative AI applications. These principles are grounded in an environment of generative variability. Six principles are focused on designing for characteristics of generative AI: multiple outcomes & imperfection; exploration & control; and mental models & explanations. In addition, we urge designers to design against potential harms that may be caused by a generative model's hazardous output, misuse, or potential for human displacement. We anticipate these principles to usefully inform design decisions made in the creation of novel human-AI applications, and we invite the community to apply, revise, and extend these principles to their own work.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Better Together? An Evaluation of AI-Supported Code Translation
Authors:
Justin D. Weisz,
Michael Muller,
Steven I. Ross,
Fernando Martinez,
Stephanie Houde,
Mayank Agarwal,
Kartik Talamadupula,
John T. Richards
Abstract:
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful i…
▽ More
Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Investigating Explainability of Generative AI for Code through Scenario-based Design
Authors:
Jiao Sun,
Q. Vera Liao,
Michael Muller,
Mayank Agarwal,
Stephanie Houde,
Kartik Talamadupula,
Justin D. Weisz
Abstract:
What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in hel** people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such a…
▽ More
What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in hel** people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using scenario-based design and question-driven XAI design approaches, we explore users' explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users' explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Using Document Similarity Methods to create Parallel Datasets for Code Translation
Authors:
Mayank Agarwal,
Kartik Talamadupula,
Fernando Martinez,
Stephanie Houde,
Michael Muller,
John Richards,
Steven I Ross,
Justin D. Weisz
Abstract:
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, sup…
▽ More
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis by applying natural language processing techniques towards automating the code translation task. However, due to the paucity of parallel data in this domain, supervised techniques have only been applied to a limited set of popular programming languages. To bypass this limitation, unsupervised neural machine translation techniques have been proposed to learn code translation using only monolingual corpora. In this work, we propose to use document similarity methods to create noisy parallel datasets of code, thus enabling supervised techniques to be applied for automated code translation without having to rely on the availability or expensive curation of parallel code datasets. We explore the noise tolerance of models trained on such automatically-created datasets and show that these models perform comparably to models trained on ground truth for reasonable levels of noise. Finally, we exhibit the practical utility of the proposed method by creating parallel datasets for languages beyond the ones explored in prior work, thus expanding the set of programming languages for automated code translation.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
AI Explainability 360: Impact and Design
Authors:
Vijay Arya,
Rachel K. E. Bellamy,
Pin-Yu Chen,
Amit Dhurandhar,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Q. Vera Liao,
Ronny Luss,
Aleksandra Mojsilovic,
Sami Mourad,
Pablo Pedemonte,
Ramya Raghavendra,
John Richards,
Prasanna Sattigeri,
Karthikeyan Shanmugam,
Moninder Singh,
Kush R. Varshney,
Dennis Wei,
Yunfeng Zhang
Abstract:
As artificial intelligence and machine learning algorithms become increasingly prevalent in society, multiple stakeholders are calling for these algorithms to provide explanations. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, have different explanation needs. To address these needs, in 2019, we created AI Expl…
▽ More
As artificial intelligence and machine learning algorithms become increasingly prevalent in society, multiple stakeholders are calling for these algorithms to provide explanations. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, have different explanation needs. To address these needs, in 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods and two evaluation metrics. This paper examines the impact of the toolkit with several case studies, statistics, and community feedback. The different ways in which users have experienced AI Explainability 360 have resulted in multiple types of impact and improvements in multiple metrics, highlighted by the adoption of the toolkit by the independent LF AI & Data Foundation. The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
ModalPINN: an extension of Physics-Informed Neural Networks with enforced truncated Fourier decomposition for periodic flow reconstruction using a limited number of imperfect sensors
Authors:
Gaetan Raynaud,
Sebastien Houde,
Frederick P. Gosselin
Abstract:
Continuous reconstructions of periodic phenomena provide powerful tools to understand, predict and model natural situations and engineering problems. In line with the recent method called Physics-Informed Neural Networks (PINN) where a multi layer perceptron directly approximates any physical quantity as a symbolic function of time and space coordinates, we present an extension, namely ModalPINN,…
▽ More
Continuous reconstructions of periodic phenomena provide powerful tools to understand, predict and model natural situations and engineering problems. In line with the recent method called Physics-Informed Neural Networks (PINN) where a multi layer perceptron directly approximates any physical quantity as a symbolic function of time and space coordinates, we present an extension, namely ModalPINN, that encodes the approximation of a limited number of Fourier mode shapes. In addition to the added interpretability, this representation performs up to two orders of magnitude more precisely for a similar number of degrees of freedom and training time in some cases as illustrated through the test case of laminar shedding of vortices over a cylinder. This added simplicity proves to be robust in regards to flow reconstruction using only a limited number of sensors with asymmetric data that simulates an experimental configuration, even when a Gaussian noise or a random delay is added, imitating imperfect and sparse information.
△ Less
Submitted 8 April, 2022; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Perfection Not Required? Human-AI Partnerships in Code Translation
Authors:
Justin D. Weisz,
Michael Muller,
Stephanie Houde,
John Richards,
Steven I. Ross,
Fernando Martinez,
Mayank Agarwal,
Kartik Talamadupula
Abstract:
Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, suc…
▽ More
Generative models have become adept at producing artifacts such as images, videos, and prose at human-like levels of proficiency. New generative techniques, such as unsupervised neural machine translation (NMT), have recently been applied to the task of generating source code, translating it from one programming language to another. The artifacts produced in this way may contain imperfections, such as compilation or logical errors. We examine the extent to which software engineers would tolerate such imperfections and explore ways to aid the detection and correction of those errors. Using a design scenario approach, we interviewed 11 software engineers to understand their reactions to the use of an NMT model in the context of application modernization, focusing on the task of translating source code from one language to another. Our three-stage scenario sparked discussions about the utility and desirability of working with an imperfect AI system, how acceptance of that system's outputs would be established, and future opportunities for generative AI in application modernization. Our study highlights how UI features such as confidence highlighting and alternate translations help software engineers work with and better understand generative NMT models.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Quality Estimation & Interpretability for Code Translation
Authors:
Mayank Agarwal,
Kartik Talamadupula,
Stephanie Houde,
Fernando Martinez,
Michael Muller,
John Richards,
Steven Ross,
Justin D. Weisz
Abstract:
Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the transl…
▽ More
Recently, the automated translation of source code from one programming language to another by using automatic approaches inspired by Neural Machine Translation (NMT) methods for natural languages has come under study. However, such approaches suffer from the same problem as previous NMT approaches on natural languages, viz. the lack of an ability to estimate and evaluate the quality of the translations; and consequently ascribe some measure of interpretability to the model's choices. In this paper, we attempt to estimate the quality of source code translations built on top of the TransCoder model. We consider the code translation task as an analog of machine translation (MT) for natural languages, with some added caveats. We present our main motivation from a user study built around code translation; and present a technique that correlates the confidences generated by that model to lint errors in the translated code. We conclude with some observations on these correlations, and some ideas for future work.
△ Less
Submitted 26 April, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Towards evaluating and eliciting high-quality documentation for intelligent systems
Authors:
David Piorkowski,
Daniel González,
John Richards,
Stephanie Houde
Abstract:
A vital component of trust and transparency in intelligent systems built on machine learning and artificial intelligence is the development of clear, understandable documentation. However, such systems are notorious for their complexity and opaqueness making quality documentation a non-trivial task. Furthermore, little is known about what makes such documentation "good." In this paper, we propose…
▽ More
A vital component of trust and transparency in intelligent systems built on machine learning and artificial intelligence is the development of clear, understandable documentation. However, such systems are notorious for their complexity and opaqueness making quality documentation a non-trivial task. Furthermore, little is known about what makes such documentation "good." In this paper, we propose and evaluate a set of quality dimensions to identify in what ways this type of documentation falls short. Then, using those dimensions, we evaluate three different approaches for eliciting intelligent system documentation. We show how the dimensions identify shortcomings in such documentation and posit how such dimensions can be use to further enable users to provide documentation that is suitable to a given persona or use case.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
A Methodology for Creating AI FactSheets
Authors:
John Richards,
David Piorkowski,
Michael Hind,
Stephanie Houde,
Aleksandra Mojsilović
Abstract:
As AI models and services are used in a growing number of highstakes areas, a consensus is forming around the need for a clearer record of how these models and services are developed to increase trust. Several proposals for higher quality and more consistent AI documentation have emerged to address ethical and legal concerns and general social impacts of such systems. However, there is little publ…
▽ More
As AI models and services are used in a growing number of highstakes areas, a consensus is forming around the need for a clearer record of how these models and services are developed to increase trust. Several proposals for higher quality and more consistent AI documentation have emerged to address ethical and legal concerns and general social impacts of such systems. However, there is little published work on how to create this documentation. This is the first work to describe a methodology for creating the form of AI documentation we call FactSheets. We have used this methodology to create useful FactSheets for nearly two dozen models. This paper describes this methodology and shares the insights we have gathered. Within each step of the methodology, we describe the issues to consider and the questions to explore with the relevant people in an organization who will be creating and consuming the AI facts in a FactSheet. This methodology will accelerate the broader adoption of transparent AI documentation.
△ Less
Submitted 27 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Business (mis)Use Cases of Generative AI
Authors:
Stephanie Houde,
Vera Liao,
Jacquelyn Martino,
Michael Muller,
David Piorkowski,
John Richards,
Justin Weisz,
Yunfeng Zhang
Abstract:
Generative AI is a class of machine learning technology that learns to generate new data from training data. While deep fakes and media-and art-related generative AI breakthroughs have recently caught people's attention and imagination, the overall area is in its infancy for business use. Further, little is known about generative AI's potential for malicious misuse at large scale. Using co-creatio…
▽ More
Generative AI is a class of machine learning technology that learns to generate new data from training data. While deep fakes and media-and art-related generative AI breakthroughs have recently caught people's attention and imagination, the overall area is in its infancy for business use. Further, little is known about generative AI's potential for malicious misuse at large scale. Using co-creation design fictions with AI engineers, we explore the plausibility and severity of business misuse cases.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Experiences with Improving the Transparency of AI Models and Services
Authors:
Michael Hind,
Stephanie Houde,
Jacquelyn Martino,
Aleksandra Mojsilovic,
David Piorkowski,
John Richards,
Kush R. Varshney
Abstract:
AI models and services are used in a growing number of highstakes areas, resulting in a need for increased transparency. Consistent with this, several proposals for higher quality and more consistent documentation of AI data, models, and systems have emerged. Little is known, however, about the needs of those who would produce or consume these new forms of documentation. Through semi-structured de…
▽ More
AI models and services are used in a growing number of highstakes areas, resulting in a need for increased transparency. Consistent with this, several proposals for higher quality and more consistent documentation of AI data, models, and systems have emerged. Little is known, however, about the needs of those who would produce or consume these new forms of documentation. Through semi-structured developer interviews, and two document creation exercises, we have assembled a clearer picture of these needs and the various challenges faced in creating accurate and useful AI documentation. Based on the observations from this work, supplemented by feedback received during multiple design explorations and stakeholder conversations, we make recommendations for easing the collection and flexible presentation of AI facts to promote transparency.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques
Authors:
Vijay Arya,
Rachel K. E. Bellamy,
Pin-Yu Chen,
Amit Dhurandhar,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Q. Vera Liao,
Ronny Luss,
Aleksandra Mojsilović,
Sami Mourad,
Pablo Pedemonte,
Ramya Raghavendra,
John Richards,
Prasanna Sattigeri,
Karthikeyan Shanmugam,
Moninder Singh,
Kush R. Varshney,
Dennis Wei,
Yunfeng Zhang
Abstract:
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these need…
▽ More
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (http://aix360.mybluemix.net/), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed.
△ Less
Submitted 14 September, 2019; v1 submitted 6 September, 2019;
originally announced September 2019.
-
AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias
Authors:
Rachel K. E. Bellamy,
Kuntal Dey,
Michael Hind,
Samuel C. Hoffman,
Stephanie Houde,
Kalapriya Kannan,
Pranay Lohia,
Jacquelyn Martino,
Sameep Mehta,
Aleksandra Mojsilovic,
Seema Nagar,
Karthikeyan Natesan Ramamurthy,
John Richards,
Diptikalyan Saha,
Prasanna Sattigeri,
Moninder Singh,
Kush R. Varshney,
Yunfeng Zhang
Abstract:
Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license {https://github.com/ibm/aif360). The main objectives of this…
▽ More
Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license {https://github.com/ibm/aif360). The main objectives of this toolkit are to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms.
The package includes a comprehensive set of fairness metrics for datasets and models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It also includes an interactive Web experience (https://aif360.mybluemix.net) that provides a gentle introduction to the concepts and capabilities for line-of-business users, as well as extensive documentation, usage guidance, and industry-specific tutorials to enable data scientists and practitioners to incorporate the most appropriate tool for their problem into their work products. The architecture of the package has been engineered to conform to a standard paradigm used in data science, thereby further improving usability for practitioners. Such architectural design and abstractions enable researchers and developers to extend the toolkit with their new algorithms and improvements, and to use it for performance benchmarking. A built-in testing infrastructure maintains code quality.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
FactSheets: Increasing Trust in AI Services through Supplier's Declarations of Conformity
Authors:
Matthew Arnold,
Rachel K. E. Bellamy,
Michael Hind,
Stephanie Houde,
Sameep Mehta,
Aleksandra Mojsilovic,
Ravi Nair,
Karthikeyan Natesan Ramamurthy,
Darrell Reimer,
Alexandra Olteanu,
David Piorkowski,
Jason Tsay,
Kush R. Varshney
Abstract:
Accuracy is an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety (which includes fairness and explainability), security, and provenance, are also critical elements to engender consumers' trust in a service. Many industries use transparent, standardized, but often not legally required documents called supplier's declarations…
▽ More
Accuracy is an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety (which includes fairness and explainability), security, and provenance, are also critical elements to engender consumers' trust in a service. Many industries use transparent, standardized, but often not legally required documents called supplier's declarations of conformity (SDoCs) to describe the lineage of a product along with the safety and performance testing it has undergone. SDoCs may be considered multi-dimensional fact sheets that capture and quantify various aspects of the product and its development to make it worthy of consumers' trust. Inspired by this practice, we propose FactSheets to help increase trust in AI services. We envision such documents to contain purpose, performance, safety, security, and provenance information to be completed by AI service providers for examination by consumers. We suggest a comprehensive set of declaration items tailored to AI and provide examples for two fictitious AI services in the appendix of the paper.
△ Less
Submitted 7 February, 2019; v1 submitted 22 August, 2018;
originally announced August 2018.