-
Are We Done with MMLU?
Authors:
Aryo Pradipta Gema,
Joshua Ong Jun Leang,
Giwon Hong,
Alessio Devoto,
Alberto Carlo Maria Mancino,
Rohit Saxena,
Xuanli He,
Yu Zhao,
Xiaotang Du,
Mohammad Reza Ghasemi Madani,
Claire Barale,
Robert McHardy,
Joshua Harris,
Jean Kaddour,
Emile van Krieken,
Pasquale Minervini
Abstract:
Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive fr…
▽ More
Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive framework for identifying dataset errors using a novel error taxonomy. Then, we create MMLU-Redux, which is a subset of 3,000 manually re-annotated questions across 30 MMLU subjects. Using MMLU-Redux, we demonstrate significant discrepancies with the model performance metrics that were originally reported. Our results strongly advocate for revising MMLU's error-ridden questions to enhance its future utility and reliability as a benchmark. Therefore, we open up MMLU-Redux for additional annotation https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux.
△ Less
Submitted 7 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Authors:
Joshua Harris,
Timothy Laurence,
Leo Loman,
Fan Grayson,
Toby Nonnenmacher,
Harry Long,
Loes WalsGriffith,
Amy Douglas,
Holly Fountain,
Stelios Georgiou,
Jo Hardstaff,
Kathryn Hopkins,
Y-Ling Chi,
Galena Kuyumdzhieva,
Lesley Larkin,
Samuel Collins,
Hamish Mohammed,
Thomas Finnie,
Luke Hounsome,
Steven Riley
Abstract:
Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to e…
▽ More
Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to evaluate LLMs for processing text related to: health burden, epidemiological risk factors, and public health interventions. We initially evaluate five open-weight LLMs (7-70 billion parameters) across all tasks using zero-shot in-context learning. We find that Llama-3-70B-Instruct is the highest performing model, achieving the best results on 15/17 tasks (using micro-F1 scores). We see significant variation across tasks with all open-weight LLMs scoring below 60% micro-F1 on some challenging tasks, such as Contact Classification, while all LLMs achieve greater than 80% micro-F1 on others, such as GI Illness Classification. For a subset of 12 tasks, we also evaluate GPT-4 and find comparable results to Llama-3-70B-Instruct, which scores equally or outperforms GPT-4 on 6 of the 12 tasks. Overall, based on these initial results we find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources, and support public health surveillance, research, and interventions.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Which Artificial Intelligences Do People Care About Most? A Conjoint Experiment on Moral Consideration
Authors:
Ali Ladak,
Jamie Harris,
Jacy Reese Anthis
Abstract:
Many studies have identified particular features of artificial intelligences (AI), such as their autonomy and emotion expression, that affect the extent to which they are treated as subjects of moral consideration. However, there has not yet been a comparison of the relative importance of features as is necessary to design and understand increasingly capable, multi-faceted AI systems. We conducted…
▽ More
Many studies have identified particular features of artificial intelligences (AI), such as their autonomy and emotion expression, that affect the extent to which they are treated as subjects of moral consideration. However, there has not yet been a comparison of the relative importance of features as is necessary to design and understand increasingly capable, multi-faceted AI systems. We conducted an online conjoint experiment in which 1,163 participants evaluated descriptions of AIs that varied on these features. All 11 features increased how morally wrong participants considered it to harm the AIs. The largest effects were from human-like physical bodies and prosociality (i.e., emotion expression, emotion recognition, cooperation, and moral judgment). For human-computer interaction designers, the importance of prosociality suggests that, because AIs are often seen as threatening, the highest levels of moral consideration may only be granted if the AI has positive intentions.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Challenges and Applications of Large Language Models
Authors:
Jean Kaddour,
Joshua Harris,
Maximilian Mozes,
Herbie Bradley,
Roberta Raileanu,
Robert McHardy
Abstract:
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current…
▽ More
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
Authors:
Hazem Ibrahim,
Fengyuan Liu,
Rohail Asim,
Balaraju Battu,
Sidahmed Benabderrahmane,
Bashar Alhafni,
Wifag Adnan,
Tuka Alhanai,
Bedoor AlShebli,
Riyadh Baghdadi,
Jocelyn J. Bélanger,
Elena Beretta,
Kemal Celik,
Moumena Chaqfeh,
Mohammed F. Daqaq,
Zaynab El Bernoussi,
Daryl Fougnie,
Borja Garcia de Soto,
Alberto Gandolfi,
Andras Gyorgy,
Nizar Habash,
J. Andrew Harris,
Aaron Kaufman,
Lefteris Kirousis,
Korhan Kocak
, et al. (14 additional authors not shown)
Abstract:
The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artific…
▽ More
The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artificial intelligence (AI). To date, it is unclear how such tools perform compared to students on university-level courses. Further, students' perspectives regarding the use of such tools, and educators' perspectives on treating their use as plagiarism, remain unknown. Here, we compare the performance of ChatGPT against students on 32 university-level courses. We also assess the degree to which its use can be detected by two classifiers designed specifically for this purpose. Additionally, we conduct a survey across five countries, as well as a more in-depth survey at the authors' institution, to discern students' and educators' perceptions of ChatGPT's use. We find that ChatGPT's performance is comparable, if not superior, to that of students in many courses. Moreover, current AI-text classifiers cannot reliably detect ChatGPT's use in school work, due to their propensity to classify human-written answers as AI-generated, as well as the ease with which AI-generated text can be edited to evade detection. Finally, we find an emerging consensus among students to use the tool, and among educators to treat this as plagiarism. Our findings offer insights that could guide policy discussions addressing the integration of AI into educational frameworks.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
GPT-4 Technical Report
Authors:
OpenAI,
Josh Achiam,
Steven Adler,
Sandhini Agarwal,
Lama Ahmad,
Ilge Akkaya,
Florencia Leoni Aleman,
Diogo Almeida,
Janko Altenschmidt,
Sam Altman,
Shyamal Anadkat,
Red Avila,
Igor Babuschkin,
Suchir Balaji,
Valerie Balcom,
Paul Baltescu,
Haiming Bao,
Mohammad Bavarian,
Jeff Belgum,
Irwan Bello,
Jake Berdine,
Gabriel Bernadett-Shapiro,
Christopher Berner,
Lenny Bogdonoff,
Oleg Boiko
, et al. (256 additional authors not shown)
Abstract:
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo…
▽ More
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was develo** infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
△ Less
Submitted 4 March, 2024; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
Authors:
Wael Elwasif,
William Godoy,
Nick Hagerty,
J. Austin Harris,
Oscar Hernandez,
Balint Joo,
Paul Kent,
Damien Lebrun-Grandie,
Elijah Maccarthy,
Veronica G. Melesse Vergara,
Bronson Messer,
Ross Miller,
Sarp Opal,
Sergei Bastrakov,
Michael Bussmann,
Alexander Debus,
Klaus Steinger,
Jan Stephan,
Rene Widera,
Spencer H. Bryngelson,
Henry Le Berre,
Anand Radhakrishnan,
Jefferey Young,
Sunita Chandrasekaran,
Florina Ciorba
, et al. (6 additional authors not shown)
Abstract:
This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The syst…
▽ More
This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The systems are connected together using Infiniband high-bandwidth low-latency interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.
△ Less
Submitted 19 December, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Web3 Challenges and Opportunities for the Market
Authors:
Dan Sheridan,
James Harris,
Frank Wear,
Jerry Cowell Jr,
Easton Wong,
Abbas Yazdinejad
Abstract:
The inability of a computer to think has been a limiter in its usefulness and a point of reassurance for humanity since the first computers were created. The semantic web is the first step toward removing that barrier, enabling computers to operate based on conceptual understanding, and AI and ML are the second. Both semantic knowledge and the ability to learn are fundamental to web3, as are block…
▽ More
The inability of a computer to think has been a limiter in its usefulness and a point of reassurance for humanity since the first computers were created. The semantic web is the first step toward removing that barrier, enabling computers to operate based on conceptual understanding, and AI and ML are the second. Both semantic knowledge and the ability to learn are fundamental to web3, as are blockchain, decentralization, transactional transparency, and ownership. Web3 is the next generational step in the information age, where the web evolves into a more digestible medium for users and machines to browse knowledge. The slow introduction of Web3 across the global software ecosystem will impact the people who enable the current iteration. This evolution of the internet space will expand the way knowledge is shared, consumed, and owned, which will lessen the requirement for a global standard and allow data to interact efficiently, no matter the construction of the knowledge. The heart of this paper understands the: 1) Enablement of Web3 across the digital ecosystem. 2) What a Web3 developer will look like. 3) How this alteration will evolve the market around software and knowledge in general.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Flash-X, a multiphysics simulation software instrument
Authors:
Anshu Dubey,
Klaus Weide,
Jared O'Neal,
Akash Dhruv,
Sean Couch,
J. Austin Harris,
Tom Klosterman,
Rajeev Jain,
Johann Rudi,
Bronson Messer,
Michael Pajkos,
Jared Carlson,
Ran Chu,
Mohamed Wahib,
Saurabh Chawdhary,
Paul M. Ricker,
Dongwook Lee,
Katie Antypas,
Katherine M. Riley,
Christopher Daley,
Murali Ganapathy,
Francis X. Timmes,
Dean M. Townsley,
Marcos Vanella,
John Bachan
, et al. (6 additional authors not shown)
Abstract:
Flash-X is a highly composable multiphysics software system that can be used to simulate physical phenomena in several scientific domains. It derives some of its solvers from FLASH, which was first released in 2000. Flash-X has a new framework that relies on abstractions and asynchronous communications for performance portability across a range of increasingly heterogeneous hardware platforms. Fla…
▽ More
Flash-X is a highly composable multiphysics software system that can be used to simulate physical phenomena in several scientific domains. It derives some of its solvers from FLASH, which was first released in 2000. Flash-X has a new framework that relies on abstractions and asynchronous communications for performance portability across a range of increasingly heterogeneous hardware platforms. Flash-X is meant primarily for solving Eulerian formulations of applications with compressible and/or incompressible reactive flows. It also has a built-in, versatile Lagrangian framework that can be used in many different ways, including implementing tracers, particle-in-cell simulations, and immersed boundary methods.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
The History of AI Rights Research
Authors:
Jamie Harris
Abstract:
This report documents the history of research on AI rights and other moral consideration of artificial entities. It highlights key intellectual influences on this literature as well as research and academic discussion addressing the topic more directly. We find that researchers addressing AI rights have often seemed to be unaware of the work of colleagues whose interests overlap with their own. Ac…
▽ More
This report documents the history of research on AI rights and other moral consideration of artificial entities. It highlights key intellectual influences on this literature as well as research and academic discussion addressing the topic more directly. We find that researchers addressing AI rights have often seemed to be unaware of the work of colleagues whose interests overlap with their own. Academic interest in this topic has grown substantially in recent years; this reflects wider trends in academic research, but it seems that certain influential publications, the gradual, accumulating ubiquity of AI and robotic technology, and relevant news events may all have encouraged increased academic interest in this specific topic. We suggest four levers that, if pulled on in the future, might increase interest further: the adoption of publication strategies similar to those of the most successful previous contributors; increased engagement with adjacent academic fields and debates; the creation of specialized journals, conferences, and research institutions; and more exploration of legal rights for artificial entities.
△ Less
Submitted 27 August, 2022; v1 submitted 6 July, 2022;
originally announced August 2022.
-
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
Authors:
Jonathan Harris,
Mohammed J. Zaki
Abstract:
With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also…
▽ More
With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also bridge the gap between the vast collection of personal health data and the summary generation required to describe an individual's behavioral tendencies. Previous work has focused on rule-based time-series data summarization methods designed to generate natural language summaries of interesting patterns found within temporal personal health data. We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data. We showcase the effectiveness of our models on real user health data logged in MyFitnessPal and show that we can automatically generate high-quality natural language summaries. Our work serves as a first step towards the ambitious goal of automatically generating novel and meaningful temporal summaries from personal health data.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
FC$^3$: Feasibility-Based Control Chain Coordination
Authors:
Jason Harris,
Danny Driess,
Marc Toussaint
Abstract:
Hierarchical coordination of controllers often uses symbolic state representations that fully abstract their underlying low-level controllers, treating them as "black boxes" to the symbolic action abstraction. This paper proposes a framework to realize robust behavior, which we call Feasibility-based Control Chain Coordination (FC$^3$). Our controllers expose the geometric features and constraints…
▽ More
Hierarchical coordination of controllers often uses symbolic state representations that fully abstract their underlying low-level controllers, treating them as "black boxes" to the symbolic action abstraction. This paper proposes a framework to realize robust behavior, which we call Feasibility-based Control Chain Coordination (FC$^3$). Our controllers expose the geometric features and constraints they operate on. Based on this, FC$^3$ can reason over the controllers' feasibility and their sequence feasibility. For a given task, FC$^3$ first automatically constructs a library of potential controller chains using a symbolic action tree, which is then used to coordinate controllers in a chain, evaluate task feasibility, as well as switching between controller chains if necessary. In several real-world experiments we demonstrate FC$^3$'s robustness and awareness of the task's feasibility through its own actions and gradual responses to different interferences.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Sequence-of-Constraints MPC: Reactive Timing-Optimal Control of Sequential Manipulation
Authors:
Marc Toussaint,
Jason Harris,
Jung-Su Ha,
Danny Driess,
Wolfgang Hönig
Abstract:
Task and Motion Planning has made great progress in solving hard sequential manipulation problems. However, a gap between such planning formulations and control methods for reactive execution remains. In this paper we propose a model predictive control approach dedicated to robustly execute a single sequence of constraints, which corresponds to a discrete decision sequence of a TAMP plan. We decom…
▽ More
Task and Motion Planning has made great progress in solving hard sequential manipulation problems. However, a gap between such planning formulations and control methods for reactive execution remains. In this paper we propose a model predictive control approach dedicated to robustly execute a single sequence of constraints, which corresponds to a discrete decision sequence of a TAMP plan. We decompose the overall control problem into three sub-problems (solving for sequential waypoints, their timing, and a short receding horizon path) that each is a non-linear program solved online in each MPC cycle. The resulting control strategy can account for long-term interdependencies of constraints and reactively plan for a timing-optimal transition through all constraints. We additionally propose phase backtracking when running constraints of the current phase cannot be fulfilled, leading to a fluent re-initiation behavior that is robust to perturbations and interferences by an experimenter.
△ Less
Submitted 22 September, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
RBO Hand 3 -- A Platform for Soft Dexterous Manipulation
Authors:
Steffen Puhlmann,
Jason Harris,
Oliver Brock
Abstract:
We present the RBO Hand 3, a highly capable and versatile anthropomorphic soft hand based on pneumatic actuation. The RBO Hand 3 is designed to enable dexterous manipulation, to facilitate transfer of insights about human dexterity, and to serve as a robust research platform for extensive real-world experiments. It achieves these design goals by combining many degrees of actuation with intrinsic c…
▽ More
We present the RBO Hand 3, a highly capable and versatile anthropomorphic soft hand based on pneumatic actuation. The RBO Hand 3 is designed to enable dexterous manipulation, to facilitate transfer of insights about human dexterity, and to serve as a robust research platform for extensive real-world experiments. It achieves these design goals by combining many degrees of actuation with intrinsic compliance, replicating relevant functioning of the human hand, and by combining robust components in a modular design. The RBO Hand 3 possesses 16 independent degrees of actuation, implemented in a dexterous opposable thumb, two-chambered fingers, an actuated palm, and the ability to spread the fingers. In this work, we derive the design objectives that are based on experimentation with the hand's predecessors, observations about human gras**, and insights about principles of dexterity. We explain in detail how the design features of the RBO Hand 3 achieve these goals and evaluate the hand by demonstrating its ability to achieve the highest possible score in the Kapandji test for thumb opposition, to realize all 33 grasp types of the comprehensive GRASP taxonomy, to replicate common human gras** strategies, and to perform dexterous in-hand manipulation.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
Personal Health Knowledge Graph for Clinically Relevant Diet Recommendations
Authors:
Oshani Seneviratne,
Jonathan Harris,
Ching-Hua Chen,
Deborah L. McGuinness
Abstract:
We propose a knowledge model for capturing dietary preferences and personal context to provide personalized dietary recommendations. We develop a knowledge model called the Personal Health Ontology, which is grounded in semantic technologies, and represents a patient's combined medical information, social determinants of health, and observations of daily living elicited from interviews with diabet…
▽ More
We propose a knowledge model for capturing dietary preferences and personal context to provide personalized dietary recommendations. We develop a knowledge model called the Personal Health Ontology, which is grounded in semantic technologies, and represents a patient's combined medical information, social determinants of health, and observations of daily living elicited from interviews with diabetic patients. We then generate a personal health knowledge graph that captures temporal patterns from synthetic food logs, annotated with concepts from the Personal Health Ontology. We further discuss how lifestyle guidelines grounded in semantic technologies can be reasoned with the generated personal health knowledge graph to provide appropriate dietary recommendations that satisfy the user's medical and other lifestyle needs.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
The Moral Consideration of Artificial Entities: A Literature Review
Authors:
Jamie Harris,
Jacy Reese Anthis
Abstract:
Ethicists, policy-makers, and the general public have questioned whether artificial entities such as robots warrant rights or other forms of moral consideration. There is little synthesis of the research on this topic so far. We identify 294 relevant research or discussion items in our literature review of this topic. There is widespread agreement among scholars that some artificial entities could…
▽ More
Ethicists, policy-makers, and the general public have questioned whether artificial entities such as robots warrant rights or other forms of moral consideration. There is little synthesis of the research on this topic so far. We identify 294 relevant research or discussion items in our literature review of this topic. There is widespread agreement among scholars that some artificial entities could warrant moral consideration in the future, if not also the present. The reasoning varies, such as concern for the effects on artificial entities and concern for the effects on human society. Beyond the conventional consequentialist, deontological, and virtue ethicist ethical frameworks, some scholars encourage "information ethics" and "social-relational" approaches, though there are opportunities for more in-depth ethical research on the nuances of moral consideration of artificial entities. There is limited relevant empirical data collection, primarily in a few psychological studies on current moral and social attitudes of humans towards robots and other artificial entities. This suggests an important gap for social science research on how artificial entities will be integrated into society and the factors that will determine how the interests of sentient artificial entities are considered.
△ Less
Submitted 26 January, 2021;
originally announced February 2021.
-
Analysis of Models for Decentralized and Collaborative AI on Blockchain
Authors:
Justin D. Harris
Abstract:
Machine learning has recently enabled large advances in artificial intelligence, but these results can be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and maintain them. Published proposals to provide models and data for free for cer…
▽ More
Machine learning has recently enabled large advances in artificial intelligence, but these results can be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and maintain them. Published proposals to provide models and data for free for certain tasks include Microsoft Research's Decentralized and Collaborative AI on Blockchain. The framework allows participants to collaboratively build a dataset and use smart contracts to share a continuously updated model on a public blockchain. The initial proposal gave an overview of the framework omitting many details of the models used and the incentive mechanisms in real world scenarios. In this work, we evaluate the use of several models and configurations in order to propose best practices when using the Self-Assessment incentive mechanism so that models can remain accurate and well-intended participants that submit correct data have the chance to profit. We have analyzed simulations for each of three models: Perceptron, Naïve Bayes, and a Nearest Centroid Classifier, with three different datasets: predicting a sport with user activity from Endomondo, sentiment analysis on movie reviews from IMDB, and determining if a news article is fake. We compare several factors for each dataset when models are hosted in smart contracts on a public blockchain: their accuracy over time, balances of a good and bad user, and transaction costs (or gas) for deploying, updating, collecting refunds, and collecting rewards. A free and open source implementation for the Ethereum blockchain and simulations written in Python is provided at https://github.com/microsoft/0xDeCA10B. This version has updated gas costs using newer optimizations written after the original publication.
△ Less
Submitted 21 September, 2020; v1 submitted 14 September, 2020;
originally announced September 2020.
-
Flood & Loot: A Systemic Attack On The Lightning Network
Authors:
Jona Harris,
Aviv Zohar
Abstract:
The Lightning Network promises to alleviate Bitcoin's known scalability problems. The operation of such second layer approaches relies on the ability of participants to turn to the blockchain to claim funds at any time, which is assumed to happen rarely. One of the risks that was identified early on is that of a wide systemic attack on the protocol, in which an attacker triggers the closure of man…
▽ More
The Lightning Network promises to alleviate Bitcoin's known scalability problems. The operation of such second layer approaches relies on the ability of participants to turn to the blockchain to claim funds at any time, which is assumed to happen rarely. One of the risks that was identified early on is that of a wide systemic attack on the protocol, in which an attacker triggers the closure of many Lightning channels at once. The resulting high volume of transactions in the blockchain will not allow for the proper settlement of all debts, and attackers may get away with stealing some funds. This paper explores the details of such an attack and evaluates its cost and overall impact on Bitcoin and the Lightning Network. Specifically, we show that an attacker is able to simultaneously cause victim nodes to overload the Bitcoin blockchain with requests and to steal funds that were locked in channels. We go on to examine the interaction of Lightning nodes with the fee estimation mechanism and show that the attacker can continuously lower the fee of transactions that will later be used by the victim in its attempts to recover funds - eventually reaching a state in which only low fractions of the block are available for lightning transactions. Our attack is made easier even further as the Lightning protocol allows the attacker to increase the fee offered by his own transactions. We continue to empirically show that the vast majority of nodes agree to channel opening requests from unknown sources and are therefore susceptible to this attack. We highlight differences between various implementations of the Lightning Network protocol and review the susceptibility of each one to the attack. Finally, we propose mitigation strategies to lower the systemic attack risk of the network.
△ Less
Submitted 27 August, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
A Framework for Generating Explanations from Temporal Personal Health Data
Authors:
Jonathan J. Harris,
Ching-Hua Chen,
Mohammed J. Zaki
Abstract:
Whereas it has become easier for individuals to track their personal health data (e.g., heart rate, step count, food log), there is still a wide chasm between the collection of data and the generation of meaningful explanations to help users better understand what their data means to them. With an increased comprehension of their data, users will be able to act upon the newfound information and wo…
▽ More
Whereas it has become easier for individuals to track their personal health data (e.g., heart rate, step count, food log), there is still a wide chasm between the collection of data and the generation of meaningful explanations to help users better understand what their data means to them. With an increased comprehension of their data, users will be able to act upon the newfound information and work towards striving closer to their health goals. We aim to bridge the gap between data collection and explanation generation by mining the data for interesting behavioral findings that may provide hints about a user's tendencies. Our focus is on improving the explainability of temporal personal health data via a set of informative summary templates, or "protoforms." These protoforms span both evaluation-based summaries that help users evaluate their health goals and pattern-based summaries that explain their implicit behaviors. In addition to individual users, the protoforms we use are also designed for population-level summaries. We apply our approach to generate summaries (both univariate and multivariate) from real user data and show that our system can generate interesting and useful explanations.
△ Less
Submitted 9 March, 2021; v1 submitted 20 March, 2020;
originally announced March 2020.
-
Improving Neural Question Generation using World Knowledge
Authors:
Deepak Gupta,
Kaheer Suleman,
Mahmoud Adada,
Andrew McNamara,
Justin Harris
Abstract:
In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world…
▽ More
In this paper, we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. We evaluate our models on both SQuAD and MS MARCO to demonstrate the usefulness of the world knowledge features. The proposed world knowledge enriched question generation model is able to outperform the vanilla neural question generation model by 1.37 and 1.59 absolute BLEU 4 score on SQuAD and MS MARCO test dataset respectively.
△ Less
Submitted 10 September, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Decentralized & Collaborative AI on Blockchain
Authors:
Justin D. Harris,
Bo Waggoner
Abstract:
Machine learning has recently enabled large advances in artificial intelligence, but these tend to be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and re-train them. We propose a framework for participants to collaboratively build a…
▽ More
Machine learning has recently enabled large advances in artificial intelligence, but these tend to be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and re-train them. We propose a framework for participants to collaboratively build a dataset and use smart contracts to host a continuously updated model. This model will be shared publicly on a blockchain where it can be free to use for inference. Ideal learning problems include scenarios where a model is used many times for similar input such as personal assistants, playing games, recommender systems, etc. In order to maintain the model's accuracy with respect to some test set we propose both financial and non-financial (gamified) incentive structures for providing good data. A free and open source implementation for the Ethereum blockchain is provided at https://github.com/microsoft/0xDeCA10B.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Open Chemistry: RESTful Web APIs, JSON, NWChem and the Modern Web Application
Authors:
Marcus D. Hanwell,
Wibe A. de Jong,
Christopher J. Harris
Abstract:
An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers a highly interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more…
▽ More
An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers a highly interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing (HPC) resources, and offer command-line tools to automate interaction---connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platform with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines.
The platform is designed to enable teams to reap the benefits of the connected web---going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript web client.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems
Authors:
Layla El Asri,
Hannes Schulz,
Shikhar Sharma,
Jeremie Zumer,
Justin Harris,
Emery Fine,
Rahul Mehrotra,
Kaheer Suleman
Abstract:
This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tra…
▽ More
This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation.
△ Less
Submitted 13 April, 2017; v1 submitted 31 March, 2017;
originally announced April 2017.
-
NewsQA: A Machine Comprehension Dataset
Authors:
Adam Trischler,
Tong Wang,
Xingdi Yuan,
Justin Harris,
Alessandro Sordoni,
Philip Bachman,
Kaheer Suleman
Abstract:
We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reas…
▽ More
We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.
△ Less
Submitted 7 February, 2017; v1 submitted 29 November, 2016;
originally announced November 2016.