-
A Lightweight Algorithm for Classifying Ex Vivo Tissues Samples
Authors:
Tzu-Hao Li,
Ethan Murphy,
Allaire Doussan,
Ryan Halter,
Kofi Odame
Abstract:
In this paper, we present a novel algorithm for classifying ex vivo tissue that comprises multi-channel bioimpedance analysis and a hardware neural network. When implemented in a mixed-signal 180 nm CMOS process, the classifier has an estimated power budget of 39 mW and an area of 30 mm2. This means that the classifier can be integrated into the tip of a surgical margin assessment probe, for in vi…
▽ More
In this paper, we present a novel algorithm for classifying ex vivo tissue that comprises multi-channel bioimpedance analysis and a hardware neural network. When implemented in a mixed-signal 180 nm CMOS process, the classifier has an estimated power budget of 39 mW and an area of 30 mm2. This means that the classifier can be integrated into the tip of a surgical margin assessment probe, for in vivo use during radical prostatectomy. We tested our classifier on digital phantoms of prostate tissue and also on an animal model of ex vivo bovine tissue. The classifier achieved an accuracy of 90% on the prostate tissue phantoms, and an accuracy of 84% on the animal model.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Affirmative safety: An approach to risk management for high-risk AI
Authors:
Akash R. Wasil,
Joshua Clymer,
David Krueger,
Emily Dardaman,
Simeon Campos,
Evan R. Murphy
Abstract:
Prominent AI experts have suggested that companies develo** high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities develo** or deploying high-risk AI systems should be required to present evidence of affirmative…
▽ More
Prominent AI experts have suggested that companies develo** high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities develo** or deploying high-risk AI systems should be required to present evidence of affirmative safety: a proactive case that their activities keep risks below acceptable thresholds. We begin the paper by highlighting global security risks from AI that have been acknowledged by AI experts and world governments. Next, we briefly describe principles of risk management from other high-risk fields (e.g., nuclear safety). Then, we propose a risk management approach for advanced AI in which model developers must provide evidence that their activities keep certain risks below regulator-set thresholds. As a first step toward understanding what affirmative safety cases should include, we illustrate how certain kinds of technical evidence and operational evidence can support an affirmative safety case. In the technical section, we discuss behavioral evidence (evidence about model outputs), cognitive evidence (evidence about model internals), and developmental evidence (evidence about the training process). In the operational section, we offer examples of organizational practices that could contribute to affirmative safety cases: information security practices, safety culture, and emergency response capacity. Finally, we briefly compare our approach to the NIST AI Risk Management Framework. Overall, we hope our work contributes to ongoing discussions about national and global security risks posed by AI and regulatory approaches to address these risks.
△ Less
Submitted 14 April, 2024;
originally announced June 2024.
-
On the referential capacity of language models: An internalist rejoinder to Mandelkern & Linzen
Authors:
Giosue Baggio,
Elliot Murphy
Abstract:
In a recent paper, Mandelkern & Linzen (2024) - henceforth M&L - address the question of whether language models' (LMs) words refer. Their argument draws from the externalist tradition in philosophical semantics, which views reference as the capacity of words to "achieve 'word-to-world' connections". In the externalist framework, causally uninterrupted chains of usage, tracing every occurrence of…
▽ More
In a recent paper, Mandelkern & Linzen (2024) - henceforth M&L - address the question of whether language models' (LMs) words refer. Their argument draws from the externalist tradition in philosophical semantics, which views reference as the capacity of words to "achieve 'word-to-world' connections". In the externalist framework, causally uninterrupted chains of usage, tracing every occurrence of a name back to its bearer, guarantee that, for example, 'Peano' refers to the individual Peano (Kripke 1980). This account is externalist both because words pick out referents 'out there' in the world, and because what determines reference are coordinated linguistic actions by members of a community, and not individual mental states. The "central question to ask", for M&L, is whether LMs too belong to human linguistic communities, such that words by LMs may also trace back causally to their bearers. Their answer is a cautious "yes": inputs to LMs are linguistic "forms with particular histories of referential use"; "those histories ground the referents of those forms"; any occurrence of 'Peano' in LM outputs is as causally connected to the individual Peano as any other occurrence of the same proper name in human speech or text; therefore, occurrences of 'Peano' in LM outputs refer to Peano. In this commentary, we first qualify M&L's claim as applying to a narrow class of natural language expressions. Thus qualified, their claim is valid, and we emphasise an additional motivation for that in Section 2. Next, we discuss the actual scope of their claim, and we suggest that the way they formulate it may lead to unwarranted generalisations about reference in LMs. Our critique is likewise applicable to other externalist accounts of LMs (e.g., Lederman & Mahowald 2024; Mollo & Milliere 2023). Lastly, we conclude with a comment on the status of LMs as members of human linguistic communities.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models
Authors:
Anthony M. Barrett,
Krystal Jackson,
Evan R. Murphy,
Nada Madkour,
Jessica Newman
Abstract:
A concern about cutting-edge or "frontier" AI foundation models is that an adversary may use the models for preparing chemical, biological, radiological, nuclear, (CBRN), cyber, or other attacks. At least two methods can identify foundation models with potential dual-use capability; each has advantages and disadvantages: A. Open benchmarks (based on openly available questions and answers), which a…
▽ More
A concern about cutting-edge or "frontier" AI foundation models is that an adversary may use the models for preparing chemical, biological, radiological, nuclear, (CBRN), cyber, or other attacks. At least two methods can identify foundation models with potential dual-use capability; each has advantages and disadvantages: A. Open benchmarks (based on openly available questions and answers), which are low-cost but accuracy-limited by the need to omit security-sensitive details; and B. Closed red team evaluations (based on private evaluation by CBRN and cyber experts), which are higher-cost but can achieve higher accuracy by incorporating sensitive details. We propose a research and risk-management approach using a combination of methods including both open benchmarks and closed red team evaluations, in a way that leverages advantages of both methods. We recommend that one or more groups of researchers with sufficient resources and access to a range of near-frontier and frontier foundation models run a set of foundation models through dual-use capability evaluation benchmarks and red team evaluations, then analyze the resulting sets of models' scores on benchmark and red team evaluations to see how correlated those are. If, as we expect, there is substantial correlation between the dual-use potential benchmark scores and the red team evaluation scores, then implications include the following: The open benchmarks should be used frequently during foundation model development as a quick, low-cost measure of a model's dual-use potential; and if a particular model gets a high score on the dual-use potential benchmark, then more in-depth red team assessments of that model's dual-use capability should be performed. We also discuss limitations and mitigations for our approach, e.g., if model developers try to game benchmarks by including a version of benchmark test data in a model's training data.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
A Comparative Investigation of Compositional Syntax and Semantics in DALL-E 2
Authors:
Elliot Murphy,
Jill de Villiers,
Sofia Lucero Morales
Abstract:
In this study we compared how well DALL-E 2 visually represented the meaning of linguistic prompts also given to young children in comprehension tests. Sentences representing fundamental components of grammatical knowledge were selected from assessment tests used with several hundred English-speaking children aged 2-7 years for whom we had collected original item-level data. DALL-E 2 was given the…
▽ More
In this study we compared how well DALL-E 2 visually represented the meaning of linguistic prompts also given to young children in comprehension tests. Sentences representing fundamental components of grammatical knowledge were selected from assessment tests used with several hundred English-speaking children aged 2-7 years for whom we had collected original item-level data. DALL-E 2 was given these prompts five times to generate 20 cartoons per item, for 9 adult judges to score. Results revealed no conditions in which DALL-E 2-generated images that matched the semantic accuracy of children, even at the youngest age (2 years). DALL-E 2 failed to assign the appropriate roles in reversible forms; it failed on negation despite an easier contrastive prompt than the children received; it often assigned the adjective to the wrong noun; it ignored implicit agents in passives. This work points to a clear absence of compositional sentence representations for DALL-E 2.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
What is a word?
Authors:
Elliot Murphy
Abstract:
In order to design strong paradigms for isolating lexical access and semantics, we need to know what a word is. Surprisingly few linguists and philosophers have a clear model of what a word is, even though words impact basically every aspect of human life. Researchers that regularly publish academic papers about language often rely on outdated, or inaccurate, assumptions about wordhood. This short…
▽ More
In order to design strong paradigms for isolating lexical access and semantics, we need to know what a word is. Surprisingly few linguists and philosophers have a clear model of what a word is, even though words impact basically every aspect of human life. Researchers that regularly publish academic papers about language often rely on outdated, or inaccurate, assumptions about wordhood. This short pedagogical document outlines what the lexicon is most certainly not (though is often mistakenly taken to be), what it might be (based on current good theories), and what some implications for experimental design are.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
The Quo Vadis of the Relationship between Language and Large Language Models
Authors:
Evelina Leivada,
Vittoria Dentella,
Elliot Murphy
Abstract:
In the field of Artificial (General) Intelligence (AI), the several recent advancements in Natural language processing (NLP) activities relying on Large Language Models (LLMs) have come to encourage the adoption of LLMs as scientific models of language. While the terminology employed for the characterization of LLMs favors their embracing as such, it is not clear that they are in a place to offer…
▽ More
In the field of Artificial (General) Intelligence (AI), the several recent advancements in Natural language processing (NLP) activities relying on Large Language Models (LLMs) have come to encourage the adoption of LLMs as scientific models of language. While the terminology employed for the characterization of LLMs favors their embracing as such, it is not clear that they are in a place to offer insights into the target system they seek to represent. After identifying the most important theoretical and empirical risks brought about by the adoption of scientific models that lack transparency, we discuss LLMs relating them to every scientific model's fundamental components: the object, the medium, the meaning and the user. We conclude that, at their current stage of development, LLMs hardly offer any explanations for language, and then we provide an outlook for more informative future research directions on this topic.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Global Synthesis of CNOT Circuits with Holes
Authors:
Ewan Murphy,
Aleks Kissinger
Abstract:
A common approach to quantum circuit transformation is to use the properties of a specific gate set to create an efficient representation of a given circuit's unitary, such as a parity matrix or stabiliser tableau, and then resynthesise an improved circuit, e.g. with fewer gates or respecting connectivity constraints. Since these methods rely on a restricted gate set, generalisation to arbitrary c…
▽ More
A common approach to quantum circuit transformation is to use the properties of a specific gate set to create an efficient representation of a given circuit's unitary, such as a parity matrix or stabiliser tableau, and then resynthesise an improved circuit, e.g. with fewer gates or respecting connectivity constraints. Since these methods rely on a restricted gate set, generalisation to arbitrary circuits usually involves slicing the circuit into pieces that can be resynthesised and working with these separately. The choices made about what gates should go into each slice can have a major effect on the performance of the resynthesis. In this paper we propose an alternative approach to generalising these resynthesis algorithms to general quantum circuits. Instead of cutting the circuit into slices, we "cut out" the gates we can't resynthesise leaving holes in our quantum circuit. The result is a second-order process called a quantum comb, which can be resynthesised directly. We apply this idea to the RowCol algorithm, which resynthesises CNOT circuits for topologically constrained hardware, explaining how we were able to extend it to work for quantum combs. We then compare the generalisation of RowCol using our method to the naive "slice and build" method empirically on a variety of circuit sizes and hardware topologies. Finally, we outline how quantum combs could be used to help generalise other resynthesis algorithms.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Human Language?
Authors:
Gary Marcus,
Evelina Leivada,
Elliot Murphy
Abstract:
Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of large language models have been linked to claims about human-like linguistic performance and their applications are hailed both as a key step towards Artificial General Intelligence and as major advance in understanding the cognitive, and even neural bas…
▽ More
Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of large language models have been linked to claims about human-like linguistic performance and their applications are hailed both as a key step towards Artificial General Intelligence and as major advance in understanding the cognitive, and even neural basis of human language. We analyze the contribution of large language models as theoretically informative representations of a target system vs. atheoretical powerful mechanistic tools, and we identify the key abilities that are still missing from the current state of development and exploitation of these models.
△ Less
Submitted 26 July, 2023;
originally announced August 2023.
-
ROSE: A Neurocomputational Architecture for Syntax
Authors:
Elliot Murphy
Abstract:
A comprehensive model of natural language processing in the brain must accommodate four components: representations, operations, structures and encoding. It further requires a principled account of how these components mechanistically, and causally, relate to each another. While previous models have isolated regions of interest for structure-building and lexical access, many gaps remain with respe…
▽ More
A comprehensive model of natural language processing in the brain must accommodate four components: representations, operations, structures and encoding. It further requires a principled account of how these components mechanistically, and causally, relate to each another. While previous models have isolated regions of interest for structure-building and lexical access, many gaps remain with respect to bridging distinct scales of neural complexity. By expanding existing accounts of how neural oscillations can index various linguistic processes, this article proposes a neurocomputational architecture for syntax, termed the ROSE model (Representation, Operation, Structure, Encoding). Under ROSE, the basic data structures of syntax are atomic features, types of mental representations (R), and are coded at the single-unit and ensemble level. Elementary computations (O) that transform these units into manipulable objects accessible to subsequent structure-building levels are coded via high frequency gamma activity. Low frequency synchronization and cross-frequency coupling code for recursive categorial inferences (S). Distinct forms of low frequency coupling and phase-amplitude coupling (delta-theta coupling via pSTS-IFG; theta-gamma coupling via IFG to conceptual hubs) then encode these structures onto distinct workspaces (E). Causally connecting R to O is spike-phase/LFP coupling; connecting O to S is phase-amplitude coupling; connecting S to E is a system of frontotemporal traveling oscillations; connecting E to lower levels is low-frequency phase resetting of spike-LFP coupling. ROSE is reliant on neurophysiologically plausible mechanisms, is supported at all four levels by a range of recent empirical research, and provides an anatomically precise and falsifiable grounding for the basic property of natural language syntax: hierarchical, recursive structure-building.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning
Authors:
Vittoria Dentella,
Elliot Murphy,
Gary Marcus,
Evelina Leivada
Abstract:
Advances in computational methods and big data availability have recently translated into breakthroughs in AI applications. With successes in bottom-up challenges partially overshadowing shortcomings, the 'human-like' performance of Large Language Models has raised the question of how linguistic performance is achieved by algorithms. Given systematic shortcomings in generalization across many AI s…
▽ More
Advances in computational methods and big data availability have recently translated into breakthroughs in AI applications. With successes in bottom-up challenges partially overshadowing shortcomings, the 'human-like' performance of Large Language Models has raised the question of how linguistic performance is achieved by algorithms. Given systematic shortcomings in generalization across many AI systems, in this work we ask whether linguistic performance is indeed guided by language knowledge in Large Language Models. To this end, we prompt GPT-3 with a grammaticality judgement task and comprehension questions on less frequent constructions that are thus unlikely to form part of Large Language Models' training data. These included grammatical 'illusions', semantic anomalies, complex nested hierarchies and self-embeddings. GPT-3 failed for every prompt but one, often offering answers that show a critical lack of understanding even of high-frequency words used in these less frequent grammatical constructions. The present work sheds light on the boundaries of the alleged AI human-like linguistic competence and argues that, far from human-like, the next-word prediction abilities of LLMs may face issues of robustness, when pushed beyond training data.
△ Less
Submitted 27 February, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Natural Language Syntax Complies with the Free-Energy Principle
Authors:
Elliot Murphy,
Emma Holmes,
Karl Friston
Abstract:
Natural language syntax yields an unbounded array of hierarchically structured expressions. We claim that these are used in the service of active inference in accord with the free-energy principle (FEP). While conceptual advances alongside modelling and simulation work have attempted to connect speech segmentation and linguistic communication with the FEP, we extend this program to the underlying…
▽ More
Natural language syntax yields an unbounded array of hierarchically structured expressions. We claim that these are used in the service of active inference in accord with the free-energy principle (FEP). While conceptual advances alongside modelling and simulation work have attempted to connect speech segmentation and linguistic communication with the FEP, we extend this program to the underlying computations responsible for generating syntactic objects. We argue that recently proposed principles of economy in language design - such as "minimal search" criteria from theoretical syntax - adhere to the FEP. This affords a greater degree of explanatory power to the FEP - with respect to higher language functions - and offers linguistics a grounding in first principles with respect to computability. We show how both tree-geometric depth and a Kolmogorov complexity estimate (recruiting a Lempel-Ziv compression algorithm) can be used to accurately predict legal operations on syntactic workspaces, directly in line with formulations of variational free energy minimization. This is used to motivate a general principle of language design that we term Turing-Chomsky Compression (TCC). We use TCC to align concerns of linguists with the normative account of self-organization furnished by the FEP, by marshalling evidence from theoretical linguistics and psycholinguistics to ground core principles of efficient syntactic computation within active inference.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
DALL-E 2 Fails to Reliably Capture Common Syntactic Processes
Authors:
Evelina Leivada,
Elliot Murphy,
Gary Marcus
Abstract:
Machine intelligence is increasingly being linked to claims about sentience, language processing, and an ability to comprehend and transform natural language into a range of stimuli. We systematically analyze the ability of DALL-E 2 to capture 8 grammatical phenomena pertaining to compositionality that are widely discussed in linguistics and pervasive in human language: binding principles and core…
▽ More
Machine intelligence is increasingly being linked to claims about sentience, language processing, and an ability to comprehend and transform natural language into a range of stimuli. We systematically analyze the ability of DALL-E 2 to capture 8 grammatical phenomena pertaining to compositionality that are widely discussed in linguistics and pervasive in human language: binding principles and coreference, passives, word order, coordination, comparatives, negation, ellipsis, and structural ambiguity. Whereas young children routinely master these phenomena, learning systematic map**s between syntax and semantics, DALL-E 2 is unable to reliably infer meanings that are consistent with the syntax. These results challenge recent claims concerning the capacity of such systems to understand of human language. We make available the full set of test materials as a benchmark for future testing.
△ Less
Submitted 25 October, 2022; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Operations for Autonomous Spacecraft
Authors:
Rebecca Castano,
Tiago Vaquero,
Federico Rossi,
Vandi Verma,
Ellen Van Wyk,
Dan Allard,
Bennett Huffmann,
Erin M. Murphy,
Nihal Dhamani,
Robert A. Hewitt,
Scott Davidoff,
Rashied Amini,
Anthony Barrett,
Julie Castillo-Rogez,
Steve A. Chien,
Mathieu Choukroun,
Alain Dadaian,
Raymond Francis,
Benjamin Gorr,
Mark Hofstadter,
Mitch Ingham,
Cristina Sorice,
Iain Tierney
Abstract:
Onboard autonomy technologies such as planning and scheduling, identification of scientific targets, and content-based data summarization, will lead to exciting new space science missions. However, the challenge of operating missions with such onboard autonomous capabilities has not been studied to a level of detail sufficient for consideration in mission concepts. These autonomy capabilities will…
▽ More
Onboard autonomy technologies such as planning and scheduling, identification of scientific targets, and content-based data summarization, will lead to exciting new space science missions. However, the challenge of operating missions with such onboard autonomous capabilities has not been studied to a level of detail sufficient for consideration in mission concepts. These autonomy capabilities will require changes to current operations processes, practices, and tools. We have developed a case study to assess the changes needed to enable operators and scientists to operate an autonomous spacecraft by facilitating a common model between the ground personnel and the onboard algorithms. We assess the new operations tools and workflows necessary to enable operators and scientists to convey their desired intent to the spacecraft, and to be able to reconstruct and explain the decisions made onboard and the state of the spacecraft. Mock-ups of these tools were used in a user study to understand the effectiveness of the processes and tools in enabling a shared framework of understanding, and in the ability of the operators and scientists to effectively achieve mission science objectives.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
An Analysis of Introductory Programming Courses at UK Universities
Authors:
Ellen Murphy,
Tom Crick,
James H. Davenport
Abstract:
Context: In the context of exploring the art, science and engineering of programming, the question of which programming languages should be taught first has been fiercely debated since computer science teaching started in universities. Failure to grasp programming readily almost certainly implies failure to progress in computer science. Inquiry: What first programming languages are being taught? T…
▽ More
Context: In the context of exploring the art, science and engineering of programming, the question of which programming languages should be taught first has been fiercely debated since computer science teaching started in universities. Failure to grasp programming readily almost certainly implies failure to progress in computer science. Inquiry: What first programming languages are being taught? There have been regular national-scale surveys in Australia and New Zealand, with the only US survey reporting on a small subset of universities. This the first such national survey of universities in the UK. Approach: We report the results of the first survey of introductory programming courses (N=80) taught at UK universities as part of their first year computer science (or related) degree programmes, conducted in the first half of 2016. We report on student numbers, programming paradigm, programming languages and environment/tools used, as well as the underpinning rationale for these choices. Knowledge: The results in this first UK survey indicate a dominance of Java at a time when universities are still generally teaching students who are new to programming (and computer science), despite the fact that Python is perceived, by the same respondents, to be both easier to teach as well as to learn. Grounding: We compare the results of this survey with a related survey conducted since 2010 (as well as earlier surveys from 2001 and 2003) in Australia and New Zealand. Importance: This survey provides a starting point for valuable pedagogic baseline data for the analysis of the art, science and engineering of programming, in the context of substantial computer science curriculum reform in UK schools, as well as increasing scrutiny of teaching excellence and graduate employability for UK universities.
△ Less
Submitted 31 March, 2017; v1 submitted 21 September, 2016;
originally announced September 2016.
-
The Role of Time in the Creation of Knowledge
Authors:
Roy E. Murphy
Abstract:
This paper I assume that in humans the creation of knowledge depends on a discrete time, or stage, sequential decision-making process subjected to a stochastic, information transmitting environment. For each time-stage, this environment randomly transmits Shannon type information-packets to the decision-maker, who examines each of them for relevancy and then determines his optimal choices. Using…
▽ More
This paper I assume that in humans the creation of knowledge depends on a discrete time, or stage, sequential decision-making process subjected to a stochastic, information transmitting environment. For each time-stage, this environment randomly transmits Shannon type information-packets to the decision-maker, who examines each of them for relevancy and then determines his optimal choices. Using this set of relevant information-packets, the decision-maker adapts, over time, to the stochastic nature of his environment, and optimizes the subjective expected rate-of-growth of knowledge. The decision-maker's optimal actions, lead to a decision function that involves, over time, his view of the subjective entropy of the environmental process and other important parameters at each time-stage of the process. Using this model of human behavior, one could create psychometric experiments using computer simulation and real decision-makers, to play programmed games to measure the resulting human performance.
△ Less
Submitted 3 July, 2007;
originally announced July 2007.