Search | arXiv e-print repository

Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in Medicine

Authors: Qiao **, Fangyuan Chen, Yiliang Zhou, Ziyang Xu, Justin M. Cheung, Robert Chen, Ronald M. Summers, Justin F. Rousseau, Peiyun Ni, Marc J Landsman, Sally L. Baxter, Subhi J. Al'Aref, Yijia Li, Alex Chen, Josef A. Brejt, Michael F. Chiang, Yifan Peng, Zhiyong Lu

Abstract: Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by… ▽ More Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows. △ Less

Submitted 22 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Under review

arXiv:1811.06526 [pdf, other]

Artificial Intelligence for Interstellar Travel

Authors: Andreas M. Hein, Stephen Baxter

Abstract: The large distances involved in interstellar travel require a high degree of spacecraft autonomy, realized by artificial intelligence. The breadth of tasks artificial intelligence could perform on such spacecraft involves maintenance, data collection, designing and constructing an infrastructure using in-situ resources. Despite its importance, existing publications on artificial intelligence and i… ▽ More The large distances involved in interstellar travel require a high degree of spacecraft autonomy, realized by artificial intelligence. The breadth of tasks artificial intelligence could perform on such spacecraft involves maintenance, data collection, designing and constructing an infrastructure using in-situ resources. Despite its importance, existing publications on artificial intelligence and interstellar travel are limited to cursory descriptions where little detail is given about the nature of the artificial intelligence. This article explores the role of artificial intelligence for interstellar travel by compiling use cases, exploring capabilities, and proposing typologies, system and mission architectures. Estimations for the required intelligence level for specific types of interstellar probes are given, along with potential system and mission architectures, covering those proposed in the literature but also presenting novel ones. Finally, a generic design for interstellar probes with an AI payload is proposed. Given current levels of increase in computational power, a spacecraft with a similar computational power as the human brain would have a mass from dozens to hundreds of tons in a 2050-2060 timeframe. Given that the advent of the first interstellar missions and artificial general intelligence are estimated to be by the mid-21st century, a more in-depth exploration of the relationship between the two should be attempted, focusing on neglected areas such as protecting the artificial intelligence payload from radiation in interstellar space and the role of artificial intelligence in self-replication. △ Less

Submitted 19 November, 2018; v1 submitted 15 November, 2018; originally announced November 2018.

Journal ref: Journal of the British Interplanetary Society 2019

arXiv:1802.02974 [pdf, other]

Putting in All the Stops: Execution Control for JavaScript

Authors: Samuel Baxter, Rachit Nigam, Joe Gibbs Politz, Shriram Krishnamurthi, Arjun Guha

Abstract: Scores of compilers produce JavaScript, enabling programmers to use many languages on the Web, reuse existing code, and even use Web IDEs. Unfortunately, most compilers inherit the browser's compromised execution model, so long-running programs freeze the browser tab, infinite loops crash IDEs, and so on. The few compilers that avoid these problems suffer poor performance and are difficult to engi… ▽ More Scores of compilers produce JavaScript, enabling programmers to use many languages on the Web, reuse existing code, and even use Web IDEs. Unfortunately, most compilers inherit the browser's compromised execution model, so long-running programs freeze the browser tab, infinite loops crash IDEs, and so on. The few compilers that avoid these problems suffer poor performance and are difficult to engineer. This paper presents Stopify, a source-to-source compiler that extends JavaScript with debugging abstractions and blocking operations, and easily integrates with existing compilers. We apply Stopify to 10 programming languages and develop a Web IDE that supports stop**, single-step**, breakpointing, and long-running computations. For nine languages, Stopify requires no or trivial compiler changes. For eight, our IDE is the first that provides these features. Two of our subject languages have compilers with similar features. Stopify's performance is competitive with these compilers and it makes them dramatically simpler. Stopify's abstractions rely on first-class continuations, which it provides by compiling JavaScript to JavaScript. We also identify sub-languages of JavaScript that compilers implicitly use, and exploit these to improve performance. Finally, Stopify needs to repeatedly interrupt and resume program execution. We use a sampling-based technique to estimate program speed that outperforms other systems. △ Less

Submitted 15 April, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

Comments: In proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 2018

arXiv:1511.03629 [pdf, other]

A Continuous Max-Flow Approach to Cyclic Field Reconstruction

Authors: John S. H. Baxter, Jonathan McLeod, Terry M. Peters

Abstract: Reconstruction of an image from noisy data using Markov Random Field theory has been explored by both the graph-cuts and continuous max-flow community in the form of the Potts and Ishikawa models. However, neither model takes into account the particular cyclic topology of specific intensity types such as the hue in natural colour images, or the phase in complex valued MRI. This paper presents \tex… ▽ More Reconstruction of an image from noisy data using Markov Random Field theory has been explored by both the graph-cuts and continuous max-flow community in the form of the Potts and Ishikawa models. However, neither model takes into account the particular cyclic topology of specific intensity types such as the hue in natural colour images, or the phase in complex valued MRI. This paper presents \textit{cyclic continuous max-flow} image reconstruction which models the intensity being reconstructed as having a fundamentally cyclic topology. This model complements the Ishikawa model in that it is designed with image reconstruction in mind, having the topology of the intensity space inherent in the model while being readily extendable to an arbitrary intensity resolution. △ Less

Submitted 11 November, 2015; originally announced November 2015.

Comments: 8 pages, 1 figure

arXiv:1510.04706 [pdf, other]

Shape Complexes in Continuous Max-Flow Hierarchical Multi-Labeling Problems

Authors: John S. H. Baxter, **g Yuan, Terry M. Peters

Abstract: Although topological considerations amongst multiple labels have been previously investigated in the context of continuous max-flow image segmentation, similar investigations have yet to be made about shape considerations in a general and extendable manner. This paper presents shape complexes for segmentation, which capture more complex shapes by combining multiple labels and super-labels constrai… ▽ More Although topological considerations amongst multiple labels have been previously investigated in the context of continuous max-flow image segmentation, similar investigations have yet to be made about shape considerations in a general and extendable manner. This paper presents shape complexes for segmentation, which capture more complex shapes by combining multiple labels and super-labels constrained by geodesic star convexity. Shape complexes combine geodesic star convexity constraints with hierarchical label organization, which together allow for more complex shapes to be represented. This framework avoids the use of co-ordinate system war** techniques to convert shape constraints into topological constraints, which may be ambiguous or ill-defined for certain segmentation problems. △ Less

Submitted 15 October, 2015; originally announced October 2015.

Comments: 9 pages, 1 figure

arXiv:1501.07844 [pdf, ps, other]

A Proximal Bregman Projection Approach to Continuous Max-Flow Problems Using Entropic Distances

Authors: John S. H. Baxter, Martin Rajchl, **g Yuan, Terry M. Peters

Abstract: One issue limiting the adaption of large-scale multi-region segmentation is the sometimes prohibitive memory requirements. This is especially troubling considering advances in massively parallel computing and commercial graphics processing units because of their already limited memory compared to the current random access memory used in more traditional computation. To address this issue in the fi… ▽ More One issue limiting the adaption of large-scale multi-region segmentation is the sometimes prohibitive memory requirements. This is especially troubling considering advances in massively parallel computing and commercial graphics processing units because of their already limited memory compared to the current random access memory used in more traditional computation. To address this issue in the field of continuous max-flow segmentation, we have developed a \textit{pseudo-flow} framework using the theory of Bregman proximal projections and entropic distances which implicitly represents flow variables between labels and designated source and sink nodes. This reduces the memory requirements for max-flow segmentation by approximately 20\% for Potts models and approximately 30\% for hierarchical max-flow (HMF) and directed acyclic graph max-flow (DAGMF) models. This represents a great improvement in the state-of-the-art in max-flow segmentation, allowing for much larger problems to be addressed and accelerated using commercially available graphics processing hardware. △ Less

Submitted 30 January, 2015; originally announced January 2015.

Comments: 10 pages

arXiv:1405.0892 [pdf, ps, other]

A Continuous Max-Flow Approach to Multi-Labeling Problems under Arbitrary Region Regularization

Authors: John S. H. Baxter, Martin Rajchl, **g Yuan, Terry M. Peters

Abstract: The incorporation of region regularization into max-flow segmentation has traditionally focused on ordering and part-whole relationships. A side effect of the development of such models is that it constrained regularization only to those cases, rather than allowing for arbitrary region regularization. Directed Acyclic Graphical Max-Flow (DAGMF) segmentation overcomes these limitations by allowing… ▽ More The incorporation of region regularization into max-flow segmentation has traditionally focused on ordering and part-whole relationships. A side effect of the development of such models is that it constrained regularization only to those cases, rather than allowing for arbitrary region regularization. Directed Acyclic Graphical Max-Flow (DAGMF) segmentation overcomes these limitations by allowing for the algorithm designer to specify an arbitrary directed acyclic graph to structure a max-flow segmentation. This allows for individual 'parts' to be a member of multiple distinct 'wholes.' △ Less

Submitted 5 June, 2014; v1 submitted 5 May, 2014; originally announced May 2014.

Comments: 10 pages, 2 figures, 3 algorithms - v2: Fixed typos / grammatical errors and mathematical errors in the primal/dual formulation. Extended methods for weighted DAGs rather than DAGs with edge multiplicity

arXiv:1404.2571 [pdf, other]

RANCOR: Non-Linear Image Registration with Total Variation Regularization

Authors: Martin Rajchl, John S. H. Baxter, Wu Qiu, Ali R. Khan, Aaron Fenster, Terry M. Peters, **g Yuan

Abstract: Optimization techniques have been widely used in deformable registration, allowing for the incorporation of similarity metrics with regularization mechanisms. These regularization mechanisms are designed to mitigate the effects of trivial solutions to ill-posed registration problems and to otherwise ensure the resulting deformation fields are well-behaved. This paper introduces a novel deformable… ▽ More Optimization techniques have been widely used in deformable registration, allowing for the incorporation of similarity metrics with regularization mechanisms. These regularization mechanisms are designed to mitigate the effects of trivial solutions to ill-posed registration problems and to otherwise ensure the resulting deformation fields are well-behaved. This paper introduces a novel deformable registration algorithm, RANCOR, which uses iterative convexification to address deformable registration problems under total-variation regularization. Initial comparative results against four state-of-the-art registration algorithms are presented using the Internet Brain Segmentation Repository (IBSR) database. △ Less

Submitted 9 April, 2014; originally announced April 2014.

Comments: 9 pages, 1 figure, technical note

arXiv:1404.0336 [pdf, ps, other]

A Continuous Max-Flow Approach to General Hierarchical Multi-Labeling Problems

Authors: John S. H. Baxter, Martin Rajchl, **g Yuan, Terry M. Peters

Abstract: Multi-region segmentation algorithms often have the onus of incorporating complex anatomical knowledge representing spatial or geometric relationships between objects, and general-purpose methods of addressing this knowledge in an optimization-based manner have thus been lacking. This paper presents Generalized Hierarchical Max-Flow (GHMF) segmentation, which captures simple anatomical part-whole… ▽ More Multi-region segmentation algorithms often have the onus of incorporating complex anatomical knowledge representing spatial or geometric relationships between objects, and general-purpose methods of addressing this knowledge in an optimization-based manner have thus been lacking. This paper presents Generalized Hierarchical Max-Flow (GHMF) segmentation, which captures simple anatomical part-whole relationships in the form of an unconstrained hierarchy. Regularization can then be applied to both parts and wholes independently, allowing for spatial grou** and clustering of labels in a globally optimal convex optimization framework. For the purposes of ready integration into a variety of segmentation tasks, the hierarchies can be presented in run-time, allowing for the segmentation problem to be readily specified and alternatives explored without undue programming effort or recompilation. △ Less

Submitted 5 June, 2014; v1 submitted 1 April, 2014; originally announced April 2014.

Comments: 11 pages, 1 figure, 3 algorithms -v2: Fixed typos / grammatical errors

arXiv:nucl-ex/0006011 [pdf, ps, other]

doi 10.1103/PhysRevC.62.034308

Crossing the Dripline to 11N Using Elastic Resonance Scattering

Authors: K. Markenroth, L. Axelsson, S. Baxter, M. J. G. Borge, C. Donzaud, S. Fayans, H. O. U. Fynbo, V. Z. Goldberg, S. Grevy, D. Guillemaud-Mueller, B. Jonson, K. -M. Kallman, S. Leenhardt, M. Lewitowicz, T. Lonnroth, P. Manngard, I. Martel, A. C. Mueller, I. Mukha, T. Nilsson, G. Nyman, N. A. Orr, K. Riisager, G. V. Rogachev, M. -G. Saint-Laurent , et al. (11 additional authors not shown)

Abstract: The level structure of the unbound nucleus 11N has been studied by 10C+p elastic resonance scattering in inverse geometry with the LISE3 spectrometer at GANIL, using a 10C beam with an energy of 9.0 MeV/u. An additional measurement was done at the A1200 spectrometer at MSU. The excitation function above the 10C+p threshold has been determined up to 5 MeV. A potential-model analysis revealed thre… ▽ More The level structure of the unbound nucleus 11N has been studied by 10C+p elastic resonance scattering in inverse geometry with the LISE3 spectrometer at GANIL, using a 10C beam with an energy of 9.0 MeV/u. An additional measurement was done at the A1200 spectrometer at MSU. The excitation function above the 10C+p threshold has been determined up to 5 MeV. A potential-model analysis revealed three resonance states at energies 1.27 (+0.18-0.05) MeV (Gamma=1.44 +-0.2 MeV), 2.01(+0.15-0.05) MeV, (Gamma=0.84 +-$0.2 MeV) and 3.75(+-0.05) MeV, (Gamma=0.60 +-0.05 MeV) with the spin-parity assignments I(pi) =1/2+, 1/2- and 5/2+, respectively. Hence, 11N is shown to have a ground state parity inversion completely analogous to its mirror partner, 11Be. A narrow resonance in the excitation function at 4.33 (+-0.05) MeV was also observed and assigned spin-parity 3/2-. △ Less

Submitted 21 June, 2000; originally announced June 2000.

Comments: 14 pages, 9 figures, twocolumn Accepted for publication in PRC

Journal ref: Phys.Rev. C62 (2000) 034308

Showing 1–10 of 10 results for author: Baxter, S