Skip to main content

Showing 1–19 of 19 results for author: Atkinson, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01968  [pdf, ps, other

    cs.CY

    Unsettled Law: Time to Generate New Approaches?

    Authors: David Atkinson, Jacob Morrison

    Abstract: We identify several important and unsettled legal questions with profound ethical and societal implications arising from generative artificial intelligence (GenAI), focusing on its distinguishable characteristics from traditional software and earlier AI models. Our key contribution is formally identifying the issues that are unique to GenAI so scholars, practitioners, and others can conduct more u… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 14 pages

  2. arXiv:2406.20086  [pdf, other

    cs.CL cs.LG

    Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

    Authors: Sheridan Feucht, David Atkinson, Byron Wallace, David Bau

    Abstract: LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantical… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 13 pages, 14 figures. Code and data at https://footprints.baulab.info/

    ACM Class: I.2.7

  3. arXiv:2406.18842  [pdf

    cs.CY cs.AI cs.CL

    The global landscape of academic guidelines for generative AI and Large Language Models

    Authors: Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar

    Abstract: The integration of Generative Artificial Intelligence (GAI) and Large Language Models (LLMs) in academia has spurred a global discourse on their potential pedagogical benefits and ethical considerations. Positive reactions highlight some potential, such as collaborative creativity, increased access to education, and empowerment of trainers and trainees. However, negative reactions raise concerns a… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 May, 2024; originally announced June 2024.

  4. arXiv:2404.09479  [pdf, ps, other

    cs.CY

    A Legal Risk Taxonomy for Generative Artificial Intelligence

    Authors: David Atkinson, Jacob Morrison

    Abstract: For the first time, this paper presents a taxonomy of legal risks associated with generative AI (GenAI) by breaking down complex legal concepts to provide a common understanding of potential legal challenges for develo** and deploying GenAI models. The methodology is based on (1) examining the legal claims that have been filed in existing lawsuits and (2) evaluating the reasonably foreseeable le… ▽ More

    Submitted 23 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 29 pages, 2 tables, preprint

  5. arXiv:2404.03646  [pdf, other

    cs.CL

    Locating and Editing Factual Associations in Mamba

    Authors: Arnab Sen Sharma, David Atkinson, David Bau

    Abstract: We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer language models suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experi… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  6. arXiv:2403.05812  [pdf, other

    cs.CL cs.AI

    Algorithmic progress in language models

    Authors: Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla

    Abstract: We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months,… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  7. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  8. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  9. arXiv:2311.10538  [pdf, other

    cs.AI

    Testing Language Model Agents Safely in the Wild

    Authors: Silen Naihin, David Atkinson, Marc Green, Merwane Hamadi, Craig Swift, Douglas Schonholtz, Adam Tauman Kalai, David Bau

    Abstract: A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tes… ▽ More

    Submitted 3 December, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

  10. arXiv:2309.06268  [pdf

    eess.IV cs.LG

    ssVERDICT: Self-Supervised VERDICT-MRI for Enhanced Prostate Tumour Characterisation

    Authors: Snigdha Sen, Saurabh Singh, Hayley Pye, Caroline M. Moore, Hayley Whitaker, Shonit Punwani, David Atkinson, Eleftheria Panagiotaki, Paddy J. Slator

    Abstract: Purpose: Demonstrating and assessing self-supervised machine learning fitting of the VERDICT (Vascular, Extracellular and Restricted DIffusion for Cytometry in Tumours) model for prostate. Methods: We derive a self-supervised neural network for fitting VERDICT (ssVERDICT) that estimates parameter maps without training data. We compare the performance of ssVERDICT to two established baseline method… ▽ More

    Submitted 27 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 12 pages, 5 figures. Submitted to Magnetic Resonance in Medicine

  11. Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation

    Authors: Wen Yan, Bernard Chiu, Ziyi Shen, Qianye Yang, Tom Syer, Zhe Min, Shonit Punwani, Mark Emberton, David Atkinson, Dean C. Barratt, Yipeng Hu

    Abstract: One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant canc… ▽ More

    Submitted 20 January, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: 30 pages, 6 figures

    MSC Class: 68T07

    Journal ref: journal={Medical Image Analysis}, volume={91}, pages={103030}, year={2024}, publisher={Elsevier}

  12. arXiv:2208.07167  [pdf, other

    cs.CV cs.AI

    Where is VALDO? VAscular Lesions Detection and segmentatiOn challenge at MICCAI 2021

    Authors: Carole H. Sudre, Kimberlin Van Wijnen, Florian Dubost, Hieab Adams, David Atkinson, Frederik Barkhof, Mahlet A. Birhanu, Esther E. Bron, Robin Camarasa, Nish Chaturvedi, Yuan Chen, Zihao Chen, Shuai Chen, Qi Dou, Tavia Evans, Ivan Ezhov, Haojun Gao, Marta Girones Sanguesa, Juan Domingo Gispert, Beatriz Gomez Anson, Alun D. Hughes, M. Arfan Ikram, Silvia Ingala, H. Rolf Jaeger, Florian Kofler , et al. (24 additional authors not shown)

    Abstract: Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

  13. Cross-Modality Image Registration using a Training-Time Privileged Third Modality

    Authors: Qianye Yang, David Atkinson, Yunguan Fu, Tom Syer, Wen Yan, Shonit Punwani, Matthew J. Clarkson, Dean C. Barratt, Tom Vercauteren, Yipeng Hu

    Abstract: In this work, we consider the task of pairwise cross-modality image registration, which may benefit from exploiting additional images available only at training time from an additional modality that is different to those being registered. As an example, we focus on aligning intra-subject multiparametric Magnetic Resonance (mpMR) images, between T2-weighted (T2w) scans and diffusion-weighted scans… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE Transactions on Medical Imaging (TMI, 2022)

  14. arXiv:1911.00523  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    What Gets Echoed? Understanding the "Pointers" in Explanations of Persuasive Arguments

    Authors: David Atkinson, Kumar Bhargav Srinivasan, Chenhao Tan

    Abstract: Explanations are central to everyday life, and are a topic of growing interest in the AI community. To investigate the process of providing natural language explanations, we leverage the dynamics of the /r/ChangeMyView subreddit to build a dataset with 36K naturally occurring explanations of why an argument is persuasive. We propose a novel word-level prediction task to investigate how explanation… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

    Comments: 19 pages, 3 figures, EMNLP 2019, the code and dataset are available at https://chenhaot.com/papers/explanation-pointers.html

  15. arXiv:1908.08431  [pdf, other

    eess.IV cs.CV cs.LG physics.med-ph

    Improved MR to CT synthesis for PET/MR attenuation correction using Imitation Learning

    Authors: Kerstin Kläser, Thomas Varsavsky, Pawel Markiewicz, Tom Vercauteren, David Atkinson, Kris Thielemans, Brian Hutton, M Jorge Cardoso, Sebastien Ourselin

    Abstract: The ability to synthesise Computed Tomography images - commonly known as pseudo CT, or pCT - from MRI input data is commonly assessed using an intensity-wise similarity, such as an L2-norm between the ground truth CT and the pCT. However, given that the ultimate purpose is often to use the pCT as an attenuation map ($μ$-map) in Positron Emission Tomography Magnetic Resonance Imaging (PET/MRI), min… ▽ More

    Submitted 27 August, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: Aceppted at SASHIMI2019

  16. arXiv:1808.07431  [pdf, other

    physics.med-ph cs.AI stat.ML

    Deep Boosted Regression for MR to CT Synthesis

    Authors: Kerstin Kläser, Pawel Markiewicz, Marta Ranzini, Wenqi Li, Marc Modat, Brian F Hutton, David Atkinson, Kris Thielemans, M Jorge Cardoso, Sebastien Ourselin

    Abstract: Attenuation correction is an essential requirement of positron emission tomography (PET) image reconstruction to allow for accurate quantification. However, attenuation correction is particularly challenging for PET-MRI as neither PET nor magnetic resonance imaging (MRI) can directly image tissue attenuation properties. MRI-based computed tomography (CT) synthesis has been proposed as an alternati… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

    Comments: Accepted at SASHIMI2018

  17. arXiv:1202.1542  [pdf, ps, other

    math.CO cs.DM

    Pattern classes and priority queues

    Authors: Michael Albert, M. D. Atkinson

    Abstract: When a set of permutations comprising a pattern class C is submitted as input to a priority queue the resulting output is again a pattern class C'. The basis of C' is determined for pattern classes C whose basis elements have length 3, and is finite in these cases. An example is given of a class C with basis 2431 for which C is not finitely based.

    Submitted 7 February, 2012; originally announced February 2012.

    MSC Class: 05A05; 68P05

  18. arXiv:cs/0702097  [pdf, ps, other

    cs.CR cs.MA

    Avoiding bias in cards cryptography

    Authors: M. D. Atkinson, H. P. van Ditmarsch, S. Roehling

    Abstract: We outline the need for stricter requirements for unconditionally secure cryptographic protocols inspired by the Russian Cards problem. A new requirement CA4 is proposed that checks for bias in single card occurrence in announcements consisting of alternatives for players' holdings of cards. This requirement CA4 is shown to be equivalent to an alternative requirement CA5. All announcements found… ▽ More

    Submitted 16 February, 2007; originally announced February 2007.

    Comments: 11 pages

    Journal ref: Australasian Journal of Combinatorics 44:3-17, 2009

  19. arXiv:cs/0209016  [pdf, ps, other

    cs.DM cs.DS math.CO

    Sorting with a forklift

    Authors: M. H. Albert, M. D. Atkinson

    Abstract: A fork stack is a generalised stack which allows pushes and pops of several items at a time. We consider the problem of determining which input streams can be sorted using a single forkstack, or dually, which permutations of a fixed input stream can be produced using a single forkstack. An algorithm is given to solve the sorting problem and the minimal unsortable sequences are found. The results… ▽ More

    Submitted 10 September, 2002; originally announced September 2002.

    Comments: 24 pages, 2 figures

    ACM Class: G.2.1