Skip to main content

Showing 1–50 of 91 results for author: Gray, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17029  [pdf, other

    eess.IV cs.CV

    Multi-view Disparity Estimation Using a Novel Gradient Consistency Model

    Authors: James L. Gray, Aous T. Naman, David S. Taubman

    Abstract: Variational approaches to disparity estimation typically use a linearised brightness constancy constraint, which only applies in smooth regions and over small distances. Accordingly, current variational approaches rely on a schedule to progressively include image data. This paper proposes the use of Gradient Consistency information to assess the validity of the linearisation; this information is u… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 11 pages, 11 figures. Submitted to Transactions on Image Processing

  2. arXiv:2310.16113  [pdf

    cs.LG q-bio.GN q-bio.NC

    Compressed representation of brain genetic transcription

    Authors: James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev

    Abstract: The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use st… ▽ More

    Submitted 20 June, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 22 pages, 5 main figures, 1 supplementary figure

  3. arXiv:2309.07096  [pdf

    q-bio.NC cs.CV eess.IV

    Computational limits to the legibility of the imaged human brain

    Authors: James K Ruffle, Robert J Gray, Samia Mohinta, Guilherme Pombo, Chaitanya Kaul, Harpreet Hyare, Geraint Rees, Parashkev Nachev

    Abstract: Our knowledge of the organisation of the human brain at the population-level is yet to translate into power to predict functional differences at the individual-level, limiting clinical applications, and casting doubt on the generalisability of inferred mechanisms. It remains unknown whether the difficulty arises from the absence of individuating biological patterns within the brain, or from limite… ▽ More

    Submitted 2 April, 2024; v1 submitted 23 August, 2023; originally announced September 2023.

    Comments: 38 pages, 6 figures, 1 table, 2 supplementary figures, 1 supplementary table

  4. arXiv:2308.07039  [pdf

    cs.CV cs.AI q-bio.NC

    The minimal computational substrate of fluid intelligence

    Authors: Amy PK Nelson, Joe Mole, Guilherme Pombo, Robert J Gray, James K Ruffle, Edgar Chan, Geraint E Rees, Lisa Cipolotti, Parashkev Nachev

    Abstract: The quantification of cognitive powers rests on identifying a behavioural task that depends on them. Such dependence cannot be assured, for the powers a task invokes cannot be experimentally controlled or constrained a priori, resulting in unknown vulnerability to failure of specificity and generalisability. Evaluating a compact version of Raven's Advanced Progressive Matrices (RAPM), a widely use… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: 26 pages, 5 figures

  5. arXiv:2306.15004  [pdf, other

    quant-ph cond-mat.dis-nn cond-mat.stat-mech cs.CC physics.comp-ph

    One-step replica symmetry breaking in the language of tensor networks

    Authors: Nicola Pancotti, Johnnie Gray

    Abstract: We develop an exact map** between the one-step replica symmetry breaking cavity method and tensor networks. The two schemes come with complementary mathematical and numerical toolboxes that could be leveraged to improve the respective states of the art. As an example, we construct a tensor-network representation of Survey Propagation, one of the best deterministic k-SAT solvers. The resulting al… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  6. arXiv:2303.00823  [pdf, other

    physics.plasm-ph cs.LG physics.acc-ph physics.comp-ph

    Automated control and optimisation of laser driven ion acceleration

    Authors: B. Loughran, M. J. V. Streeter, H. Ahmed, S. Astbury, M. Balcazar, M. Borghesi, N. Bourgeois, C. B. Curry, S. J. D. Dann, S. DiIorio, N. P. Dover, T. Dzelzanis, O. C. Ettlinger, M. Gauthier, L. Giuffrida, G. D. Glenn, S. H. Glenzer, J. S. Green, R. J. Gray, G. S. Hicks, C. Hyland, V. Istokskaia, M. King, D. Margarone, O. McCusker , et al. (10 additional authors not shown)

    Abstract: The interaction of relativistically intense lasers with opaque targets represents a highly non-linear, multi-dimensional parameter space. This limits the utility of sequential 1D scanning of experimental parameters for the optimisation of secondary radiation, although to-date this has been the accepted methodology due to low data acquisition rates. High repetition-rate (HRR) lasers augmented by ma… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 11 pages

  7. arXiv:2210.05492  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

    Authors: Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown

    Abstract: No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  8. arXiv:2209.03042  [pdf, other

    hep-ex astro-ph.IM cs.LG physics.data-an physics.ins-det

    Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

    Authors: R. Abbasi, M. Ackermann, J. Adams, N. Aggarwal, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, K. -H. Becker , et al. (359 additional authors not shown)

    Abstract: IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challen… ▽ More

    Submitted 11 October, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: Prepared for submission to JINST

  9. Archangel: A Hybrid UAV-based Human Detection Benchmark with Position and Pose Metadata

    Authors: Yi-Ting Shen, Yaesop Lee, Heesung Kwon, Damon M. Conover, Shuvra S. Bhattacharyya, Nikolas Vale, Joshua D. Gray, G. Jeremy Leong, Kenneth Evensen, Frank Skirlo

    Abstract: Learning to detect objects, such as humans, in imagery captured by an unmanned aerial vehicle (UAV) usually suffers from tremendous variations caused by the UAV's position towards the objects. In addition, existing UAV-based benchmark datasets do not provide adequate dataset metadata, which is essential for precise model diagnosis and learning features invariant to those variations. In this paper,… ▽ More

    Submitted 8 August, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: IEEE Access

    Journal ref: IEEE Access, vol. 11, pp. 80958-80972, 2023

  10. arXiv:2206.06120  [pdf

    cs.CV cs.AI q-bio.TO

    Brain tumour segmentation with incomplete imaging data

    Authors: James K Ruffle, Samia Mohinta, Robert J Gray, Harpreet Hyare, Parashkev Nachev

    Abstract: The complex heterogeneity of brain tumours is increasingly recognized to demand data of magnitudes and richness only fully-inclusive, large-scale collections drawn from routine clinical care could plausibly offer. This is a task contemporary machine learning could facilitate, especially in neuroimaging, but its ability to deal with incomplete data common in real world clinical practice remains unk… ▽ More

    Submitted 22 February, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: 26 pages, 8 figures, 4 supplementary tables

  11. arXiv:2203.06566  [pdf, other

    cs.HC

    PromptChainer: Chaining Large Language Model Prompts through Visual Programming

    Authors: Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, Carrie J Cai

    Abstract: While LLMs can effectively help prototype single ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found that chaining multiple LLM runs together (with the output of one step being the input to the next) can help users accomplish these more complex tasks, and in a way that is perceived to be more transpa… ▽ More

    Submitted 12 March, 2022; originally announced March 2022.

    Comments: CHI LBW 2022

  12. arXiv:2112.07782  [pdf, other

    q-bio.BM cs.LG

    Deciphering antibody affinity maturation with language models and weakly supervised learning

    Authors: Jeffrey A. Ruffolo, Jeffrey J. Gray, Jeremias Sulam

    Abstract: In response to pathogens, the adaptive immune system generates specific antibodies that bind and neutralize foreign antigens. Understanding the composition of an individual's immune repertoire can provide insights into this process and reveal potential therapeutic antibodies. In this work, we explore the application of antibody-specific language models to aid understanding of immune repertoires. W… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Comments: Presented at Machine Learning for Structural Biology Workshop, NeurIPS 2021

  13. arXiv:2111.13280  [pdf, other

    cs.CV

    Efficient Self-Ensemble for Semantic Segmentation

    Authors: Walid Bousselham, Guillaume Thibault, Lucas Pagano, Archana Machireddy, Joe Gray, Young Hwan Chang, Xubo Song

    Abstract: Ensemble of predictions is known to perform better than individual predictions taken separately. However, for tasks that require heavy computational resources, e.g. semantic segmentation, creating an ensemble of learners that needs to be trained separately is hardly tractable. In this work, we propose to leverage the performance boost offered by ensemble methods to enhance the semantic segmentatio… ▽ More

    Submitted 22 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Code available at https://github.com/WalBouss/SenFormer

  14. arXiv:2110.08904  [pdf

    cs.DL cs.LG

    Deep forecasting of translational impact in medical research

    Authors: Amy PK Nelson, Robert J Gray, James K Ruffle, Henry C Watkins, Daniel Herron, Nick Sorros, Danil Mikhailov, M. Jorge Cardoso, Sebastien Ourselin, Nick McNally, Bryan Williams, Geraint E. Rees, Parashkev Nachev

    Abstract: The value of biomedical research--a $1.7 trillion annual investment--is ultimately determined by its downstream, real-world impact. Current objective predictors of impact rest on proxy, reductive metrics of dissemination, such as paper citation rates, whose relation to real-world translation remains unquantified. Here we sought to determine the comparative predictability of future real-world trans… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 28 pages, 6 figures

  15. Welsch Based Multiview Disparity Estimation

    Authors: James L. Gray, Aous T. Naman, David S. Taubman

    Abstract: In this work, we explore disparity estimation from a high number of views. We experimentally identify occlusions as a key challenge for disparity estimation for applications with high numbers of views. In particular, occlusions can actually result in a degradation in accuracy as more views are added to a dataset. We propose the use of a Welsch loss function for the data term in a global variationa… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

    Comments: Published in 2021 IEEE International Conference on Image Processing (ICIP), 5 pages

    Journal ref: 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 3223-3227,

  16. arXiv:2106.13367  [pdf, other

    cs.AI cs.DB

    SeaNet -- Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks

    Authors: Qianru Zhou, Alasdair J. G. Gray, Stephen McLaughlin

    Abstract: Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network manage… ▽ More

    Submitted 27 May, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

  17. arXiv:2105.04408  [pdf

    cs.RO cs.AI

    The Challenges and Opportunities of Human-Centered AI for Trustworthy Robots and Autonomous Systems

    Authors: Hongmei He, John Gray, Angelo Cangelosi, Qinggang Meng, T. Martin McGinnity, Jörn Mehnen

    Abstract: The trustworthiness of Robots and Autonomous Systems (RAS) has gained a prominent position on many research agendas towards fully autonomous systems. This research systematically explores, for the first time, the key facets of human-centered AI (HAI) for trustworthy RAS. In this article, five key properties of a trustworthy RAS initially have been identified. RAS must be (i) safe in any uncertain… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

    Comments: 15 pages, 4 figures

    MSC Class: 68T05; 68T40 ACM Class: I.2

  18. Designing Building Blocks for Open-Ended Early Literacy Software

    Authors: Ivan Sysoev, James H. Gray, Susan Fine, Deb Roy

    Abstract: English has a convoluted relationship between its pronunciation and spelling, which obscures its phonological structure for early literacy learners. This convoluted relationship has implications for early literacy software, particularly for open-ended, child-driven designs. A tempting way to bypass this issue is to use manipulables (blocks) that are directly tied to phonemes. However, creating pho… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: This is a published manuscript for the paper published in the International Journal of Child-Computer Interaction. Sharing on ArXiv is in accordance with Elsevier sharing policy

    Journal ref: International Journal of Child-Computer Interaction. Article 100273 (2021)

  19. arXiv:2101.06124  [pdf, other

    cs.CR

    Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

    Authors: Jason Gray, Daniele Sgandurra, Lorenzo Cavallaro

    Abstract: Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art meth… ▽ More

    Submitted 18 January, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: 31 pages, 3 figures, 10 tables; Modified table headings to make them readable

  20. SARA -- A Semantic Access Point Resource Allocation Service for Heterogenous Wireless Networks

    Authors: Qianru Zhou, Alasdair J. G. Gray, Dimitrios Pezaros, Stephen McLaughlin

    Abstract: In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access point providing the best quality of service among t… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: 2019 IEEE Wireless Day

  21. arXiv:2010.02923  [pdf, other

    cs.AI cs.GT cs.LG

    Human-Level Performance in No-Press Diplomacy via Equilibrium Search

    Authors: Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown

    Abstract: Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings. In contrast, Diplomacy is a game of shifting alliances that involves both cooperation and competition. For this reason, Diplomacy has proven to be a formidable research challenge. In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised lear… ▽ More

    Submitted 3 May, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

  22. Efficient Quantum State Sample Tomography with Basis-dependent Neural-networks

    Authors: Alistair W. R. Smith, Johnnie Gray, M. S. Kim

    Abstract: We use a meta-learning neural-network approach to analyse data from a measured quantum state. Once our neural network has been trained it can be used to efficiently sample measurements of the state in measurement bases not contained in the training data. These samples can be used calculate expectation values and other useful quantities. We refer to this process as "state sample tomography". We enc… ▽ More

    Submitted 10 June, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Journal ref: PRX Quantum 2, 020348 (2021)

  23. COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model

    Authors: **gqi Wang, Noor Abu-el-rub, Josh Gray, Huy Anh Pham, Yujia Zhou, Frank Manion, Mei Liu, Xing Song, Hua Xu, Masoud Rouhizadeh, Yaoyun Zhang

    Abstract: The COVID-19 pandemic swept across the world rapidly, infecting millions of people. An efficient tool that can accurately recognize important clinical concepts of COVID-19 from free text in electronic health records (EHRs) will be valuable to accelerate COVID-19 clinical research. To this end, this study aims at adapting the existing CLAMP natural language processing tool to quickly build COVID-19… ▽ More

    Submitted 7 April, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Journal of the American Medical Informatics Association, 2021

  24. arXiv:2007.08383  [pdf, other

    q-bio.BM cs.LG

    Deep Learning in Protein Structural Modeling and Design

    Authors: Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, Jeffrey J. Gray

    Abstract: Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a pr… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  25. arXiv:2003.08567  [pdf, other

    cs.CR cs.CY cs.DC

    Apps Gone Rogue: Maintaining Personal Privacy in an Epidemic

    Authors: Ramesh Raskar, Isabel Schunemann, Rachel Barbar, Kristen Vilcans, Jim Gray, Praneeth Vepakomma, Suraj Kapa, Andrea Nuzzo, Rajiv Gupta, Alex Berke, Dazza Greenwood, Christian Keegan, Shriank Kanaparti, Robson Beaudry, David Stansbury, Beatriz Botero Arcila, Rishank Kanaparti, Vitor Pamplona, Francesco M Benedetti, Alina Clough, Riddhiman Das, Kaushal Jain, Khahlil Louisy, Greg Nadeau, Steve Penrod , et al. (7 additional authors not shown)

    Abstract: Containment, the key strategy in quickly halting an epidemic, requires rapid identification and quarantine of the infected individuals, determination of whom they have had close contact with in the previous days and weeks, and decontamination of locations the infected individual has visited. Achieving containment demands accurate and timely collection of the infected individual's location and cont… ▽ More

    Submitted 21 December, 2022; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: 15 pages

  26. arXiv:1907.09273  [pdf, other

    cs.AI cs.CL

    Why Build an Assistant in Minecraft?

    Authors: Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

    Abstract: In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

    Submitted 25 July, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

  27. arXiv:1907.08584  [pdf, other

    cs.AI

    CraftAssist: A Framework for Dialogue-enabled Interactive Agents

    Authors: Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam

    Abstract: This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions. The purpose of building such an assistant is to facilitate the study of agents that can complete tasks specified by dialogue, and eventually, to learn from dialogue interactions.

    Submitted 19 July, 2019; originally announced July 2019.

  28. arXiv:1907.03920  [pdf, other

    cs.CL physics.soc-ph

    Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of misty**s and misspellings

    Authors: Tyler J. Gray, Christopher M. Danforth, Peter Sheridan Dodds

    Abstract: Stretched words like `heellllp' or `heyyyyy' are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of `stretchable words' found in roughly 100 billion twee… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Comments: 18 pages, 18 figures, and 9 tables. Online appendices at http://compstorylab.org/stretchablewords/

  29. arXiv:1906.04063  [pdf, other

    stat.ML cs.LG stat.CO

    On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods

    Authors: Waldyn Martinez, J. Brian Gray

    Abstract: Boosting and other ensemble methods combine a large number of weak classifiers through weighted voting to produce stronger predictive models. To explain the successful performance of boosting algorithms, Schapire et al. (1998) showed that AdaBoost is especially effective at increasing the margins of the training data. Schapire et al. (1998) also developed an upper bound on the generalization error… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  30. arXiv:1906.03123  [pdf, other

    stat.ML cs.LG stat.CO

    On the Current State of Research in Explaining Ensemble Performance Using Margins

    Authors: Waldyn Martinez, J. Brian Gray

    Abstract: Empirical evidence shows that ensembles, such as bagging, boosting, random and rotation forests, generally perform better in terms of their generalization error than individual classifiers. To explain this performance, Schapire et al. (1998) developed an upper bound on the generalization error of an ensemble based on the margins of the training data, from which it was concluded that larger margins… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  31. arXiv:1905.01978  [pdf, other

    cs.CL cs.AI

    CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

    Authors: Yacine Jernite, Kavya Srinet, Jonathan Gray, Arthur Szlam

    Abstract: We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft. We describe the data collection process which yields additional 35K human generated instructions with their semantic annotations. We report the performance of three baseline models and find that while a dataset of this size helps us train a usable instruction parser, it still p… ▽ More

    Submitted 17 April, 2019; originally announced May 2019.

  32. arXiv:1903.05372  [pdf, other

    cs.CY cs.NI

    Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams

    Authors: Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, Chengxiang Wang

    Abstract: Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is ill… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 15 pages, 4 figures, WSP ISWC 2017 conference

    Journal ref: ISWC WSP 2017, pp. 33--47

  33. arXiv:1803.09745  [pdf, other

    cs.CL physics.soc-ph

    English verb regularization in books and tweets

    Authors: Tyler J. Gray, Andrew J. Reagan, Peter Sheridan Dodds, Christopher M. Danforth

    Abstract: The English language has evolved dramatically throughout its lifespan, to the extent that a modern speaker of Old English would be incomprehensible without translation. One concrete indicator of this process is the movement from irregular to regular (-ed) forms for the past tense of verbs. In this study we quantify the extent of verb regularization using two vastly disparate datasets: (1) Six year… ▽ More

    Submitted 3 January, 2019; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: 16 pages, 10 figures, and 4 tables. Online appendices at https://www.uvm.edu/storylab/share/papers/gray2018a/ ; Updated to journal version with minor differences from first version

    Journal ref: PLOS ONE 13(12): e0209651, 2018

  34. arXiv:1803.06617  [pdf, other

    cs.AR

    Towards an Area-Efficient Implementation of a High ILP EDGE Soft Processor

    Authors: Jan Gray, Aaron Smith

    Abstract: In-order scalar RISC architectures have been the dominant paradigm in FPGA soft processor design for twenty years. Prior out-of-order superscalar implementations have not exhibited competitive area or absolute performance. This paper describes a new way to build fast and area-efficient out-of-order superscalar soft processors by utilizing an Explicit Data Graph Execution (EDGE) instruction set arc… ▽ More

    Submitted 18 March, 2018; originally announced March 2018.

  35. arXiv:1801.01322  [pdf

    cs.SI physics.soc-ph

    Narrating Networks

    Authors: Liliana Bounegru, Tommaso Venturini, Jonathan Gray, Mathieu Jacomy

    Abstract: Networks have become the de facto diagram of the Big Data age (try searching Google Images for [big data AND visualisation] and see). The concept of networks has become central to many fields of human inquiry and is said to revolutionise everything from medicine to markets to military intelligence. While the mathematical and analytical capabilities of networks have been extensively studied over th… ▽ More

    Submitted 4 January, 2018; originally announced January 2018.

    Journal ref: Digital Journalism , Routledge, 2017, 5 (6), pp.699-730

  36. arXiv:1705.09413  [pdf, other

    cs.PL cs.CY cs.HC cs.SE

    Learnable Programming: Blocks and Beyond

    Authors: David Bau, Jeff Gray, Caitlin Kelleher, Josh Sheldon, Franklyn Turbak

    Abstract: Blocks-based programming has become the lingua franca for introductory coding. Studies have found that experience with blocks-based programming can help beginners learn more traditional text-based languages. We explore how blocks environments improve learnability for novices by 1) favoring recognition over recall, 2) reducing cognitive load, and 3) preventing errors. Increased usability of blocks… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

    ACM Class: K.3.2; H.5.2; D.1.7; D.2.6

    Journal ref: Communications of the ACM, June 2017, pp. 72-80

  37. arXiv:1606.01037  [pdf

    cs.AR

    GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator

    Authors: Jan Gray

    Abstract: GRVI is an FPGA-efficient RISC-V RV32I soft processor. Phalanx is a parallel processor and accelerator array framework. Groups of processors and accelerators form shared memory clusters. Clusters are interconnected with each other and with extreme bandwidth I/O and memory devices by a 300- bit-wide Hoplite NOC. An example Kintex UltraScale KU040 system has 400 RISC-V cores, peak throughput of 100,… ▽ More

    Submitted 3 June, 2016; originally announced June 2016.

    Comments: Presented at 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016) arXiv:1605.08149

    Report number: OLAF/2016/05

  38. PAV ontology: Provenance, Authoring and Versioning

    Authors: Paolo Ciccarese, Stian Soiland-Reyes, Khalid Belhajjame, Alasdair J G Gray, Carole Goble, Tim Clark

    Abstract: Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to… ▽ More

    Submitted 6 December, 2013; v1 submitted 26 April, 2013; originally announced April 2013.

    Comments: 22 pages (incl 5 tables and 19 figures). Submitted to Journal of Biomedical Semantics 2013-04-26 (#1858276535979415). Revised article submitted 2013-08-30. Second revised article submitted 2013-10-06. Accepted 2013-10-07. Author proofs sent 2013-10-09 and 2013-10-16. Published 2013-11-22. Final version 2013-12-06. http://www.jbiomedsem.com/content/4/1/37

    Report number: University of Manchester eScholar: uk-ac-man-scw:193385 ACM Class: I.2.4; H.2.1; H.3.7; I.7.4

    Journal ref: Journal of Biomedical Semantics 2013, 4:37

  39. arXiv:cs/0701173  [pdf

    cs.DB cs.CE

    SkyServer Traffic Report - The First Five Years

    Authors: Vik Singh, Jim Gray, Ani Thakar, Alexander S. Szalay, Jordan Raddick, Bill Boroski, Svetlana Lebedeva, Brian Yanny

    Abstract: The SkyServer is an Internet portal to the Sloan Digital Sky Survey Catalog Archive Server. From 2001 to 2006, there were a million visitors in 3 million sessions generating 170 million Web hits, 16 million ad-hoc SQL queries, and 62 million page views. The site currently averages 35 thousand visitors and 400 thousand sessions per month. The Web and SQL logs are public. We analyzed traffic and s… ▽ More

    Submitted 26 January, 2007; originally announced January 2007.

    Report number: MSR TR-2006-190

  40. arXiv:cs/0701172  [pdf

    cs.DB cs.CE

    Cross-Matching Multiple Spatial Observations and Dealing with Missing Data

    Authors: Jim Gray, Alex Szalay, Tamas Budavari, Robert Lupton, Maria Nieto-Santisteban, Ani Thakar

    Abstract: Cross-match spatially clusters and organizes several astronomical point-source measurements from one or more surveys. Ideally, each object would be found in each survey. Unfortunately, the observation conditions and the objects themselves change continually. Even some stationary objects are missing in some observations; sometimes objects have a variable light flux and sometimes the seeing is wor… ▽ More

    Submitted 26 January, 2007; originally announced January 2007.

    Report number: MSR TR 2006-175

  41. arXiv:cs/0701171  [pdf

    cs.DB cs.DS

    The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets

    Authors: Jim Gray, Maria A. Nieto-Santisteban, Alexander S. Szalay

    Abstract: Zones index an N-dimensional Euclidian or metric space to efficiently support points-near-a-point queries either within a dataset or between two datasets. The approach uses relational algebra and the B-Tree mechanism found in almost all relational database systems. Hence, the Zones Algorithm gives a portable-relational implementation of points-near-point, spatial cross-match, and self-match quer… ▽ More

    Submitted 26 January, 2007; originally announced January 2007.

    Report number: MSR TR 2006 52

  42. arXiv:cs/0701170  [pdf

    cs.DB cs.CE

    Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service

    Authors: Katalin Szlavecz, Andreas Terzis, Stuart Ozer, Razvan Musaloiu-E, Joshua Cogan, Sam Small, Randal Burns, Jim Gray, Alex Szalay

    Abstract: Wireless sensor networks can revolutionize soil ecology by providing measurements at temporal and spatial granularities previously impossible. This paper presents a soil monitoring system we developed and deployed at an urban forest in Baltimore as a first step towards realizing this vision. Motes in this network measure and save soil moisture and temperature in situ every minute. Raw measuremen… ▽ More

    Submitted 26 January, 2007; originally announced January 2007.

    Report number: MSR TR 2006 90

  43. arXiv:cs/0701168  [pdf

    cs.DB

    To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?

    Authors: Russell Sears, Catharine Van Ingen, Jim Gray

    Abstract: Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation - one of the operational issues that can affect the performance and/or manageability of the system as deployed long term… ▽ More

    Submitted 25 January, 2007; originally announced January 2007.

    Report number: MSR-TR-2006-45

    Journal ref: CIDR 2007

  44. arXiv:cs/0701167  [pdf

    cs.DB cs.CE

    Large-Scale Query and XMatch, Entering the Parallel Zone

    Authors: Maria A. Nieto-Santisteban, Aniruddha R. Thakar, Alexander S. Szalay, Jim Gray

    Abstract: Current and future astronomical surveys are producing catalogs with millions and billions of objects. On-line access to such big datasets for data mining and cross-correlation is usually as highly desired as unfeasible. Providing these capabilities is becoming critical for the Virtual Observatory framework. In this paper we present various performance tests that show how using Relational Databas… ▽ More

    Submitted 25 January, 2007; originally announced January 2007.

    Comments: Astronomical Data Analysis Software and Systems XV in San Lorenzo de El Escorial, Madrid, Spain, October 2005, to appear in the ASP Conference Series

    Report number: MSR-TR-2005- 169

  45. arXiv:cs/0701166  [pdf

    cs.DB cs.AR

    Empirical Measurements of Disk Failure Rates and Error Rates

    Authors: Jim Gray, Catharine van Ingen

    Abstract: The SATA advertised bit error rate of one error in 10 terabytes is frightening. We moved 2 PB through low-cost hardware and saw five disk read error events, several controller failures, and many system reboots caused by security patches. We conclude that SATA uncorrectable read errors are not yet a dominant system-fault source - they happen, but are rare compared to other problems. We also concl… ▽ More

    Submitted 25 January, 2007; originally announced January 2007.

    Report number: MSR-TR-2005-166

  46. arXiv:cs/0701165  [pdf

    cs.DB cs.AR

    Petascale Computational Systems

    Authors: Gordon Bell, Jim Gray, Alex Szalay

    Abstract: Computational science is changing to be data intensive. Super-Computers must be balanced systems; not just CPU farms but also petascale IO and networking arrays. Anyone building CyberInfrastructure should allocate resources to support a balanced Tier-1 through Tier-3 design.

    Submitted 25 January, 2007; originally announced January 2007.

  47. arXiv:cs/0701164  [pdf

    cs.DB cs.DS

    Indexing the Sphere with the Hierarchical Triangular Mesh

    Authors: Alexander S. Szalay, Jim Gray, George Fekete, Peter Z. Kunszt, Peter Kukol, Ani Thakar

    Abstract: We describe a method to subdivide the surface of a sphere into spherical triangles of similar, but not identical, shapes and sizes. The Hierarchical Triangular Mesh (HTM) is a quad-tree that is particularly good at supporting searches at different resolutions, from arc seconds to hemispheres. The subdivision scheme is universal, providing the basis for addressing and for fast lookups. The HTM pr… ▽ More

    Submitted 25 January, 2007; originally announced January 2007.

    Report number: MSR-TR-2005-123

  48. arXiv:cs/0701163  [pdf

    cs.DB cs.CE

    Using Table Valued Functions in SQL Server 2005 To Implement a Spatial Data Library

    Authors: Jim Gray, Alex Szalay, Gyorgy Fekete

    Abstract: This article explains how to add spatial search functions (point-near-point and point in polygon) to Microsoft SQL Server 2005 using C# and table-valued functions. It is possible to use this library to add spatial search to your application without writing any special code. The library implements the public-domain C# Hierarchical Triangular Mesh (HTM) algorithms from Johns Hopkins University. Th… ▽ More

    Submitted 25 January, 2007; originally announced January 2007.

    Report number: MSR-TR-2005-122

  49. arXiv:cs/0701162  [pdf

    cs.DB cs.PF

    A Measure of Transaction Processing 20 Years Later

    Authors: Jim Gray

    Abstract: This provides a retrospective of the paper "A Measure of Transaction Processing" published in 1985. It shows that transaction processing peak performance and price-peformance have improved about 100,000x respectively and that sort/sequential performance has approximately doubled each year (so a million fold improvement) even though processor performance plateaued in 1995.

    Submitted 25 January, 2007; originally announced January 2007.

    Comments: This article appeared in the IEEE Data Engineering, Fall 2005

    Report number: MSR-TR-2005-57

  50. arXiv:cs/0701161  [pdf

    cs.DB cs.PF

    Thousands of DebitCredit Transactions-Per-Second: Easy and Inexpensive

    Authors: Jim Gray, Charles Levine

    Abstract: A $2k computer can execute about 8k transactions per second. This is 80x more than one of the largest US bank's 1970's traffic - it approximates the total US 1970's financial transaction volume. Very modest modern computers can easily solve yesterday's problems.

    Submitted 25 January, 2007; originally announced January 2007.

    Report number: MSR-TR-2005-39