Skip to main content

Showing 1–47 of 47 results for author: Fox, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12898  [pdf, other

    physics.ins-det cs.AI hep-ex physics.data-an

    A Comprehensive Evaluation of Generative Models in Calorimeter Shower Simulation

    Authors: Farzana Yasmin Ahmad, Vanamala Venkataswamy, Geoffrey Fox

    Abstract: The pursuit of understanding fundamental particle interactions has reached unparalleled precision levels. Particle physics detectors play a crucial role in generating low-level object signatures that encode collision physics. However, simulating these particle collisions is a demanding task in terms of memory and computation which will be exasperated with larger data volumes, more complex detector… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  2. arXiv:2403.15721  [pdf, other

    cs.DC

    Design and Implementation of an Analysis Pipeline for Heterogeneous Data

    Authors: Arup Kumar Sarker, Aymen Alsaadi, Niranda Perera, Mills Staylor, Gregor von Laszewski, Matteo Turilli, Ozgur Ozan Kilic, Mikhail Titov, Andre Merzky, Shantenu Jha, Geoffrey Fox

    Abstract: Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. In… ▽ More

    Submitted 7 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: 14 pages, 16 figures, 2 tables

    ACM Class: H.2.4; D.2.7; D.2.2

  3. arXiv:2401.08636   

    cs.DC cs.AI

    MLCommons Cloud Masking Benchmark with Early Stop**

    Authors: Varshitha Chennamsetti, Gregor von Laszewski, Ruochen Gu, Laiba Mehnaz, Juri Papay, Samuel Jackson, Jeyan Thiyagalingam, Sergey V. Samsonau, Geoffrey C. Fox

    Abstract: In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop.… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 December, 2023; originally announced January 2024.

    Comments: NYU did not approve the publication of the paper

  4. arXiv:2312.14199  [pdf, other

    cs.CR

    Report on 2023 CyberTraining PI Meeting, 26-27 September 2023

    Authors: Geoffrey Fox, Mary P Thomas, Sajal Bhatia, Marisa Brazil, Nicole M Gasparini, Venkatesh Mohan Merwade, Henry J. Neeman, Jeff Carver, Henri Casanova, Vipin Chaudhary, Dirk Colbry, Lonnie Crosby, Prasun Dewan, Jessica Eisma, Nicole M Gasparini, Ahmed Irfan, Kate Kaehey, Qianqian Liu, Zhen Ni, Sushil Prasad, Apan Qasem, Erik Saule, Prabha Sundaravadivel, Karen Tomko

    Abstract: This document describes a two-day meeting held for the Principal Investigators (PIs) of NSF CyberTraining grants. The report covers invited talks, panels, and six breakout sessions. The meeting involved over 80 PIs and NSF program managers (PMs). The lessons recorded in detail in the report are a wealth of information that could help current and future PIs, as well as NSF PMs, understand the futur… ▽ More

    Submitted 28 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 38 pages, 3 main sections and 2 Appendix sections, 2 figures, 19 tables; updated version: author corrections

  5. arXiv:2312.02368  [pdf, other

    cs.DB cs.DC cs.LG cs.PF

    RINAS: Training with Dataset Shuffling Can Be General and Fast

    Authors: Tianle Zhong, Jiechen Zhao, Xindi Guo, Qiang Su, Geoffrey Fox

    Abstract: Deep learning datasets are expanding at an unprecedented pace, creating new challenges for data processing in model training pipelines. A crucial aspect of these pipelines is dataset shuffling, which significantly improves unbiased learning and convergence accuracy by adhering to the principles of random sampling. However, loading shuffled data for large datasets incurs significant overhead in the… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  6. arXiv:2311.01635  [pdf, other

    cs.DC cs.AI cs.NI

    RTP: Rethinking Tensor Parallelism with Memory Deduplication

    Authors: Cheng Luo, Tianle Zhong, Geoffrey Fox

    Abstract: In the evolving landscape of neural network models, one prominent challenge stand out: the significant memory overheads associated with training expansive models. Addressing this challenge, this study delves deep into the Rotated Tensor Parallelism (RTP). RTP is an innovative approach that strategically focuses on memory deduplication in distributed training environments. It boasts of unique featu… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  7. arXiv:2310.17013  [pdf, other

    cs.DC

    Whitepaper on Reusable Hybrid and Multi-Cloud Analytics Service Framework

    Authors: Gregor von Laszewski, Wo Chang, Russell Reinsch, Olivera Kotevska, Ali Karimi, Abdul Rahman Sattar, Garry Mazzaferro, Geoffrey C. Fox

    Abstract: Over the last several years, the computation landscape for conducting data analytics has completely changed. While in the past, a lot of the activities have been undertaken in isolation by companies, and research institutions, today's infrastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reuse, and interactions while leveraging service colla… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  8. arXiv:2307.01394  [pdf, ps, other

    cs.DC cs.AI cs.IR cs.LG

    In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

    Authors: Niranda Perera, Arup Kumar Sarker, Mills Staylor, Gregor von Laszewski, Kaiying Shan, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Thejaka Amila Kanewela, Geoffrey Fox

    Abstract: The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amoun… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Report number: FGCS-D-23-00577R1

  9. arXiv:2306.04025  [pdf, other

    cs.AI

    Designing explainable artificial intelligence with active inference: A framework for transparent introspection and decision-making

    Authors: Mahault Albarracin, Inês Hipólito, Safae Essafi Tremblay, Jason G. Fox, Gabriel René, Karl Friston, Maxwell J. D. Ramstead

    Abstract: This paper investigates the prospect of develo** human-interpretable, explainable artificial intelligence (AI) systems based on active inference and the free energy principle. We first provide a brief overview of active inference, and in particular, of how it applies to the modeling of decision-making, introspection, as well as the generation of overt and covert actions. We then discuss how acti… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  10. arXiv:2302.03786  [pdf, other

    cs.LG cond-mat.dis-nn

    Analyzing the Performance of Deep Encoder-Decoder Networks as Surrogates for a Diffusion Equation

    Authors: J. Quetzalcoatl Toledo-Marin, James A. Glazier, Geoffrey Fox

    Abstract: Neural networks (NNs) have proven to be a viable alternative to traditional direct numerical algorithms, with the potential to accelerate computational time by several orders of magnitude. In the present paper we study the use of encoder-decoder convolutional neural network (CNN) as surrogates for steady-state diffusion solvers. The construction of such surrogates requires the selection of an appr… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 21 ps, 17 figs, 8 ts

  11. arXiv:2301.07896  [pdf, other

    cs.DC cs.DB

    Supercharging Distributed Computing Environments For High Performance Data Engineering

    Authors: Niranda Perera, Kaiying Shan, Supun Kamburugamuwe, Thejaka Amila Kanewela, Chathura Widanage, Arup Sarker, Mills Staylor, Tianle Zhong, Vibhatha Abeykoon, Geoffrey Fox

    Abstract: The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to process terabytes of data. They can easily exceed the capabilities of a single machine, but also demand significant developer time & effort. Therefore it is esse… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  12. arXiv:2212.13732  [pdf, ps, other

    cs.DC

    Hybrid Cloud and HPC Approach to High-Performance Dataframes

    Authors: Kaiying Shan, Niranda Perera, Damitha Lenadora, Tianle Zhong, Arup Sarker, Supun Kamburugamuve, Thejaka Amila Kanewela, Chathura Widanage, Geoffrey Fox

    Abstract: Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data processing both as a standalone application and as a library, especially for Python applications. While Cylon shows promising performance results, we experienced difficu… ▽ More

    Submitted 29 December, 2022; v1 submitted 28 December, 2022; originally announced December 2022.

  13. arXiv:2212.01354  [pdf, other

    cs.AI cs.MA nlin.AO

    Designing Ecosystems of Intelligence from First Principles

    Authors: Karl J Friston, Maxwell J D Ramstead, Alex B Kiefer, Alexander Tschantz, Christopher L Buckley, Mahault Albarracin, Riddhi J Pitliya, Conor Heins, Brennan Klein, Beren Millidge, Dalton A R Sakthivadivel, Toby St Clere Smithe, Magnus Koudahl, Safae Essafi Tremblay, Capm Petersen, Kaiser Fung, Jason G Fox, Steven Swanson, Dan Mapes, Gabriel René

    Abstract: This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants -- what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read… ▽ More

    Submitted 11 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: 23+18 pages, one figure, one six page appendix

    Journal ref: Collective Intelligence, 3(1), 2024

  14. An Implicit Parametric Morphable Dental Model

    Authors: Congyi Zhang, Mohamed Elgharib, Gereon Fox, Min Gu, Christian Theobalt, Wen** Wang

    Abstract: 3D Morphable models of the human body capture variations among subjects and are useful in reconstruction and editing applications. Current dental models use an explicit mesh scene representation and model only the teeth, ignoring the gum. In this work, we present the first parametric 3D morphable dental model for both teeth and gum. Our model uses an implicit scene representation and is learned fr… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  15. arXiv:2210.16941  [pdf, other

    cs.DC

    Hybrid Reusable Computational Analytics Workflow Management with Cloudmesh

    Authors: Gregor von Laszewski, J. P. Fleischer, Geoffrey C. Fox

    Abstract: In this paper, we summarize our effort to create and utilize a simple framework to coordinate computational analytics tasks with the help of a workflow system. Our design is based on a minimalistic approach while at the same time allowing to access computational resources offered through the owner's computer, HPC computing centers, cloud resources, and distributed systems in general. The access to… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 12 pages, 3 apendies, 23 Figures, 4 Tables

  16. arXiv:2210.08973  [pdf, ps, other

    cs.CY cs.HC cs.LG hep-ex

    FAIR for AI: An interdisciplinary and international community building perspective

    Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver Rübel, Zhizhen Zhao, Ruike Zhu

    Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More

    Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

    ACM Class: I.2.0; E.0

    Journal ref: Scientific Data 10, 487 (2023)

  17. High Performance Dataframes from Parallel Processing Patterns

    Authors: Niranda Perera, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Ahmet Uyar, Kaiying Shan, Hasara Maithree, Damitha Lenadora, Thejaka Amila Kanewala, Geoffrey Fox

    Abstract: The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performance limitations even while working on even moderat… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Will be presented in PPAM 2022

  18. arXiv:2201.06717  [pdf, other

    cs.LG cs.AI

    GTrans: Spatiotemporal Autoregressive Transformer with Graph Embeddings for Nowcasting Extreme Events

    Authors: Bo Feng, Geoffrey Fox

    Abstract: Spatiotemporal time series nowcasting should preserve temporal and spatial dynamics in the sense that generated new sequences from models respect the covariance relationship from history. Conventional feature extractors are built with deep convolutional neural networks (CNN). However, CNN models have limits to image-like applications where data can be formed with high-dimensional arrays. In contra… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

  19. arXiv:2201.01869  [pdf

    physics.geo-ph cs.LG

    Earthquake Nowcasting with Deep Learning

    Authors: Geoffrey Fox, John Rundle, Andrea Donnellan, Bo Feng

    Abstract: We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950-2020. Earthquake activity is predicted as a function of 0.1-degree spa… ▽ More

    Submitted 18 December, 2021; originally announced January 2022.

  20. arXiv:2110.12773  [pdf

    cs.LG physics.comp-ph

    Scientific Machine Learning Benchmarks

    Authors: Jeyan Thiyagalingam, Mallikarjun Shankar, Geoffrey Fox, Tony Hey

    Abstract: The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training machines to identify patterns, trends, and anomalies… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    ACM Class: I.2

  21. arXiv:2110.11466  [pdf, other

    cs.LG cs.DC

    MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

    Authors: Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda , et al. (18 additional authors not shown)

    Abstract: Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning appli… ▽ More

    Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

  22. arXiv:2108.06001  [pdf, other

    cs.DC cs.AI

    HPTMT Parallel Operators for High Performance Data Science & Data Engineering

    Authors: Vibhatha Abeykoon, Supun Kamburugamuve, Chathura Widanage, Niranda Perera, Ahmet Uyar, Thejaka Amila Kanewala, Gregor von Laszewski, Geoffrey Fox

    Abstract: Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field ha… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  23. arXiv:2107.12807  [pdf, other

    cs.DC cs.AI

    HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

    Authors: Supun Kamburugamuve, Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Ahmet Uyar, Thejaka Amila Kanewala, Gregor von Laszewski, Geoffrey Fox

    Abstract: Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, graphs, and tables. Our key concepts are inspired… ▽ More

    Submitted 29 July, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

  24. arXiv:2107.07224  [pdf, other

    cs.CV

    StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

    Authors: Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt

    Abstract: Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset to learn temporal correlations, while being rather limited in the resolution… ▽ More

    Submitted 30 November, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

    Comments: Final draft

  25. arXiv:2104.09014  [pdf

    cs.AI cs.DC

    Multidimensional Scaling for Gene Sequence Data with Autoencoders

    Authors: Pulasthi Wickramasinghe, Geoffrey Fox

    Abstract: Multidimensional scaling of gene sequence data has long played a vital role in analysing gene sequence data to identify clusters and patterns. However the computation complexities and memory requirements of state-of-the-art dimensional scaling algorithms make it infeasible to scale to large datasets. In this paper we present an autoencoder-based dimensional reduction model which can easily scale t… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  26. arXiv:2102.05527  [pdf, other

    cond-mat.soft cond-mat.dis-nn cs.LG

    Deep learning approaches to surrogates for solving the diffusion equation for mechanistic real-world simulations

    Authors: J. Quetzalcóatl Toledo-Marín, Geoffrey Fox, James P. Sluka, James A. Glazier

    Abstract: In many mechanistic medical, biological, physical and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs) can make simulations impractically slow. Biological models require the simultaneous calculation of the spatial variation of concentration of dozens of diffusing chemical species. Machine learning surrogates, neural networks trained to provid… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: 17 pp, 2 tables, 11 figs, 1sm, 12sm-figs, code available at GitHub

  27. arXiv:2012.14336  [pdf, other

    physics.geo-ph cs.CV cs.LG

    Spatiotemporal Pattern Mining for Nowcasting Extreme Earthquakes in Southern California

    Authors: Bo Feng, Geoffrey C. Fox

    Abstract: Geoscience and seismology have utilized the most advanced technologies and equipment to monitor seismic events globally from the past few decades. With the enormous amount of data, modern GPU-powered deep learning presents a promising approach to analyze data and discover patterns. In recent years, there are plenty of successful deep learning models for picking seismic waves. However, forecasting… ▽ More

    Submitted 11 September, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

    Journal ref: IEEE eScience 2021

  28. arXiv:2010.14596  [pdf, other

    cs.DC cs.IR

    A Fast, Scalable, Universal Approach For Distributed Data Aggregations

    Authors: Niranda Perera, Vibhatha Abeykoon, Chathura Widanage, Supun Kamburugamuve, Thejaka Amila Kanewala, Pulasthi Wickramasinghe, Ahmet Uyar, Hasara Maithree, Damitha Lenadora, Geoffrey Fox

    Abstract: In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these appl… ▽ More

    Submitted 14 December, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

  29. arXiv:2010.11796  [pdf, other

    cs.CR cs.AI

    CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU

    Authors: Bo Feng, Qian Lou, Lei Jiang, Geoffrey C. Fox

    Abstract: Billions of text analysis requests containing private emails, personal text messages, and sensitive online reviews, are processed by recurrent neural networks (RNNs) deployed on public clouds every day. Although prior secure networks combine homomorphic encryption (HE) and garbled circuit (GC) to preserve users' privacy, naively adopting the HE and GC hybrid technique to implement RNNs suffers fro… ▽ More

    Submitted 9 September, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Journal ref: The 2021 Conference on Empirical Methods in Natural Language Processing

  30. arXiv:2010.06312  [pdf, other

    cs.DC cs.CY cs.PF cs.SE

    Data Engineering for HPC with Python

    Authors: Vibhatha Abeykoon, Niranda Perera, Chathura Widanage, Supun Kamburugamuve, Thejaka Amila Kanewala, Hasara Maithree, Pulasthi Wickramasinghe, Ahmet Uyar, Geoffrey Fox

    Abstract: Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation, and data movements. One goal of data engineering is to transform data from original data to vector/matrix/tensor formats accepted by deep learning and machine l… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: 9 pages, 11 images, Accepted in 9th Workshop on Python for High-Performance and Scientific Computing (In conjunction with Supercomputing 20)

  31. arXiv:2010.03757  [pdf, other

    cs.LG stat.ML

    AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

    Authors: Geoffrey C. Fox, Gregor von Laszewski, Fugang Wang, Saumyadipta Pyne

    Abstract: The COVID-19 pandemic has profound global consequences on health, economic, social, political, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of AICov, which provides an integrative deep learning framework for COVID-19 forecasting wi… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: 25 pages, 4 tabkes, 19 figures

  32. arXiv:2010.03712  [pdf, other

    cs.CV

    Deep Tiered Image Segmentation For Detecting Internal Ice Layers in Radar Imagery

    Authors: Yuchen Wang, Mingze Xu, John Paden, Lora Koenig, Geoffrey Fox, David Crandall

    Abstract: Understanding the structure of Earth's polar ice sheets is important for modeling how global warming will impact polar ice and, in turn, the Earth's climate. Ground-penetrating radar is able to collect observations of the internal structure of snow and ice, but the process of manually labeling these observations is slow and laborious. Recent work has developed automatic techniques for finding the… ▽ More

    Submitted 6 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: ICME version

  33. arXiv:2007.09589  [pdf, other

    cs.DC cs.DB

    High Performance Data Engineering Everywhere

    Authors: Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Supun Kamburugamuve, Thejaka Amila Kanewala, Hasara Maithree, Pulasthi Wickramasinghe, Ahmet Uyar, Gurhan Gunduz, Geoffrey Fox

    Abstract: The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide. However this is just a small part of the issues facing the overall data processing environment, which must also support a raft of data engineering for pre- and po… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

  34. arXiv:2005.10360  [pdf, other

    cs.CV

    VideoForensicsHQ: Detecting High-quality Manipulated Face Videos

    Authors: Gereon Fox, Wentao Liu, Hyeongwoo Kim, Hans-Peter Seidel, Mohamed Elgharib, Christian Theobalt

    Abstract: There are concerns that new approaches to the synthesis of high quality face videos may be misused to manipulate videos with malicious intent. The research community therefore developed methods for the detection of modified footage and assembled benchmark datasets for this task. In this paper, we examine how the performance of forgery detectors depends on the presence of artefacts that the human e… ▽ More

    Submitted 2 June, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: ICME 2021 camera-ready

  35. arXiv:2004.06493  [pdf, other

    physics.comp-ph cond-mat.soft cs.LG stat.ML

    Solving Newton's Equations of Motion with Large Timesteps using Recurrent Neural Networks based Operators

    Authors: JCS Kadupitiya, Geoffrey C. Fox, Vikram Jadhao

    Abstract: Classical molecular dynamics simulations are based on solving Newton's equations of motion. Using a small timestep, numerical integrators such as Verlet generate trajectories of particles as solutions to Newton's equations. We introduce operators derived using recurrent neural networks that accurately solve Newton's equations utilizing sequences of past trajectory data, and produce energy-conservi… ▽ More

    Submitted 13 December, 2021; v1 submitted 12 April, 2020; originally announced April 2020.

    Comments: 15 pages, 12 figures; updated content

  36. arXiv:1911.07101  [pdf, other

    cs.LG stat.ML

    Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

    Authors: Qian Lou, Bo Feng, Geoffrey C. Fox, Lei Jiang

    Abstract: Big data is one of the cornerstones to enabling and training deep neural networks (DNNs). Because of the lack of expertise, to gain benefits from their data, average users have to rely on and upload their private data to big data companies they may not trust. Due to the compliance, legal, or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or… ▽ More

    Submitted 21 October, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

    Comments: 10 pages, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  37. arXiv:1911.05878  [pdf, other

    eess.IV cs.CV cs.LG

    Scientific Image Restoration Anywhere

    Authors: Vibhatha Abeykoon, Zhengchun Liu, Rajkumar Kettimuthu, Geoffrey Fox, Ian Foster

    Abstract: The use of deep learning models within scientific experimental facilities frequently requires low-latency inference, so that, for example, quality control operations can be performed while data are being collected. Edge computing devices can be useful in this context, as their low cost and compact form factor permit them to be co-located with the experimental apparatus. Can such devices, with thei… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: 6 pages, 8 figures, 1 table

  38. arXiv:1909.13340  [pdf

    cs.LG stat.ML

    Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations

    Authors: Geoffrey Fox, Shantenu Jha

    Abstract: We present a taxonomy of research on Machine Learning (ML) applied to enhance simulations together with a catalog of some activities. We cover eight patterns for the link of ML to the simulations or systems plus three algorithmic areas: particle dynamics, agent-based models and partial differential equations. The patterns are further divided into three action areas: Improving simulation with Confi… ▽ More

    Submitted 13 October, 2019; v1 submitted 29 September, 2019; originally announced September 2019.

    Comments: 15th International Conference eScience 2019, September 24-27, 2019, San Diego, California,

  39. arXiv:1909.02363  [pdf, ps, other

    cs.LG cs.DC physics.comp-ph stat.ML

    Understanding ML driven HPC: Applications and Infrastructure

    Authors: Geoffrey Fox, Shantenu Jha

    Abstract: We recently outlined the vision of "Learning Everywhere" which captures the possibility and impact of how learning methods and traditional HPC methods can be coupled together. A primary driver of such coupling is the promise that Machine Learning (ML) will give major performance improvements for traditional HPC simulations. Motivated by this potential, the ML around HPC class of integration is of… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Comments: Invited talk to "Visionary Track" at IEEE eScience 2019. arXiv admin note: text overlap with arXiv:1806.04731 by other authors

  40. arXiv:1907.00097  [pdf, other

    cs.DC q-bio.QM

    Parallel Performance of Molecular Dynamics Trajectory Analysis

    Authors: Mahzad Khoshlessan, Ioannis Paraskevakos, Geoffrey C. Fox, Shantenu Jha, Oliver Beckstein

    Abstract: The performance of biomolecular molecular dynamics simulations has steadily increased on modern high performance computing resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is becoming a bottleneck. To close this gap, we studied the performance of parallel trajectory analysis with MPI and the Python MDAnalysis library on three dif… ▽ More

    Submitted 27 March, 2020; v1 submitted 28 June, 2019; originally announced July 2019.

    Comments: accepted manuscript, to appear in 'Concurrency and Computation: Practice and Experience'

    ACM Class: D.1.3; J.2

  41. arXiv:1905.01219  [pdf, other

    cs.DC cs.LG

    Performance Optimization on Model Synchronization in Parallel Stochastic Gradient Descent Based SVM

    Authors: Vibhatha Abeykoon, Geoffrey Fox, Minje Kim

    Abstract: Understanding the bottlenecks in implementing stochastic gradient descent (SGD)-based distributed support vector machines (SVM) algorithm is important in training larger data sets. The communication time to do the model synchronization across the parallel processes is the main bottleneck that causes inefficiency in the training process. The model synchronization is directly affected by the mini-ba… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.

    Comments: Paper Accepted in HPML 2019 Held in conjunction with IEEE/ACM CCGRID 2019

  42. arXiv:1902.10810  [pdf, ps, other

    cs.DC physics.comp-ph

    Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation

    Authors: Geoffrey Fox, James A. Glazier, JCS Kadupitiya, Vikram Jadhao, Minje Kim, Judy Qiu, James P. Sluka, Endre Somogyi, Madhav Marathe, Abhi** Adiga, Jiangzhuo Chen, Oliver Beckstein, Shantenu Jha

    Abstract: The convergence of HPC and data-intensive methodologies provide a promising approach to major performance improvements. This paper provides a general description of the interaction between traditional HPC and ML approaches and motivates the Learning Everywhere paradigm for HPC. We introduce the concept of effective performance that one can achieve by combining learning methodologies with simulatio… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

  43. arXiv:1801.07630  [pdf, other

    cs.DC

    Task-parallel Analysis of Molecular Dynamics Trajectories

    Authors: Ioannis Paraskevakos, Andre Luckow, Mahzad Khoshlessan, George Chantzialexiou, Thomas E. Cheatham, Oliver Beckstein, Geoffrey C. Fox, Shantenu Jha

    Abstract: Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics on HPC resources and compare them with MPI. We investigate the data analysis requirements of Molecular Dynamics (MD) simul… ▽ More

    Submitted 10 June, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

  44. arXiv:1801.03986  [pdf, other

    cs.CV

    Multi-Task Spatiotemporal Neural Networks for Structured Surface Reconstruction

    Authors: Mingze Xu, Chenyou Fan, John D Paden, Geoffrey C Fox, David J Crandall

    Abstract: Deep learning methods have surpassed the performance of traditional techniques on a wide range of problems in computer vision, but nearly all of this work has studied consumer photos, where precisely correct output is often not critical. It is less clear how well these techniques may apply on structured prediction problems where fine-grained output with high precision is required, such as in scien… ▽ More

    Submitted 20 July, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: 10 pages, 7 figures, published in WACV 2018

  45. arXiv:1712.07758  [pdf, ps, other

    cs.CV

    Automatic Estimation of Ice Bottom Surfaces from Radar Imagery

    Authors: Mingze Xu, David J Crandall, Geoffrey C Fox, John D Paden

    Abstract: Ground-penetrating radar on planes and satellites now makes it practical to collect 3D observations of the subsurface structure of the polar ice sheets, providing crucial data for understanding and tracking global climate change. But converting these noisy readings into useful observations is generally done by hand, which is impractical at a continental scale. In this paper, we propose a computer… ▽ More

    Submitted 20 December, 2017; originally announced December 2017.

    Comments: 5 pages, 3 figures, published in ICIP 2017

  46. Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research

    Authors: Geoffrey C. Fox, Vatche Ishakian, Vinod Muthusamy, Aleksander Slominski

    Abstract: This whitepaper summarizes issues raised during the First International Workshop on Serverless Computing (WoSC) 2017 held June 5th 2017 and especially in the panel and associated discussion that concluded the workshop. We also include comments from the keynote and submitted papers. A glossary at the end (section 8) defines many technical terms used in this report.

    Submitted 26 August, 2017; originally announced August 2017.

    Comments: Technical Report

  47. arXiv:1403.1528  [pdf, other

    cs.DC

    A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

    Authors: Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, Geoffrey C. Fox

    Abstract: Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-perform… ▽ More

    Submitted 22 June, 2014; v1 submitted 6 March, 2014; originally announced March 2014.

    Comments: 8 pages, 2 figures