-
Map** Housing Stock Characteristics from Drone Images for Climate Resilience in the Caribbean
Authors:
Isabelle Tingzon,
Nuala Margaret Cowan,
Pierre Chrzanowski
Abstract:
Comprehensive information on housing stock is crucial for climate adaptation initiatives aiming to reduce the adverse impacts of climate-extreme hazards in high-risk regions like the Caribbean. In this study, we propose a workflow for rapidly generating critical baseline housing stock data using very high-resolution drone images and deep learning techniques. Specifically, our work leverages the Se…
▽ More
Comprehensive information on housing stock is crucial for climate adaptation initiatives aiming to reduce the adverse impacts of climate-extreme hazards in high-risk regions like the Caribbean. In this study, we propose a workflow for rapidly generating critical baseline housing stock data using very high-resolution drone images and deep learning techniques. Specifically, our work leverages the Segment Anything Model and convolutional neural networks for the automated generation of building footprints and roof classification maps. By strengthening local capacity within government agencies to leverage AI and Earth Observation-based solutions, this work seeks to improve the climate resilience of the housing sector in small island develo** states in the Caribbean.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Fusing VHR Post-disaster Aerial Imagery and LiDAR Data for Roof Classification in the Caribbean
Authors:
Isabelle Tingzon,
Nuala Margaret Cowan,
Pierre Chrzanowski
Abstract:
Accurate and up-to-date information on building characteristics is essential for vulnerability assessment; however, the high costs and long timeframes associated with conducting traditional field surveys can be an obstacle to obtaining critical exposure datasets needed for disaster risk management. In this work, we leverage deep learning techniques for the automated classification of roof characte…
▽ More
Accurate and up-to-date information on building characteristics is essential for vulnerability assessment; however, the high costs and long timeframes associated with conducting traditional field surveys can be an obstacle to obtaining critical exposure datasets needed for disaster risk management. In this work, we leverage deep learning techniques for the automated classification of roof characteristics from very high-resolution orthophotos and airborne LiDAR data obtained in Dominica following Hurricane Maria in 2017. We demonstrate that the fusion of multimodal earth observation data performs better than using any single data source alone. Using our proposed methods, we achieve F1 scores of 0.93 and 0.92 for roof type and roof material classification, respectively. This work is intended to help governments produce more timely building information to improve resilience and disaster response in the Caribbean.
△ Less
Submitted 9 October, 2023; v1 submitted 30 July, 2023;
originally announced July 2023.
-
GC3: An Optimizing Compiler for GPU Collective Communication
Authors:
Meghan Cowan,
Saeed Maleki,
Madanlal Musuvathi,
Olli Saarikivi,
Yifan Xiong
Abstract:
Machine learning models made up of millions or billions of parameters are trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications become a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can alleviate this bottlen…
▽ More
Machine learning models made up of millions or billions of parameters are trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications become a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can alleviate this bottleneck and help these applications scale. However, correctly and efficiently implementing custom algorithms is challenging.
This paper introduces GC3, a system for programmable GPU communication. GC3 provides a domain specific language for writing collective communication algorithms and an optimizing compiler for lowering them to an executable form, which can be executed efficiently and flexibly in an interpreter based runtime. We used GC3 to write novel collective algorithms for AllReduce and AllToAll that are up to $1.9\times$ and $1.3\times$ faster than hand-optimized implementations, respectively.
△ Less
Submitted 19 July, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Authors:
Aashaka Shah,
Vijay Chidambaram,
Meghan Cowan,
Saeed Maleki,
Madan Musuvathi,
Todd Mytkowicz,
Jacob Nelson,
Olli Saarikivi,
Rachee Singh
Abstract:
Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm d…
▽ More
Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the Nvidia Collective Communication Library (NCCL) by up to 6.7x. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3x for different batch sizes.
△ Less
Submitted 5 October, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Analysis and Mitigations of Reverse Engineering Attacks on Local Feature Descriptors
Authors:
Deeksha Dangwal,
Vincent T. Lee,
Hyo ** Kim,
Tianwei Shen,
Meghan Cowan,
Rajvi Shah,
Caroline Trippel,
Brandon Reagen,
Timothy Sherwood,
Vasileios Balntas,
Armin Alaghi,
Eddy Ilg
Abstract:
As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user images. The widely adopted technology uses local feature descriptors, which are derived from the images and it was long thought that they could not be reverted back. However, recent work has demonstrated that under certain conditions reverse…
▽ More
As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user images. The widely adopted technology uses local feature descriptors, which are derived from the images and it was long thought that they could not be reverted back. However, recent work has demonstrated that under certain conditions reverse engineering attacks are possible and allow an adversary to reconstruct RGB images. This poses a potential risk to user privacy. We take this a step further and model potential adversaries using a privacy threat model. Subsequently, we show under controlled conditions a reverse engineering attack on sparse feature maps and analyze the vulnerability of popular descriptors including FREAK, SIFT and SOSNet. Finally, we evaluate potential mitigation techniques that select a subset of descriptors to carefully balance privacy reconstruction risk while preserving image matching accuracy; our results show that similar accuracy can be obtained when revealing less information.
△ Less
Submitted 8 May, 2021;
originally announced May 2021.
-
SoK: Opportunities for Software-Hardware-Security Codesign for Next Generation Secure Computing
Authors:
Deeksha Dangwal,
Meghan Cowan,
Armin Alaghi,
Vincent T. Lee,
Brandon Reagen,
Caroline Trippel
Abstract:
Users are demanding increased data security. As a result, security is rapidly becoming a first-order design constraint in next generation computing systems. Researchers and practitioners are exploring various security technologies to meet user demand such as trusted execution environments (e.g., Intel SGX, ARM TrustZone), homomorphic encryption, and differential privacy. Each technique provides so…
▽ More
Users are demanding increased data security. As a result, security is rapidly becoming a first-order design constraint in next generation computing systems. Researchers and practitioners are exploring various security technologies to meet user demand such as trusted execution environments (e.g., Intel SGX, ARM TrustZone), homomorphic encryption, and differential privacy. Each technique provides some degree of security, but differs with respect to threat coverage, performance overheads, as well as implementation and deployment challenges. In this paper, we present a systemization of knowledge (SoK) on these design considerations and trade-offs using several prominent security technologies. Our study exposes the need for \textit{software-hardware-security} codesign to realize efficient and effective solutions of securing user data. In particular, we explore how design considerations across applications, hardware, and security mechanisms must be combined to overcome fundamental limitations in current technologies so that we can minimize performance overhead while achieving sufficient threat model coverage. Finally, we propose a set of guidelines to facilitate putting these secure computing technologies into practice.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Porcupine: A Synthesizing Compiler for Vectorized Homomorphic Encryption
Authors:
Meghan Cowan,
Deeksha Dangwal,
Armin Alaghi,
Caroline Trippel,
Vincent T. Lee,
Brandon Reagen
Abstract:
Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly on encrypted data. Despite its promise, HE has seen limited use due to performance overheads and compilation challenges. Recent work has made significant advances to address the performance overheads but automatic compilation of efficient HE kernels remains relatively unexplored.
This paper presents P…
▽ More
Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly on encrypted data. Despite its promise, HE has seen limited use due to performance overheads and compilation challenges. Recent work has made significant advances to address the performance overheads but automatic compilation of efficient HE kernels remains relatively unexplored.
This paper presents Porcupine, an optimizing compiler, and HE DSL named Quill to automatically generate HE code using program synthesis. HE poses three major compilation challenges: it only supports a limited set of SIMD-like operators, it uses long-vector operands, and decryption can fail if ciphertext noise growth is not managed properly. Quill captures the underlying HE operator behavior that enables Porcupine to reason about the complex trade-offs imposed by the challenges and generate optimized, verified HE kernels. To improve synthesis time, we propose a series of optimizations including a sketch design tailored to HE and instruction restriction to narrow the program search space. We evaluate Procupine using a set of kernels and show speedups of up to 51% (11% geometric mean) compared to heuristic-driven hand-optimized kernels. Analysis of Porcupine's synthesized code reveals that optimal solutions are not always intuitive, underscoring the utility of automated reasoning in this domain.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Automating Generation of Low Precision Deep Learning Operators
Authors:
Meghan Cowan,
Thierry Moreau,
Tianqi Chen,
Luis Ceze
Abstract:
State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interes…
▽ More
State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying highly quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance. These models have a significantly reduced memory footprint (up to 32x reduction) and can replace multiply-accumulates with bitwise operations during compute intensive convolution and fully connected layers.
Most deep learning frameworks rely on highly engineered linear algebra libraries such as ATLAS or Intel's MKL to implement efficient deep learning operators. To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries. In this paper we introduce a work flow to quickly generate high performance low precision deep learning operators for arbitrary precision that target multiple CPU architectures and include optimizations such as memory tiling and vectorization. We present an extensive case study on low power ARM Cortex-A53 CPU, and show how we can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.
-
Spark-MPI: Approaching the Fifth Paradigm of Cognitive Applications
Authors:
Nikolay Malitsky,
Ralph Castain,
Matt Cowan
Abstract:
Over the past decade, the fourth paradigm of data-intensive science rapidly became a major driving concept of multiple application domains encompassing and generating large-scale devices such as light sources and cutting edge telescopes. The success of data-intensive projects subsequently triggered the next generation of machine learning approaches. These new artificial intelligent systems clearly…
▽ More
Over the past decade, the fourth paradigm of data-intensive science rapidly became a major driving concept of multiple application domains encompassing and generating large-scale devices such as light sources and cutting edge telescopes. The success of data-intensive projects subsequently triggered the next generation of machine learning approaches. These new artificial intelligent systems clearly represent a paradigm shift from data processing pipelines towards the fifth paradigm of composite cognitive applications requiring the integration of Big Data processing platforms and HPC technologies. The paper addresses the existing impedance mismatch between data-intensive and compute-intensive ecosystems by presenting the Spark-MPI approach based on the MPI Exascale Process Management Interface (PMIx). The approach is demonstrated within the context of hybrid MPI/GPU ptychographic image reconstruction pipelines and distributed deep learning applications.
△ Less
Submitted 15 May, 2018;
originally announced June 2018.
-
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Authors:
Nikolay Malitsky,
Aashish Chaudhary,
Sebastien Jourdain,
Matt Cowan,
Patrick O'Leary,
Marcus Hanwell,
Kerstin Kleese Van Dam
Abstract:
Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three Vs (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spa…
▽ More
Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three Vs (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.
△ Less
Submitted 13 May, 2018;
originally announced May 2018.
-
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Authors:
Tianqi Chen,
Thierry Moreau,
Ziheng Jiang,
Lianmin Zheng,
Eddie Yan,
Meghan Cowan,
Haichen Shen,
Leyuan Wang,
Yuwei Hu,
Luis Ceze,
Carlos Guestrin,
Arvind Krishnamurthy
Abstract:
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that…
▽ More
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, map** to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.
△ Less
Submitted 5 October, 2018; v1 submitted 12 February, 2018;
originally announced February 2018.
-
Exploring Computation-Communication Tradeoffs in Camera Systems
Authors:
Amrita Mazumdar,
Thierry Moreau,
Sung Kim,
Meghan Cowan,
Armin Alaghi,
Luis Ceze,
Mark Oskin,
Visvesh Sathe
Abstract:
Cameras are the defacto sensor. The growing demand for real-time and low-power computer vision, coupled with trends towards high-efficiency heterogeneous systems, has given rise to a wide range of image processing acceleration techniques at the camera node and in the cloud. In this paper, we characterize two novel camera systems that use acceleration techniques to push the extremes of energy and p…
▽ More
Cameras are the defacto sensor. The growing demand for real-time and low-power computer vision, coupled with trends towards high-efficiency heterogeneous systems, has given rise to a wide range of image processing acceleration techniques at the camera node and in the cloud. In this paper, we characterize two novel camera systems that use acceleration techniques to push the extremes of energy and performance scaling, and explore the computation-communication tradeoffs in their design. The first case study targets a camera system designed to detect and authenticate individual faces, running solely on energy harvested from RFID readers. We design a multi-accelerator SoC design operating in the sub-mW range, and evaluate it with real-world workloads to show performance and energy efficiency improvements over a general purpose microprocessor. The second camera system supports a 16-camera rig processing over 32 Gb/s of data to produce real-time 3D-360 degree virtual reality video. We design a multi-FPGA processing pipeline that outperforms CPU and GPU configurations by up to 10x in computation time, producing panoramic stereo video directly from the camera rig at 30 frames per second. We find that an early data reduction step, either before complex processing or offloading, is the most critical optimization for in-camera systems.
△ Less
Submitted 16 October, 2017; v1 submitted 12 June, 2017;
originally announced June 2017.
-
Computationally efficient methods for modelling laser wakefield acceleration in the blowout regime
Authors:
B. M. Cowan,
S. Y. Kalmykov,
A. Beck,
X. Davoine,
K. Bunkers,
A. F. Lifschitz,
E. Lefebvre,
D. L. Bruhwiler,
B. A. Shadwick,
D. P. Umstadter
Abstract:
Electron self-injection and acceleration until dephasing in the blowout regime is studied for a set of initial conditions typical of recent experiments with 100 terawatt-class lasers. Two different approaches to computationally efficient, fully explicit, three-dimensional particle-in-cell modelling are examined. First, the Cartesian code VORPAL using a perfect-dispersion electromagnetic solver pre…
▽ More
Electron self-injection and acceleration until dephasing in the blowout regime is studied for a set of initial conditions typical of recent experiments with 100 terawatt-class lasers. Two different approaches to computationally efficient, fully explicit, three-dimensional particle-in-cell modelling are examined. First, the Cartesian code VORPAL using a perfect-dispersion electromagnetic solver precisely describes the laser pulse and bubble dynamics, taking advantage of coarser resolution in the propagation direction, with a proportionally larger time step. Using third-order splines for macroparticles helps suppress the sampling noise while kee** the usage of computational resources modest. The second way to reduce the simulation load is using reduced-geometry codes. In our case, the quasi-cylindrical code CALDER-CIRC uses decomposition of fields and currents into a set of poloidal modes, while the macroparticles move in the Cartesian 3D space. Cylindrical symmetry of the interaction allows using just two modes, reducing the computational load to roughly that of a planar Cartesian simulation while preserving the 3D nature of the interaction. This significant economy of resources allows using fine resolution in the direction of propagation and a small time step, making numerical dispersion vanishingly small, together with a large number of particles per cell, enabling good particle statistics. Quantitative agreement of the two simulations indicates that they are free of numerical artefacts. Both approaches thus retrieve physically correct evolution of the plasma bubble, recovering the intrinsic connection of electron self-injection to the nonlinear optical evolution of the driver.
△ Less
Submitted 3 April, 2012;
originally announced April 2012.
-
Three-dimensional dielectric photonic crystal structures for laser-driven acceleration
Authors:
Benjamin M. Cowan
Abstract:
We present the design and simulation of a three-dimensional photonic crystal waveguide for linear laser-driven acceleration in vacuum. The structure confines a synchronous speed-of-light accelerating mode in both transverse dimensions. We report the properties of this mode, including sustainable gradient and optical-to-beam efficiency. We present a novel method for confining a particle beam usin…
▽ More
We present the design and simulation of a three-dimensional photonic crystal waveguide for linear laser-driven acceleration in vacuum. The structure confines a synchronous speed-of-light accelerating mode in both transverse dimensions. We report the properties of this mode, including sustainable gradient and optical-to-beam efficiency. We present a novel method for confining a particle beam using optical fields as focusing elements. This technique, combined with careful structure design, is shown to have a large dynamic aperture and minimal emittance growth, even over millions of optical wavelengths.
△ Less
Submitted 20 November, 2007;
originally announced November 2007.
-
Photonic crystal laser-driven accelerator structures
Authors:
Benjamin M. Cowan
Abstract:
Laser-driven acceleration holds great promise for significantly improving accelerating gradient. However, scaling the conventional process of structure-based acceleration in vacuum down to optical wavelengths requires a substantially different kind of structure. We require an optical waveguide that (1) is constructed out of dielectric materials, (2) has transverse size on the order of a waveleng…
▽ More
Laser-driven acceleration holds great promise for significantly improving accelerating gradient. However, scaling the conventional process of structure-based acceleration in vacuum down to optical wavelengths requires a substantially different kind of structure. We require an optical waveguide that (1) is constructed out of dielectric materials, (2) has transverse size on the order of a wavelength, and (3) supports a mode with speed-of-light phase velocity in vacuum. Photonic crystals--structures whose electromagnetic properties are spatially periodic--can meet these requirements.
We discuss simulated photonic crystal accelerator structures and describe their properties. We begin with a class of two-dimensional structures which serves to illustrate the design considerations and trade-offs involved. We then present a three-dimensional structure, and describe its performance in terms of accelerating gradient and efficiency. We discuss particle beam dynamics in this structure, demonstrating a method for kee** a beam confined to the waveguide.
We also discuss material and fabrication considerations. Since accelerating gradient is limited by optical damage to the structure, the damage threshold of the dielectric is a critical parameter. We experimentally measure the damage threshold of silicon for picosecond pulses in the infrared, and determine that our structure is capable of sustaining an accelerating gradient of 300 MV/m at 1550 nm. Finally, we discuss possibilities for manufacturing these structures using common microfabrication techniques.
△ Less
Submitted 23 August, 2007;
originally announced August 2007.
-
Mesoscopic phase statistics of diffuse ultrasound in dynamic matter
Authors:
M. L. Cowan,
D. Anache-Ménier,
W. K. Hildebrand,
J. H. Page,
B. A. van Tiggelen
Abstract:
Temporal fluctuations in the phase of waves transmitted through a dynamic, strongly scattering, mesoscopic sample are investigated using ultrasonic waves, and compared with theoretical predictions based on circular Gaussian statistics. The fundamental role of phase in Diffusing Acoustic Wave Spectroscopy is revealed, and phase statistics are also shown to provide a sensitive and accurate way to…
▽ More
Temporal fluctuations in the phase of waves transmitted through a dynamic, strongly scattering, mesoscopic sample are investigated using ultrasonic waves, and compared with theoretical predictions based on circular Gaussian statistics. The fundamental role of phase in Diffusing Acoustic Wave Spectroscopy is revealed, and phase statistics are also shown to provide a sensitive and accurate way to probe scatterer motions at both short and long time scales.
△ Less
Submitted 1 June, 2007; v1 submitted 25 January, 2007;
originally announced January 2007.