-
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
Authors:
Joseph Liu,
Mahesh Kumar Nandwana,
Janne Pylkkönen,
Hannes Heikinheimo,
Morgan McGuire
Abstract:
Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier…
▽ More
Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier on large-scale datasets with real-world characteristics to validate the effectiveness of this framework. Through ablation studies, we demonstrate that general-purpose semantic text embeddings are rich and aligned with speech for toxicity classification purposes. Conducting experiments across multiple languages at scale, we show improvements in voice toxicity classification across five languages and different toxicity categories.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Authors:
Nameer Hirschkind,
Xiao Yu,
Mahesh Kumar Nandwana,
Joseph Liu,
Eloi DuBois,
Dao Le,
Nicolas Thiebaut,
Colin Sinclair,
Kyle Spence,
Charles Shang,
Zoe Abrams,
Morgan McGuire
Abstract:
We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer component of the architecture, comparing a Tacotron-based synthesizer to a novel diffusion-based synthesizer. We find the diffusion-based synthesizer to improve M…
▽ More
We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer component of the architecture, comparing a Tacotron-based synthesizer to a novel diffusion-based synthesizer. We find the diffusion-based synthesizer to improve MOS and PESQ audio quality metrics by 23\% each and speaker similarity by 5\% while maintaining comparable BLEU scores. Despite having more than double the parameter count, the diffusion synthesizer has lower latency, allowing the entire model to run more than 5$\times$ faster than real-time.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Lessons from Deploying CropFollow++: Under-Canopy Agricultural Navigation with Keypoints
Authors:
Arun N. Sivakumar,
Mateus V. Gasparino,
Michael McGuire,
Vitor A. H. Higuti,
M. Ugur Akcal,
Girish Chowdhary
Abstract:
We present a vision-based navigation system for under-canopy agricultural robots using semantic keypoints. Autonomous under-canopy navigation is challenging due to the tight spacing between the crop rows ($\sim 0.75$ m), degradation in RTK-GPS accuracy due to multipath error, and noise in LiDAR measurements from the excessive clutter. Our system, CropFollow++, introduces modular and interpretable…
▽ More
We present a vision-based navigation system for under-canopy agricultural robots using semantic keypoints. Autonomous under-canopy navigation is challenging due to the tight spacing between the crop rows ($\sim 0.75$ m), degradation in RTK-GPS accuracy due to multipath error, and noise in LiDAR measurements from the excessive clutter. Our system, CropFollow++, introduces modular and interpretable perception architecture with a learned semantic keypoint representation. We deployed CropFollow++ in multiple under-canopy cover crop planting robots on a large scale (25 km in total) in various field conditions and we discuss the key lessons learned from this.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
AdaptNet: Policy Adaptation for Physics-Based Character Control
Authors:
Pei Xu,
Kaixiang Xie,
Sheldon Andrews,
Paul G. Kry,
Michael Neff,
Morgan McGuire,
Ioannis Karamouzas,
Victor Zordan
Abstract:
Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state e…
▽ More
Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet.
△ Less
Submitted 14 November, 2023; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
Authors:
Liam Dugan,
Anshul Wadhawan,
Kyle Spence,
Chris Callison-Burch,
Morgan McGuire,
Victor Zordan
Abstract:
Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this wo…
▽ More
Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this work, we introduce a system for simultaneous S2ST targeting real-world use cases. Our system supports translation from 57 languages to English with tunable parameters for dynamically adjusting the latency of the output -- including four policies for determining when to speak an output sequence. We show that these policies achieve offline-level accuracy with minimal increases in latency over a Greedy (wait-$k$) baseline. We open-source our evaluation code and interactive test script to aid future SimulS2ST research and application development.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Variable Bitrate Neural Fields
Authors:
Towaki Takikawa,
Alex Evans,
Jonathan Tremblay,
Thomas Müller,
Morgan McGuire,
Alec Jacobson,
Sanja Fidler
Abstract:
Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. State-of-the-art results are obtained by conditioning a neural approximation with a lookup from trainable feature grids that take on part of the learning task and allow for smaller, more efficient neural networks. Unfortunately, these fea…
▽ More
Neural approximations of scalar and vector fields, such as signed distance functions and radiance fields, have emerged as accurate, high-quality representations. State-of-the-art results are obtained by conditioning a neural approximation with a lookup from trainable feature grids that take on part of the learning task and allow for smaller, more efficient neural networks. Unfortunately, these feature grids usually come at the cost of significantly increased memory consumption compared to stand-alone neural network models. We present a dictionary method for compressing such feature grids, reducing their memory consumption by up to 100x and permitting a multiresolution representation which can be useful for out-of-core streaming. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available and with dynamic topology and structure. Our source code will be available at https://github.com/nv-tlabs/vqad.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Experimental Augmented Reality User Experience
Authors:
Josef Spjut,
Fengyuan Zhu,
Xiaolei Huang,
Yichen Shou,
Ben Boudaoud,
Omer Shapira,
Morgan McGuire
Abstract:
Augmented Reality (AR) is an emerging field ripe for experimentation, especially when it comes to develo** the kinds of applications and experiences that will drive mass adoption of the technology. While we aren't aware of any current consumer product that realize a wearable, wide Field of View (FoV), AR Head Mounted Display (HMD), such devices will certainly come. In order for these sophisticat…
▽ More
Augmented Reality (AR) is an emerging field ripe for experimentation, especially when it comes to develo** the kinds of applications and experiences that will drive mass adoption of the technology. While we aren't aware of any current consumer product that realize a wearable, wide Field of View (FoV), AR Head Mounted Display (HMD), such devices will certainly come. In order for these sophisticated, likely high-cost hardware products to succeed, it is important they provide a high quality user experience. To that end, we prototyped 4 experimental applications for wide FoV displays that will likely exist in the future. Given current AR HMD limitations, we used a AR simulator built on web technology and VR headsets to demonstrate these applications, allowing users and designers to peer into the future.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
FirstPersonScience: Quantifying Psychophysics for First Person Shooter Tasks
Authors:
Josef Spjut,
Ben Boudaoud,
Kamran Binaee,
Zander Majercik,
Morgan McGuire,
Joohwan Kim
Abstract:
In the emerging field of esports research, there is an increasing demand for quantitative results that can be used by players, coaches and analysts to make decisions and present meaningful commentary for spectators. We present FirstPersonScience, a software application intended to fill this need in the esports community by allowing scientists to design carefully controlled experiments and capture…
▽ More
In the emerging field of esports research, there is an increasing demand for quantitative results that can be used by players, coaches and analysts to make decisions and present meaningful commentary for spectators. We present FirstPersonScience, a software application intended to fill this need in the esports community by allowing scientists to design carefully controlled experiments and capture accurate results in the First Person Shooter esports genre. An experiment designer can control a variety of parameters including target motion, weapon configuration, 3D scene, frame rate, and latency. Furthermore, we validate this application through careful end-to-end latency analysis and provide a case study showing how it can be used to demonstrate the training effect of one user given repeated task performance.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Wavelet Transparency
Authors:
Maksim Aizenshtein,
Niklas Smal,
Morgan McGuire
Abstract:
Order-independent transparency schemes rely on low-order approximations of transmittance as a function of depth. We introduce a new wavelet representation of this function and an algorithm for building and evaluating it efficiently on a GPU. We then extend the order-independent Phenomenological Transparency algorithm to our representation and introduce a new phenomenological approximation of chrom…
▽ More
Order-independent transparency schemes rely on low-order approximations of transmittance as a function of depth. We introduce a new wavelet representation of this function and an algorithm for building and evaluating it efficiently on a GPU. We then extend the order-independent Phenomenological Transparency algorithm to our representation and introduce a new phenomenological approximation of chromatic aberration under refraction. This generates comparable image quality to reference A-buffering for challenging cases such as smoke coverage, more realistic refraction, and comparable or better performance and bandwidth to the state-of-the-art Moment transparency with a simpler implementation.
△ Less
Submitted 31 December, 2021;
originally announced January 2022.
-
Dynamic Diffuse Global Illumination Resampling
Authors:
Zander Majercik,
Thomas Müller,
Alexander Keller,
Derek Nowrouzezahrai,
Morgan McGuire
Abstract:
Interactive global illumination remains a challenge in radiometrically- and geometrically-complex scenes. Specialized sampling strategies are effective for specular and near-specular transport because the scattering has relatively low directional variance per scattering event. In contrast, the high variance from transport paths comprising multiple rough glossy or diffuse scattering events remains…
▽ More
Interactive global illumination remains a challenge in radiometrically- and geometrically-complex scenes. Specialized sampling strategies are effective for specular and near-specular transport because the scattering has relatively low directional variance per scattering event. In contrast, the high variance from transport paths comprising multiple rough glossy or diffuse scattering events remains notoriously difficult to resolve with a small number of samples. We extend unidirectional path tracing to address this by combining screen-space reservoir resampling and sparse world-space probes, significantly improving sample efficiency for transport contributions that terminate on diffuse scattering events. Our experiments demonstrate a clear improvement -- at equal time and equal quality -- over purely path traced and purely probe-based baselines. Moreover, when combined with commodity denoisers, we are able to interactively render global illumination in complex scenes.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System
Authors:
Rachel Brown,
Vasha DuTell,
Bruce Walter,
Ruth Rosenholtz,
Peter Shirley,
Morgan McGuire,
David Luebke
Abstract:
Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer's visual field, especially with the rise of high resolution and wide field-of-vi…
▽ More
Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer's visual field, especially with the rise of high resolution and wide field-of-view display devices. However, while the ongoing study of foveal vision is advanced, much less is known about how humans process imagery in the periphery of their vision -- which comprises, at any given moment, the vast majority of the pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, we present a dataflow computational model of peripheral encoding that is more efficient than prior pooling - based methods and more compact than contrast sensitivity-based methods. Further, we account for the explicit encoding of "end stopped" features in the image, which was missing from previous methods. Finally, we evaluate our model in the context of perception of textures in the periphery. Our improved peripheral encoding may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
High Throughput Soybean Pod-Counting with In-Field Robotic Data Collection and Machine-Vision Based Data Analysis
Authors:
Michael McGuire,
Chinmay Soman,
Brian Diers,
Girish Chowdhary
Abstract:
We report promising results for high-throughput on-field soybean pod count with small mobile robots and machine-vision algorithms. Our results show that the machine-vision based soybean pod counts are strongly correlated with soybean yield. While pod counts has a strong correlation with soybean yield, pod counting is extremely labor intensive, and has been difficult to automate. Our results establ…
▽ More
We report promising results for high-throughput on-field soybean pod count with small mobile robots and machine-vision algorithms. Our results show that the machine-vision based soybean pod counts are strongly correlated with soybean yield. While pod counts has a strong correlation with soybean yield, pod counting is extremely labor intensive, and has been difficult to automate. Our results establish that an autonomous robot equipped with vision sensors can autonomously collect soybean data at maturity. Machine-vision algorithms can be used to estimate pod-counts across a large diversity panel planted across experimental units (EUs, or plots) in a high-throughput, automated manner. We report a correlation of 0.67 between our automated pod counts and soybean yield. The data was collected in an experiment consisting of 1463 single-row plots maintained by the University of Illinois soybean breeding program during the 2020 growing season. We also report a correlation of 0.88 between automated pod counts and manual pod counts over a smaller data set of 16 plots.
△ Less
Submitted 27 May, 2021; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Robust Vision-Based Cheat Detection in Competitive Gaming
Authors:
Aditya Jonnalagadda,
Iuri Frosio,
Seth Schneider,
Morgan McGuire,
Joohwan Kim
Abstract:
Game publishers and anti-cheat companies have been unsuccessful in blocking cheating in online gaming. We propose a novel, vision-based approach that captures the final state of the frame buffer and detects illicit overlays. To this aim, we train and evaluate a DNN detector on a new dataset, collected using two first-person shooter games and three cheating software. We study the advantages and dis…
▽ More
Game publishers and anti-cheat companies have been unsuccessful in blocking cheating in online gaming. We propose a novel, vision-based approach that captures the final state of the frame buffer and detects illicit overlays. To this aim, we train and evaluate a DNN detector on a new dataset, collected using two first-person shooter games and three cheating software. We study the advantages and disadvantages of different DNN architectures operating on a local or global scale. We use output confidence analysis to avoid unreliable detections and inform when network retraining is required. In an ablation study, we show how to use Interval Bound Propagation to build a detector that is also resistant to potential adversarial attacks and study its interaction with confidence analysis. Our results show that robust and effective anti-cheating through machine learning is practically feasible and can be used to guarantee fair play in online gaming.
△ Less
Submitted 27 March, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
A Distributed, Decoupled System for Losslessly Streaming Dynamic Light Probes to Thin Clients
Authors:
Michael Stengel,
Zander Majercik,
Benjamin Boudaoud,
Morgan McGuire
Abstract:
We present a networked, high performance graphics system that combines dynamic, high quality, ray traced global illumination computed on a server with direct illumination and primary visibility computed on a client. This approach provides many of the image quality benefits of real-time ray tracing on low-power and legacy hardware, while maintaining a low latency response and mobile form factor. Ou…
▽ More
We present a networked, high performance graphics system that combines dynamic, high quality, ray traced global illumination computed on a server with direct illumination and primary visibility computed on a client. This approach provides many of the image quality benefits of real-time ray tracing on low-power and legacy hardware, while maintaining a low latency response and mobile form factor. Our system distributes the graphic pipeline over a network by computing diffuse global illumination on a remote machine. Global illumination is computed using a recent irradiance volume representation combined with a novel, lossless, HEVC-based, hardware-accelerated encoding, and a perceptually-motivated update scheme. Our experimental implementation streams thousands of irradiance probes per second and requires less than 50 Mbps of throughput, reducing the consumed bandwidth by 99.4% when streaming at 60 Hz compared to traditional lossless texture compression. This bandwidth reduction allows higher quality and lower latency graphics than state-of-the-art remote rendering via video streaming. In addition, our split-rendering solution decouples remote computation from local rendering and so does not limit local display update rate or resolution.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes
Authors:
Towaki Takikawa,
Joey Litalien,
Kangxue Yin,
Karsten Kreis,
Charles Loop,
Derek Nowrouzezahrai,
Alec Jacobson,
Morgan McGuire,
Sanja Fidler
Abstract:
Neural signed distance functions (SDFs) are emerging as an effective representation for 3D shapes. State-of-the-art methods typically encode the SDF with a large, fixed-size neural network to approximate complex shapes with implicit surfaces. Rendering with these large networks is, however, computationally expensive since it requires many forward passes through the network for every pixel, making…
▽ More
Neural signed distance functions (SDFs) are emerging as an effective representation for 3D shapes. State-of-the-art methods typically encode the SDF with a large, fixed-size neural network to approximate complex shapes with implicit surfaces. Rendering with these large networks is, however, computationally expensive since it requires many forward passes through the network for every pixel, making these representations impractical for real-time graphics. We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality. We represent implicit surfaces using an octree-based feature volume which adaptively fits shapes with multiple discrete levels of detail (LODs), and enables continuous LOD with SDF interpolation. We further develop an efficient algorithm to directly render our novel neural SDF representation in real-time by querying only the necessary LODs with sparse octree traversal. We show that our representation is 2-3 orders of magnitude more efficient in terms of rendering speed compared to previous works. Furthermore, it produces state-of-the-art reconstruction quality for complex shapes under both 3D geometric and 2D image-space metrics.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
Learning Deformable Tetrahedral Meshes for 3D Reconstruction
Authors:
Jun Gao,
Wenzheng Chen,
Tommy Xiang,
Clement Fuji Tsang,
Alec Jacobson,
Morgan McGuire,
Sanja Fidler
Abstract:
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics. Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations. We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes…
▽ More
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics. Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations. We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem. Unlike existing volumetric approaches, DefTet optimizes for both vertex placement and occupancy, and is differentiable with respect to standard 3D reconstruction loss functions. It is thus simultaneously high-precision, volumetric, and amenable to learning-based neural architectures. We show that it can represent arbitrary, complex topology, is both memory and computationally efficient, and can produce high-fidelity reconstructions with a significantly smaller grid size than alternative volumetric approaches. The predicted surfaces are also inherently defined as tetrahedral meshes, thus do not require post-processing. We demonstrate that DefTet matches or exceeds both the quality of the previous best approaches and the performance of the fastest ones. Our approach obtains high-quality tetrahedral meshes computed directly from noisy point clouds, and is the first to showcase high-quality 3D tet-mesh results using only a single image as input. Our project webpage: https://nv-tlabs.github.io/DefTet/
△ Less
Submitted 23 November, 2020; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Scaling Probe-Based Real-Time Dynamic Global Illumination for Production
Authors:
Zander Majercik,
Adam Marrs,
Josef Spjut,
Morgan McGuire
Abstract:
We contribute several practical extensions to the probe based irradiance-field-with-visibility representation to improve image quality, constant and asymptotic performance, memory efficiency, and artist control. We developed these extensions in the process of incorporating the previous work into the global illumination solutions of the NVIDIA RTXGI SDK, the Unity and Unreal Engine 4 game engines,…
▽ More
We contribute several practical extensions to the probe based irradiance-field-with-visibility representation to improve image quality, constant and asymptotic performance, memory efficiency, and artist control. We developed these extensions in the process of incorporating the previous work into the global illumination solutions of the NVIDIA RTXGI SDK, the Unity and Unreal Engine 4 game engines, and proprietary engines for several commercial games. These extensions include: a single, intuitive tuning parameter (the "self-shadow" bias); heuristics to speed transitions in the global illumination; reuse of irradiance data as prefiltered radiance for recursive glossy reflection; a probe state machine to prune work that will not affect the final image; and multiresolution cascaded volumes for large worlds.
△ Less
Submitted 21 June, 2021; v1 submitted 22 September, 2020;
originally announced September 2020.
-
Hardware Trojan with Frequency Modulation
Authors:
Ash Luft,
Mihai Sima,
Michael McGuire
Abstract:
The use of third-party IP cores in implementing applications in FPGAs has given rise to the threat of malicious alterations through the insertion of hardware Trojans. To address this threat, it is important to predict the way hardware Trojans are built and to identify their weaknesses. This paper describes a logic family for implementing robust hardware Trojans, which can evade the two major detec…
▽ More
The use of third-party IP cores in implementing applications in FPGAs has given rise to the threat of malicious alterations through the insertion of hardware Trojans. To address this threat, it is important to predict the way hardware Trojans are built and to identify their weaknesses. This paper describes a logic family for implementing robust hardware Trojans, which can evade the two major detection methods, namely unused-circuit identification and side-channel analysis. This robustness is achieved by encoding information in frequency rather than amplitude so that the Trojan trigger circuitry's state will never stay constant during 'normal' operation. In addition, the power consumption of Trojan circuits built using the proposed logic family can be concealed with minimal design effort and supplementary hardware resources. Defense measures against hardware Trojans with frequency modulation are described.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera
Authors:
**gfan Wang,
Lyne P. Tchapmi,
Arvind P. Ravikumara,
Mike McGuire,
Clay S. Bell,
Daniel Zimmerle,
Silvio Savarese,
Adam R. Brandt
Abstract:
It is crucial to reduce natural gas methane emissions, which can potentially offset the climate benefits of replacing coal with gas. Optical gas imaging (OGI) is a widely-used method to detect methane leaks, but is labor-intensive and cannot provide leak detection results without operators' judgment. In this paper, we develop a computer vision approach to OGI-based leak detection using convolution…
▽ More
It is crucial to reduce natural gas methane emissions, which can potentially offset the climate benefits of replacing coal with gas. Optical gas imaging (OGI) is a widely-used method to detect methane leaks, but is labor-intensive and cannot provide leak detection results without operators' judgment. In this paper, we develop a computer vision approach to OGI-based leak detection using convolutional neural networks (CNN) trained on methane leak images to enable automatic detection. First, we collect ~1 M frames of labeled video of methane leaks from different leaking equipment for building CNN model, covering a wide range of leak sizes (5.3-2051.6 gCH4/h) and imaging distances (4.6-15.6 m). Second, we examine different background subtraction methods to extract the methane plume in the foreground. Third, we then test three CNN model variants, collectively called GasNet, to detect plumes in videos taken at other pieces of leaking equipment. We assess the ability of GasNet to perform leak detection by comparing it to a baseline method that uses optical-flow based change detection algorithm. We explore the sensitivity of results to the CNN structure, with a moderate-complexity variant performing best across distances. We find that the detection accuracy can reach as high as 99%, the overall detection accuracy can exceed 95% for a case across all leak sizes and imaging distances. Binary detection accuracy exceeds 97% for large leaks (~710 gCH4/h) imaged closely (~5-7 m). At closer imaging distances (~5-10 m), CNN-based models have greater than 94% accuracy across all leak sizes. At farthest distances (~13-16 m), performance degrades rapidly, but it can achieve above 95% accuracy to detect large leaks (>950 gCH4/h). The GasNet-based computer vision approach could be deployed in OGI surveys to allow automatic vigilance of methane leak detection with high detection accuracy in the real world.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.