-
Probabilistic Emulation of a Global Climate Model with Spherical DYffusion
Authors:
Salva Rühling Cachay,
Brian Henn,
Oliver Watt-Meyer,
Christopher S. Bretherton,
Rose Yu
Abstract:
Data-driven deep learning models are on the verge of transforming global weather forecasting. It is an open question if this success can extend to climate modeling, where long inference rollouts and data complexity pose significant challenges. Here, we present the first conditional generative model able to produce global climate ensemble simulations that are accurate and physically consistent. Our…
▽ More
Data-driven deep learning models are on the verge of transforming global weather forecasting. It is an open question if this success can extend to climate modeling, where long inference rollouts and data complexity pose significant challenges. Here, we present the first conditional generative model able to produce global climate ensemble simulations that are accurate and physically consistent. Our model runs at 6-hourly time steps and is shown to be stable for 10-year-long simulations. Our approach beats relevant baselines and nearly reaches a gold standard for successful climate model emulation. We discuss the key design choices behind our dynamics-informed diffusion model-based approach which enables this significant step towards efficient, data-driven climate simulations that can help us better understand the Earth and adapt to a changing climate.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Strong Approximations for Empirical Processes Indexed by Lipschitz Functions
Authors:
Matias D. Cattaneo,
Ruiqi Rae Yu
Abstract:
This paper presents new uniform Gaussian strong approximations for empirical processes indexed by classes of functions based on $d$-variate random vectors ($d\geq1$). First, a uniform Gaussian strong approximation is established for general empirical processes indexed by Lipschitz functions, encompassing and improving on all previous results in the literature. When specialized to the setting consi…
▽ More
This paper presents new uniform Gaussian strong approximations for empirical processes indexed by classes of functions based on $d$-variate random vectors ($d\geq1$). First, a uniform Gaussian strong approximation is established for general empirical processes indexed by Lipschitz functions, encompassing and improving on all previous results in the literature. When specialized to the setting considered by Rio (1994), and certain constraints on the function class hold, our result improves the approximation rate $n^{-1/(2d)}$ to $n^{-1/\max\{d,2\}}$, up to the same $\operatorname{polylog} n$ term, where $n$ denotes the sample size. Remarkably, we establish a valid uniform Gaussian strong approximation at the optimal rate $n^{-1/2}\log n$ for $d=2$, which was previously known to be valid only for univariate ($d=1$) empirical processes via the celebrated Hungarian construction (Komlós et al., 1975). Second, a uniform Gaussian strong approximation is established for a class of multiplicative separable empirical processes indexed by Lipschitz functions, which address some outstanding problems in the literature (Chernozhukov et al., 2014, Section 3). In addition, two other uniform Gaussian strong approximation results are presented for settings where the function class takes the form of a sequence of Haar basis based on generalized quasi-uniform partitions. We demonstrate the improvements and usefulness of our new strong approximation results with several statistical applications to nonparametric density and regression estimation.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Re-evaluating the impact of hormone replacement therapy on heart disease using match-adaptive randomization inference
Authors:
Samuel D. Pimentel,
Ruoqi Yu
Abstract:
Matching is an appealing way to design observational studies because it mimics the data structure produced by stratified randomized trials, pairing treated individuals with similar controls. After matching, inference is often conducted using methods tailored for stratified randomized trials in which treatments are permuted within matched pairs. However, in observational studies, matched pairs are…
▽ More
Matching is an appealing way to design observational studies because it mimics the data structure produced by stratified randomized trials, pairing treated individuals with similar controls. After matching, inference is often conducted using methods tailored for stratified randomized trials in which treatments are permuted within matched pairs. However, in observational studies, matched pairs are not predetermined before treatment; instead, they are constructed based on observed treatment status. This introduces a challenge as the permutation distributions used in standard inference methods do not account for the possibility that permuting treatments might lead to a different selection of matched pairs ($Z$-dependence). To address this issue, we propose a novel and computationally efficient algorithm that characterizes and enables sampling from the correct conditional distribution of treatment after an optimal propensity score matching, accounting for $Z$-dependence. We show how this new procedure, called match-adaptive randomization inference, corrects for an anticonservative result in a well-known observational study investigating the impact of hormone replacement theory (HRT) on coronary heart disease and corroborates experimental findings about heterogeneous effects of HRT across different ages of initiation in women. Keywords: matching, causal inference, propensity score, permutation test, Type I error, graphs.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations
Authors:
Yao Shu,
Jiongfeng Fang,
Ying Tiffany He,
Fei Richard Yu
Abstract:
First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately paralle…
▽ More
First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $Ω(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes
Authors:
Dongxia Wu,
Tsuyoshi Idé,
Aurélie Lozano,
Georgios Kollias,
Jiří Navrátil,
Naoki Abe,
Yi-An Ma,
Rose Yu
Abstract:
We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature e…
▽ More
We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature either requires strong assumptions, such as linearity in the intensity function, or heuristically defined model parameters that do not necessarily meet the requirements of Granger causality. We propose Instance-wise Self-Attentive Hawkes Processes (ISAHP), a novel deep learning framework that can directly infer the Granger causality at the event instance level. ISAHP is the first neural point process model that meets the requirements of Granger causality. It leverages the self-attention mechanism of the transformer to align with the principles of Granger causality. We empirically demonstrate that ISAHP is capable of discovering complex instance-level causal structures that cannot be handled by classical models. We also show that ISAHP achieves state-of-the-art performance in proxy tasks involving type-level causal discovery and instance-level event type prediction.
△ Less
Submitted 29 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Discovering Mixtures of Structural Causal Models from Time Series Data
Authors:
Sumanth Varambally,
Yi-An Ma,
Rose Yu
Abstract:
Discovering causal relationships from time series data is significant in fields such as finance, climate science, and neuroscience. However, contemporary techniques rely on the simplifying assumption that data originates from the same causal model, while in practice, data is heterogeneous and can stem from different causal models. In this work, we relax this assumption and perform causal discovery…
▽ More
Discovering causal relationships from time series data is significant in fields such as finance, climate science, and neuroscience. However, contemporary techniques rely on the simplifying assumption that data originates from the same causal model, while in practice, data is heterogeneous and can stem from different causal models. In this work, we relax this assumption and perform causal discovery from time series data originating from a mixture of causal models. We propose a general variational inference-based framework called MCD to infer the underlying causal models as well as the mixing probability of each sample. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood. We present two variants: MCD-Linear for linear relationships and independent noise, and MCD-Nonlinear for nonlinear causal relationships and history-dependent noise. We demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks through extensive experimentation on synthetic and real-world datasets, particularly when the data emanates from diverse underlying causal graphs. Theoretically, we prove the identifiability of such a model under some mild assumptions.
△ Less
Submitted 23 June, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Automatic Integration for Spatiotemporal Neural Point Processes
Authors:
Zihao Zhou,
Rose Yu
Abstract:
Learning continuous-time point processes is essential to many discrete event forecasting tasks. However, integration poses a major challenge, particularly for spatiotemporal point processes (STPPs), as it involves calculating the likelihood through triple integrals over space and time. Existing methods for integrating STPP either assume a parametric form of the intensity function, which lacks flex…
▽ More
Learning continuous-time point processes is essential to many discrete event forecasting tasks. However, integration poses a major challenge, particularly for spatiotemporal point processes (STPPs), as it involves calculating the likelihood through triple integrals over space and time. Existing methods for integrating STPP either assume a parametric form of the intensity function, which lacks flexibility; or approximating the intensity with Monte Carlo sampling, which introduces numerical errors. Recent work by Omi et al. [2019] proposes a dual network approach for efficient integration of flexible intensity function. However, their method only focuses on the 1D temporal point process. In this paper, we introduce a novel paradigm: AutoSTPP (Automatic Integration for Spatiotemporal Neural Point Processes) that extends the dual network approach to 3D STPP. While previous work provides a foundation, its direct extension overly restricts the intensity function and leads to computational challenges. In response, we introduce a decomposable parametrization for the integral network using ProdNet. This approach, leveraging the product of simplified univariate graphs, effectively sidesteps the computational complexities inherent in multivariate computational graphs. We prove the consistency of AutoSTPP and validate it on synthetic data and benchmark real-world datasets. AutoSTPP shows a significant advantage in recovering complex intensity functions from irregular spatiotemporal events, particularly when the intensity is sharply localized. Our code is open-source at https://github.com/Rose-STL-Lab/AutoSTPP.
△ Less
Submitted 31 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Balancing Weights for Causal Inference in Observational Factorial Studies
Authors:
Ruoqi Yu,
Peng Ding
Abstract:
Many scientific questions in biomedical, environmental, and psychological research involve understanding the impact of multiple factors on outcomes. While randomized factorial experiments are ideal for this purpose, randomization is infeasible in many empirical studies. Therefore, investigators often rely on observational data, where drawing reliable causal inferences for multiple factors remains…
▽ More
Many scientific questions in biomedical, environmental, and psychological research involve understanding the impact of multiple factors on outcomes. While randomized factorial experiments are ideal for this purpose, randomization is infeasible in many empirical studies. Therefore, investigators often rely on observational data, where drawing reliable causal inferences for multiple factors remains challenging. As the number of treatment combinations grows exponentially with the number of factors, some treatment combinations can be rare or even missing by chance in observed data, further complicating factorial effects estimation. To address these challenges, we propose a novel weighting method tailored to observational studies with multiple factors. Our approach uses weighted observational data to emulate a randomized factorial experiment, enabling simultaneous estimation of the effects of multiple factors and their interactions. Our investigations reveal a crucial nuance: achieving balance among covariates, as in single-factor scenarios, is necessary but insufficient for unbiasedly estimating factorial effects. Our findings suggest that balancing the factors is also essential in multi-factor settings. Moreover, we extend our weighting method to handle missing treatment combinations in observed data. Finally, we study the asymptotic behavior of the new weighting estimators and propose a consistent variance estimator, providing reliable inferences on factorial effects in observational studies.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
DYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting
Authors:
Salva Rühling Cachay,
Bo Zhao,
Hailey Joren,
Rose Yu
Abstract:
While diffusion models can successfully generate data and make predictions, they are predominantly designed for static images. We propose an approach for efficiently training diffusion models for probabilistic spatiotemporal forecasting, where generating stable and accurate rollout forecasts remains challenging, Our method, DYffusion, leverages the temporal dynamics in the data, directly coupling…
▽ More
While diffusion models can successfully generate data and make predictions, they are predominantly designed for static images. We propose an approach for efficiently training diffusion models for probabilistic spatiotemporal forecasting, where generating stable and accurate rollout forecasts remains challenging, Our method, DYffusion, leverages the temporal dynamics in the data, directly coupling it with the diffusion steps in the model. We train a stochastic, time-conditioned interpolator and a forecaster network that mimic the forward and reverse processes of standard diffusion models, respectively. DYffusion naturally facilitates multi-step and long-range forecasting, allowing for highly flexible, continuous-time sampling trajectories and the ability to trade-off performance with accelerated sampling at inference time. In addition, the dynamics-informed diffusion process in DYffusion imposes a strong inductive bias and significantly improves computational efficiency compared to traditional Gaussian noise-based diffusion models. Our approach performs competitively on probabilistic forecasting of complex dynamics in sea surface temperatures, Navier-Stokes flows, and spring mesh systems.
△ Less
Submitted 11 October, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Long-term Forecasting with TiDE: Time-series Dense Encoder
Authors:
Abhimanyu Das,
Weihao Kong,
Andrew Leach,
Shaan Mathur,
Rajat Sen,
Rose Yu
Abstract:
Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and…
▽ More
Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and non-linear dependencies. Theoretically, we prove that the simplest linear analogue of our model can achieve near optimal error rate for linear dynamical systems (LDS) under some assumptions. Empirically, we show that our method can match or outperform prior approaches on popular long-term time-series forecasting benchmarks while being 5-10x faster than the best Transformer based model.
△ Less
Submitted 4 April, 2024; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks
Authors:
Mohamed Aziz Bhouri,
Michael Joly,
Robert Yu,
Soumalya Sarkar,
Paris Perdikaris
Abstract:
Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluatio…
▽ More
Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.
△ Less
Submitted 14 September, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Copula Conformal Prediction for Multi-step Time Series Forecasting
Authors:
Sophia Sun,
Rose Yu
Abstract:
Accurate uncertainty measurement is a key step to building robust and reliable machine learning systems. Conformal prediction is a distribution-free uncertainty quantification algorithm popular for its ease of implementation, statistical coverage guarantees, and versatility for underlying forecasters. However, existing conformal prediction algorithms for time series are limited to single-step pred…
▽ More
Accurate uncertainty measurement is a key step to building robust and reliable machine learning systems. Conformal prediction is a distribution-free uncertainty quantification algorithm popular for its ease of implementation, statistical coverage guarantees, and versatility for underlying forecasters. However, existing conformal prediction algorithms for time series are limited to single-step prediction without considering the temporal dependency. In this paper, we propose a Copula Conformal Prediction algorithm for multivariate, multi-step Time Series forecasting, CopulaCPTS. We prove that CopulaCPTS has finite sample validity guarantee. On several synthetic and real-world multivariate time series datasets, we show that CopulaCPTS produces more calibrated and sharp confidence intervals for multi-step prediction tasks than existing techniques.
△ Less
Submitted 18 March, 2024; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Koopman Neural Forecaster for Time Series with Temporal Distribution Shifts
Authors:
Rui Wang,
Yihe Dong,
Sercan Ö. Arik,
Rose Yu
Abstract:
Temporal distributional shifts, with underlying dynamics changing over time, frequently occur in real-world time series and pose a fundamental challenge for deep neural networks (DNNs). In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting: Koopman Neural Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the coeffici…
▽ More
Temporal distributional shifts, with underlying dynamics changing over time, frequently occur in real-world time series and pose a fundamental challenge for deep neural networks (DNNs). In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting: Koopman Neural Forecaster (KNF) which leverages DNNs to learn the linear Koopman space and the coefficients of chosen measurement functions. KNF imposes appropriate inductive biases for improved robustness against distributional shifts, employing both a global operator to learn shared characteristics and a local operator to capture changing dynamics, as well as a specially-designed feedback loop to continuously update the learned operators over time for rapidly varying behaviors. We demonstrate that \ours{} achieves superior performance compared to the alternatives, on multiple time series datasets that are shown to suffer from distribution shifts.
△ Less
Submitted 28 February, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Deep Bayesian Active Learning for Accelerating Stochastic Simulation
Authors:
Dongxia Wu,
Ruijia Niu,
Matteo Chinazzi,
Alessandro Vespignani,
Yi-An Ma,
Rose Yu
Abstract:
Stochastic simulations such as large-scale, spatiotemporal, age-structured epidemic models are computationally expensive at fine-grained resolution. While deep surrogate models can speed up the simulations, doing so for stochastic simulations and with active learning approaches is an underexplored area. We propose Interactive Neural Process (INP), a deep Bayesian active learning framework for lear…
▽ More
Stochastic simulations such as large-scale, spatiotemporal, age-structured epidemic models are computationally expensive at fine-grained resolution. While deep surrogate models can speed up the simulations, doing so for stochastic simulations and with active learning approaches is an underexplored area. We propose Interactive Neural Process (INP), a deep Bayesian active learning framework for learning deep surrogate models to accelerate stochastic simulations. INP consists of two components, a spatiotemporal surrogate model built upon Neural Process (NP) family and an acquisition function for active learning. For surrogate modeling, we develop Spatiotemporal Neural Process (STNP) to mimic the simulator dynamics. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. We perform a theoretical analysis and demonstrate that LIG reduces sample complexity compared with random sampling in high dimensions. We also conduct empirical studies on three complex spatiotemporal simulators for reaction diffusion, heat flow, and infectious disease. The results demonstrate that STNP outperforms the baselines in the offline learning setting and LIG achieves the state-of-the-art for Bayesian active learning.
△ Less
Submitted 4 June, 2023; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Quantifying Uncertainty in Deep Spatiotemporal Forecasting
Authors:
Dongxia Wu,
Liyao Gao,
Xinyue Xiong,
Matteo Chinazzi,
Alessandro Vespignani,
Yi-An Ma,
Rose Yu
Abstract:
Deep learning is gaining increasing popularity for spatiotemporal forecasting. However, prior works have mostly focused on point estimates without quantifying the uncertainty of the predictions. In high stakes domains, being able to generate probabilistic forecasts with confidence intervals is critical to risk assessment and decision making. Hence, a systematic study of uncertainty quantification…
▽ More
Deep learning is gaining increasing popularity for spatiotemporal forecasting. However, prior works have mostly focused on point estimates without quantifying the uncertainty of the predictions. In high stakes domains, being able to generate probabilistic forecasts with confidence intervals is critical to risk assessment and decision making. Hence, a systematic study of uncertainty quantification (UQ) methods for spatiotemporal forecasting is missing in the community. In this paper, we describe two types of spatiotemporal forecasting problems: regular grid-based and graph-based. Then we analyze UQ methods from both the Bayesian and the frequentist point of view, casting in a unified framework via statistical decision theory. Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical and computational trade-offs for different UQ methods: Bayesian methods are typically more robust in mean prediction, while confidence levels obtained from frequentist methods provide more extensive coverage over data variations. Computationally, quantile regression type methods are cheaper for a single confidence interval but require re-training for different intervals. Sampling based methods generate samples that can form multiple confidence intervals, albeit at a higher computational cost.
△ Less
Submitted 12 June, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Treatment Effects Estimation by Uniform Transformer
Authors:
Ruoqi Yu,
Shulei Wang
Abstract:
In observational studies, balancing covariates in different treatment groups is essential to estimate treatment effects. One of the most commonly used methods for such purposes is weighting. The performance of this class of methods usually depends on strong regularity conditions for the underlying model, which might not hold in practice. In this paper, we investigate weighting methods from a funct…
▽ More
In observational studies, balancing covariates in different treatment groups is essential to estimate treatment effects. One of the most commonly used methods for such purposes is weighting. The performance of this class of methods usually depends on strong regularity conditions for the underlying model, which might not hold in practice. In this paper, we investigate weighting methods from a functional estimation perspective and argue that the weights needed for covariate balancing could differ from those needed for treatment effects estimation under low regularity conditions. Motivated by this observation, we introduce a new framework of weighting that directly targets the treatment effects estimation. Unlike existing methods, the resulting estimator for a treatment effect under this new framework is a simple kernel-based $U$-statistic after applying a data-driven transformation to the observed covariates. We characterize the theoretical properties of the new estimators of treatment effects under a nonparametric setting and show that they are able to work robustly under low regularity conditions. The new framework is also applied to several numerical examples to demonstrate its practical merits.
△ Less
Submitted 5 July, 2021; v1 submitted 9 August, 2020;
originally announced August 2020.
-
Dynamic Relational Inference in Multi-Agent Trajectories
Authors:
Ruichao Xiao,
Manish Kumar Singh,
Rose Yu
Abstract:
Inferring interactions from multi-agent trajectories has broad applications in physics, vision and robotics. Neural relational inference (NRI) is a deep generative model that can reason about relations in complex dynamics without supervision. In this paper, we take a careful look at this approach for relational inference in multi-agent trajectories. First, we discover that NRI can be fundamentally…
▽ More
Inferring interactions from multi-agent trajectories has broad applications in physics, vision and robotics. Neural relational inference (NRI) is a deep generative model that can reason about relations in complex dynamics without supervision. In this paper, we take a careful look at this approach for relational inference in multi-agent trajectories. First, we discover that NRI can be fundamentally limited without sufficient long-term observations. Its ability to accurately infer interactions degrades drastically for short output sequences. Next, we consider a more general setting of relational inference when interactions are changing overtime. We propose an extension ofNRI, which we call the DYnamic multi-AgentRelational Inference (DYARI) model that can reason about dynamic relations. We conduct exhaustive experiments to study the effect of model architecture, under-lying dynamics and training scheme on the performance of dynamic relational inference using a simulated physics system. We also showcase the usage of our model on real-world multi-agent basketball trajectories.
△ Less
Submitted 8 October, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Learning Disentangled Representations of Video with Missing Data
Authors:
Armand Comas-Massagué,
Chi Zhang,
Zlatan Feric,
Octavia Camps,
Rose Yu
Abstract:
Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pos…
▽ More
Missing data poses significant challenges while learning representations of video sequences. We present Disentangled Imputed Video autoEncoder (DIVE), a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE introduces a missingness latent variable, disentangles the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object. DIVE imputes each object's trajectory where data is missing. On a moving MNIST dataset with various missing scenarios, DIVE outperforms the state of the art baselines by a substantial margin. We also present comparisons for real-world MOTSChallenge pedestrian dataset, which demonstrates the practical value of our method in a more realistic setting. Our code and data can be found at https://github.com/Rose-STL-Lab/DIVE.
△ Less
Submitted 3 November, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
A Tutorial on VAEs: From Bayes' Rule to Lossless Compression
Authors:
Ronald Yu
Abstract:
The Variational Auto-Encoder (VAE) is a simple, efficient, and popular deep maximum likelihood model. Though usage of VAEs is widespread, the derivation of the VAE is not as widely understood. In this tutorial, we will provide an overview of the VAE and a tour through various derivations and interpretations of the VAE objective. From a probabilistic standpoint, we will examine the VAE through the…
▽ More
The Variational Auto-Encoder (VAE) is a simple, efficient, and popular deep maximum likelihood model. Though usage of VAEs is widespread, the derivation of the VAE is not as widely understood. In this tutorial, we will provide an overview of the VAE and a tour through various derivations and interpretations of the VAE objective. From a probabilistic standpoint, we will examine the VAE through the lens of Bayes' Rule, importance sampling, and the change-of-variables formula. From an information theoretic standpoint, we will examine the VAE through the lens of lossless compression and transmission through a noisy channel. We will then identify two common misconceptions over the VAE formulation and their practical consequences. Finally, we will visualize the capabilities and limitations of VAEs using a code example (with an accompanying Jupyter notebook) on toy 2D data.
△ Less
Submitted 30 June, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Aortic Pressure Forecasting with Deep Sequence Learning
Authors:
Eliza Huang,
Rui Wang,
Uma Chandrasekaran,
Rose Yu
Abstract:
Mean aortic pressure (MAP) is a major determinant of perfusion in all organs systems. The ability to forecast MAP would enhance the ability of physicians to estimate prognosis of the patient and assist in early detection of hemodynamic instability. However, forecasting MAP is challenging because the blood pressure (BP) time series is noisy and can be highly non-stationary. The aim of this study wa…
▽ More
Mean aortic pressure (MAP) is a major determinant of perfusion in all organs systems. The ability to forecast MAP would enhance the ability of physicians to estimate prognosis of the patient and assist in early detection of hemodynamic instability. However, forecasting MAP is challenging because the blood pressure (BP) time series is noisy and can be highly non-stationary. The aim of this study was to forecast the mean aortic pressure five minutes in advance, using the 25 Hz time series data of previous five minutes as input. We provide a benchmark study of different deep learning models for BP forecasting. We investigate a left ventricular dwelling transvalvular micro-axial device, the Impella, in patients undergoing high-risk percutaneous intervention. The Impella provides hemodynamic support, thus aiding in native heart function recovery. It is also equipped with pressure sensors to capture high frequency MAP measurements at origin, instead of peripherally. Our dataset and the clinical application is novel in the BP forecasting field. We performed a comprehensive study on time series with increasing, decreasing, and stationary trends. The experiments show that recurrent neural networks with Legendre Memory Unit achieve the best performance with an overall forecasting error of 1.8 mmHg.
△ Less
Submitted 16 October, 2020; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis
Authors:
Jung Yeon Park,
Kenneth Theo Carr,
Stephan Zheng,
Yisong Yue,
Rose Yu
Abstract:
Efficient and interpretable spatial analysis is crucial in many fields such as geology, sports, and climate science. Tensor latent factor models can describe higher-order correlations for spatial data. However, they are computationally expensive to train and are sensitive to initialization, leading to spatially incoherent, uninterpretable results. We develop a novel Multiresolution Tensor Learning…
▽ More
Efficient and interpretable spatial analysis is crucial in many fields such as geology, sports, and climate science. Tensor latent factor models can describe higher-order correlations for spatial data. However, they are computationally expensive to train and are sensitive to initialization, leading to spatially incoherent, uninterpretable results. We develop a novel Multiresolution Tensor Learning (MRTL) algorithm for efficiently learning interpretable spatial patterns. MRTL initializes the latent factors from an approximate full-rank tensor model for improved interpretability and progressively learns from a coarse resolution to the fine resolution to reduce computation. We also prove the theoretical convergence and computational complexity of MRTL. When applied to two real-world datasets, MRTL demonstrates 4~5x speedup compared to a fixed resolution approach while yielding accurate and interpretable latent factors.
△ Less
Submitted 14 August, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.
-
Incorporating Symmetry into Deep Dynamics Models for Improved Generalization
Authors:
Rui Wang,
Robin Walters,
Rose Yu
Abstract:
Recent work has shown deep learning can accelerate the prediction of physical dynamics relative to numerical solvers. However, limited physical accuracy and an inability to generalize under distributional shift limit its applicability to the real world. We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks. Specifically, we employ a variet…
▽ More
Recent work has shown deep learning can accelerate the prediction of physical dynamics relative to numerical solvers. However, limited physical accuracy and an inability to generalize under distributional shift limit its applicability to the real world. We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks. Specifically, we employ a variety of methods each tailored to enforce a different symmetry. Our models are both theoretically and experimentally robust to distributional shift by symmetry group transformations and enjoy favorable sample complexity. We demonstrate the advantage of our approach on a variety of physical dynamics including Rayleigh Bénard convection and real-world ocean currents and temperatures. Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems with complex dynamics. We open-source our simulation, data, and code at \url{https://github.com/Rose-STL-Lab/Equivariant-Net}.
△ Less
Submitted 15 March, 2021; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Deep Technology Tracing for High-tech Companies
Authors:
Han Wu,
Kun Zhang,
Guangyi Lv,
Qi Liu,
Runlong Yu,
Weihao Zhao,
Enhong Chen,
Jianhui Ma
Abstract:
Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (D…
▽ More
Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (DTF) framework, to automatically find the most possible technology directions customized to each high-tech company. Specially, DTF consists of three components: Potential Competitor Recognition (PCR), Collaborative Technology Recognition (CTR), and Deep Technology Tracing (DTT) neural network. For one thing, PCR and CTR aim to capture competitive relations among enterprises and collaborative relations among technologies, respectively. For another, DTT is designed for modeling dynamic interactions between companies and technologies with the above relations involved. Finally, we evaluate our DTF framework on real-world patent data, and the experimental results clearly prove that DTF can precisely help to prospect future technology emphasis of companies by exploiting hybrid factors.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Towards Physics-informed Deep Learning for Turbulent Flow Prediction
Authors:
Rui Wang,
Karthik Kashinath,
Mustafa Mustafa,
Adrian Albert,
Rose Yu
Abstract:
While deep learning has shown tremendous success in a wide range of domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this paper, we aim to predict turbulent flow by learning its highly nonlinear dynamics from spatiotemporal velocity fields of large-scale fluid flow simulations of relevance to t…
▽ More
While deep learning has shown tremendous success in a wide range of domains, it remains a grand challenge to incorporate physical principles in a systematic manner to the design, training, and inference of such models. In this paper, we aim to predict turbulent flow by learning its highly nonlinear dynamics from spatiotemporal velocity fields of large-scale fluid flow simulations of relevance to turbulence modeling and climate modeling. We adopt a hybrid approach by marrying two well-established turbulent flow simulation techniques with deep learning. Specifically, we introduce trainable spectral filters in a coupled model of Reynolds-averaged Navier-Stokes (RANS) and Large Eddy Simulation (LES), followed by a specialized U-net for prediction. Our approach, which we call turbulent-Flow Net (TF-Net), is grounded in a principled physics model, yet offers the flexibility of learned representations. We compare our model, TF-Net, with state-of-the-art baselines and observe significant reductions in error for predictions 60 frames ahead. Most importantly, our method predicts physical fields that obey desirable physical characteristics, such as conservation of mass, whilst faithfully emulating the turbulent kinetic energy field and spectrum, which are critical for accurate prediction of turbulent flows.
△ Less
Submitted 13 June, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Robust Federated Learning with Noisy Communication
Authors:
Fan Ang,
Li Chen,
Nan Zhao,
Yunfei Chen,
Weidong Wang,
F. Richard Yu
Abstract:
Federated learning is a communication-efficient training process that alternates between local training at the edge devices and averaging the updated local model at the central server. Nevertheless, it is impractical to achieve a perfect acquisition of the local models in wireless communication due to noise, which also brings serious effects on federated learning. To tackle this challenge, we prop…
▽ More
Federated learning is a communication-efficient training process that alternates between local training at the edge devices and averaging the updated local model at the central server. Nevertheless, it is impractical to achieve a perfect acquisition of the local models in wireless communication due to noise, which also brings serious effects on federated learning. To tackle this challenge, we propose a robust design for federated learning to alleviate the effects of noise in this paper. Considering noise in the two aforementioned steps, we first formulate the training problem as a parallel optimization for each node under the expectation-based model and the worst-case model. Due to the non-convexity of the problem, a regularization for the loss function approximation method is proposed to make it tractable. Regarding the worst-case model, we develop a feasible training scheme which utilizes the sampling-based successive convex approximation algorithm to tackle the unavailable maxima or minima noise condition and the non-convex issue of the objective function. Furthermore, the convergence rates of both new designs are analyzed from a theoretical point of view. Finally, the improvement of prediction accuracy and the reduction of loss function are demonstrated via simulations for the proposed designs.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
On the Global Optima of Kernelized Adversarial Representation Learning
Authors:
Bashir Sadeghi,
Runyi Yu,
Vishnu Naresh Boddeti
Abstract:
Adversarial representation learning is a promising paradigm for obtaining data representations that are invariant to certain sensitive attributes while retaining the information necessary for predicting target attributes. Existing approaches solve this problem through iterative adversarial minimax optimization and lack theoretical guarantees. In this paper, we first study the "linear" form of this…
▽ More
Adversarial representation learning is a promising paradigm for obtaining data representations that are invariant to certain sensitive attributes while retaining the information necessary for predicting target attributes. Existing approaches solve this problem through iterative adversarial minimax optimization and lack theoretical guarantees. In this paper, we first study the "linear" form of this problem i.e., the setting where all the players are linear functions. We show that the resulting optimization problem is both non-convex and non-differentiable. We obtain an exact closed-form expression for its global optima through spectral learning and provide performance guarantees in terms of analytical bounds on the achievable utility and invariance. We then extend this solution and analysis to non-linear functions through kernel representation. Numerical experiments on UCI, Extended Yale B and CIFAR-100 datasets indicate that, (a) practically, our solution is ideal for "imparting" provable invariance to any biased pre-trained data representation, and (b) empirically, the trade-off between utility and invariance provided by our solution is comparable to iterative minimax optimization of existing deep neural network based approaches. Code is available at https://github.com/human-analysis/Kernel-ARL
△ Less
Submitted 25 December, 2019; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Adversarial shape perturbations on 3D point clouds
Authors:
Daniel Liu,
Ronald Yu,
Hao Su
Abstract:
The importance of training robust neural network grows as 3D data is increasingly utilized in deep learning for vision tasks in robotics, drone control, and autonomous driving. One commonly used 3D data type is 3D point clouds, which describe shape information. We examine the problem of creating robust models from the perspective of the attacker, which is necessary in understanding how 3D neural n…
▽ More
The importance of training robust neural network grows as 3D data is increasingly utilized in deep learning for vision tasks in robotics, drone control, and autonomous driving. One commonly used 3D data type is 3D point clouds, which describe shape information. We examine the problem of creating robust models from the perspective of the attacker, which is necessary in understanding how 3D neural networks can be exploited. We explore two categories of attacks: distributional attacks that involve imperceptible perturbations to the distribution of points, and shape attacks that involve deforming the shape represented by a point cloud. We explore three possible shape attacks for attacking 3D point cloud classification and show that some of them are able to be effective even against preprocessing steps, like the previously proposed point-removal defenses.
△ Less
Submitted 23 October, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology
Authors:
Nima Dehmamy,
Albert-László Barabási,
Rose Yu
Abstract:
To deepen our understanding of graph neural networks, we investigate the representation power of Graph Convolutional Networks (GCN) through the looking glass of graph moments, a key property of graph topology encoding path of various lengths. We find that GCNs are rather restrictive in learning graph moments. Without careful design, GCNs can fail miserably even with multiple layers and nonlinear a…
▽ More
To deepen our understanding of graph neural networks, we investigate the representation power of Graph Convolutional Networks (GCN) through the looking glass of graph moments, a key property of graph topology encoding path of various lengths. We find that GCNs are rather restrictive in learning graph moments. Without careful design, GCNs can fail miserably even with multiple layers and nonlinear activation functions. We analyze theoretically the expressiveness of GCNs, concluding a modular GCN design, using different propagation rules with residual connections could significantly improve the performance of GCN. We demonstrate that such modular designs are capable of distinguishing graphs from different graph generation models for surprisingly small graphs, a notoriously difficult problem in network science. Our investigation suggests that, depth is much more influential than width, with deeper GCNs being more capable of learning higher order graph moments. Additionally, combining GCN modules with different propagation rules is critical to the representation power of GCNs.
△ Less
Submitted 31 October, 2019; v1 submitted 11 July, 2019;
originally announced July 2019.
-
NAOMI: Non-Autoregressive Multiresolution Sequence Imputation
Authors:
Yukai Liu,
Rose Yu,
Stephan Zheng,
Eric Zhan,
Yisong Yue
Abstract:
Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NA…
▽ More
Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NAOMI) to impute long-range sequences given arbitrary missing patterns. NAOMI exploits the multiresolution structure of spatiotemporal data and decodes recursively from coarse to fine-grained resolutions using a divide-and-conquer strategy. We further enhance our model with adversarial training. When evaluated extensively on benchmark datasets from systems of both deterministic and stochastic dynamics. NAOMI demonstrates significant improvement in imputation accuracy (reducing average prediction error by 60% compared to autoregressive counterparts) and generalization for long range sequences.
△ Less
Submitted 29 October, 2019; v1 submitted 30 January, 2019;
originally announced January 2019.
-
Extending Adversarial Attacks and Defenses to Deep 3D Point Cloud Classifiers
Authors:
Daniel Liu,
Ronald Yu,
Hao Su
Abstract:
3D object classification and segmentation using deep neural networks has been extremely successful. As the problem of identifying 3D objects has many safety-critical applications, the neural networks have to be robust against adversarial changes to the input data set. There is a growing body of research on generating human-imperceptible adversarial attacks and defenses against them in the 2D image…
▽ More
3D object classification and segmentation using deep neural networks has been extremely successful. As the problem of identifying 3D objects has many safety-critical applications, the neural networks have to be robust against adversarial changes to the input data set. There is a growing body of research on generating human-imperceptible adversarial attacks and defenses against them in the 2D image classification domain. However, 3D objects have various differences with 2D images, and this specific domain has not been rigorously studied so far.
We present a preliminary evaluation of adversarial attacks on deep 3D point cloud classifiers, namely PointNet and PointNet++, by evaluating both white-box and black-box adversarial attacks that were proposed for 2D images and extending those attacks to reduce the perceptibility of the perturbations in 3D space. We also show the high effectiveness of simple defenses against those attacks by proposing new defenses that exploit the unique structure of 3D point clouds. Finally, we attempt to explain the effectiveness of the defenses through the intrinsic structures of both the point clouds and the neural network architectures. Overall, we find that networks that process 3D point cloud data are weak to adversarial attacks, but they are also more easily defensible compared to 2D image classifiers. Our investigation will provide the groundwork for future studies on improving the robustness of deep neural networks that handle 3D data.
△ Less
Submitted 28 June, 2019; v1 submitted 9 January, 2019;
originally announced January 2019.
-
Efficient Tensor Decomposition with Boolean Factors
Authors:
Sung-En Chang,
Xun Zheng,
Ian E. H. Yen,
Pradeep Ravikumar,
Rose Yu
Abstract:
Tensor decomposition has been extensively used as a tool for exploratory analysis. Motivated by neuroscience applications, we study tensor decomposition with Boolean factors. The resulting optimization problem is challenging due to the non-convex objective and the combinatorial constraints. We propose Binary Matching Pursuit (BMP), a novel generalization of the matching pursuit strategy to decompo…
▽ More
Tensor decomposition has been extensively used as a tool for exploratory analysis. Motivated by neuroscience applications, we study tensor decomposition with Boolean factors. The resulting optimization problem is challenging due to the non-convex objective and the combinatorial constraints. We propose Binary Matching Pursuit (BMP), a novel generalization of the matching pursuit strategy to decompose the tensor efficiently. BMP iteratively searches for atoms in a greedy fashion. The greedy atom search step is solved efficiently via a MAXCUT-like boolean quadratic program. We prove that BMP is guaranteed to converge sublinearly to the optimal solution and recover the factors under mild identifiability conditions. Experiments demonstrate the superior performance of our method over baselines on synthetic and real datasets. We also showcase the application of BMP in quantifying neural interactions underlying high-resolution spatiotemporal ECoG recordings.
△ Less
Submitted 11 November, 2020; v1 submitted 10 October, 2018;
originally announced October 2018.
-
Scene Learning: Deep Convolutional Networks For Wind Power Prediction by Embedding Turbines into Grid Space
Authors:
Ruiguo Yu,
Zhiqiang Liu,
Xuewei Li,
Wenhuan Lu,
Mei Yu,
Jianrong Wang,
Bin Li
Abstract:
Wind power prediction is of vital importance in wind power utilization. There have been a lot of researches based on the time series of the wind power or speed, but In fact, these time series cannot express the temporal and spatial changes of wind, which fundamentally hinders the advance of wind power prediction. In this paper, a new kind of feature that can describe the process of temporal and sp…
▽ More
Wind power prediction is of vital importance in wind power utilization. There have been a lot of researches based on the time series of the wind power or speed, but In fact, these time series cannot express the temporal and spatial changes of wind, which fundamentally hinders the advance of wind power prediction. In this paper, a new kind of feature that can describe the process of temporal and spatial variation is proposed, namely, Spatio-Temporal Features. We first map the data collected at each moment from the wind turbine to the plane to form the state map, namely, the scene, according to the relative positions. The scene time series over a period of time is a multi-channel image, i.e. the Spatio-Temporal Features. Based on the Spatio-Temporal Features, the deep convolutional network is applied to predict the wind power, achieving a far better accuracy than the existing methods. Compared with the starge-of-the-art method, the mean-square error (MSE) in our method is reduced by 49.83%, and the average time cost for training models can be shortened by a factor of more than 150.
△ Less
Submitted 17 July, 2018; v1 submitted 15 July, 2018;
originally announced July 2018.
-
Utilizing Bluetooth and Adaptive Signal Control Data for Urban Arterials Safety Analysis
Authors:
**ghui Yuan,
Mohamed Abdel-Aty,
Ling Wang,
Jaeyoung Lee,
Rongjie Yu,
Xuesong Wang
Abstract:
Real-time safety analysis has become a hot research topic as it can more accurately reveal the relationships between real-time traffic characteristics and crash occurrence, and these results could be applied to improve active traffic management systems and enhance safety performance. Most of the previous studies have been applied to freeways and seldom to arterials. This study attempts to examine…
▽ More
Real-time safety analysis has become a hot research topic as it can more accurately reveal the relationships between real-time traffic characteristics and crash occurrence, and these results could be applied to improve active traffic management systems and enhance safety performance. Most of the previous studies have been applied to freeways and seldom to arterials. This study attempts to examine the relationship between crash occurrence and real-time traffic and weather characteristics based on four urban arterials in Central Florida. Considering the substantial difference between the interrupted urban arterials and the access controlled freeways, the adaptive signal phasing data was introduced in addition to the traditional traffic data. Bayesian conditional logistic models were developed by incorporating the Bluetooth, adaptive signal control, and weather data, which were extracted for a period of 20 minutes (four 5-minute intervals) before the time of crash occurrence. Model comparison results indicated that the model based on 5-10 minute interval dataset performs the best. It revealed that the average speed, upstream left-turn volume, downstream green ratio, and rainy indicator were found to have significant effects on crash occurrence. Furthermore, both Bayesian random parameters logistic and Bayesian random parameters conditional logistic models were developed to compare with the Bayesian conditional logistic model, and the Bayesian random parameters conditional logistic model was found to have the best model performance in terms of the AUC and DIC values. These results are important in real-time safety applications in the context of Integrated Active Traffic Management.
△ Less
Submitted 20 May, 2018;
originally announced May 2018.
-
Real-Time Crash Risk Analysis of Urban Arterials Incorporating Bluetooth, Weather, and Adaptive Signal Control Data
Authors:
**ghui Yuan,
Mohamed Abdel-Aty,
Ling Wang,
Jaeyoung Lee,
Xuesong Wang,
Rongjie Yu
Abstract:
Real-time safety analysis has become a hot research topic as it can reveal the relationship between real-time traffic characteristics and crash occurrence more accurately, and these results could be applied to improve active traffic management systems and enhance safety performance. Most of the previous studies have been applied to freeways and seldom to arterials. Therefore, this study attempts t…
▽ More
Real-time safety analysis has become a hot research topic as it can reveal the relationship between real-time traffic characteristics and crash occurrence more accurately, and these results could be applied to improve active traffic management systems and enhance safety performance. Most of the previous studies have been applied to freeways and seldom to arterials. Therefore, this study attempts to examine the relationship between crash occurrence and real-time traffic and weather characteristics based on four urban arterials in Central Florida. Considering the substantial difference between the interrupted traffic flow on urban arterials and the free flow on freeways, the adaptive signal phasing was also introduced in this study. Bayesian conditional logistic models were developed by incorporating the Bluetooth, adaptive signal control, and weather data, which were extracted for a period of 20 minutes (four 5-minute interval) before the time of crash occurrence. Model comparison results indicate that the model based on 5-10 minute interval dataset is the most appropriate model. It reveals that the average speed, upstream volume, and rainy weather indicator were found to have significant effects on crash occurrence. Furthermore, both Bayesian logistic and Bayesian random effects logistic models were developed to compare with the Bayesian conditional logistic model, and the Bayesian conditional logistic model was found to be much better than the other two models. These results are important in real-time safety applications in the context of Integrated Active Traffic Management.
△ Less
Submitted 20 May, 2018;
originally announced May 2018.
-
Tensor Regression Meets Gaussian Processes
Authors:
Rose Yu,
Guangyu Li,
Yan Liu
Abstract:
Low-rank tensor regression, a new model class that learns high-order correlation from data, has recently received considerable attention. At the same time, Gaussian processes (GP) are well-studied machine learning models for structure learning. In this paper, we demonstrate interesting connections between the two, especially for multi-way data analysis. We show that low-rank tensor regression is e…
▽ More
Low-rank tensor regression, a new model class that learns high-order correlation from data, has recently received considerable attention. At the same time, Gaussian processes (GP) are well-studied machine learning models for structure learning. In this paper, we demonstrate interesting connections between the two, especially for multi-way data analysis. We show that low-rank tensor regression is essentially learning a multi-linear kernel in Gaussian processes, and the low-rank assumption translates to the constrained Bayesian inference problem. We prove the oracle inequality and derive the average case learning curve for the equivalent GP model. Our finding implies that low-rank tensor regression, though empirically successful, is highly dependent on the eigenvalues of covariance functions as well as variable correlations.
△ Less
Submitted 31 October, 2017;
originally announced October 2017.
-
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Authors:
Yaguang Li,
Rose Yu,
Cyrus Shahabi,
Yan Liu
Abstract:
Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain. Traffic forecasting is one canonical example of such learning task. The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we…
▽ More
Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain. Traffic forecasting is one canonical example of such learning task. The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic forecasting that incorporates both spatial and temporal dependency in the traffic flow. Specifically, DCRNN captures the spatial dependency using bidirectional random walks on the graph, and the temporal dependency using the encoder-decoder architecture with scheduled sampling. We evaluate the framework on two real-world large scale road network traffic datasets and observe consistent improvement of 12% - 15% over state-of-the-art baselines.
△ Less
Submitted 22 February, 2018; v1 submitted 6 July, 2017;
originally announced July 2017.
-
Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data
Authors:
Paroma Varma,
Bryan He,
Dan Iter,
Peng Xu,
Rose Yu,
Christopher De Sa,
Christopher Ré
Abstract:
A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy param…
▽ More
A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set. In particular, they fail to model latent subsets in the training data in which the supervision sources perform differently than on average. We present Socratic learning, a paradigm that uses feedback from a corresponding discriminative model to automatically identify these subsets and augments the structure of the generative model accordingly. Experimentally, we show that without any ground truth labels, the augmented generative model reduces error by up to 56.06% for a relation extraction task compared to a state-of-the-art weak supervision technique that utilizes generative models.
△ Less
Submitted 28 September, 2017; v1 submitted 25 October, 2016;
originally announced October 2016.