Search | arXiv e-print repository

Anatomy of Elite and Mass Polarization in Social Networks

Authors: Ali Salloum, Ted Hsuan Yun Chen, Mikko Kivelä

Abstract: Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon.… ▽ More Existing methods for quantifying polarization in social networks typically report a single value describing the amount of polarization in a social system. While this approach can be used to confirm the observation that many societies have witnessed an increase in political polarization in recent years, it misses the complexities that could be used to understand the reasons behind this phenomenon. Notably, opposing groups can have unequal impact on polarization, and the elites are often understood to be more divided than the masses, making it critical to differentiate their roles in polarized systems. We propose a method to characterize these distinct hierarchies in polarized networks, enabling separate polarization measurements for these groups within a single social system. Applied to polarized topics in the Finnish Twittersphere surrounding the 2019 and 2023 parliamentary elections, our analysis reveals valuable insights: 1) The impact of opposing groups on observed polarization is rarely balanced, and 2) while the elite strongly contributes to structural polarization and consistently display greater alignment across various topics, the masses have also recently experienced a surge in issue alignment, a special form of polarization. Our findings suggest that the masses may not be as immune to an increasingly polarized environment as previously thought. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10148 [pdf, other]

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

Authors: Liuyuan Jiang, Quan Xiao, Victor M. Tenorio, Fernando Real-Rojas, Antonio Marques, Tianyi Chen

Abstract: Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around develo** efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without… ▽ More Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around develo** efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without constraints, or featuring only simple constraints that do not couple variables across the upper and lower levels, excluding a range of complex applications. Our paper studies this challenging but less explored scenario and develops a (fully) first-order algorithm, which we term BLOCC, to tackle BiLevel Optimization problems with Coupled Constraints. We establish rigorous convergence theory for the proposed algorithm and demonstrate its effectiveness on two well-known real-world applications - hyperparameter selection in support vector machine (SVM) and infrastructure planning in transportation networks using the real data from the city of Seville. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.04713 [pdf, other]

FlowMM: Generating Materials with Riemannian Flow Matching

Authors: Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram, Brandon M Wood

Abstract: Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area ar… ▽ More Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area are to (a) predict the stable crystal structure of a known composition of elements and (b) propose novel compositions along with their stable structures. We present FlowMM, a pair of generative models that achieve state-of-the-art performance on both tasks while being more efficient and more flexible than competing methods. We generalize Riemannian Flow Matching to suit the symmetries inherent to crystals: translation, rotation, permutation, and periodic boundary conditions. Our framework enables the freedom to choose the flow base distributions, drastically simplifying the problem of learning crystal structures compared with diffusion models. In addition to standard benchmarks, we validate FlowMM's generated structures with quantum chemistry calculations, demonstrating that it is about 3x more efficient, in terms of integration steps, at finding stable materials compared to previous open methods. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: https://github.com/facebookresearch/flowmm

Journal ref: ICML 2024

arXiv:2406.00288 [pdf, other]

Neural Optimal Transport with Lagrangian Costs

Authors: Aram-Alexandre Pooladian, Carles Domingo-Enrich, Ricky T. Q. Chen, Brandon Amos

Abstract: We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations are useful when connecting observations from a physical system where the transport dynamics are influenced by the geometry of the system, such as obstacles (e.g., incorporating barrier f… ▽ More We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations are useful when connecting observations from a physical system where the transport dynamics are influenced by the geometry of the system, such as obstacles (e.g., incorporating barrier functions in the Lagrangian), and allows practitioners to incorporate a priori knowledge of the underlying system such as non-Euclidean geometries (e.g., paths must be circular). Our contributions are of computational interest, where we demonstrate the ability to efficiently compute geodesics and amortize spline-based paths, which has not been done before, even in low dimensional problems. Unlike prior work, we also output the resulting Lagrangian optimal transport map without requiring an ODE solver. We demonstrate the effectiveness of our formulation on low-dimensional examples taken from prior work. The source code to reproduce our experiments is available at https://github.com/facebookresearch/lagrangian-ot. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: UAI 2024

arXiv:2405.16381 [pdf, other]

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Authors: Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

Abstract: The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the po… ▽ More The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates to a new momentum variable that stays in a simple $\textbf{fixed vector space}$. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency. The resulting method achieves state-of-the-art performance on protein and RNA torsion angle generation and sophisticated torus datasets. We also, arguably for the first time, tackle the generation of data on high-dimensional Special Orthogonal and Unitary groups, the latter essential for quantum problems. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15920 [pdf, other]

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Authors: Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specif… ▽ More This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.16173

arXiv:2405.11111 [pdf, other]

Euclidean mirrors and first-order changepoints in network time series

Authors: Tianyi Chen, Zachary Lubberts, Avanti Athreya, Youngser Park, Carey E. Priebe

Abstract: We describe a model for a network time series whose evolution is governed by an underlying stochastic process, known as the latent position process, in which network evolution can be represented in Euclidean space by a curve, called the Euclidean mirror. We define the notion of a first-order changepoint for a time series of networks, and construct a family of latent position process networks with… ▽ More We describe a model for a network time series whose evolution is governed by an underlying stochastic process, known as the latent position process, in which network evolution can be represented in Euclidean space by a curve, called the Euclidean mirror. We define the notion of a first-order changepoint for a time series of networks, and construct a family of latent position process networks with underlying first-order changepoints. We prove that a spectral estimate of the associated Euclidean mirror localizes these changepoints, even when the graph distribution evolves continuously, but at a rate that changes. Simulated and real data examples on organoid networks show that this localization captures empirically significant shifts in network evolution. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.07098 [pdf, other]

Interpretable global minima of deep ReLU neural networks on sequentially separable data

Authors: Thomas Chen, Patricia Muñoz Ewald

Abstract: We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linea… ▽ More We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: AMS Latex, 22 pages, 3 figures

MSC Class: 57R70; 62M45

arXiv:2404.06336 [pdf, other]

Quantum State Generation with Structure-Preserving Diffusion Model

Authors: Yuchen Zhu, Tianrong Chen, Evangelos A. Theodorou, Xie Chen, Molei Tao

Abstract: This article considers the generative modeling of the (mixed) states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and t… ▽ More This article considers the generative modeling of the (mixed) states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and trace one. Generic diffusion models, or other generative methods, may not be able to generate data that strictly satisfy these structural constraints, even if all training data do. To develop a machine learning algorithm that has physics hard-wired in, we leverage mirror diffusion and borrow the physical notion of von Neumann entropy to design a new map, for enabling strict structure-preserving generation. Both unconditional generation and conditional generation via classifier-free guidance are experimentally demonstrated efficacious, the latter enabling the design of new quantum states when generated on unseen labels. △ Less

Submitted 25 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.02060 [pdf, other]

Expectile Periodograms

Authors: Tianbo Chen

Abstract: In this paper, we introduce a periodogram-like function, called expectile periodograms, for detecting and estimating hidden periodicity from observations with asymmetrically distributed noise. The expectile periodograms are constructed from trigonometric expectile regression where a specially designed objective function is used to substitute the squared $l_2$ norm that leads to the ordinary period… ▽ More In this paper, we introduce a periodogram-like function, called expectile periodograms, for detecting and estimating hidden periodicity from observations with asymmetrically distributed noise. The expectile periodograms are constructed from trigonometric expectile regression where a specially designed objective function is used to substitute the squared $l_2$ norm that leads to the ordinary periodograms. The expectile periodograms have properties which are analogous to quantile periodograms, which provide a broader view of the time series by examining different expectile levels, but are much faster to calculate. The asymptotic properties are discussed and simulations show its efficiency and robustness in the presence of hidden periodicities with asymmetric or heavy-tailed noise. Finally, we leverage the inherent two-dimensional characteristics of the expectile periodograms and train a deep-learning (DL) model to classify the earthquake waveform data. Remarkably, our approach achieves heightened classification testing accuracy when juxtaposed with alternative periodogram-based methodologies. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.06886 [pdf, other]

Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF

Authors: Han Shen, Zhuoran Yang, Tianyi Chen

Abstract: Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structures are considered. But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functio… ▽ More Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised learning setting, where static objective functions with benign structures are considered. But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functions that go beyond the simple static objective structures, which pose significant challenges of using existing bilevel solutions. To tackle this new class of bilevel problems, we introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation. We provide theoretical studies of the problem landscape and its penalty-based (policy) gradient algorithms. We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design. △ Less

Submitted 31 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: Shorter version accepted to ICML 2024

arXiv:2401.06980 [pdf, other]

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Authors: A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

Abstract: In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel op… ▽ More In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: This paper has been accepted in ICASSP-2024 conference

arXiv:2401.03482 [pdf, other]

Uncertainty Quantification on Clinical Trial Outcome Prediction

Authors: Tianyi Chen, Yingzhou Lu, Nan Hao, Capucine Van Rechem, **tai Chen, Tianfan Fu

Abstract: The importance of uncertainty quantification is increasingly recognized in the diverse field of machine learning. Accurately assessing model prediction uncertainty can help provide deeper understanding and confidence for researchers and practitioners. This is especially critical in medical diagnosis and drug discovery areas, where reliable predictions directly impact research quality and patient h… ▽ More The importance of uncertainty quantification is increasingly recognized in the diverse field of machine learning. Accurately assessing model prediction uncertainty can help provide deeper understanding and confidence for researchers and practitioners. This is especially critical in medical diagnosis and drug discovery areas, where reliable predictions directly impact research quality and patient health. In this paper, we proposed incorporating uncertainty quantification into clinical trial outcome predictions. Our main goal is to enhance the model's ability to discern nuanced differences, thereby significantly improving its overall performance. We have adopted a selective classification approach to fulfill our objective, integrating it seamlessly with the Hierarchical Interaction Network (HINT), which is at the forefront of clinical trial prediction modeling. Selective classification, encompassing a spectrum of methods for uncertainty quantification, empowers the model to withhold decision-making in the face of samples marked by ambiguity or low confidence, thereby amplifying the accuracy of predictions for the instances it chooses to classify. A series of comprehensive experiments demonstrate that incorporating selective classification into clinical trial predictions markedly enhances the model's performance, as evidenced by significant upticks in pivotal metrics such as PR-AUC, F1, ROC-AUC, and overall accuracy. Specifically, the proposed method achieved 32.37\%, 21.43\%, and 13.27\% relative improvement on PR-AUC over the base model (HINT) in phase I, II, and III trial outcome prediction, respectively. When predicting phase III, our method reaches 0.9022 PR-AUC scores. These findings illustrate the robustness and prospective utility of this strategy within the area of clinical trial predictions, potentially setting a new benchmark in the field. △ Less

Submitted 18 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.03228 [pdf, other]

Reflected Schrödinger Bridge for Constrained Generative Modeling

Authors: Wei Deng, Yu Chen, Nicole Tianjiao Yang, Hengrong Du, Qi Feng, Ricky T. Q. Chen

Abstract: Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process… ▽ More Diffusion models have become the go-to method for large-scale generative models in real-world applications. These applications often involve data distributions confined within bounded domains, typically requiring ad-hoc thresholding techniques for boundary enforcement. Reflected diffusion models (Lou23) aim to enhance generalizability by generating the data distribution through a backward process governed by reflected Brownian motion. However, reflected diffusion models may not easily adapt to diverse domains without the derivation of proper diffeomorphic map**s and do not guarantee optimal transport properties. To overcome these limitations, we introduce the Reflected Schrodinger Bridge algorithm: an entropy-regularized optimal transport approach tailored for generating data within diverse bounded domains. We derive elegant reflected forward-backward stochastic differential equations with Neumann and Robin boundary conditions, extend divergence-based likelihood training to bounded domains, and explore natural connections to entropic optimal transport for the study of approximate linear convergence - a valuable insight for practical training. Our algorithm yields robust generative modeling in diverse domains, and its scalability is demonstrated in real-world constrained generative modeling through standard image benchmarks. △ Less

Submitted 6 January, 2024; originally announced January 2024.

arXiv:2312.13331 [pdf, other]

A Bayesian Spatial Berkson error approach to estimate small area opioid mortality rates accounting for population-at-risk uncertainty

Authors: Emily N Peterson, Rachel C. Nethery, Jarvis T. Chen, Loni P. Tabb, Brent A. Coull, Frederic B. Piel, Lance A Waller

Abstract: Monitoring small-area geographical population trends in opioid mortality has large scale implications to informing preventative resource allocation. A common approach to obtain small area estimates of opioid mortality is to use a standard disease map** approach in which population-at-risk estimates are treated as fixed and known. Assuming fixed populations ignores the uncertainty surrounding sma… ▽ More Monitoring small-area geographical population trends in opioid mortality has large scale implications to informing preventative resource allocation. A common approach to obtain small area estimates of opioid mortality is to use a standard disease map** approach in which population-at-risk estimates are treated as fixed and known. Assuming fixed populations ignores the uncertainty surrounding small area population estimates, which may bias risk estimates and under-estimate their associated uncertainties. We present a Bayesian Spatial Berkson Error (BSBE) model to incorporate population-at-risk uncertainty within a disease map** model. We compare the BSBE approach to the naive (treating denominators as fixed) using simulation studies to illustrate potential bias resulting from this assumption. We show the application of the BSBE model to obtain 2020 opioid mortality risk estimates for 159 counties in GA accounting for population-at-risk uncertainty. Utilizing our proposed approach will help to inform interventions in opioid related public health responses, policies, and resource allocation. Additionally, we provide a general framework to improve in the estimation and map** of health indicators. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.05250 [pdf, other]

TaskMet: Task-Driven Metric Learning for Model Learning

Authors: Dishank Bansal, Ricky T. Q. Chen, Mustafa Mukadam, Brandon Amos

Abstract: Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a di… ▽ More Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a differentiable surrogate that the model can be trained on. In these settings, the task loss needs to be carefully balanced with the prediction loss because they may have conflicting objectives. We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states. The source code to reproduce our experiments is available at https://github.com/facebookresearch/taskmet △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023

arXiv:2312.02027 [pdf, other]

Stochastic Optimal Control Matching

Authors: Carles Domingo-Enrich, Jiequn Han, Brandon Amos, Joan Bruna, Ricky T. Q. Chen

Abstract: Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffu… ▽ More Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest. Code at https://github.com/facebookresearch/SOC-matching △ Less

Submitted 28 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01260 [pdf, other]

Rethinking PGD Attack: Is Sign Function Necessary?

Authors: Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang

Abstract: Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as… ▽ More Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD. △ Less

Submitted 20 May, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.16375 [pdf, other]

Testing for a difference in means of a single feature after clustering

Authors: Yiqun T. Chen, Lucy L. Gao

Abstract: For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a s… ▽ More For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or $k$-means clustering. The test based on the proposed $p$-value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data. △ Less

Submitted 27 November, 2023; originally announced November 2023.

MSC Class: 62H30; 62H15; 62P10

arXiv:2311.15487 [pdf, ps, other]

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

Authors: Thomas Chen

Abstract: We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here i… ▽ More We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here instead, we choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network. This naturally induces two modified versions of the gradient descent flow in the parameter space, one adapted for the overparametrized setting, and the other for the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stop** time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry. Moreover, we generalize the above framework to the situation in which the rank condition does not hold; in particular, we show that local equilibria can only exist if a rank loss occurs, and that generically, they are not isolated points, but elements of a critical submanifold of parameter space. △ Less

Submitted 10 April, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: AMS Latex, 20 pages. Significantly edited and extended, abstract changed

MSC Class: 57R70; 62M45

arXiv:2311.13443 [pdf, other]

Guided Flows for Generative Modeling and Decision Making

Authors: Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, Ricky T. Q. Chen

Abstract: Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks. While it has previously demonstrated remarkable improvements for the sample quality, it has only been exclusively employed for diffusion models. In this paper, we integrate classifier-free guidance into Flow Matching (FM) models, an alternative simulation-free approach t… ▽ More Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks. While it has previously demonstrated remarkable improvements for the sample quality, it has only been exclusively employed for diffusion models. In this paper, we integrate classifier-free guidance into Flow Matching (FM) models, an alternative simulation-free approach that trains Continuous Normalizing Flows (CNFs) based on regressing vector fields. We explore the usage of \emph{Guided Flows} for a variety of downstream applications. We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text-to-speech synthesis, boasting state-of-the-art performance. Notably, we are the first to apply flow models for plan generation in the offline reinforcement learning setting, showcasing a 10x speedup in computation compared to diffusion models while maintaining comparable performance. △ Less

Submitted 7 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.11913 [pdf, other]

Deep Calibration of Market Simulations using Neural Density Estimators and Embedding Networks

Authors: Namid R. Stillman, Rory Baggott, Justin Lyon, Jianfei Zhang, Dingqiu Zhu, Tao Chen, Perukrishnen Vytelingum

Abstract: The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts… ▽ More The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts and statistics. However, the ability to calibrate simulators to a specific period of trading remains an open challenge. In this work, we develop a novel approach to the calibration of market simulators by leveraging recent advances in deep learning, specifically using neural density estimators and embedding networks. We demonstrate that our approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts. △ Less

Submitted 27 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: 4th ACM International Conference on AI in Finance (ICAIF 2023)

arXiv:2311.07065 [pdf, ps, other]

Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning

Authors: Thomas Chen, Patricia Muñoz Ewald

Abstract: We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduce… ▽ More We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: AMS Latex, 7 pages

MSC Class: 57R70; 62M45

arXiv:2311.06978 [pdf, other]

Augmented Bridge Matching

Authors: Valentin De Bortoli, Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Weilie Nie

Abstract: Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. I… ▽ More Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. In this paper, we highlight that while flow and bridge matching processes preserve the information of the marginal distributions, they do \emph{not} necessarily preserve the coupling information unless additional, stronger optimality conditions are met. This can be problematic if one aims at preserving the original empirical pairing. We show that a simple modification of the matching process recovers this coupling by augmenting the velocity field (or drift) with the information of the initial sample point. Doing so, we lose the Markovian property of the process but preserve the coupling information between distributions. We illustrate the efficiency of our augmentation in learning mixture of image translation tasks. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.04691 [pdf]

Sustainable Collaborative Strategy in Pharmaceutical Refrigerated Logistics Routing Problem

Authors: Tingting Chen, Feng Chu, Jiantong Zhang, Jiaqing Sun

Abstract: The rapid growth of pharmaceutical refrigerated logistics poses sustainability challenges, including elevated costs, energy consumption, and resource inefficiency. Collaborating multiple depots can enhance logistics efficiency when standalone distribution centers have limited transport resources, i.e., refrigerated vehicles. However, the sustainable benefits and performance across different strate… ▽ More The rapid growth of pharmaceutical refrigerated logistics poses sustainability challenges, including elevated costs, energy consumption, and resource inefficiency. Collaborating multiple depots can enhance logistics efficiency when standalone distribution centers have limited transport resources, i.e., refrigerated vehicles. However, the sustainable benefits and performance across different strategies remain unexplored. This study fills this research gap by addressing a refrigerated pharmaceutical routing problem. While many collaborative strategies prioritize economic and environmental benefits, our approach highlights a vital social indicator: maintaining vehicle flow equilibrium at each depot during collaboration. This ensures the stability of transport resources for all stakeholders, promoting sustainable collaborative logistics. The problem is formulated as a multi-depot vehicle routing problem with time windows (MDVRPTW). Three collaborative strategies using Clustering VRP (CLUVRP) and improved Open VRP (OVRP) are proposed and compared. We develop two approaches to address traditional OVRP limitations in ensuring vehicle flow equilibrium at each depot. Our models consider perishable pharmaceuticals and time-dependent travel speeds. Three hybrid heuristics based on Simulated Annealing and Variable Neighborhood Search (SAVNS) are proposed and evaluated for efficacy. Computational experiments and a case study demonstrate distinct sustainable benefits across various strategies, offering valuable insights for decision-makers in the refrigerated logistics market. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.08649 [pdf, other]

Time-vectorized numerical integration for systems of ODEs

Authors: Mark C. Messner, Tianchen Hu, Tianju Chen

Abstract: Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential equations through time and calculating parameter gradients with the adjoint method. The main innovation is to vectorize the problem both over the number of independent… ▽ More Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential equations through time and calculating parameter gradients with the adjoint method. The main innovation is to vectorize the problem both over the number of independent times series and over a batch or "chunk" of sequential time steps, effectively vectorizing the assembly of the implicit system of ODEs. The block-bidiagonal structure of the linearized implicit system for the backward Euler method allows for further vectorization using parallel cyclic reduction (PCR). Vectorizing over both axes of the input data provides a higher bandwidth of calculations to the computing device, allowing even problems with comparatively sparse data to fully utilize modern GPUs and achieving speed ups of greater than 100x, compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.02679 [pdf, other]

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Authors: Dinghuai Zhang, Ricky T. Q. Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio

Abstract: We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajector… ▽ More We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional "flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods. △ Less

Submitted 9 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted by ICLR 2024

arXiv:2310.02233 [pdf, other]

Generalized Schrödinger Bridge Matching

Authors: Guan-Horng Liu, Yaron Lipman, Maximilian Nickel, Brian Karrer, Evangelos A. Theodorou, Ricky T. Q. Chen

Abstract: Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generaliz… ▽ More Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generalized Schrödinger Bridge (GSB), appears prevalently in many scientific areas both within and without machine learning. We propose Generalized Schrödinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances, generalizing them beyond kinetic energy minimization and to account for task-specific state costs. We show that such a generalization can be cast as solving conditional stochastic optimal control, for which efficient variational approximations can be used, and further debiased with the aid of path integral theory. Compared to prior methods for solving GSB problems, our GSBM algorithm better preserves a feasible transport map between the boundary distributions throughout training, thereby enabling stable convergence and significantly improved scalability. We empirically validate our claims on an extensive suite of experimental setups, including crowd navigation, opinion depolarization, LiDAR manifolds, and image domain transfer. Our work brings new algorithmic opportunities for training diffusion models enhanced with task-specific optimality structures. Code available at https://github.com/facebookresearch/generalized-schrodinger-bridge-matching △ Less

Submitted 18 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: ICLR 2024 Camera Ready

arXiv:2310.01236 [pdf, other]

Mirror Diffusion Models for Constrained and Watermarked Generation

Authors: Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Molei Tao

Abstract: Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data i… ▽ More Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and $\ell_2$-balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains. Our code is available at https://github.com/ghliu/mdm △ Less

Submitted 29 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: submitted to NeurIPS on 5/18 but did not arxiv per NeurIPS policy, accepted on 9/22

arXiv:2309.10639 [pdf, ps, other]

Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers

Authors: Thomas Chen, Patricia Muñoz Ewald

Abstract: In this paper, we explicitly determine local and global minimizers of the $\mathcal{L}^2$ cost function in underparametrized Deep Learning (DL) networks; our main goal is to shed light on their geometric structure and properties. We accomplish this by a direct construction, without invoking the gradient descent flow at any point of this work. We specifically consider $L$ hidden layers, a ReLU ramp… ▽ More In this paper, we explicitly determine local and global minimizers of the $\mathcal{L}^2$ cost function in underparametrized Deep Learning (DL) networks; our main goal is to shed light on their geometric structure and properties. We accomplish this by a direct construction, without invoking the gradient descent flow at any point of this work. We specifically consider $L$ hidden layers, a ReLU ramp activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input and output spaces $\mathbb{R}^Q$ with equal dimension $Q\geq1$, and hidden layers also defined on $\mathbb{R}^{Q}$; the training inputs are assumed to be sufficiently clustered. The training input size $N$ can be arbitrarily large - thus, we are considering the underparametrized regime. More general settings are left to future work. We construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function. In the context presented here, the concatenation of hidden layers of the DL network is reinterpreted as a recursive application of a {\em truncation map} which "curates" the training inputs by minimizing their noise to signal ratio. △ Less

Submitted 14 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: AMS Latex, 22 pages. Typos corrected, slightly extended

MSC Class: 57R70; 62M45

arXiv:2309.10370 [pdf, ps, other]

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

Authors: Thomas Chen, Patricia Muñoz Ewald

Abstract: In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow neural networks through the explicit construction of upper bounds, without any use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider shallow neural networks with one hidden layer, a ReLU activation function, an ${\mathcal L}^2$… ▽ More In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow neural networks through the explicit construction of upper bounds, without any use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider shallow neural networks with one hidden layer, a ReLU activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$ that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(δ_P)$ where $δ_P$ measures the signal to noise ratio of training inputs. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(δ_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}^M$. We comment on the characterization of the global minimum of the cost function in the given context. △ Less

Submitted 17 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: AMS Latex, 28 pages. Exposition has been streamlined

MSC Class: 57R70; 62M45

arXiv:2309.08165 [pdf, other]

To Predict or to Reject: Causal Effect Estimation with Uncertainty on Networked Data

Authors: Hechuan Wen, Tong Chen, Li Kheng Chai, Shazia Sadiq, Kai Zheng, Hongzhi Yin

Abstract: Due to the imbalanced nature of networked observational data, the causal effect predictions for some individuals can severely violate the positivity/overlap assumption, rendering unreliable estimations. Nevertheless, this potential risk of individual-level treatment effect estimation on networked data has been largely under-explored. To create a more trustworthy causal effect estimator, we propose… ▽ More Due to the imbalanced nature of networked observational data, the causal effect predictions for some individuals can severely violate the positivity/overlap assumption, rendering unreliable estimations. Nevertheless, this potential risk of individual-level treatment effect estimation on networked data has been largely under-explored. To create a more trustworthy causal effect estimator, we propose the uncertainty-aware graph deep kernel learning (GraphDKL) framework with Lipschitz constraint to model the prediction uncertainty with Gaussian process and identify unreliable estimations. To the best of our knowledge, GraphDKL is the first framework to tackle the violation of positivity assumption when performing causal effect estimation with graphs. With extensive experiments, we demonstrate the superiority of our proposed method in uncertainty-aware causal effect estimation on networked data. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Accepted by ICDM'23

arXiv:2309.07867 [pdf, other]

Beta Diffusion

Authors: Mingyuan Zhou, Tianqi Chen, Zhendong Wang, Huangjie Zheng

Abstract: We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, giv… ▽ More We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them. △ Less

Submitted 24 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: NeurIPS 2023

arXiv:2308.08177 [pdf]

Develo** a Conceptual Tribal Crash Safety Dashboard: Data-Driven Strategies for Identifying High-Risk Areas and Enhancing Tribal Safety Programs

Authors: Tianyi Chen, Haotian Shi, Steven T. Parker, Glenn Vorhes, David A Noyce, Bin Ran

Abstract: Tribal lands in the United States have consistently exhibited higher crash rates and injury severities compared to other regions. To address this issue, effective data-driven safety analysis methods are essential for resource allocation and tribal safety program development. This study outlines the minimum data requirements and presents a generic tribal crash dashboard prototype to enable feasible… ▽ More Tribal lands in the United States have consistently exhibited higher crash rates and injury severities compared to other regions. To address this issue, effective data-driven safety analysis methods are essential for resource allocation and tribal safety program development. This study outlines the minimum data requirements and presents a generic tribal crash dashboard prototype to enable feasible and reproducible tribal crash analysis. The dashboard offers statistical performance measurement techniques, tracks tribal safety trends using various indicators, and updates in with incoming data, making it adaptable for any state. Specifically, the dashboard's capabilities are showcased using Wisconsin tribal crash data, locating high-risk tribal land areas while analyzing and comparing statewide crashes and tribal crashes based on severity. The results reveal that tribal land roads, particularly rural ones, are more dangerous than those in other Wisconsin areas. The dashboard also examines crash types to uncover the causes and insights behind Wisconsin tribal crashes. Demonstrating this dashboard concept reveals its potential to facilitate a more intuitive understanding of tribal crashes from a statistical standpoint, thereby enabling more effective applications of available data sources and the development of targeted safety countermeasures. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2306.17361 [pdf, other]

iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models

Authors: Tianyu Chen, Kevin Bello, Bryon Aragam, Pradeep Ravikumar

Abstract: Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the underlying causal structure is often unknown, and estimating it from data remains a challenging task. In many situations, however, the end goal is to localize the changes (shifts) in the causal mechanisms between related datasets instead of… ▽ More Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the underlying causal structure is often unknown, and estimating it from data remains a challenging task. In many situations, however, the end goal is to localize the changes (shifts) in the causal mechanisms between related datasets instead of learning the full causal structure of the individual datasets. Some applications include root cause analysis, analyzing gene regulatory network structure changes between healthy and cancerous individuals, or explaining distribution shifts. This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables -- without estimating the entire DAG structure of each SCM. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key technical contribution of this work is to show that the Jacobian of the score function for the mixture distribution allows for the identification of shifts under general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN. △ Less

Submitted 12 January, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 36 pages, 18 figures. Published at NeurIPS 2023

arXiv:2306.13271 [pdf, other]

Variational Counterfactual Prediction under Runtime Domain Corruption

Authors: Hechuan Wen, Tong Chen, Li Kheng Chai, Shazia Sadiq, Junbin Gao, Hongzhi Yin

Abstract: To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. Th… ▽ More To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.06626 [pdf, other]

On Kinetic Optimal Probability Paths for Generative Models

Authors: Neta Shaul, Ricky T. Q. Chen, Maximilian Nickel, Matt Le, Yaron Lipman

Abstract: Recent successful generative models are trained by fitting a neural network to an a-priori defined tractable probability density path taking noise to training examples. In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense. In particular, minimizing the Kinetic Energy (KE) of a path i… ▽ More Recent successful generative models are trained by fitting a neural network to an a-priori defined tractable probability density path taking noise to training examples. In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense. In particular, minimizing the Kinetic Energy (KE) of a path is known to make particles' trajectories simple, hence easier to sample, and empirically improve performance in terms of likelihood of unseen data and sample generation quality. We investigate Kinetic Optimal (KO) Gaussian paths and offer the following observations: (i) We show the KE takes a simplified form on the space of Gaussian paths, where the data is incorporated only through a single, one dimensional scalar function, called the \emph{data separation function}. (ii) We characterize the KO solutions with a one dimensional ODE. (iii) We approximate data-dependent KO paths by approximating the data separation function and minimizing the KE. (iv) We prove that the data separation function converges to $1$ in the general case of arbitrary normalized dataset consisting of $n$ samples in $d$ dimension as $n/\sqrt{d}\rightarrow 0$. A consequence of this result is that the Conditional Optimal Transport (Cond-OT) path becomes \emph{kinetic optimal} as $n/\sqrt{d}\rightarrow 0$. We further support this theory with empirical experiments on ImageNet. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2305.18375 [pdf, other]

Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling

Authors: Tianqi Chen, Mingyuan Zhou

Abstract: Learning to denoise has emerged as a prominent paradigm to design state-of-the-art deep generative models for natural images. How to use it to model the distributions of both continuous real-valued data and categorical data has been well studied in recently proposed diffusion models. However, it is found in this paper to have limited ability in modeling some other types of data, such as count and… ▽ More Learning to denoise has emerged as a prominent paradigm to design state-of-the-art deep generative models for natural images. How to use it to model the distributions of both continuous real-valued data and categorical data has been well studied in recently proposed diffusion models. However, it is found in this paper to have limited ability in modeling some other types of data, such as count and non-negative continuous data, that are often highly sparse, skewed, heavy-tailed, and/or overdispersed. To this end, we propose learning to jump as a general recipe for generative modeling of various types of data. Using a forward count thinning process to construct learning objectives to train a deep neural network, it employs a reverse count thickening process to iteratively refine its generation through that network. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better. For example, learning to jump is recommended when the training data is non-negative and exhibits strong sparsity, skewness, heavy-tailedness, and/or heterogeneity. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2303.04871 [pdf, other]

Discovering a change point and piecewise linear structure in a time series of organoid networks via the iso-mirror

Authors: Tianyi Chen, Youngser Park, Ali Saad-Eldin, Zachary Lubberts, Avanti Athreya, Benjamin D. Pedigo, Joshua T. Vogelstein, Francesca Puppo, Gabriel A. Silva, Alysson R. Muotri, Weiwei Yang, Christopher M. White, Carey E. Priebe

Abstract: Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to t… ▽ More Recent advancements have been made in the development of cell-based in-vitro neuronal networks, or organoids. In order to better understand the network structure of these organoids, a super-selective algorithm has been proposed for inferring the effective connectivity networks from multi-electrode array data. In this paper, we apply a novel statistical method called spectral mirror estimation to the time series of inferred effective connectivity organoid networks. This method produces a one-dimensional iso-mirror representation of the dynamics of the time series of the networks which exhibits a piecewise linear structure. A classical change point algorithm is then applied to this representation, which successfully detects a change point coinciding with the neuroscientifically significant time inhibitory neurons start appearing and the percentage of astrocytes increases dramatically. This finding demonstrates the potential utility of applying the iso-mirror dynamic structure discovery method to inferred effective connectivity time series of organoid networks. △ Less

Submitted 12 April, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.01751 [pdf, other]

Deep Momentum Multi-Marginal Schrödinger Bridge

Authors: Tianrong Chen, Guan-Horng Liu, Molei Tao, Evangelos A. Theodorou

Abstract: It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schrödinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Mar… ▽ More It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schrödinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chrödinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible. △ Less

Submitted 5 October, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2303.00039 [pdf, other]

M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation

Authors: Junjie Yang, Xuxi Chen, Tianlong Chen, Zhangyang Wang, Yingbin Liang

Abstract: Learning to Optimize (L2O) has drawn increasing attention as it often remarkably accelerates the optimization procedure of complex tasks by ``overfitting" specific task type, leading to enhanced performance compared to analytical optimizers. Generally, L2O develops a parameterized optimization method (i.e., ``optimizer") by learning from solving sample problems. This data-driven procedure yields L… ▽ More Learning to Optimize (L2O) has drawn increasing attention as it often remarkably accelerates the optimization procedure of complex tasks by ``overfitting" specific task type, leading to enhanced performance compared to analytical optimizers. Generally, L2O develops a parameterized optimization method (i.e., ``optimizer") by learning from solving sample problems. This data-driven procedure yields L2O that can efficiently solve problems similar to those seen in training, that is, drawn from the same ``task distribution". However, such learned optimizers often struggle when new test problems come with a substantially deviation from the training task distribution. This paper investigates a potential solution to this open challenge, by meta-training an L2O optimizer that can perform fast test-time self-adaptation to an out-of-distribution task, in only a few steps. We theoretically characterize the generalization of L2O, and further show that our proposed framework (termed as M-L2O) provably facilitates rapid task adaptation by locating well-adapted initial points for the optimizer weight. Empirical observations on several classic tasks like LASSO and Quadratic, demonstrate that M-L2O converges significantly faster than vanilla L2O with only $5$ steps of adaptation, echoing our theoretical results. Codes are available in https://github.com/VITA-Group/M-L2O. △ Less

Submitted 28 February, 2023; originally announced March 2023.

Comments: This paper is accepted in ICLR 2023

arXiv:2302.11085 [pdf, other]

Learning to Generalize Provably in Learning to Optimize

Authors: Junjie Yang, Tianlong Chen, Mingkang Zhu, Fengxiang He, Dacheng Tao, Yingbin Liang, Zhangyang Wang

Abstract: Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of op… ▽ More Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy. △ Less

Submitted 28 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: This paper is accepted in AISTATS 2023

arXiv:2302.10324 [pdf, other]

Bayesian subty** for multi-state brain functional connectome with application on adolescent brain cognition

Authors: Tianqi Chen, Chichun Tan, Hongyu Zhao, Todd Constable, Sarah Yip, Yize Zhao

Abstract: Converging evidence indicates that the heterogeneity of cognitive profiles may arise through detectable alternations in brain functions. Particularly, brain functional connectivity, measured under resting and cognitive states, characterizes the unique neuronal interconnections across large-scale brain networks. Despite an unprecedented opportunity to uncover neurobiological subtypes through cluste… ▽ More Converging evidence indicates that the heterogeneity of cognitive profiles may arise through detectable alternations in brain functions. Particularly, brain functional connectivity, measured under resting and cognitive states, characterizes the unique neuronal interconnections across large-scale brain networks. Despite an unprecedented opportunity to uncover neurobiological subtypes through clustering or subty** analyses on multi-state functional connectivity, few existing approaches are applicable here to accommodate the network topology and unique biological architecture of functional connectivity. To address this issue, we propose an innovative Bayesian nonparametric network-variate clustering analysis to uncover subgroups with homogeneous brain functional network patterns integrating different cognitive states. In light of the existing neuroscience literature, we assume there are unknown state-specific modular structures within functional connectivity and simultaneously impose selection to identify informative network features for defining subtypes within unsupervised learning. To further facilitate practical use, we develop a computationally efficient variational inference algorithm to perform posterior inference with satisfactory estimation accuracy. Extensive simulations show the superior clustering accuracy and plausible result of our method. Applying the method to the landmark Adolescent Brain Cognitive Development (ABCD) study, we successfully establish neurodevelopmental subtypes linked with impulsivity related behavior trait, and identify brain sub-network phenotypes under each state to signal neurobiological heterogeneity. △ Less

Submitted 20 February, 2023; originally announced February 2023.

arXiv:2302.05793 [pdf, other]

Distributional GFlowNets with Quantile Flows

Authors: Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio

Abstract: Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In thi… ▽ More Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards. △ Less

Submitted 17 February, 2024; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted by TMLR

arXiv:2302.05185 [pdf, other]

On Penalty-based Bilevel Gradient Descent Method

Authors: Han Shen, Quan Xiao, Tianyi Chen

Abstract: Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization, meta-learning and reinforcement learning. However, bilevel optimization problems are difficult to solve. Recent progress on scalable bilevel algorithms mainly focuses on bilevel optimization problems where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle the bileve… ▽ More Bilevel optimization enjoys a wide range of applications in hyper-parameter optimization, meta-learning and reinforcement learning. However, bilevel optimization problems are difficult to solve. Recent progress on scalable bilevel algorithms mainly focuses on bilevel optimization problems where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle the bilevel problem through the lens of the penalty method. We show that under certain conditions, the penalty reformulation recovers the solutions of the original bilevel problem. Further, we propose the penalty-based bilevel gradient descent (PBGD) algorithm and establish its finite-time convergence for the constrained bilevel problem without lower-level strong convexity. Experiments showcase the efficiency of the proposed PBGD algorithm. △ Less

Submitted 12 September, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: Improved Section 4 by removing a critical assumption; Added Section 5 and citations

arXiv:2302.03660 [pdf, other]

Flow Matching on General Geometries

Authors: Ricky T. Q. Chen, Yaron Lipman

Abstract: We propose Riemannian Flow Matching (RFM), a simple yet powerful framework for training continuous normalizing flows on manifolds. Existing methods for generative modeling on manifolds either require expensive simulation, are inherently unable to scale to high dimensions, or use approximations for limiting quantities that result in biased training objectives. Riemannian Flow Matching bypasses thes… ▽ More We propose Riemannian Flow Matching (RFM), a simple yet powerful framework for training continuous normalizing flows on manifolds. Existing methods for generative modeling on manifolds either require expensive simulation, are inherently unable to scale to high dimensions, or use approximations for limiting quantities that result in biased training objectives. Riemannian Flow Matching bypasses these limitations and offers several advantages over previous approaches: it is simulation-free on simple geometries, does not require divergence computation, and computes its target vector field in closed-form. The key ingredient behind RFM is the construction of a relatively simple premetric for defining target vector fields, which encompasses the existing Euclidean case. To extend to general geometries, we rely on the use of spectral decompositions to efficiently compute premetrics on the fly. Our method achieves state-of-the-art performance on many real-world non-Euclidean datasets, and we demonstrate tractable training on general geometries, including triangular meshes with highly non-trivial curvature and boundaries. △ Less

Submitted 26 February, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

Journal ref: ICLR 2024

arXiv:2302.03157 [pdf, other]

A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data

Authors: Madhav Sankaranarayanan, Intekhab Hossain, Tom Chen

Abstract: Recent advancements in Mixed Integer Optimization (MIO) algorithms, paired with hardware enhancements, have led to significant speedups in resolving MIO problems. These strategies have been utilized for optimal subset selection, specifically for choosing $k$ features out of $p$ in linear regression given $n$ observations. In this paper, we broaden this method to facilitate cluster-aware regression… ▽ More Recent advancements in Mixed Integer Optimization (MIO) algorithms, paired with hardware enhancements, have led to significant speedups in resolving MIO problems. These strategies have been utilized for optimal subset selection, specifically for choosing $k$ features out of $p$ in linear regression given $n$ observations. In this paper, we broaden this method to facilitate cluster-aware regression, where selection aims to choose $λ$ out of $K$ clusters in a linear mixed effects (LMM) model with $n_k$ observations for each cluster. Through comprehensive testing on a multitude of synthetic and real datasets, we exhibit that our method efficiently solves problems within minutes. Through numerical experiments, we also show that the MIO approach outperforms both Gaussian- and Laplace-distributed LMMs in terms of generating sparse solutions with high predictive power. Traditional LMMs typically assume that clustering effects are independent of individual features. However, we introduce an innovative algorithm that evaluates cluster effects for new data points, thereby increasing the robustness and precision of this model. The inferential and predictive efficacy of this approach is further illustrated through its application in student scoring and protein expression. △ Less

Submitted 25 March, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2212.14384 [pdf]

Towards data-driven modeling and real-time prediction of solar flares and coronal mass ejections

Authors: M. Rempel, Y. Fan, M. Dikpati, A. Malanushenko, M. D. Kazachenko, M. C. M. Cheung, G. Chintzoglou, X. Sun, G. H. Fisher, T. Y. Chen

Abstract: Modeling of transient events in the solar atmosphere requires the confluence of 3 critical elements: (1) model sophistication, (2) data availability, and (3) data assimilation. This white paper describes required advances that will enable statistical flare and CME forecasting (e.g. eruption probability and timing, estimation of strength, and CME details, such as speed and magnetic field orientatio… ▽ More Modeling of transient events in the solar atmosphere requires the confluence of 3 critical elements: (1) model sophistication, (2) data availability, and (3) data assimilation. This white paper describes required advances that will enable statistical flare and CME forecasting (e.g. eruption probability and timing, estimation of strength, and CME details, such as speed and magnetic field orientation) similar to weather prediction on Earth. △ Less

Submitted 29 December, 2022; originally announced December 2022.

Comments: Heliophysics 2050 White Paper

arXiv:2212.13659 [pdf, other]

Latent Discretization for Continuous-time Sequence Compression

Authors: Ricky T. Q. Chen, Matthew Le, Matthew Muckley, Maximilian Nickel, Karen Ullrich

Abstract: Neural compression offers a domain-agnostic approach to creating codecs for lossy or lossless compression via deep generative models. For sequence compression, however, most deep sequence models have costs that scale with the sequence length rather than the sequence complexity. In this work, we instead treat data sequences as observations from an underlying continuous-time process and learn how to… ▽ More Neural compression offers a domain-agnostic approach to creating codecs for lossy or lossless compression via deep generative models. For sequence compression, however, most deep sequence models have costs that scale with the sequence length rather than the sequence complexity. In this work, we instead treat data sequences as observations from an underlying continuous-time process and learn how to efficiently discretize while retaining information about the full sequence. As a consequence of decoupling sequential information from its temporal discretization, our approach allows for greater compression rates and smaller computational complexity. Moreover, the continuous-time approach naturally allows us to decode at different time intervals. We empirically verify our approach on multiple domains involving compression of video and motion capture sequences, showing that our approaches can automatically achieve reductions in bit rates by learning how to discretize. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2211.10926 [pdf, other]

Unraveling implicit human behavioral effects on dynamic characteristics of Covid-19 daily infection rates in Taiwan

Authors: Ting-Li Chen, Elizabeth P. Chou, Min-Yi Chen, Hsieh Fushing

Abstract: We study Covid-19 spreading dynamics underlying 84 curves of daily Covid-19 infection rates pertaining to 84 districts belonging to the largest seven cities in Taiwan during her pristine surge period. Our computational developments begin with selecting and extracting 18 features from each smoothed district-specific curve. This step of computing effort allows unstructured data to be converted into… ▽ More We study Covid-19 spreading dynamics underlying 84 curves of daily Covid-19 infection rates pertaining to 84 districts belonging to the largest seven cities in Taiwan during her pristine surge period. Our computational developments begin with selecting and extracting 18 features from each smoothed district-specific curve. This step of computing effort allows unstructured data to be converted into structured data, with which we then demonstrate asymmetric growth and decline dynamics among all involved curves. Specifically, based on Theoretical Information measurements of conditional entropy and mutual information, we compute major factors of order-1 and order-2 that reveal significant effects on affecting the curves' peak value and curvature at peak, which are two essential features characterizing all the curves. Further, we investigate and demonstrate major factors determining the geographic and social-economic induced behavioral effects by encoding each of these 84 districts with two binary characteristics: North-vs-South and Unban-vs-suburban. Furthermore, based on this data-driven knowledge on the district scale, we go on to study fine-scale behavioral effects on infectious disease spreading through similarity among 96 age-group-specific curves of daily infection rate within 12 urban districts of Taipei and 12 suburban districts of New Taipei City, which counts for almost one-quarter of the island nation's total population. We conclude that human living, traveling, and working behaviors do implicitly affect the spreading dynamics of Covid-19 across Taiwan profoundly. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Showing 1–50 of 225 results for author: Chen, T