Search | arXiv e-print repository

Evaluating the World Model Implicit in a Generative Model

Authors: Keyon Vafa, Justin Y. Chen, Jon Kleinberg, Sendhil Mullainathan, Ashesh Rambachan

Abstract: Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton. This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world m… ▽ More Recent work suggests that large language models may implicitly learn world models. How should we assess this possibility? We formalize this question for the case where the underlying reality is governed by a deterministic finite automaton. This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry. We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory. We illustrate their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models we consider do well on existing diagnostics for assessing world models, but our evaluation metrics reveal their world models to be far less coherent than they appear. Such incoherence creates fragility: using a generative model to solve related but subtly different tasks can lead it to fail badly. Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; our results suggest new ways to assess how close a given model is to that goal. △ Less

Submitted 22 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01382 [pdf, other]

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

Authors: Keyon Vafa, Ashesh Rambachan, Sendhil Mullainathan

Abstract: What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these deployment decisions are made by people, and in particular, people's beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human… ▽ More What makes large language models (LLMs) impressive is also what makes them hard to evaluate: their diversity of uses. To evaluate these models, we must understand the purposes they will be used for. We consider a setting where these deployment decisions are made by people, and in particular, people's beliefs about where an LLM will perform well. We model such beliefs as the consequence of a human generalization function: having seen what an LLM gets right or wrong, people generalize to where else it might succeed. We collect a dataset of 19K examples of how humans make generalizations across 79 tasks from the MMLU and BIG-Bench benchmarks. We show that the human generalization function can be predicted using NLP methods: people have consistent structured ways to generalize. We then evaluate LLM alignment with the human generalization function. Our results show that -- especially for cases where the cost of mistakes is high -- more capable models (e.g. GPT-4) can do worse on the instances people choose to use them for, exactly because they are not aligned with the human generalization function. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: To appear in ICML 2024

arXiv:2405.16297 [pdf, other]

LUCIE: A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles

Authors: Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay, Romit Maulik

Abstract: We present LUCIE, a $1000$- member ensemble data-driven atmospheric emulator that remains stable during autoregressive inference for thousands of years without a drifting climatology. LUCIE has been trained on $9.5$ years of coarse-resolution ERA5 data with $4$ prognostic variables on a single A100 GPU for $2.4$ h. Owing to the cheap computational cost of inference, $1000$ model ensembles are exec… ▽ More We present LUCIE, a $1000$- member ensemble data-driven atmospheric emulator that remains stable during autoregressive inference for thousands of years without a drifting climatology. LUCIE has been trained on $9.5$ years of coarse-resolution ERA5 data with $4$ prognostic variables on a single A100 GPU for $2.4$ h. Owing to the cheap computational cost of inference, $1000$ model ensembles are executed for $5$ years to compute an uncertainty-quantified climatology for the prognostic variables that closely match the climatology obtained from ERA5. Unlike all the other state-of-the-art AI weather models, LUCIE is neither unstable nor does it produce hallucinations that result in unphysical drift of the emulated climate. Furthermore, LUCIE \textbf{does not impose} ``true" sea-surface temperature (SST) from a coupled numerical model to enforce the annual cycle in temperature. We demonstrate the long-term climatology obtained from LUCIE as well as subseasonal-to-seasonal scale prediction skills on the prognostic variables. We also demonstrate a $20$-year emulation with LUCIE here: https://drive.google.com/file/d/1mRmhx9RRGiF3uGo_mRQK8RpwQatrCiMn/view △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2404.19035 [pdf, other]

Improved pressure-gradient sensor for the prediction of separation onset in RANS models

Authors: Kevin Patrick Griffin, Ganesh Vijayakumar, Ashesh Sharma, Michael A. Sprague

Abstract: We improve upon two key aspects of the Menter shear stress transport (SST) turbulence model: (1) We propose a more robust adverse pressure gradient sensor based on the strength of the pressure gradient in the direction of the local mean flow; (2) We propose two alternative eddy viscosity models to be used in the adverse pressure gradient regions identified by our sensor. Direct numerical simulatio… ▽ More We improve upon two key aspects of the Menter shear stress transport (SST) turbulence model: (1) We propose a more robust adverse pressure gradient sensor based on the strength of the pressure gradient in the direction of the local mean flow; (2) We propose two alternative eddy viscosity models to be used in the adverse pressure gradient regions identified by our sensor. Direct numerical simulations of the Boeing Gaussian bump are used to identify the terms in the baseline SST model that need correction, and a posteriori Reynolds-averaged Navier-Stokes calculations are used to calibrate coefficient values, leading to a model that is both physics driven and data informed. The two sensor-equipped models are applied to two thick airfoils representative of modern wind turbine applications, the FFA-W3-301 and the DU00-W-212, with maximum thicknesses of 30% and 20% of their chord lengths, respectively. While the baseline SST model predicts stall (onset of separation) $3^\circ$ to $5^\circ$ late for all cases considered, the proposed models predict stall within the margins of experimental uncertainty, which greatly improves the prediction of the maximum lift generated. For the FFA airfoil, the models also improve the prediction of the linear region of the lift curve likely due to their improved prediction of a pressure-side separation at low angles of attack. The models are shown to generalize well across the two airfoil geometries (despite their difference in thickness) and across almost a factor of 10 in variations in chord-based Reynolds numbers from $1.6\times10^6$ to $1.5\times10^7$. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.10111 [pdf, other]

From Predictive Algorithms to Automatic Generation of Anomalies

Authors: Sendhil Mullainathan, Ashesh Rambachan

Abstract: Machine learning algorithms can find predictive signals that researchers fail to notice; yet they are notoriously hard-to-interpret. How can we extract theoretical insights from these black boxes? History provides a clue. Facing a similar problem -- how to extract theoretical insights from their intuitions -- researchers often turned to ``anomalies:'' constructed examples that highlight flaws in a… ▽ More Machine learning algorithms can find predictive signals that researchers fail to notice; yet they are notoriously hard-to-interpret. How can we extract theoretical insights from these black boxes? History provides a clue. Facing a similar problem -- how to extract theoretical insights from their intuitions -- researchers often turned to ``anomalies:'' constructed examples that highlight flaws in an existing theory and spur the development of new ones. Canonical examples include the Allais paradox and the Kahneman-Tversky choice experiments for expected utility theory. We suggest anomalies can extract theoretical insights from black box predictive algorithms. We develop procedures to automatically generate anomalies for an existing theory when given a predictive algorithm. We cast anomaly generation as an adversarial game between a theory and a falsifier, the solutions to which are anomalies: instances where the black box algorithm predicts - were we to collect data - we would likely observe violations of the theory. As an illustration, we generate anomalies for expected utility theory using a large, publicly available dataset on real lottery choices. Based on an estimated neural network that predicts lottery choices, our procedures recover known anomalies and discover new ones for expected utility theory. In incentivized experiments, subjects violate expected utility theory on these algorithmically generated anomalies; moreover, the violation rates are similar to observed rates for the Allais paradox and Common ratio effect. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.11854 [pdf, other]

denoiSplit: a method for joint image splitting and unsupervised denoising

Authors: Ashesh Ashesh, Florian Jug

Abstract: In this work we present denoiSplit, a method to tackle a new analysis task, i.e. the challenge of joint semantic image splitting and unsupervised denoising. This dual approach has important applications in fluorescence microscopy, where semantic image splitting has important applications but noise does generally hinder the downstream analysis of image content. Image splitting involves dissecting a… ▽ More In this work we present denoiSplit, a method to tackle a new analysis task, i.e. the challenge of joint semantic image splitting and unsupervised denoising. This dual approach has important applications in fluorescence microscopy, where semantic image splitting has important applications but noise does generally hinder the downstream analysis of image content. Image splitting involves dissecting an image into its distinguishable semantic structures. We show that the current state-of-the-art method for this task struggles in the presence of image noise, inadvertently also distributing the noise across the predicted outputs. The method we present here can deal with image noise by integrating an unsupervised denoising sub-task. This integration results in improved semantic image unmixing, even in the presence of notable and realistic levels of imaging noise. A key innovation in denoiSplit is the use of specifically formulated noise models and the suitable adjustment of KL-divergence loss for the high-dimensional hierarchical latent space we are training. We showcase the performance of denoiSplit across 4 tasks on real-world microscopy images. Additionally, we perform qualitative and quantitative evaluations and compare results to existing benchmarks, demonstrating the effectiveness of using denoiSplit: a single Variational Splitting Encoder-Decoder (VSE) Network using two suitable noise models to jointly perform semantic splitting and denoising. △ Less

Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.00271 [pdf]

Assessing Bilateral Neurovascular Bundles Function with Pulsed Wave Doppler Ultrasound: Implications for Reducing Erectile Dysfunction Following Prostate Radiotherapy

Authors: **g Wang, Xiaofeng Yang, Boran Zhou, James Sohn, Richard Qiu, Pretesh Patel, Ashesh B. Jani, Tian Liu

Abstract: This study aims to evaluate the functional status of bilateral neurovascular bundles (NVBs) using pulsed wave Doppler ultrasound in patients undergoing prostate radiotherapy (RT). Sixty-two patients (mean age: 66.1 +/- 7.2 years) underwent transrectal ultrasound scan using a conventional ultrasound scanner, a 7.5 MHz bi-plane probe and a mechanical stepper. The ultrasound protocol comprised 3 step… ▽ More This study aims to evaluate the functional status of bilateral neurovascular bundles (NVBs) using pulsed wave Doppler ultrasound in patients undergoing prostate radiotherapy (RT). Sixty-two patients (mean age: 66.1 +/- 7.2 years) underwent transrectal ultrasound scan using a conventional ultrasound scanner, a 7.5 MHz bi-plane probe and a mechanical stepper. The ultrasound protocol comprised 3 steps: 1) 3D B-mode scans of the entire prostate, 2) localization of NVBs using color flow Doppler imaging, and 3) measurement of NVB function using pulsed wave Doppler. Five pulsed Doppler waveform features were extracted: peak systolic velocity (PSV), end-diastolic velocity (EDV), mean velocity (Vm), resistive index (RI), and pulsatile index (PI). In summary, this study presents a Doppler evaluation of NVBs in patients undergoing prostate RT. It highlights substantial differences in Doppler ultrasound waveform features between bilateral NVBs. The proposed ultrasound method may prove valuable as clinicians strive to deliver NVB-sparing RT to preserve sexual function effectively and enhance patients' overall well-being. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 14 pages, 4 figures

MSC Class: 68U10

arXiv:2401.17671 [pdf, other]

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Authors: Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D. Mehta, Nima Mesgarani

Abstract: Recent advancements in artificial intelligence have sparked interest in the parallels between large language models (LLMs) and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs,… ▽ More Recent advancements in artificial intelligence have sparked interest in the parallels between large language models (LLMs) and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs, remain elusive. Here, we examined a diverse selection of high-performance LLMs with similar parameter sizes to investigate the factors contributing to their alignment with the brain's language processing mechanisms. We find that as LLMs achieve higher performance on benchmark tasks, they not only become more brain-like as measured by higher performance when predicting neural responses from LLM embeddings, but also their hierarchical feature extraction pathways map more closely onto the brain's while using fewer layers to do the same encoding. We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms. Finally, we show the importance of contextual information in improving model performance and brain similarity. Our findings reveal the converging aspects of language processing in the brain and LLMs and offer new directions for develo** models that align more closely with human cognitive processing. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 19 pages, 5 figures and 4 supplementary figures

arXiv:2311.17078 [pdf, other]

Data Imbalance, Uncertainty Quantification, and Generalization via Transfer Learning in Data-driven Parameterizations: Lessons from the Emulation of Gravity Wave Momentum Transport in WACCM

Authors: Y. Qiang Sun, Hamid A. Pahlavan, Ashesh Chattopadhyay, Pedram Hassanzadeh, Sandro W. Lubis, M. Joan Alexander, Edwin Gerber, Aditi Sheshadri, Yifei Guan

Abstract: Neural networks (NNs) are increasingly used for data-driven subgrid-scale parameterization in weather and climate models. While NNs are powerful tools for learning complex nonlinear relationships from data, there are several challenges in using them for parameterizations. Three of these challenges are 1) data imbalance related to learning rare (often large-amplitude) samples; 2) uncertainty quanti… ▽ More Neural networks (NNs) are increasingly used for data-driven subgrid-scale parameterization in weather and climate models. While NNs are powerful tools for learning complex nonlinear relationships from data, there are several challenges in using them for parameterizations. Three of these challenges are 1) data imbalance related to learning rare (often large-amplitude) samples; 2) uncertainty quantification (UQ) of the predictions to provide an accuracy indicator; and 3) generalization to other climates, e.g., those with higher radiative forcing. Here, we examine the performance of methods for addressing these challenges using NN-based emulators of the Whole Atmosphere Community Climate Model (WACCM) physics-based gravity wave (GW) parameterizations as the test case. WACCM has complex, state-of-the-art parameterizations for orography-, convection- and frontal-driven GWs. Convection- and orography-driven GWs have significant data imbalance due to the absence of convection or orography in many grid points. We address data imbalance using resampling and/or weighted loss functions, enabling the successful emulation of parameterizations for all three sources. We demonstrate that three UQ methods (Bayesian NNs, variational auto-encoders, and dropouts) provide ensemble spreads that correspond to accuracy during testing, offering criteria on when a NN gives inaccurate predictions. Finally, we show that the accuracy of these NNs decreases for a warmer climate (4XCO2). However, the generalization accuracy is significantly improved by applying transfer learning, e.g., re-training only one layer using ~1% new data from the warmer climate. The findings of this study offer insights for develo** reliable and generalizable data-driven parameterizations for various processes, including (but not limited) to GWs. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.00813 [pdf, other]

OceanNet: A principled neural operator-based digital twin for regional oceans

Authors: Ashesh Chattopadhyay, Michael Gray, Tianning Wu, Anna B. Lowe, Ruoying He

Abstract: While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector… ▽ More While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: Supplementary information can be found in: https://drive.google.com/file/d/1NoxJLa967naJT787a5-IfZ7f_MmRuZMP/view?usp=sharing

arXiv:2309.17434 [pdf, other]

Local Changes in Protein Filament Properties Drive Large-Scale Membrane Transformations Involved in Endosome Tethering and Fusion

Authors: Ashesh Ghosh, Andrew J. Spkaowitz

Abstract: Large-scale cellular transformations are triggered by subtle physical and structural changes in individual biomacromolecular and membrane components. A prototypical example of such an event is the orchestrated fusion of membranes within an endosome that enables transport of cargo and processing of biochemical moieties. In this work, we demonstrate how protein filaments on the endosomal membrane su… ▽ More Large-scale cellular transformations are triggered by subtle physical and structural changes in individual biomacromolecular and membrane components. A prototypical example of such an event is the orchestrated fusion of membranes within an endosome that enables transport of cargo and processing of biochemical moieties. In this work, we demonstrate how protein filaments on the endosomal membrane surface can leverage a rigid-to-flexible transformation to elicit a large-scale change in membrane flexibility to enable membrane fusion. We develop a polymer field-theoretic model that captures molecular alignment arising from nematic interactions with varying surface density and fraction of flexible filaments, which are biologically controlled within the endosomal membrane. We then predict the collective elasticity of the filament brush in response to changes in the filament alignment, predicting a greater than 20-fold increase of the effective membrane elasticity over the bare membrane elasticity that is triggered by filament alignment. These results show that the endosome can modulate the filament properties to orchestrate membrane fluidization that facilitates vesicle fusion, providing an example of how active processes that modulate local molecular properties can result in large-scale transformations that are essential to cellular survival. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.13211 [pdf, other]

doi 10.1029/2023MS004033

Interpretable structural model error discovery from sparse assimilation increments using spectral bias-reduced neural networks: A quasi-geostrophic turbulence test case

Authors: Rambod Mojgani, Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: Earth system models suffer from various structural and parametric errors in their representation of nonlinear, multi-scale processes, leading to uncertainties in their long-term projections. The effects of many of these errors (particularly those due to fast physics) can be quantified in short-term simulations, e.g., as differences between the predicted and observed states (analysis increments). W… ▽ More Earth system models suffer from various structural and parametric errors in their representation of nonlinear, multi-scale processes, leading to uncertainties in their long-term projections. The effects of many of these errors (particularly those due to fast physics) can be quantified in short-term simulations, e.g., as differences between the predicted and observed states (analysis increments). With the increase in the availability of high-quality observations and simulations, learning nudging from these increments to correct model errors has become an active research area. However, most studies focus on using neural networks, which while powerful, are hard to interpret, are data-hungry, and poorly generalize out-of-distribution. Here, we show the capabilities of Model Error Discovery with Interpretability and Data Assimilation (MEDIDA), a general, data-efficient framework that uses sparsity-promoting equation-discovery techniques to learn model errors from analysis increments. Using two-layer quasi-geostrophic turbulence as the test case, MEDIDA is shown to successfully discover various linear and nonlinear structural/parametric errors when full observations are available. Discovery from spatially sparse observations is found to require highly accurate interpolation schemes. While NNs have shown success as interpolators in recent studies, here, they are found inadequate due to their inability to accurately represent small scales, a phenomenon known as spectral bias. We show that a general remedy, adding a random Fourier feature layer to the NN, resolves this issue enabling MEDIDA to successfully discover model errors from sparse observations. These promising results suggest that with further development, MEDIDA could be scaled up to models of the Earth system and real observations. △ Less

Submitted 15 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 26 pages, 5+1 figures

arXiv:2306.05014 [pdf, other]

doi 10.1029/2023MS003874

Learning Closed-form Equations for Subgrid-scale Closures from High-fidelity Data: Promises and Challenges

Authors: Karan Jakhar, Yifei Guan, Rambod Mojgani, Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: There is growing interest in discovering interpretable, closed-form equations for subgrid-scale (SGS) closures/parameterizations of complex processes in Earth systems. Here, we apply a common equation-discovery technique with expansive libraries to learn closures from filtered direct numerical simulations of 2D turbulence and Rayleigh-Bénard convection (RBC). Across common filters (e.g., Gaussian,… ▽ More There is growing interest in discovering interpretable, closed-form equations for subgrid-scale (SGS) closures/parameterizations of complex processes in Earth systems. Here, we apply a common equation-discovery technique with expansive libraries to learn closures from filtered direct numerical simulations of 2D turbulence and Rayleigh-Bénard convection (RBC). Across common filters (e.g., Gaussian, box), we robustly discover closures of the same form for momentum and heat fluxes. These closures depend on nonlinear combinations of gradients of filtered variables, with constants that are independent of the fluid/flow properties and only depend on filter type/size. We show that these closures are the nonlinear gradient model (NGM), which is derivable analytically using Taylor-series. Indeed, we suggest that with common (physics-free) equation-discovery algorithms, for many common systems/physics, discovered closures are consistent with the leading term of the Taylor-series (except when cutoff filters are used). Like previous studies, we find that large-eddy simulations with NGM closures are unstable, despite significant similarities between the true and NGM-predicted fluxes (correlations $> 0.95$). We identify two shortcomings as reasons for these instabilities: in 2D, NGM produces zero kinetic energy transfer between resolved and subgrid scales, lacking both diffusion and backscattering. In RBC, potential energy backscattering is poorly predicted. Moreover, we show that SGS fluxes diagnosed from data, presumed the ''truth'' for discovery, depend on filtering procedures and are not unique. Accordingly, to learn accurate, stable closures in future work, we propose several ideas around using physics-informed libraries, loss functions, and metrics. These findings are relevant to closure modeling of any multi-scale system. △ Less

Submitted 7 July, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 40 pages, 4 figures. The code for 2D-FHIT solver "py2d" is available at https://github.com/envfluids/py2d. The code and data used for analysis in this work can be found at https://github.com/jakharkaran/EqsDiscovery_2D-FHIT_RBC and https://doi.org/10.5281/zenodo.7500647, respectively

MSC Class: 76F65 (Primary) 86A08; 68T01; 76F05; 76F35 (Secondary) ACM Class: J.2; I.2.0; G.1.8

arXiv:2305.00385 [pdf]

Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI

Authors: Yuheng Li, Jacob Wynne, **g Wang, Richard L. J. Qiu, Justin Roper, Shaoyan Pan, Ashesh B. Jani, Tian Liu, Pretesh R. Patel, Hui Mao, Xiaofeng Yang

Abstract: Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learni… ▽ More Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learning (SSL) utilizes unlabeled data to generate meaningful semantic representations without the need for costly annotations, enhancing model performance on tasks with limited labeled data. We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI) and demonstrate the effectiveness of our proposed self-supervised pre-training framework. Using a large prostate bpMRI dataset with 1500 patients, we first pretrain CSwin transformer using multi-task self-supervised learning to improve data-efficiency and network generalizability. We then finetune using lesion annotations to perform csPCa detection. Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888 AUC and 0.545 Average Precision (AP), significantly outperforming four comparable models (Swin UNETR, DynUNet, Attention UNet, UNet). Using a separate bpMRI dataset with 158 patients, we evaluate our method robustness to external hold-out data. Self-supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating good generalization to external data. △ Less

Submitted 17 March, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

arXiv:2304.07029 [pdf, other]

Long-term instabilities of deep learning-based digital twins of the climate system: The cause and a solution

Authors: Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: Long-term stability is a critical property for deep learning-based data-driven digital twins of the Earth system. Such data-driven digital twins enable sub-seasonal and seasonal predictions of extreme environmental events, probabilistic forecasts, that require a large number of ensemble members, and computationally tractable high-resolution Earth system models where expensive components of the mod… ▽ More Long-term stability is a critical property for deep learning-based data-driven digital twins of the Earth system. Such data-driven digital twins enable sub-seasonal and seasonal predictions of extreme environmental events, probabilistic forecasts, that require a large number of ensemble members, and computationally tractable high-resolution Earth system models where expensive components of the models can be replaced with cheaper data-driven surrogates. Owing to computational cost, physics-based digital twins, though long-term stable, are intractable for real-time decision-making. Data-driven digital twins offer a cheaper alternative to them and can provide real-time predictions. However, such digital twins can only provide short-term forecasts accurately since they become unstable when time-integrated beyond 20 days. Currently, the cause of the instabilities is unknown, and the methods that are used to improve their stability horizons are ad-hoc and lack rigorous theory. In this paper, we reveal that the universal causal mechanism for these instabilities in any turbulent flow is due to \textit{spectral bias} wherein, \textit{any} deep learning architecture is biased to learn only the large-scale dynamics and ignores the small scales completely. We further elucidate how turbulence physics and the absence of convergence in deep learning-based time-integrators amplify this bias leading to unstable error propagation. Finally, using the quasigeostrophic flow and ECMWF Reanalysis data as test cases, we bridge the gap between deep learning theory and fundamental numerical analysis to propose one mitigative solution to such instabilities. We develop long-term stable data-driven digital twins for the climate system and demonstrate accurate short-term forecasts, and hundreds of years of long-term stable time-integration with accurate mean and variability. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: Supplementary information is given at https://drive.google.com/file/d/1J0k20Qk___PbDQob0Z4vnSVWEpnDFlif/view?usp=share_link

arXiv:2303.16969 [pdf, other]

Importance of Many Particle Correlations to the Collective Debye-Waller Factor in a Single-Particle Activated Dynamic Theory of the Glass Transition

Authors: Ashesh Ghosh

Abstract: We theoretically study the importance of many body correlations on the collective Debye Waller (DW) factor in the context of the Nonlinear Langevin Equation (NLE) single particle activated dynamics theory of glass transition and its extension to include collective elasticity (ECNLE theory). This microscopic force-based approach envisions structural alpha relaxation as a coupled local-nonlocal proc… ▽ More We theoretically study the importance of many body correlations on the collective Debye Waller (DW) factor in the context of the Nonlinear Langevin Equation (NLE) single particle activated dynamics theory of glass transition and its extension to include collective elasticity (ECNLE theory). This microscopic force-based approach envisions structural alpha relaxation as a coupled local-nonlocal process involving correlated local cage and longer range collective barriers. The crucial question addressed here is the importance of the deGennes narrowing contribution versus a literal Vineyard approximation for the collective DW factor that enters the construction of the dynamic free energy in NLE theory. While the Vineyard-deGennes approach-based NLE theory and its ECNLE theory extension yields predictions that agree well with experimental and simulation results, use of a literal Vineyard approximation for the collective DW factor massively overpredicts the activated relaxation time. The current study suggests many particle correlations are crucial for a reliable description of activated dynamics theory of model hard sphere fluids. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2212.09844 [pdf, other]

Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding

Authors: Ashesh Rambachan, Amanda Coston, Edward Kennedy

Abstract: Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given choices made by human decision makers. We propose a unified framework for the robust design and evaluation of predictive algorithms in selectively observed data. We impose general assumptions on how much the outcome may vary on average between unselected and selected units conditional o… ▽ More Predictive algorithms inform consequential decisions in settings where the outcome is selectively observed given choices made by human decision makers. We propose a unified framework for the robust design and evaluation of predictive algorithms in selectively observed data. We impose general assumptions on how much the outcome may vary on average between unselected and selected units conditional on observed covariates and identified nuisance parameters, formalizing popular empirical strategies for imputing missing data such as proxy outcomes and instrumental variables. We develop debiased machine learning estimators for the bounds on a large class of predictive performance estimands, such as the conditional likelihood of the outcome, a predictive algorithm's mean square error, true/false positive rate, and many others, under these assumptions. In an administrative dataset from a large Australian financial institution, we illustrate how varying assumptions on unobserved confounding leads to meaningful changes in default risk predictions and evaluations of credit scores across sensitive groups. △ Less

Submitted 19 May, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

arXiv:2211.12872 [pdf, other]

μSplit: efficient image decomposition for microscopy data

Authors: Ashesh, Alexander Krull, Moises Di Sante, Francesco Silvio Pasqualini, Florian Jug

Abstract: We present μSplit, a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. We find that best results using regular deep architectures are achieved when large image patches are used during training, making memory consumption the limiting factor to further improving performance. We therefore introduce lateral contextualization (LC), a novel meta-archite… ▽ More We present μSplit, a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. We find that best results using regular deep architectures are achieved when large image patches are used during training, making memory consumption the limiting factor to further improving performance. We therefore introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context, which we observe is a key ingredient to solving the image decomposition task at hand. We integrate LC with U-Nets, Hierarchical AEs, and Hierarchical VAEs, for which we formulate a modified ELBO loss. Additionally, LC enables training deeper hierarchical models than otherwise possible and, interestingly, helps to reduce tiling artefacts that are inherently impossible to avoid when using tiled VAE predictions. We apply μSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data. Our method consistently achieves best results (average improvements to the best baseline of 2.25 dB PSNR), while simultaneously requiring considerably less GPU memory. Our code and datasets can be found at https://github.com/juglab/uSplit. △ Less

Submitted 16 August, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: Published at ICCV 2023. 10 pages, 7 figures, 9 pages supplement, 8 supplementary figures

arXiv:2208.08419 [pdf, other]

doi 10.1122/8.0000546

Microscopic Activated Dynamics Theory of the Shear Rheology and Stress Overshoot in Ultra-Dense Glass-Forming Fluids and Colloidal Suspensions

Authors: Ashesh Ghosh, Kenneth S. Schweizer

Abstract: We formulate a microscopic, force-level, activated dynamics-based statistical-mechanical theory for the continuous startup nonlinear shear-rheology of ultra-dense glass-forming hard-sphere fluids and colloidal suspensions in the context of the ECNLE approach. Activated structural relaxation is described as a coupled local-nonlocal event involving caging and longer-range collective elasticity which… ▽ More We formulate a microscopic, force-level, activated dynamics-based statistical-mechanical theory for the continuous startup nonlinear shear-rheology of ultra-dense glass-forming hard-sphere fluids and colloidal suspensions in the context of the ECNLE approach. Activated structural relaxation is described as a coupled local-nonlocal event involving caging and longer-range collective elasticity which controls the characteristic stress relaxation time. Theoretical predictions for the deformation-induced mobility enhancement, onset of relaxation acceleration at low values of stress, strain, or shear-rate, apparent power-law thinning of the steady-state structural relaxation time and viscosity, a non-vanishing activation barrier in the shear-thinning regime, an apparent Herschel-Bulkley form of the rate dependence of the steady-state shear stress, exponential growth of different measures of a dynamic yield or flow-stress with packing fraction, and reduced fragility and dynamic heterogeneity under deformation were previously shown to be in good agreement with experiment. The central new question addressed here is the defining feature of the transient response - the stress-overshoot. In contrast to the steady-state flow regime, understanding the transient response requires an explicit treatment of the coupled nonequilibrium evolution of structure, elastic modulus, and stress relaxation time. We formulate a new quantitative model for this aspect in a physically motivated and computationally tractable manner. Theoretical predictions for the stress-overshoot are shown to be in good agreement with experimental observations in the metastable ultra-dense regime of hard-sphere colloidal suspensions as a function of shear-rate and packing fraction, and accounting for deformation-assisted activated motion is crucial for both the transient and steady-state responses. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2206.04811 [pdf, other]

doi 10.1016/j.jcp.2023.111918

Deep learning-enhanced ensemble-based data assimilation for high-dimensional nonlinear dynamical systems

Authors: Ashesh Chattopadhyay, Ebrahim Nabizadeh, Eviatar Bach, Pedram Hassanzadeh

Abstract: Data assimilation (DA) is a key component of many forecasting models in science and engineering. DA allows one to estimate better initial conditions using an imperfect dynamical model of the system and noisy/sparse observations available from the system. Ensemble Kalman filter (EnKF) is a DA algorithm that is widely used in applications involving high-dimensional nonlinear dynamical systems. Howev… ▽ More Data assimilation (DA) is a key component of many forecasting models in science and engineering. DA allows one to estimate better initial conditions using an imperfect dynamical model of the system and noisy/sparse observations available from the system. Ensemble Kalman filter (EnKF) is a DA algorithm that is widely used in applications involving high-dimensional nonlinear dynamical systems. However, EnKF requires evolving large ensembles of forecasts using the dynamical model of the system. This often becomes computationally intractable, especially when the number of states of the system is very large, e.g., for weather prediction. With small ensembles, the estimated background error covariance matrix in the EnKF algorithm suffers from sampling error, leading to an erroneous estimate of the analysis state (initial condition for the next forecast cycle). In this work, we propose hybrid ensemble Kalman filter (H-EnKF), which is applied to a two-layer quasi-geostrophic flow system as a test case. This framework utilizes a pre-trained deep learning-based data-driven surrogate that inexpensively generates and evolves a large data-driven ensemble of the states of the system to accurately compute the background error covariance matrix with less sampling error. The H-EnKF framework estimates a better initial condition without the need for any ad-hoc localization strategies. H-EnKF can be extended to any ensemble-based DA algorithm, e.g., particle filters, which are currently difficult to use for high dimensional systems. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2206.03198 [pdf, other]

doi 10.1093/pnasnexus/pgad015

Explaining the physics of transfer learning a data-driven subgrid-scale closure to a different turbulent flow

Authors: Adam Subel, Yifei Guan, Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs), such as weather/climate prediction and turbulence modeling. TL enables out-of-distribution generalization (e.g., extrapolation in parameters) and effective blending of disparate training sets (e.g., simulations and observations). In TL, selected layers of a NN, already trained for a base system,… ▽ More Transfer learning (TL) is becoming a powerful tool in scientific applications of neural networks (NNs), such as weather/climate prediction and turbulence modeling. TL enables out-of-distribution generalization (e.g., extrapolation in parameters) and effective blending of disparate training sets (e.g., simulations and observations). In TL, selected layers of a NN, already trained for a base system, are re-trained using a small dataset from a target system. For effective TL, we need to know 1) what are the best layers to re-train? and 2) what physics are learned during TL? Here, we present novel analyses and a new framework to address (1)-(2) for a broad range of multi-scale, nonlinear systems. Our approach combines spectral analyses of the systems' data with spectral analyses of convolutional NN's activations and kernels, explaining the inner-workings of TL in terms of the system's nonlinear physics. Using subgrid-scale modeling of several setups of 2D turbulence as test cases, we show that the learned kernels are combinations of low-, band-, and high-pass filters, and that TL learns new filters whose nature is consistent with the spectral differences of base and target systems. We also find the shallowest layers are the best to re-train in these cases, which is against the common wisdom guiding TL in machine learning literature. Our framework identifies the best layer(s) to re-train beforehand, based on physics and NN theory. Together, these analyses explain the physics learned in TL and provide a framework to guide TL for wide-ranging applications in science and engineering, such as climate change modeling. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 21 pages, 6 figures

arXiv:2205.04601 [pdf, other]

Long-term stability and generalization of observationally-constrained stochastic data-driven models for geophysical turbulence

Authors: Ashesh Chattopadhyay, Jaideep Pathak, Ebrahim Nabizadeh, Wahid Bhimji, Pedram Hassanzadeh

Abstract: Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lo… ▽ More Recent years have seen a surge in interest in building deep learning-based fully data-driven models for weather prediction. Such deep learning models if trained on observations can mitigate certain biases in current state-of-the-art weather models, some of which stem from inaccurate representation of subgrid-scale processes. However, these data-driven models, being over-parameterized, require a lot of training data which may not be available from reanalysis (observational data) products. Moreover, an accurate, noise-free, initial condition to start forecasting with a data-driven weather model is not available in realistic scenarios. Finally, deterministic data-driven forecasting models suffer from issues with long-term stability and unphysical climate drift, which makes these data-driven models unsuitable for computing climate statistics. Given these challenges, previous studies have tried to pre-train deep learning-based weather forecasting models on a large amount of imperfect long-term climate model simulations and then re-train them on available observational data. In this paper, we propose a convolutional variational autoencoder-based stochastic data-driven model that is pre-trained on an imperfect climate model simulation from a 2-layer quasi-geostrophic flow and re-trained, using transfer learning, on a small number of noisy observations from a perfect simulation. This re-trained model then performs stochastic forecasting with a noisy initial condition sampled from the perfect simulation. We show that our ensemble-based stochastic data-driven model outperforms a baseline deterministic encoder-decoder-based convolutional model in terms of short-term skills while remaining stable for long-term climate simulations yielding accurate climatology. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2202.11214 [pdf, other]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Authors: Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar

Abstract: FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning win… ▽ More FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2201.07347 [pdf, ps, other]

doi 10.1016/j.physd.2022.133568

Learning physics-constrained subgrid-scale closures in the small-data regime for stable and accurate LES

Authors: Yifei Guan, Adam Subel, Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: We demonstrate how incorporating physics constraints into convolutional neural networks (CNNs) enables learning subgrid-scale (SGS) closures for stable and accurate large-eddy simulations (LES) in the small-data regime (i.e., when the availability of high-quality training data is limited). Using several setups of forced 2D turbulence as the testbeds, we examine the {\it a priori} and {\it a poster… ▽ More We demonstrate how incorporating physics constraints into convolutional neural networks (CNNs) enables learning subgrid-scale (SGS) closures for stable and accurate large-eddy simulations (LES) in the small-data regime (i.e., when the availability of high-quality training data is limited). Using several setups of forced 2D turbulence as the testbeds, we examine the {\it a priori} and {\it a posteriori} performance of three methods for incorporating physics: 1) data augmentation (DA), 2) CNN with group convolutions (GCNN), and 3) loss functions that enforce a global enstrophy-transfer conservation (EnsCon). While the data-driven closures from physics-agnostic CNNs trained in the big-data regime are accurate and stable, and outperform dynamic Smagorinsky (DSMAG) closures, their performance substantially deteriorate when these CNNs are trained with 40x fewer samples (the small-data regime). We show that CNN with DA and GCNN address this issue and each produce accurate and stable data-driven closures in the small-data regime. Despite its simplicity, DA, which adds appropriately rotated samples to the training set, performs as well or in some cases even better than GCNN, which uses a sophisticated equivariance-preserving architecture. EnsCon, which combines structural modeling with aspect of functional modeling, also produces accurate and stable closures in the small-data regime. Overall, GCNN+EnCon, which combines these two physics constraints, shows the best {\it a posteriori} performance in this regime. These results illustrate the power of physics-constrained learning in the small-data regime for accurate and stable LES. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 23 pages, 9 figures

arXiv:2201.02702 [pdf]

An Improved Mathematical Model of Sepsis: Modeling, Bifurcation Analysis, and Optimal Control Study for Complex Nonlinear Infectious Disease System

Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

Abstract: Sepsis is a life-threatening medical emergency, which is a major cause of death worldwide and the second highest cause of mortality in the United States. Researching the optimal control treatment or intervention strategy on the comprehensive sepsis system is key in reducing mortality. For this purpose, first, this paper improves a complex nonlinear sepsis model proposed in our previous work. Then,… ▽ More Sepsis is a life-threatening medical emergency, which is a major cause of death worldwide and the second highest cause of mortality in the United States. Researching the optimal control treatment or intervention strategy on the comprehensive sepsis system is key in reducing mortality. For this purpose, first, this paper improves a complex nonlinear sepsis model proposed in our previous work. Then, bifurcation analyses are conducted for each sepsis subsystem to study the model behaviors under some system parameters. The bifurcation analysis results also further indicate the necessity of control treatment and intervention therapy. If the sepsis system is without adding any control under some parameter and initial system value settings, the system will perform persistent inflammation outcomes as time goes by. Therefore, we develop our complex improved nonlinear sepsis model into a sepsis optimal control model, and then use some effective biomarkers recommended in existing clinic practices as optimization objective function to measure the development of sepsis. Besides that, a Bayesian optimization algorithm by combining Recurrent neural network (RNN-BO algorithm) is introduced to predict the optimal control strategy for the studied sepsis optimal control system. The difference between the RNN-BO algorithm from other optimization algorithms is that once given any new initial system value setting (initial value is associated with the initial conditions of patients), the RNN-BO algorithm is capable of quickly predicting a corresponding time-series optimal control based on the historical optimal control data for any new sepsis patient. To demonstrate the effectiveness and efficiency of the RNN-BO algorithm on solving the optimal control solution on the complex nonlinear sepsis system, some numerical simulations are implemented by comparing with other optimization algorithms in this paper. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Comments: 25 pages, 7 figures, 1 table

arXiv:2201.00147 [pdf]

High-dimensional Bayesian Optimization Algorithm with Recurrent Neural Network for Disease Control Models in Time Series

Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

Abstract: Bayesian Optimization algorithm has become a promising approach for nonlinear global optimization problems and many machine learning applications. Over the past few years, improvements and enhancements have been brought forward and they have shown some promising results in solving the complex dynamic problems, systems of ordinary differential equations where the objective functions are computation… ▽ More Bayesian Optimization algorithm has become a promising approach for nonlinear global optimization problems and many machine learning applications. Over the past few years, improvements and enhancements have been brought forward and they have shown some promising results in solving the complex dynamic problems, systems of ordinary differential equations where the objective functions are computationally expensive to evaluate. Besides, the straightforward implementation of the Bayesian Optimization algorithm performs well merely for optimization problems with 10-20 dimensions. The study presented in this paper proposes a new high dimensional Bayesian Optimization algorithm combining Recurrent neural networks, which is expected to predict the optimal solution for the global optimization problems with high dimensional or time series decision models. The proposed RNN-BO algorithm can solve the optimal control problems in the lower dimension space and then learn from the historical data using the recurrent neural network to learn the historical optimal solution data and predict the optimal control strategy for any new initial system value setting. In addition, accurately and quickly providing the optimal control strategy is essential to effectively and efficiently control the epidemic spread while minimizing the associated financial costs. Therefore, to verify the effectiveness of the proposed algorithm, computational experiments are carried out on a deterministic SEIR epidemic model and a stochastic SIS optimal control model. Finally, we also discuss the impacts of different numbers of the RNN layers and training epochs on the trade-off between solution quality and related computational efforts. △ Less

Submitted 1 January, 2022; originally announced January 2022.

Comments: 16 pages, 9 figures, 2 tables

arXiv:2110.00546 [pdf, other]

doi 10.1063/5.0091282

Discovery of interpretable structural model errors by combining Bayesian sparse regression and data assimilation: A chaotic Kuramoto-Sivashinsky test case

Authors: Rambod Mojgani, Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, the… ▽ More Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, there is increasing interest in reducing model errors, particularly by leveraging the rapidly growing observational data to understand their physics and sources. Here, we introduce a framework named MEDIDA: Model Error Discovery with Interpretability and Data Assimilation. MEDIDA only requires a working numerical solver of the model and a small number of noise-free or noisy sporadic observations of the system. In MEDIDA, first the model error is estimated from differences between the observed states and model-predicted states (the latter are obtained from a number of one-time-step numerical integrations from the previous observed states). If observations are noisy, a data assimilation (DA) technique such as ensemble Kalman filter (EnKF) is employed to provide the analysis state of the system, which is then used to estimate the model error. Finally, an equation-discovery technique, here the relevance vector machine (RVM), a sparsity-promoting Bayesian method, is used to identify an interpretable, parsimonious, and closed-form representation of the model error. Using the chaotic Kuramoto-Sivashinsky (KS) system as the test case, we demonstrate the excellent performance of MEDIDA in discovering different types of structural/parametric model errors, representing different types of missing physics, using noise-free and noisy observations. △ Less

Submitted 2 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: 9 pages, 2 figures

Journal ref: Chaos 32, 061105 (2022)

arXiv:2109.13602 [pdf, other]

SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies

Authors: Matt Vitelli, Yan Chang, Yawei Ye, Maciej Wołczyk, Błażej Osiński, Moritz Niendorf, Hugo Grimmett, Qiangui Huang, Ashesh Jain, Peter Ondruska

Abstract: In this paper we present the first safe system for full control of self-driving vehicles trained from human demonstrations and deployed in challenging, real-world, urban environments. Current industry-standard solutions use rule-based systems for planning. Although they perform reasonably well in common scenarios, the engineering complexity renders this approach incompatible with human-level perfo… ▽ More In this paper we present the first safe system for full control of self-driving vehicles trained from human demonstrations and deployed in challenging, real-world, urban environments. Current industry-standard solutions use rule-based systems for planning. Although they perform reasonably well in common scenarios, the engineering complexity renders this approach incompatible with human-level performance. On the other hand, the performance of machine-learned (ML) planning solutions can be improved by simply adding more exemplar data. However, ML methods cannot offer safety guarantees and sometimes behave unpredictably. To combat this, our approach uses a simple yet effective rule-based fallback layer that performs sanity checks on an ML planner's decisions (e.g. avoiding collision, assuring physical feasibility). This allows us to leverage ML to handle complex situations while still assuring the safety, reducing ML planner-only collisions by 95%. We train our ML planner on 300 hours of expert driving demonstrations using imitation learning and deploy it along with the fallback layer in downtown San Francisco, where it takes complete control of a real vehicle and navigates a wide variety of challenging urban driving scenarios. △ Less

Submitted 28 September, 2021; originally announced September 2021.

arXiv:2108.02289 [pdf]

High dimensional Bayesian Optimization Algorithm for Complex System in Time Series

Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

Abstract: At present, high-dimensional global optimization problems with time-series models have received much attention from engineering fields. Since it was proposed, Bayesian optimization has quickly become a popular and promising approach for solving global optimization problems. However, the standard Bayesian optimization algorithm is insufficient to solving the global optimal solution when the model i… ▽ More At present, high-dimensional global optimization problems with time-series models have received much attention from engineering fields. Since it was proposed, Bayesian optimization has quickly become a popular and promising approach for solving global optimization problems. However, the standard Bayesian optimization algorithm is insufficient to solving the global optimal solution when the model is high-dimensional. Hence, this paper presents a novel high dimensional Bayesian optimization algorithm by considering dimension reduction and different dimension fill-in strategies. Most existing literature about Bayesian optimization algorithms did not discuss the sampling strategies to optimize the acquisition function. This study proposed a new sampling method based on both the multi-armed bandit and random search methods while optimizing the acquisition function. Besides, based on the time-dependent or dimension-dependent characteristics of the model, the proposed algorithm can reduce the dimension evenly. Then, five different dimension fill-in strategies were discussed and compared in this study. Finally, to increase the final accuracy of the optimal solution, the proposed algorithm adds a local search based on a series of Adam-based steps at the final stage. Our computational experiments demonstrated that the proposed Bayesian optimization algorithm could achieve reasonable solutions with excellent performances for high dimensional global optimization problems with a time-series optimal control model. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: 18 pages, 13 figures

arXiv:2108.00062 [pdf]

A New Bayesian Optimization Algorithm for Complex High-Dimensional Disease Epidemic Systems

Authors: Yuyang Chen, Kaiming Bi, Chih-Hang J. Wu, David Ben-Arieh, Ashesh Sinha

Abstract: This paper presents an Improved Bayesian Optimization (IBO) algorithm to solve complex high-dimensional epidemic models' optimal control solution. Evaluating the total objective function value for disease control models with hundreds of thousands of control time periods is a high computational cost. In this paper, we improve the conventional Bayesian Optimization (BO) approach from two parts. The… ▽ More This paper presents an Improved Bayesian Optimization (IBO) algorithm to solve complex high-dimensional epidemic models' optimal control solution. Evaluating the total objective function value for disease control models with hundreds of thousands of control time periods is a high computational cost. In this paper, we improve the conventional Bayesian Optimization (BO) approach from two parts. The existing BO methods optimize the minimizer step for once time during each acquisition function update process. To find a better solution for each acquisition function update, we do more local minimization steps to tune the algorithm. When the model is high dimensions, and the objective function is complicated, only some update iterations of the acquisition function may not find the global optimal solution. The IBO algorithm adds a series of Adam-based steps at the final stage of the algorithm to increase the solution's accuracy. Comparative simulation experiments using different kernel functions and acquisition functions have shown that the Improved Bayesian Optimization algorithm is effective and suitable for handing large-scale and complex epidemic models under study. The IBO algorithm is then compared with four other global optimization algorithms on three well-known synthetic test functions. The effectiveness and robustness of the IBO algorithm are also demonstrated through some simulation experiments to compare with the Particle Swarm Optimization algorithm and Random Search algorithm. With its reliable convergence behaviors and straightforward implementation, the IBO algorithm has a great potential to solve other complex optimal control problems with high dimensionality. △ Less

Submitted 30 July, 2021; originally announced August 2021.

Comments: 17 pages, 14 figures

arXiv:2107.08142 [pdf, other]

Autonomy 2.0: Why is self-driving always 5 years away?

Authors: Ashesh Jain, Luca Del Pero, Hugo Grimmett, Peter Ondruska

Abstract: Despite the numerous successes of machine learning over the past decade (image recognition, decision-making, NLP, image synthesis), self-driving technology has not yet followed the same trend. In this paper, we study the history, composition, and development bottlenecks of the modern self-driving stack. We argue that the slow progress is caused by approaches that require too much hand-engineering,… ▽ More Despite the numerous successes of machine learning over the past decade (image recognition, decision-making, NLP, image synthesis), self-driving technology has not yet followed the same trend. In this paper, we study the history, composition, and development bottlenecks of the modern self-driving stack. We argue that the slow progress is caused by approaches that require too much hand-engineering, an over-reliance on road testing, and high fleet deployment costs. We observe that the classical stack has several bottlenecks that preclude the necessary scale needed to capture the long tail of rare events. To resolve these problems, we outline the principles of Autonomy 2.0, an ML-first approach to self-driving, as a viable alternative to the currently adopted state-of-the-art. This approach is based on (i) a fully differentiable AV stack trainable from human demonstrations, (ii) closed-loop data-driven reactive simulation, and (iii) large-scale, low-cost data collections as critical solutions towards scalability issues. We outline the general architecture, survey promising works in this direction and propose key challenges to be addressed by the community in the future. △ Less

Submitted 9 August, 2021; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2103.09360 [pdf, other]

Towards physically consistent data-driven weather forecasting: Integrating data assimilation with equivariance-preserving deep spatial transformers

Authors: Ashesh Chattopadhyay, Mustafa Mustafa, Pedram Hassanzadeh, Eviatar Bach, Karthik Kashinath

Abstract: There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent sp… ▽ More There is growing interest in data-driven weather prediction (DDWP), for example using convolutional neural networks such as U-NETs that are trained on data from models or reanalysis. Here, we propose 3 components to integrate with commonly used DDWP models in order to improve their physical consistency and forecast accuracy. These components are 1) a deep spatial transformer added to the latent space of the U-NETs to preserve a property called equivariance, which is related to correctly capturing rotations and scalings of features in spatio-temporal data, 2) a data-assimilation (DA) algorithm to ingest noisy observations and improve the initial conditions for next forecasts, and 3) a multi-time-step algorithm, which combines forecasts from DDWP models with different time steps through DA, improving the accuracy of forecasts at short intervals. To show the benefit/feasibility of each component, we use geopotential height at 500~hPa (Z500) from ERA5 reanalysis and examine the short-term forecast accuracy of specific setups of the DDWP framework. Results show that the equivariance-preserving networks (U-STNs) clearly outperform the U-NETs, for example improving the forecast skill by $45\%$. Using a sigma-point ensemble Kalman (SPEnKF) algorithm for DA and U-STN as the forward model, we show that stable, accurate DA cycles are achieved even with high observation noise. The DDWP+DA framework substantially benefits from large ($O(1000)$) ensembles that are inexpensively generated with the data-driven forward model in each DA cycle. The multi-time-step DDWP+DA framework also shows promises, e.g., it reduces the average error by factors of 2-3. △ Less

Submitted 16 March, 2021; originally announced March 2021.

Comments: Under review in Geoscientific Model Development

arXiv:2103.02108 [pdf]

Linear and Nonlinear Viscoelasticity of Concentrated Thermoresponsive Microgel Suspensions

Authors: Gaurav Chaudhary, Ashesh Ghosh, ** Gu Kang, Paul V. Braun, Randy H. Ewoldt, Kenneth S. Schweizer

Abstract: This is an integrated experimental and theoretical study of the dynamics and rheology of self-crosslinked, slightly charged, temperature responsive soft Poly(N-isopropylacrylamide) (pNIPAM) microgels over a wide range of concentration and temperature spanning the sharp change in particle size and intermolecular interactions across the lower critical solution temperature (LCST). Dramatic, non-monot… ▽ More This is an integrated experimental and theoretical study of the dynamics and rheology of self-crosslinked, slightly charged, temperature responsive soft Poly(N-isopropylacrylamide) (pNIPAM) microgels over a wide range of concentration and temperature spanning the sharp change in particle size and intermolecular interactions across the lower critical solution temperature (LCST). Dramatic, non-monotonic changes in viscoelasticity are observed with temperature, with distinctive concentration dependences in the dense fluid, glassy, and soft-jammed states. Motivated by our experimental observations, we formulate a minimalistic model for the size dependence of a single microgel particle and the change of interparticle interaction from purely repulsive to attractive upon heating. Using microscopic equilibrium and time-dependent statistical mechanical theories, theoretical predictions are quantitatively compared with experimental measurements of the shear modulus. Good agreement is found for the nonmonotonic temperature behavior that originates as a consequence of the competition between reduced microgel packing fraction and increasing interpar-ticle attractions. Testable predictions are made for nonlinear rheological properties such as the yield stress and strain. To the best of our knowledge, this is the first attempt to quantitatively understand in a unified manner the viscoelasticity of dense, temperature-responsive microgel suspensions spanning a wide range of temperatures and concentrations. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: additional supplementary information provided

arXiv:2102.11400 [pdf, other]

doi 10.1016/j.jcp.2022.111090

Stable a posteriori LES of 2D turbulence using convolutional neural networks: Backscattering analysis and generalization to higher Re via transfer learning

Authors: Yifei Guan, Ashesh Chattopadhyay, Adam Subel, Pedram Hassanzadeh

Abstract: There is a growing interest in develo** data-driven subgrid-scale (SGS) models for large-eddy simulation (LES) using machine learning (ML). In a priori (offline) tests, some recent studies have found ML-based data-driven SGS models that are trained on high-fidelity data (e.g., from direct numerical simulation, DNS) to outperform baseline physics-based models and accurately capture the inter-scal… ▽ More There is a growing interest in develo** data-driven subgrid-scale (SGS) models for large-eddy simulation (LES) using machine learning (ML). In a priori (offline) tests, some recent studies have found ML-based data-driven SGS models that are trained on high-fidelity data (e.g., from direct numerical simulation, DNS) to outperform baseline physics-based models and accurately capture the inter-scale transfers, both forward (diffusion) and backscatter. While promising, instabilities in a posteriori (online) tests and inabilities to generalize to a different flow (e.g., with a higher Reynolds number, Re) remain as major obstacles in broadening the applications of such data-driven SGS models. For example, many of the same aforementioned studies have found instabilities that required often ad-hoc remedies to stabilize the LES at the expense of reducing accuracy. Here, using 2D decaying turbulence as the testbed, we show that deep fully convolutional neural networks (CNNs) can accurately predict the SGS forcing terms and the inter-scale transfers in a priori tests, and if trained with enough samples, lead to stable and accurate a posteriori LES-CNN. Further analysis attributes these instabilities to the disproportionally lower accuracy of the CNNs in capturing backscattering when the training set is small. We also show that transfer learning, which involves re-training the CNN with a small amount of data (e.g., 1%) from the new flow, enables accurate and stable a posteriori LES-CNN for flows with 16x higher Re (as well as higher grid resolution if needed). These results show the promise of CNNs with transfer learning to provide stable, accurate, and generalizable LES for practical use. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 30 pages, 12 figures

arXiv:2102.08175 [pdf, other]

doi 10.1175/AIES-D-21-0005.1

Accurate and Clear Precipitation Nowcasting with Consecutive Attention and Rain-map Discrimination

Authors: Ashesh, Buo-Fu Chen, Treng-Shi Huang, Boyo Chen, Chia-Tung Chang, Hsuan-Tien Lin

Abstract: Precipitation nowcasting is an important task for weather forecasting. Many recent works aim to predict the high rainfall events more accurately with the help of deep learning techniques, but such events are relatively rare. The rarity is often addressed by formulations that re-weight the rare events. Somehow such a formulation carries a side effect of making "blurry" predictions in low rainfall r… ▽ More Precipitation nowcasting is an important task for weather forecasting. Many recent works aim to predict the high rainfall events more accurately with the help of deep learning techniques, but such events are relatively rare. The rarity is often addressed by formulations that re-weight the rare events. Somehow such a formulation carries a side effect of making "blurry" predictions in low rainfall regions and cannot convince meteorologists to trust its practical usability. We fix the trust issue by introducing a discriminator that encourages the prediction model to generate realistic rain-maps without sacrificing predictive accuracy. Furthermore, we extend the nowcasting time frame from one hour to three hours to further address the needs from meteorologists. The extension is based on consecutive attentions across different hours. We propose a new deep learning model for precipitation nowcasting that includes both the discrimination and attention techniques. The model is examined on a newly-built benchmark dataset that contains both radar data and actual rain data. The benchmark, which will be publicly released, not only establishes the superiority of the proposed model, but also is expected to encourage future research on precipitation nowcasting. △ Less

Submitted 16 February, 2021; originally announced February 2021.

arXiv:2101.00352 [pdf, other]

Characterizing Fairness Over the Set of Good Models Under Selective Labels

Authors: Amanda Coston, Ashesh Rambachan, Alexandra Chouldechova

Abstract: Algorithmic risk assessments are used to inform decisions in a wide variety of high-stakes settings. Often multiple predictive models deliver similar overall performance but differ markedly in their predictions for individual cases, an empirical phenomenon known as the "Rashomon Effect." These models may have different properties over various groups, and therefore have different predictive fairnes… ▽ More Algorithmic risk assessments are used to inform decisions in a wide variety of high-stakes settings. Often multiple predictive models deliver similar overall performance but differ markedly in their predictions for individual cases, an empirical phenomenon known as the "Rashomon Effect." These models may have different properties over various groups, and therefore have different predictive fairness properties. We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance, or "the set of good models." Our framework addresses the empirically relevant challenge of selectively labelled data in the setting where the selection decision and outcome are unconfounded given the observed data features. Our framework can be used to 1) replace an existing model with one that has better fairness properties; or 2) audit for predictive bias. We illustrate these uses cases on a real-world credit-scoring task and a recidivism prediction task. △ Less

Submitted 30 April, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

Comments: Added comparison methods to the empirical lending analysis

arXiv:2012.06664 [pdf, other]

doi 10.1063/5.0040286

Data-driven subgrid-scale modeling of forced Burgers turbulence using deep learning with generalization to higher Reynolds numbers via transfer learning

Authors: Adam Subel, Ashesh Chattopadhyay, Yifei Guan, Pedram Hassanzadeh

Abstract: Develo** data-driven subgrid-scale (SGS) models for large eddy simulations (LES) has received substantial attention recently. Despite some success, particularly in a priori (offline) tests, challenges have been identified that include numerical instabilities in a posteriori (online) tests and generalization (i.e., extrapolation) of trained data-driven SGS models, for example to higher Reynolds n… ▽ More Develo** data-driven subgrid-scale (SGS) models for large eddy simulations (LES) has received substantial attention recently. Despite some success, particularly in a priori (offline) tests, challenges have been identified that include numerical instabilities in a posteriori (online) tests and generalization (i.e., extrapolation) of trained data-driven SGS models, for example to higher Reynolds numbers. Here, using the stochastically forced Burgers turbulence as the test-bed, we show that deep neural networks trained using properly pre-conditioned (augmented) data yield stable and accurate a posteriori LES models. Furthermore, we show that transfer learning enables accurate/stable generalization to a flow with 10x higher Reynolds number. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Journal ref: Physics of Fluids, 2021

arXiv:2010.05274 [pdf]

Full Automation for Rapid Modulator Characterization and Accurate Analysis Using SciPy

Authors: T. L. Yap, A. Sasidhara, N. X. Ang, X. Guo, W. Wang, K. S. Ang, S. L. Tan

Abstract: Modulator testing involved complex biasing conditions, hardware connections and data analysis. Also, any optical signal distortion due to the grating coupler effect could potentially induce additional difficulty in setting the correct bias condition for an accurate measurement of the modulator performance. In this paper, we proposed to use SciPy, an open-source scientific computing library, for au… ▽ More Modulator testing involved complex biasing conditions, hardware connections and data analysis. Also, any optical signal distortion due to the grating coupler effect could potentially induce additional difficulty in setting the correct bias condition for an accurate measurement of the modulator performance. In this paper, we proposed to use SciPy, an open-source scientific computing library, for automation in the silicon modulator test with bias setting and data analysis. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: 7 pages with 4 figures

arXiv:2009.06924 [pdf, other]

360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales

Authors: Ashesh, Chu-Song Chen, Hsuan-Tien Lin

Abstract: Gaze estimation involves predicting where the person is looking at within an image or video. Technically, the gaze information can be inferred from two different magnification levels: face orientation and eye orientation. The inference is not always feasible for gaze estimation in the wild, given the lack of clear eye patches in conditions like extreme left/right gazes or occlusions. In this work,… ▽ More Gaze estimation involves predicting where the person is looking at within an image or video. Technically, the gaze information can be inferred from two different magnification levels: face orientation and eye orientation. The inference is not always feasible for gaze estimation in the wild, given the lack of clear eye patches in conditions like extreme left/right gazes or occlusions. In this work, we design a model that mimics humans' ability to estimate the gaze by aggregating from focused looks, each at a different magnification level of the face area. The model avoids the need to extract clear eye patches and at the same time addresses another important issue of face-scale variation for gaze estimation in the wild. We further extend the model to handle the challenging task of 360-degree gaze estimation by encoding the backward gazes in the polar representation along with a robust averaging scheme. Experiment results on the ETH-XGaze dataset, which does not contain scale-varying faces, demonstrate the model's effectiveness to assimilate information from multiple scales. For other benchmark datasets with many scale-varying faces (Gaze360 and RT-GENE), the proposed model achieves state-of-the-art performance for gaze estimation when using either images or videos. Our code and pretrained models can be accessed at https://github.com/ashesh-0/MultiZoomGaze. △ Less

Submitted 26 October, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: accepted at BMVC 2021

arXiv:2008.00602 [pdf, other]

Design-Based Uncertainty for Quasi-Experiments

Authors: Ashesh Rambachan, Jonathan Roth

Abstract: This paper develops a finite-population, design-based theory of uncertainty for studying quasi-experimental settings in the social sciences. In our framework, treatment is determined by stochastic idiosyncratic factors, but individuals may differ in their probability of receiving treatment in ways unknown to the researcher, thus allowing for rich selection into treatment. We derive formulas for th… ▽ More This paper develops a finite-population, design-based theory of uncertainty for studying quasi-experimental settings in the social sciences. In our framework, treatment is determined by stochastic idiosyncratic factors, but individuals may differ in their probability of receiving treatment in ways unknown to the researcher, thus allowing for rich selection into treatment. We derive formulas for the bias of common estimators (including difference-in-means and difference-in-differences), and provide conditions under which they are unbiased for an interpretable causal estimand (e.g. analogs to the ATE or ATT). We further show that when the finite population is large, conventional standard errors are valid but typically conservative for the variance of the estimator over the randomization distribution. An interesting feature of our framework is that conventional standard errors tend to become more conservative when treatment probabilities vary more across units, i.e. when there is more selection into treatment. This conservativeness can (at least partially) mitigate undercoverage of conventional confidence intervals when the estimator is biased because of selection. Our results also have implications for the appropriate level to cluster standard errors, and for the analysis of linear covariate adjustment and instrumental variables in quasi-experimental settings. △ Less

Submitted 13 February, 2024; v1 submitted 2 August, 2020; originally announced August 2020.

arXiv:2006.14480 [pdf, other]

One Thousand and One Hours: Self-driving Motion Prediction Dataset

Authors: John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, Peter Ondruska

Abstract: Motivated by the impact of large-scale datasets on ML systems we present the largest self-driving dataset for motion prediction to date, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception out… ▽ More Motivated by the impact of large-scale datasets on ML systems we present the largest self-driving dataset for motion prediction to date, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time. On top of this, the dataset contains a high-definition semantic map with 15,242 labelled elements and a high-definition aerial view over the area. We show that using a dataset of this size dramatically improves performance for key self-driving problems. Combined with the provided software kit, this collection forms the largest and most detailed dataset to date for the development of self-driving machine learning tasks, such as motion forecasting, motion planning and simulation. The full dataset is available at http://level5.lyft.com/. △ Less

Submitted 16 November, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: Presente at CoRL2020

arXiv:2006.14005 [pdf, other]

doi 10.1063/5.0026258

The Role of Collective Elasticity on Activated Structural Relaxation, Yielding and Steady State Flow in Hard Sphere Fluids and Colloidal Suspensions Under Strong Deformation

Authors: Ashesh Ghosh, Kenneth S. Schweizer

Abstract: We theoretically study the effect of external deformation on activated structural relaxation and elementary aspects of the nonlinear mechanical response of glassy hard sphere fluids in the context of the nonequilibrium version of the Elastically Collective Nonlinear Langevin Equation (ECNLE) theory. ECNLE theory describes activated relaxation as a coupled local-nonlocal event involving local cagin… ▽ More We theoretically study the effect of external deformation on activated structural relaxation and elementary aspects of the nonlinear mechanical response of glassy hard sphere fluids in the context of the nonequilibrium version of the Elastically Collective Nonlinear Langevin Equation (ECNLE) theory. ECNLE theory describes activated relaxation as a coupled local-nonlocal event involving local caging and longer range collective elasticity, with the latter becoming more important with increasing packing fraction. The central new question is how this physical picture, and the relative importance of caging versus elasticity physics, depends on external stress, strain and shear rate. Theoretical predictions are presented for deformation induced enhancement of mobility, onset of relaxation speed up at remarkably low values of stress, strain or dimensionless shear rate, thinning of the structural relaxation time and viscosity with apparent power law exponents, a non-vanishing activation barrier in the shear thinning regime, a Herschel-Bulkley form of rate dependence of the steady state shear stress, exponential growth of dynamic yield stresses with packing fraction, and reduced dynamic fragility and heterogeneity under deformation. The results are contrasted with experiments and simulations, and qualitative or better agreement is found. An overarching conclusion is that deformation strongly reduces the importance of longer range collective elastic effects for most, but not all, physical questions, with stress-dependent dynamic heterogeneity phenomena being qualitatively sensitive to collective elasticity. Overall, nonlinear rheology is a more local cage scale problem than quiescent relaxation, albeit with deformation-modified activated processes still important. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 13 pages, 14 figures

arXiv:2003.09915 [pdf, other]

Panel Experiments and Dynamic Causal Effects: A Finite Population Perspective

Authors: Iavor Bo**ov, Ashesh Rambachan, Neil Shephard

Abstract: In panel experiments, we randomly assign units to different interventions, measuring their outcomes, and repeating the procedure in several periods. Using the potential outcomes framework, we define finite population dynamic causal effects that capture the relative effectiveness of alternative treatment paths. For a rich class of dynamic causal effects, we provide a nonparametric estimator that is… ▽ More In panel experiments, we randomly assign units to different interventions, measuring their outcomes, and repeating the procedure in several periods. Using the potential outcomes framework, we define finite population dynamic causal effects that capture the relative effectiveness of alternative treatment paths. For a rich class of dynamic causal effects, we provide a nonparametric estimator that is unbiased over the randomization distribution and derive its finite population limiting distribution as either the sample size or the duration of the experiment increases. We develop two methods for inference: a conservative test for weak null hypotheses and an exact randomization test for sharp null hypotheses. We further analyze the finite population probability limit of linear fixed effects estimators. These commonly-used estimators do not recover a causally interpretable estimand if there are dynamic causal effects and serial correlation in the assignments, highlighting the value of our proposed estimator. △ Less

Submitted 27 May, 2021; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: Forthcoming in Quantitative Economics

arXiv:2003.06720 [pdf, other]

doi 10.1103/PhysRevE.101.060601

Microscopic Theory of Onset of De-Caging and Bond Breaking Activated Dynamics in Ultra-Dense Fluids with Strong Short Range Attractions

Authors: Ashesh Ghosh, Kenneth S. Schweizer

Abstract: We theoretically study thermally activated elementary dynamical processes that precede full structural relaxation in ultra-dense particle liquids interacting via strong short range attractive forces. Our approach is based on a microscopic theory formulated at the particle trajectory level built on the dynamic free energy concept and an explicit treatment of how attractions control physical bonding… ▽ More We theoretically study thermally activated elementary dynamical processes that precede full structural relaxation in ultra-dense particle liquids interacting via strong short range attractive forces. Our approach is based on a microscopic theory formulated at the particle trajectory level built on the dynamic free energy concept and an explicit treatment of how attractions control physical bonding. Mean time scales for bond breaking, the early stage of cage escape, and a fixed non-Fickian displacement are analyzed in the repulsive glass, bonded repulsive (attractive) glass, fluid, and dense gel regimes. The theory predicts a strong length-scale-dependent growth of these time scales with attractive force strength at fixed packing fraction, a much weaker slowing down with density at fixed attraction strength, and a strong decoupling of the shorter bond breaking time with the other two time scales that are controlled mainly by perturbed steric caging. All results are in good accord with simulations, and additional testable predictions are made. The classic statistical mechanical projection approximation of replacing all bare attractive and repulsive forces with a single effective force determined by pair structure incurs major errors for describing processes associated with thermally activated escape from transiently localized states. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. E 101, 060601 (2020)

arXiv:2002.11167 [pdf, other]

doi 10.1029/2020MS002084

Data-driven super-parameterization using deep learning: Experimentation with multi-scale Lorenz 96 systems and transfer-learning

Authors: Ashesh Chattopadhyay, Adam Subel, Pedram Hassanzadeh

Abstract: To make weather/climate modeling computationally affordable, small-scale processes are usually represented in terms of the large-scale, explicitly-resolved processes using physics-based or semi-empirical parameterization schemes. Another approach, computationally more demanding but often more accurate, is super-parameterization (SP), which involves integrating the equations of small-scale processe… ▽ More To make weather/climate modeling computationally affordable, small-scale processes are usually represented in terms of the large-scale, explicitly-resolved processes using physics-based or semi-empirical parameterization schemes. Another approach, computationally more demanding but often more accurate, is super-parameterization (SP), which involves integrating the equations of small-scale processes on high-resolution grids embedded within the low-resolution grids of large-scale processes. Recently, studies have used machine learning (ML) to develop data-driven parameterization (DD-P) schemes. Here, we propose a new approach, data-driven SP (DD-SP), in which the equations of the small-scale processes are integrated data-drivenly using ML methods such as recurrent neural networks. Employing multi-scale Lorenz 96 systems as testbed, we compare the cost and accuracy (in terms of both short-term prediction and long-term statistics) of parameterized low-resolution (LR), SP, DD-P, and DD-SP models. We show that with the same computational cost, DD-SP substantially outperforms LR, and is better than DD-P, particularly when scale separation is lacking. DD-SP is much cheaper than SP, yet its accuracy is the same in reproducing long-term statistics and often comparable in short-term forecasting. We also investigate generalization, finding that when models trained on data from one system are applied to a system with different forcing (e.g., more chaotic), the models often do not generalize, particularly when the short-term prediction accuracy is examined. But we show that transfer-learning, which involves re-training the data-driven model with a small amount of data from the new system, significantly improves generalization. Potential applications of DD-SP and transfer-learning in climate/weather modeling and the expected challenges are discussed. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Journal ref: Journal of Advances in Modeling Earth Systems 2020

arXiv:1910.01284 [pdf]

doi 10.1063/1.5129941

Microscopic Theory of the Influence of Strong Attractive Forces on the Activated Dynamics of Dense Glass and Gel Forming Fluids

Authors: Ashesh Ghosh, Kenneth S. Schweizer

Abstract: We theoretically study the non-monotonic (re-entrant) activated dynamics associated with a repulsive glass to fluid to attractive glass transition in high density particle suspensions interacting via strong short range attractive forces. The classic theoretical projection approximation that replaces all microscopic forces by a single effective force determined solely by equilibrium pair correlatio… ▽ More We theoretically study the non-monotonic (re-entrant) activated dynamics associated with a repulsive glass to fluid to attractive glass transition in high density particle suspensions interacting via strong short range attractive forces. The classic theoretical projection approximation that replaces all microscopic forces by a single effective force determined solely by equilibrium pair correlations is revisited based on the projectionless dynamic theory (PDT) that avoids force projection. A hybrid-PDT is formulated that explicitly quantifies how attractive forces induce dynamical constraints, while singular hard core interactions are treated based on the projection approach. Both the effects of interference between repulsive and attractive forces, and structural changes due to attraction-induced bond formation that competes with caging, are included. Combined with the microscopic Elastically Collective Nonlinear Langevin Equation (ECNLE) theory of activated relaxation, the resultant approach appears to properly capture both the re-entrant dynamic crossover behavior and the strong non-monotonic variation of the activated structural relaxation time with attraction strength and range at very high volume fractions. Qualitative differences with ECNLE theory-based results that adopt the full projection approximation are identified, and testable predictions made. The new formulation appears qualitatively consistent with multiple experimental and simulation studies, and provides a new perspective for the overall problem that is rooted in activated motion and interference between repulsive and attractive forces. This is conceptually distinct from empirical shifting or other ad hoc modifications of ideal mode coupling theory which do not take into account activated dynamics. Implications for thermal glass forming liquids are briefly discussed. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: 17 pages, 10 figures (additional Supplementary Material)

arXiv:1909.08518 [pdf, other]

doi 10.4230/LIPIcs.FORC.2020.6

Bias In, Bias Out? Evaluating the Folk Wisdom

Authors: Ashesh Rambachan, Jonathan Roth

Abstract: We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the… ▽ More We evaluate the folk wisdom that algorithmic decision rules trained on data produced by biased human decision-makers necessarily reflect this bias. We consider a setting where training labels are only generated if a biased decision-maker takes a particular action, and so "biased" training data arise due to discriminatory selection into the training data. In our baseline model, the more biased the decision-maker is against a group, the more the algorithmic decision rule favors that group. We refer to this phenomenon as "bias reversal." We then clarify the conditions that give rise to bias reversal. Whether a prediction algorithm reverses or inherits bias depends critically on how the decision-maker affects the training data as well as the label used in training. We illustrate our main theoretical results in a simulation study applied to the New York City Stop, Question and Frisk dataset. △ Less

Submitted 19 December, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

Journal ref: 1st Symposium on Foundations of Responsible Computing (FORC 2020)

arXiv:1909.05943 [pdf]

Spline-based Interface Modeling and Optimization (SIMO) for Surface Tension and Contact Angle Measurements

Authors: Karan Jakhar, Ashesh Chattopadhyay, Atul Thakur, Rishi Raj

Abstract: Surface tension and contact angle measurements are fundamental characterization techniques relevant to thermal and fluidic applications. Drop shape analysis techniques for the measurement of interfacial tension are powerful, versatile and flexible. Here we develop a Spline-based Interface Modeling and Optimization (SIMO) tool for estimating the surface tension and the contact angle from the profil… ▽ More Surface tension and contact angle measurements are fundamental characterization techniques relevant to thermal and fluidic applications. Drop shape analysis techniques for the measurement of interfacial tension are powerful, versatile and flexible. Here we develop a Spline-based Interface Modeling and Optimization (SIMO) tool for estimating the surface tension and the contact angle from the profiles of sessile and pendant drops of various sizes. The employed strategy models the profile using a vector parametrized cubic spline which is then evolved to the eventual equilibrium shape using a novel thermodynamic free-energy minimization-based iterative algorithm. We perform experiments to show that in comparison, the typical fitting-based techniques are very sensitive to errors due to image acquisition, digitization and edge detection, and do not predict the correct surface tension and the contact angle values. We mimic these errors in theoretical drop profiles by applying the Gaussian noise and the smoothing filters. We then demonstrate that our optimization algorithm can even drive such inaccurate digitized profiles to the minimum energy equilibrium shape for the precise estimation of the surface tension and the contact angle values. We compare our scheme with software tools available in public domain to characterize the accuracy and the precision of SIMO. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: 28 pages, 8 Figures. Supporting Information can be accessed on: https://drive.google.com/open?id=13Tp0ILu0lOf0V7lxSX7ERjRMjFCDH3y4

arXiv:1907.11617 [pdf, other]

doi 10.1029/2019MS001958

Analog forecasting of extreme-causing weather patterns using deep learning

Authors: Ashesh Chattopadhyay, Ebrahim Nabizadeh, Pedram Hassanzadeh

Abstract: Numerical weather prediction (NWP) models require ever-growing computing time/resources, but still, have difficulties with predicting weather extremes. Here we introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns) and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and impact-based auto-labelin… ▽ More Numerical weather prediction (NWP) models require ever-growing computing time/resources, but still, have difficulties with predicting weather extremes. Here we introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns) and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and impact-based auto-labeling strategy. CapsNets are trained on mid-tropospheric large-scale circulation patterns (Z500) labeled $0-4$ depending on the existence and geographical region of surface temperature extremes over North America several days ahead. The trained networks predict the occurrence/region of cold or heat waves, only using Z500, with accuracies (recalls) of $69\%-45\%$ $(77\%-48\%)$ or $62\%-41\%$ $(73\%-47\%)$ $1-5$ days ahead. CapsNets outperform simpler techniques such as convolutional neural networks and logistic regression. Using both temperature and Z500, accuracies (recalls) with CapsNets increase to $\sim 80\%$ $(88\%)$, showing the promises of multi-modal data-driven frameworks for accurate/fast extreme weather predictions, which can augment NWP efforts in providing early warnings. △ Less

Submitted 12 January, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

Comments: Accepted in Journal of Advances in Modeling Earth System

arXiv:1906.08829 [pdf, other]

doi 10.5194/npg-27-373-2020

Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM

Authors: Ashesh Chattopadhyay, Pedram Hassanzadeh, Devika Subramanian

Abstract: In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM… ▽ More In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known or used. We show that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver's time steps, equivalent to several Lyapunov timescales. The RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. The PDF of the data predicted using ANN, however, deviates from the true PDF. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems such as weather/climate are discussed. △ Less

Submitted 5 December, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: Some changes, in Figures, addition of an appendix etc has been done

Journal ref: Nonlin. Processes Geophys. 2020

Showing 1–50 of 69 results for author: Ashesh