Search | arXiv e-print repository

Evidential Deep Learning: Enhancing Predictive Uncertainty Estimation for Earth System Science Applications

Authors: John S. Schreck, David John Gagne II, Charlie Becker, William E. Chapman, Kim Elmore, Da Fan, Gabrielle Gantos, Eliot Kim, Dhamma Kimpara, Thomas Martin, Maria J. Molina, Vanessa M. Pryzbylo, Jacob Radford, Belen Saavedra, Justin Willson, Christopher Wirz

Abstract: Robust quantification of predictive uncertainty is critical for understanding factors that drive weather and climate outcomes. Ensembles provide predictive uncertainty estimates and can be decomposed physically, but both physics and machine learning ensembles are computationally expensive. Parametric deep learning can estimate uncertainty with one model by predicting the parameters of a probabilit… ▽ More Robust quantification of predictive uncertainty is critical for understanding factors that drive weather and climate outcomes. Ensembles provide predictive uncertainty estimates and can be decomposed physically, but both physics and machine learning ensembles are computationally expensive. Parametric deep learning can estimate uncertainty with one model by predicting the parameters of a probability distribution but do not account for epistemic uncertainty.. Evidential deep learning, a technique that extends parametric deep learning to higher-order distributions, can account for both aleatoric and epistemic uncertainty with one model. This study compares the uncertainty derived from evidential neural networks to those obtained from ensembles. Through applications of classification of winter precipitation type and regression of surface layer fluxes, we show evidential deep learning models attaining predictive accuracy rivaling standard methods, while robustly quantifying both sources of uncertainty. We evaluate the uncertainty in terms of how well the predictions are calibrated and how well the uncertainty correlates with prediction error. Analyses of uncertainty in the context of the inputs reveal sensitivities to underlying meteorological processes, facilitating interpretation of the models. The conceptual simplicity, interpretability, and computational efficiency of evidential neural networks make them highly extensible, offering a promising approach for reliable and practical uncertainty quantification in Earth system science modeling. In order to encourage broader adoption of evidential deep learning in Earth System Science, we have developed a new Python package, MILES-GUESS (https://github.com/ai2es/miles-guess), that enables users to train and evaluate both evidential and ensemble deep learning. △ Less

Submitted 19 February, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.09386 [pdf, other]

Axioms for Distanceless Graph Partitioning

Authors: James Willson, Tandy Warnow

Abstract: In 2002, Kleinberg proposed three axioms for distance-based clustering, and proved that it was impossible for a clustering method to satisfy all three. While there has been much subsequent work examining and modifying these axioms for distance-based clustering, little work has been done to explore axioms relevant to the graph partitioning problem when the graph is unweighted and given without a di… ▽ More In 2002, Kleinberg proposed three axioms for distance-based clustering, and proved that it was impossible for a clustering method to satisfy all three. While there has been much subsequent work examining and modifying these axioms for distance-based clustering, little work has been done to explore axioms relevant to the graph partitioning problem when the graph is unweighted and given without a distance matrix. Here, we propose and explore axioms for graph partitioning for this case, including modifications of Kleinberg's axioms and three others: two axioms relevant to the ``Resolution Limit'' and one addressing well-connectedness. We prove that clustering under the Constant Potts Model satisfies all the axioms, while Modularity clustering and iterative k-core both fail many axioms we pose. These theoretical properties of the clustering methods are relevant both for theoretical investigation as well as to practitioners considering which methods to use for their domain science studies. △ Less

Submitted 17 June, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2308.13194 [pdf, other]

Fast and Accurate Simulations of Partially Delocalised Charge Separation in Organic Semiconductors

Authors: Jacob T. Willson, Daniel Balzer, Ivan Kassal

Abstract: Accurate computational screening of candidate materials promises to accelerate the discovery of higher-efficiency organic photovoltaics (OPVs). However, modelling charge separation in OPVs is challenging because accurate models must include disorder, polaron formation, and charge delocalisation. Delocalised kinetic Monte Carlo (dKMC) includes these three essential ingredients, but it suffers from… ▽ More Accurate computational screening of candidate materials promises to accelerate the discovery of higher-efficiency organic photovoltaics (OPVs). However, modelling charge separation in OPVs is challenging because accurate models must include disorder, polaron formation, and charge delocalisation. Delocalised kinetic Monte Carlo (dKMC) includes these three essential ingredients, but it suffers from high computational cost. Recently, we developed jum** kinetic Monte Carlo (jKMC), a computationally cheap and accurate model of delocalised charge transport that models transport over a lattice of identical, spherical polarons. Here, we extend jKMC to describe the separation of a charge-transfer state, showing that this simplified approach can reproduce the considerable improvements in charge-separation efficiencies caused by delocalisation and first seen in dKMC. The low computational cost and simplicity of jKMC allows it to be applied to parameter regimes intractable by dKMC, and ensures jKMC can be easily incorporated into any existing KMC model. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2211.16165 [pdf, other]

doi 10.1021/acs.jpclett.3c00388

Jum** kinetic Monte Carlo: Fast and accurate simulations of partially delocalised charge transport in organic semiconductors

Authors: Jacob T. Willson, William Liu, Daniel Balzer, Ivan Kassal

Abstract: Develo** devices using disordered organic semiconductors requires accurate and practical models of charge transport. In these materials, charge transport occurs through partially delocalised states in an intermediate regime between localised hop** and delocalised band conduction. Partial delocalisation can increase mobilities by orders of magnitude over conventional hop**, making it importan… ▽ More Develo** devices using disordered organic semiconductors requires accurate and practical models of charge transport. In these materials, charge transport occurs through partially delocalised states in an intermediate regime between localised hop** and delocalised band conduction. Partial delocalisation can increase mobilities by orders of magnitude over conventional hop**, making it important for materials and device design. Although delocalisation, disorder, and polaron formation can be described using delocalised kinetic Monte Carlo (dKMC), it is a computationally expensive method. Here, we develop jum** kinetic Monte Carlo (jKMC), a model that approaches the accuracy of dKMC with a computational cost comparable to conventional hop**. jKMC achieves its computational performance by modelling conduction using identical spherical polarons, yielding a simple delocalisation correction to the Marcus hop** rate that allows polarons to jump over their nearest neighbours. jKMC can be used in regimes of partial delocalisation inaccessible to dKMC to show that modest delocalisation can increase mobilities by as much as two orders of magnitude. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Journal ref: J. Phys. Chem. Lett. 14, 3757 (2023)

arXiv:2107.09191 [pdf, other]

doi 10.1016/j.laa.2021.07.008

On the Geometry of Numerical Ranges Over Finite Fields

Authors: Kristin A. Camenga, Brandon Collins, Gage Hoefer, Jonny Quezada, Patrick X. Rault, James Willson, Rebekah B. Johnson Yates

Abstract: Numerical ranges over a certain family of finite fields were classified in 2016 by a team including our fifth author. Soon afterward, in 2017 Ballico generalized these results to all finite fields and published some new results about the cardinality of the finite field numerical range. In this paper we study the geometry of these finite fields using the boundary generating curve, first introduced… ▽ More Numerical ranges over a certain family of finite fields were classified in 2016 by a team including our fifth author. Soon afterward, in 2017 Ballico generalized these results to all finite fields and published some new results about the cardinality of the finite field numerical range. In this paper we study the geometry of these finite fields using the boundary generating curve, first introduced by Kippenhahn in 1951. We restrict our study to square matrices of dimension 2, with at least one eigenvalue in $\mathbb F_{q^2}$. △ Less

Submitted 19 July, 2021; originally announced July 2021.

Comments: 17 pages, 2 figures; will appear in "Linear Algebra and its Applications", July 2021

MSC Class: 15A60

arXiv:1811.08185 [pdf, ps, other]

doi 10.1007/s10898-019-00804-y

Approximation Algorithm for the Partial Set Multi-Cover Problem

Authors: Yishuo Shi, Yingli Ran, Zhao Zhang, James Willson, Guangmo Tong, Ding-Zhu Du

Abstract: Partial set cover problem and set multi-cover problem are two generalizations of set cover problem. In this paper, we consider the partial set multi-cover problem which is a combination of them: given an element set $E$, a collection of sets $\mathcal S\subseteq 2^E$, a total covering ratio $q$ which is a constant between 0 and 1, each set $S\in\mathcal S$ is associated with a cost $c_S$, each ele… ▽ More Partial set cover problem and set multi-cover problem are two generalizations of set cover problem. In this paper, we consider the partial set multi-cover problem which is a combination of them: given an element set $E$, a collection of sets $\mathcal S\subseteq 2^E$, a total covering ratio $q$ which is a constant between 0 and 1, each set $S\in\mathcal S$ is associated with a cost $c_S$, each element $e\in E$ is associated with a covering requirement $r_e$, the goal is to find a minimum cost sub-collection $\mathcal S'\subseteq\mathcal S$ to fully cover at least $q|E|$ elements, where element $e$ is fully covered if it belongs to at least $r_e$ sets of $\mathcal S'$. Denote by $r_{\max}=\max\{r_e\colon e\in E\}$ the maximum covering requirement. We present an $(O(\frac{r_{\max}\log^2n}{\varepsilon}),1-\varepsilon)$-bicriteria approximation algorithm, that is, the output of our algorithm has cost at most $O(\frac{r_{\max}\log^2 n}{\varepsilon})$ times of the optimal value while the number of fully covered elements is at least $(1-\varepsilon)q|E|$. △ Less

Submitted 20 November, 2018; originally announced November 2018.

MSC Class: 68W25 ACM Class: G.2.1

Journal ref: Journal of Global Optimization, 2019

arXiv:1501.07528 [pdf, other]

doi 10.1007/s00026-016-0324-y

Comparing and simplifying distinct-cluster phylogenetic networks

Authors: Stephen J. Willson

Abstract: Phylogenetic networks are rooted acyclic directed graphs in which the leaves are identified with members of a set X of species. The cluster of a vertex is the set of leaves that are descendants of the vertex. A network is "distinct-cluster" if distinct vertices have distinct clusters. This paper focuses on the set DC(X) of distinct-cluster networks whose leaves are identified with the members of X… ▽ More Phylogenetic networks are rooted acyclic directed graphs in which the leaves are identified with members of a set X of species. The cluster of a vertex is the set of leaves that are descendants of the vertex. A network is "distinct-cluster" if distinct vertices have distinct clusters. This paper focuses on the set DC(X) of distinct-cluster networks whose leaves are identified with the members of X. For a fixed X, a metric on DC(X) is defined. There is a "cluster-preserving" simplification process by which vertices or certain arcs may be removed without changing the clusters of any remaining vertices. Many of the resulting networks may be uniquely determined without regard to the order of the simplifying operations. △ Less

Submitted 6 August, 2016; v1 submitted 29 January, 2015; originally announced January 2015.

Comments: This is version 2. A previous version is already on ArXiv

Journal ref: Annals of Combinatorics (2016), 1-22

arXiv:1005.4956 [pdf, ps, other]

Restricted trees: simplifying networks with bottlenecks

Authors: Stephen J. Willson

Abstract: Suppose N is a phylogenetic network indicating a complicated relationship among individuals and taxa. Often of interest is a much simpler network, for example, a species tree T, that summarizes the most fundamental relationships. The meaning of a species tree is made more complicated by the recent discovery of the importance of hybridizations and lateral gene transfers. Hence it is desirable to de… ▽ More Suppose N is a phylogenetic network indicating a complicated relationship among individuals and taxa. Often of interest is a much simpler network, for example, a species tree T, that summarizes the most fundamental relationships. The meaning of a species tree is made more complicated by the recent discovery of the importance of hybridizations and lateral gene transfers. Hence it is desirable to describe uniform well-defined procedures that yield a tree given a network N. A useful tool toward this end is a connected surjective digraph (CSD) map f from N to N' where N' is generally a much simpler network than N. A set W of vertices in N is "restricted" if there is at most one vertex from which there is an arc into W, thus yielding a bottleneck in N. A CSD map f from N to N' is "restricted" if the inverse image of each vertex in N' is restricted in N. This paper describes a uniform procedure that, given a network N, yields a well-defined tree called the "restricted tree" of N. There is a restricted CSD map from N to the restricted tree. Many relationships in the tree can be proved to appear also in N. △ Less

Submitted 26 May, 2010; originally announced May 2010.

Comments: 17 pages, 2 figures

MSC Class: 92D15 (Primary); 05C20 (Secondary); 05C05

Journal ref: Bulletin of Mathematical Biology (2011) 73, 2322-2338

arXiv:1005.2108 [pdf, ps, other]

doi 10.1109/TCBB.2012.52

CSD Homomorphisms Between Phylogenetic Networks

Authors: Stephen J. Willson

Abstract: Since Darwin, species trees have been used as a simplified description of the relationships which summarize the complicated network $N$ of reality. Recent evidence of hybridization and lateral gene transfer, however, suggest that there are situations where trees are inadequate. Consequently it is important to determine properties that characterize networks closely related to $N$ and possibly more… ▽ More Since Darwin, species trees have been used as a simplified description of the relationships which summarize the complicated network $N$ of reality. Recent evidence of hybridization and lateral gene transfer, however, suggest that there are situations where trees are inadequate. Consequently it is important to determine properties that characterize networks closely related to $N$ and possibly more complicated than trees but lacking the full complexity of $N$. A connected surjective digraph map (CSD) is a map $f$ from one network $N$ to another network $M$ such that every arc is either collapsed to a single vertex or is taken to an arc, such that $f$ is surjective, and such that the inverse image of a vertex is always connected. CSD maps are shown to behave well under composition. It is proved that if there is a CSD map from $N$ to $M$, then there is a way to lift an undirected version of $M$ into $N$, often with added resolution. A CSD map from $N$ to $M$ puts strong constraints on $N$. In general, it may be useful to study classes of networks such that, for any $N$, there exists a CSD map from $N$ to some standard member of that class. △ Less

Submitted 6 August, 2016; v1 submitted 12 May, 2010; originally announced May 2010.

Comments: 19 pages, 3 figures

MSC Class: 92D15; 05C20

Journal ref: IEEE/ACM Transactions on Computational Biology and Bioinformatics (2012) 9: 1128-1138

arXiv:0902.2970 [pdf, ps, other]

Regular networks are determined by their trees

Authors: Stephen J. Willson

Abstract: A rooted acyclic digraph N with labelled leaves displays a tree T when there exists a way to select a unique parent of each hybrid vertex resulting in the tree T. Let Tr(N) denote the set of all trees displayed by the network N. In general, there may be many other networks M such that Tr(M) = Tr(N). A network is regular if it is isomorphic with its cover digraph. This paper shows that if N is re… ▽ More A rooted acyclic digraph N with labelled leaves displays a tree T when there exists a way to select a unique parent of each hybrid vertex resulting in the tree T. Let Tr(N) denote the set of all trees displayed by the network N. In general, there may be many other networks M such that Tr(M) = Tr(N). A network is regular if it is isomorphic with its cover digraph. This paper shows that if N is regular, there is a procedure to reconstruct N given Tr(N). Hence if N and M are regular networks and Tr(N) = Tr(M), it follows that N = M, proving that a regular network is uniquely determined by its displayed trees. △ Less

Submitted 17 February, 2009; originally announced February 2009.

Comments: 16 pages

Journal ref: IEEE/ACM Transactions on Computational Biology and Bioinformatics 8 (2011) 785-796

Showing 1–10 of 10 results for author: Willson, J