-
Advancing Extrapolative Predictions of Material Properties through Learning to Learn
Authors:
Kohei Noda,
Araki Wakiuchi,
Yoshihiro Hayashi,
Ryo Yoshida
Abstract:
Recent advancements in machine learning have showcased its potential to significantly accelerate the discovery of new materials. Central to this progress is the development of rapidly computable property predictors, enabling the identification of novel materials with desired properties from vast material spaces. However, the limited availability of data resources poses a significant challenge in d…
▽ More
Recent advancements in machine learning have showcased its potential to significantly accelerate the discovery of new materials. Central to this progress is the development of rapidly computable property predictors, enabling the identification of novel materials with desired properties from vast material spaces. However, the limited availability of data resources poses a significant challenge in data-driven materials research, particularly hindering the exploration of innovative materials beyond the boundaries of existing data. While machine learning predictors are inherently interpolative, establishing a general methodology to create an extrapolative predictor remains a fundamental challenge, limiting the search for innovative materials beyond existing data boundaries. In this study, we leverage an attention-based architecture of neural networks and meta-learning algorithms to acquire extrapolative generalization capability. The meta-learners, experienced repeatedly with arbitrarily generated extrapolative tasks, can acquire outstanding generalization capability in unexplored material spaces. Through the tasks of predicting the physical properties of polymeric materials and hybrid organic--inorganic perovskites, we highlight the potential of such extrapolatively trained models, particularly with their ability to rapidly adapt to unseen material domains in transfer learning scenarios.
△ Less
Submitted 25 March, 2024;
originally announced April 2024.
-
Tropical Fermat-Weber Polytropes
Authors:
David Barnhill,
John Sabol,
Ruriko Yoshida,
Keiji Miura
Abstract:
We study the geometry of tropical Fermat-Weber points in terms of the symmetric tropical metric over the tropical projective torus. It is well known that a tropical Fermat-Weber point of a given sample is not unique and in this paper we show that the set of all possible Fermat-Weber points forms a polytrope. Then, we introduce the tropical Fermat-Weber gradient and using them, we show that the tro…
▽ More
We study the geometry of tropical Fermat-Weber points in terms of the symmetric tropical metric over the tropical projective torus. It is well known that a tropical Fermat-Weber point of a given sample is not unique and in this paper we show that the set of all possible Fermat-Weber points forms a polytrope. Then, we introduce the tropical Fermat-Weber gradient and using them, we show that the tropical Fermat-Weber polytrope is a bounded cell of a tropical hyperplane arrangement given by both min- and max-tropical hyperplanes with apices which are observations in the input data.
△ Less
Submitted 23 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic Supervision
Authors:
Ryo Yoshida,
Taiga Someya,
Yohei Oseki
Abstract:
Syntactic Language Models (SLMs) can be trained efficiently to reach relatively high performance; however, they have trouble with inference efficiency due to the explicit generation of syntactic structures. In this paper, we propose a new method dubbed tree-planting: instead of explicitly generating syntactic structures, we "plant" trees into attention weights of unidirectional Transformer LMs to…
▽ More
Syntactic Language Models (SLMs) can be trained efficiently to reach relatively high performance; however, they have trouble with inference efficiency due to the explicit generation of syntactic structures. In this paper, we propose a new method dubbed tree-planting: instead of explicitly generating syntactic structures, we "plant" trees into attention weights of unidirectional Transformer LMs to implicitly reflect syntactic structures of natural language. Specifically, unidirectional Transformer LMs trained with tree-planting will be called Tree-Planted Transformers (TPT), which inherit the training efficiency from SLMs without changing the inference efficiency of their underlying Transformer LMs. Targeted syntactic evaluations on the SyntaxGym benchmark demonstrated that TPTs, despite the lack of explicit generation of syntactic structures, significantly outperformed not only vanilla Transformer LMs but also various SLMs that generate hundreds of syntactic structures in parallel. This result suggests that TPTs can learn human-like syntactic knowledge as data-efficiently as SLMs while maintaining the modeling space of Transformer LMs unchanged.
△ Less
Submitted 6 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Emergent Word Order Universals from Cognitively-Motivated Language Models
Authors:
Tatsuki Kuribayashi,
Ryo Ueda,
Ryo Yoshida,
Yohei Oseki,
Ted Briscoe,
Timothy Baldwin
Abstract:
The world's languages exhibit certain so-called typological or implicational universals; for example, Subject-Object-Verb (SOV) languages typically use postpositions. Explaining the source of such biases is a key goal of linguistics. We study word-order universals through a computational simulation with language models (LMs). Our experiments show that typologically-typical word orders tend to have…
▽ More
The world's languages exhibit certain so-called typological or implicational universals; for example, Subject-Object-Verb (SOV) languages typically use postpositions. Explaining the source of such biases is a key goal of linguistics. We study word-order universals through a computational simulation with language models (LMs). Our experiments show that typologically-typical word orders tend to have lower perplexity estimated by LMs with cognitively plausible biases: syntactic biases, specific parsing strategies, and memory limitations. This suggests that the interplay of cognitive biases and predictability (perplexity) can explain many aspects of word-order universals. It also showcases the advantage of cognitively-motivated LMs, typically employed in cognitive modeling, in the simulation of language universals.
△ Less
Submitted 7 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Tropical Decision Boundaries for Neural Networks Are Robust Against Adversarial Attacks
Authors:
Kurt Pasque,
Christopher Teska,
Ruriko Yoshida,
Keiji Miura,
Jefferson Huang
Abstract:
We introduce a simple, easy to implement, and computationally efficient tropical convolutional neural network architecture that is robust against adversarial attacks. We exploit the tropical nature of piece-wise linear neural networks by embedding the data in the tropical projective torus in a single hidden layer which can be added to any model. We study the geometry of its decision boundary theor…
▽ More
We introduce a simple, easy to implement, and computationally efficient tropical convolutional neural network architecture that is robust against adversarial attacks. We exploit the tropical nature of piece-wise linear neural networks by embedding the data in the tropical projective torus in a single hidden layer which can be added to any model. We study the geometry of its decision boundary theoretically and show its robustness against adversarial attacks on image datasets using computational experiments.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Tropical neural networks and its applications to classifying phylogenetic trees
Authors:
Ruriko Yoshida,
Georgios Aliatimis,
Keiji Miura
Abstract:
Deep neural networks show great success when input vectors are in an Euclidean space. However, those classical neural networks show a poor performance when inputs are phylogenetic trees, which can be written as vectors in the tropical projective torus. Here we propose tropical embedding to transform a vector in the tropical projective torus to a vector in the Euclidean space via the tropical metri…
▽ More
Deep neural networks show great success when input vectors are in an Euclidean space. However, those classical neural networks show a poor performance when inputs are phylogenetic trees, which can be written as vectors in the tropical projective torus. Here we propose tropical embedding to transform a vector in the tropical projective torus to a vector in the Euclidean space via the tropical metric. We introduce a tropical neural network where the first layer is a tropical embedding layer and the following layers are the same as the classical ones. We prove that this neural network with the tropical metric is a universal approximator and we derive a backpropagation rule for deep neural networks. Then we provide TensorFlow 2 codes for implementing a tropical neural network in the same fashion as the classical one, where the weights initialization problem is considered according to the extreme value statistics. We apply our method to empirical data including sequences of hemagglutinin for influenza virus from New York. Finally we show that a tropical neural network can be interpreted as a generalization of a tropical logistic regression.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Tropical Geometric Tools for Machine Learning: the TML package
Authors:
David Barnhill,
Ruriko Yoshida,
Georgios Aliatimis,
Keiji Miura
Abstract:
In the last decade, developments in tropical geometry have provided a number of uses directly applicable to problems in statistical learning. The TML package is the first R package which contains a comprehensive set of tools and methods used for basic computations related to tropical convexity, visualization of tropically convex sets, as well as supervised and unsupervised learning models using th…
▽ More
In the last decade, developments in tropical geometry have provided a number of uses directly applicable to problems in statistical learning. The TML package is the first R package which contains a comprehensive set of tools and methods used for basic computations related to tropical convexity, visualization of tropically convex sets, as well as supervised and unsupervised learning models using the tropical metric under the max-plus algebra over the tropical projective torus. Primarily, the TML package employs a Hit and Run Markov chain Monte Carlo sampler in conjunction with the tropical metric as its main tool for statistical inference. In addition to basic computation and various applications of the tropical HAR sampler, we also focus on several supervised and unsupervised methods incorporated in the TML package including tropical principal component analysis, tropical logistic regression and tropical kernel density estimation.
△ Less
Submitted 24 September, 2023; v1 submitted 3 September, 2023;
originally announced September 2023.
-
Imputing phylogenetic trees using tropical polytopes over the space of phylogenetic trees
Authors:
Ruriko Yoshida
Abstract:
When we apply comparative phylogenetic analyses to genome data, it is a well-known problem and challenge that some of given species (or taxa) often have missing genes. In such a case, we have to impute a missing part of a gene tree from a sample of gene trees. In this short paper we propose a novel method to infer a missing part of a phylogenetic tree using an analogue of a classical linear regres…
▽ More
When we apply comparative phylogenetic analyses to genome data, it is a well-known problem and challenge that some of given species (or taxa) often have missing genes. In such a case, we have to impute a missing part of a gene tree from a sample of gene trees. In this short paper we propose a novel method to infer a missing part of a phylogenetic tree using an analogue of a classical linear regression in the setting of tropical geometry. In our approach, we consider a tropical polytope, a convex hull with respect to the tropical metric closest to the data points. We show a condition that we can guarantee that an estimated tree from our method has at most four Robinson-Foulds (RF) distance from the ground truth and computational experiments with simulated data show our method works well.
△ Less
Submitted 3 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Tropical Logistic Regression Model on Space of Phylogenetic Trees
Authors:
Georgios Aliatimis,
Ruriko Yoshida,
Burak Boyaci,
James A. Grant
Abstract:
Classification of gene trees is an important task both in the analysis of multi-locus phylogenetic data, and assessment of the convergence of Markov Chain Monte Carlo (MCMC) analyses used in Bayesian phylogenetic tree reconstruction. The logistic regression model is one of the most popular classification models in statistical learning, thanks to its computational speed and interpretability. Howeve…
▽ More
Classification of gene trees is an important task both in the analysis of multi-locus phylogenetic data, and assessment of the convergence of Markov Chain Monte Carlo (MCMC) analyses used in Bayesian phylogenetic tree reconstruction. The logistic regression model is one of the most popular classification models in statistical learning, thanks to its computational speed and interpretability. However, it is not appropriate to directly apply the standard logistic regression model to a set of phylogenetic trees, as the space of phylogenetic trees is non-Euclidean and thus contradicts the standard assumptions on covariates. It is well-known in tropical geometry and phylogenetics that the space of phylogenetic trees is a tropical linear space in terms of the max-plus algebra. Therefore, in this paper, we propose an analogue approach of the logistic regression model in the setting of tropical geometry. Our proposed method outperforms classical logistic regression in terms of Area under the ROC Curve (AUC) in numerical examples, including with data generated by the multi-species coalescent model. Theoretical properties such as statistical consistency have been proved and generalization error rates have been derived. Finally, our classification algorithm is proposed as an MCMC convergence criterion for Mr Bayes. Unlike the convergence metric used by MrBayes which is only dependent on tree topologies, our method is sensitive to branch lengths and therefore provides a more robust metric for convergence. In a test case, it is illustrated that the tropical logistic regression can differentiate between two independently run MCMC chains, even when the standard metric cannot.
△ Less
Submitted 7 June, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Shotgun crystal structure prediction using machine-learned formation energies
Authors:
Chang Liu,
Hiromasa Tamaki,
Tomoyasu Yokoyama,
Kensuke Wakasugi,
Satoshi Yotsuhashi,
Minoru Kusaba,
Ryo Yoshida
Abstract:
Stable or metastable crystal structures of assembled atoms can be predicted by finding the global or local minima of the energy surface defined on the space of the atomic configurations. Generally, this requires repeated first-principles energy calculations that are impractical for large systems, such as those containing more than 30 atoms in the unit cell. Here, we have made significant progress…
▽ More
Stable or metastable crystal structures of assembled atoms can be predicted by finding the global or local minima of the energy surface defined on the space of the atomic configurations. Generally, this requires repeated first-principles energy calculations that are impractical for large systems, such as those containing more than 30 atoms in the unit cell. Here, we have made significant progress in solving the crystal structure prediction problem with a simple but powerful machine-learning workflow; using a machine-learning surrogate for first-principles energy calculations, we performed non-iterative, single-shot screening using a large library of virtually created crystal structures. The present method relies on two key technical components: transfer learning, which enables a highly accurate energy prediction of pre-relaxed crystalline states given only a small set of training samples from first-principles calculations, and generative models to create promising and diverse crystal structures for screening. Here, first-principles calculations were performed only to generate the training samples, and for the optimization of a dozen or fewer finally narrowed-down crystal structures. Our shotgun method proved to be computationally less demanding compared to conventional methods, which heavily rely on iterations of first-principles calculations, and achieved an exceptional prediction accuracy, reaching 92.2% in a benchmark task involving the prediction of 90 different crystal structures.
△ Less
Submitted 27 March, 2024; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Maximum Inscribed and Minimum Enclosing Tropical Balls of Tropical Polytopes and Applications to Volume Estimation and Uniform Sampling
Authors:
David Barnhill,
Ruriko Yoshida,
Keiji Miura
Abstract:
We consider a minimum enclosing and maximum inscribed tropical balls for any given tropical polytope over the tropical projective torus in terms of the tropical metric with the max-plus algebra. We show that we can obtain such tropical balls via linear programming. Then we apply minimum enclosing and maximum inscribed tropical balls of any given tropical polytope to estimate the volume of and samp…
▽ More
We consider a minimum enclosing and maximum inscribed tropical balls for any given tropical polytope over the tropical projective torus in terms of the tropical metric with the max-plus algebra. We show that we can obtain such tropical balls via linear programming. Then we apply minimum enclosing and maximum inscribed tropical balls of any given tropical polytope to estimate the volume of and sample uniformly from the tropical polytope.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Composition, Attention, or Both?
Authors:
Ryo Yoshida,
Yohei Oseki
Abstract:
In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induc…
▽ More
In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition function allowed syntactic features, but not semantic features, to percolate into subtree representations.
△ Less
Submitted 10 May, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Thermodynamic approach for enhancing superconducting critical current performance
Authors:
Masashi Miura,
Go Tsuchiya,
Takumu Harada,
Keita Sakuma,
Hodaka Kurokawa,
Naoto Sekiya,
Yasuyuki Kato,
Ryuji Yoshida,
Takeharu Kato,
Koichi Nakaoka,
Teruo Izumi,
Fuyuki Nabeshima,
Atsutaka Maeda,
Tatsumori Okada,
Satoshi Awaji,
Leonardo Civale,
Boris Maiorov
Abstract:
The addition of artificial pinning centers has led to an impressive increase in critical current density ($J_{\rm c}$) in a superconductor, enabling record-breaking all-superconducting magnets and other applications. $J_{\rm c}$ has reached $\sim 0.2$-$0.3$ $J_{\rm d}$, where $J_{\rm d}$ is the depairing current density, and the numerical factor depends on the pinning optimization. By modifying…
▽ More
The addition of artificial pinning centers has led to an impressive increase in critical current density ($J_{\rm c}$) in a superconductor, enabling record-breaking all-superconducting magnets and other applications. $J_{\rm c}$ has reached $\sim 0.2$-$0.3$ $J_{\rm d}$, where $J_{\rm d}$ is the depairing current density, and the numerical factor depends on the pinning optimization. By modifying $λ$ and/or $ξ$, the penetration depth and coherence length, respectively, we can increase $J_{\rm d}$. For (Y$_{0.77}$Gd$_{0.23}$)Ba$_2$Cu$_3$O$_y$ ((Y,Gd)123) we achieve this by controlling the carrier density, which is related to $λ$ and $ξ$. We also tune $λ$ and $ξ$ by controlling the chemical pressure in the Fe-based superconductors, BaFe$_2$(As$_{1-x}$P$_x$)$_2$ films. The variation of $λ$ and $ξ$ leads to an intrinsic improvement of $J_{\rm c}$, via $J_{\rm d}$, obtaining extremely high values of $J_{\rm c}$ of $130$ MA/cm$^2$ and $8.0$ MA/cm$^2$ at $4.2$ K, consistent with an enhancement of $J_{\rm d}$ of a factor of $2$ for both incoherent nanoparticle-doped (Y,Gd)123 coated conductors (CCs) and BaFe$_2$(As$_{1-x}$P$_x$)$_2$ films, showing that this new material design is useful to achieving high critical current densities for a wide array of superconductors. The remarkably high vortex-pinning force in combination with this thermodynamic and pinning optimization route for the (Y,Gd)123 CCs reached $\sim 3.17$ TN/m$^3$ at $4.2$ K and 18 T (${\bf H}\parallel c$), the highest values ever reported in any superconductor.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Transfer learning with affine model transformation
Authors:
Shunya Minami,
Kenji Fukumizu,
Yoshihiro Hayashi,
Ryo Yoshida
Abstract:
Supervised transfer learning has received considerable attention due to its potential to boost the predictive power of machine learning in scenarios where data are scarce. Generally, a given set of source models and a dataset from a target domain are used to adapt the pre-trained models to a target domain by statistically learning domain shift and domain-specific factors. While such procedurally a…
▽ More
Supervised transfer learning has received considerable attention due to its potential to boost the predictive power of machine learning in scenarios where data are scarce. Generally, a given set of source models and a dataset from a target domain are used to adapt the pre-trained models to a target domain by statistically learning domain shift and domain-specific factors. While such procedurally and intuitively plausible methods have achieved great success in a wide range of real-world applications, the lack of a theoretical basis hinders further methodological development. This paper presents a general class of transfer learning regression called affine model transfer, following the principle of expected-square loss minimization. It is shown that the affine model transfer broadly encompasses various existing methods, including the most common procedure based on neural feature extractors. Furthermore, the current paper clarifies theoretical properties of the affine model transfer such as generalization error and excess risk. Through several case studies, we demonstrate the practical benefits of modeling and estimating inter-domain commonality and domain-specific factors separately with the affine-type transfer models.
△ Less
Submitted 19 January, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Hit and Run Sampling from Tropically Convex Sets
Authors:
Ruriko Yoshida,
Keiji Miura,
David Barnhill
Abstract:
In this paper we propose Hit and Run (HAR) sampling from a tropically convex set. The key ingredient of HAR sampling from a tropically convex set is sampling uniformly from a tropical line segment over the tropical projective torus, which runs linearly in its computational time complexity. We show that this HAR sampling method samples uniformly from a tropical polytope which is the smallest tropic…
▽ More
In this paper we propose Hit and Run (HAR) sampling from a tropically convex set. The key ingredient of HAR sampling from a tropically convex set is sampling uniformly from a tropical line segment over the tropical projective torus, which runs linearly in its computational time complexity. We show that this HAR sampling method samples uniformly from a tropical polytope which is the smallest tropical convex set of finitely many vertices. Finally, we apply this novel method to any given distribution using Metropolis-Hasting filtering over a tropical polytope.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Tropical Density Estimation of Phylogenetic Trees
Authors:
Ruriko Yoshida,
David Barnhill,
Keiji Miura,
Daniel Howe
Abstract:
Much evidence from biological theory and empirical data indicates that, gene tree, phylogenetic trees reconstructed from different genes (loci), do not have to have exactly the same tree topologies. Such incongruence between gene trees might be caused by some ``unusual'' evolutionary events, such as meiotic sexual recombination in eukaryotes or horizontal transfers of genetic material in prokaryot…
▽ More
Much evidence from biological theory and empirical data indicates that, gene tree, phylogenetic trees reconstructed from different genes (loci), do not have to have exactly the same tree topologies. Such incongruence between gene trees might be caused by some ``unusual'' evolutionary events, such as meiotic sexual recombination in eukaryotes or horizontal transfers of genetic material in prokaryotes. However, most of gene trees are constrained by the tree topology of its species tree, that is, the phylogenetic tree of a given species following their evolutionary history. In order to discover ``outlying'' gene trees which do not follow the ``main distribution(s)'' of trees, we propose to apply the ``tropical metric'' with the max-plus algebra from tropical geometry to a non-parametric estimation of gene trees over the space of phylogenetic trees. In this research we apply the ``tropical metric,'' a well-defined metric over the space of phylogenetic trees under the max-plus algebra, to non-parametric estimation of gene trees distribution over the tree space. Kernel density estimator (KDE) is one of the most popular non-parametric estimation of a distribution from a given sample, and we propose an analogue of the classical KDE in the setting of tropical geometry with the tropical metric which measures the length of an intrinsic geodesic between trees over the tree space. We estimate the probability of an observed tree by empirical frequencies of nearby trees, with the level of influence determined by the tropical metric. Then, with simulated data generated from the multispecies coalescent model, we show that the non-parametric estimation of gene tree distribution using the tropical metric performs better than one using the Billera-Holmes-Vogtmann (BHV) metric developed by Weyenberg et al. in terms of computational times and accuracy. We then apply it to Apicomplexa data.
△ Less
Submitted 11 July, 2023; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Connecting Tables with Allowing Negative Cell Counts
Authors:
Ruriko Yoshida,
David Barnhill
Abstract:
It is well-known that computing a Markov basis for a discrete loglinear model is very hard in general. Thus, we focus on connecting tables in a fiber via a subset of a Markov basis and in this paper, we consider connecting tables if we allow cell counts in each tale to be $-1$. In this paper we show that if a subset of a Markov basis connects all tables in the fiber which contains a table with all…
▽ More
It is well-known that computing a Markov basis for a discrete loglinear model is very hard in general. Thus, we focus on connecting tables in a fiber via a subset of a Markov basis and in this paper, we consider connecting tables if we allow cell counts in each tale to be $-1$. In this paper we show that if a subset of a Markov basis connects all tables in the fiber which contains a table with all ones, then moves in this subset connect tables in the fiber if we allow cell counts to be $-1$. In addition, we show that in some cases under the no-three-way interaction model, we can connect tables by all basic moves of $2 \times 2 \times 2$ minors with allowing $X_{ijk} \geq -1$. We then apply this Markov Chain Monte Carlo (MCMC) scheme to an empirical data on Naval officer and enlisted population. Our computational experiments show it works well and we end with the conjecture on the no-three-way interaction model.
△ Less
Submitted 23 January, 2023; v1 submitted 14 May, 2022;
originally announced May 2022.
-
Bayesian Sequential Stacking Algorithm for Concurrently Designing Molecules and Synthetic Reaction Networks
Authors:
Qi Zhang,
Chang Liu,
Stephen Wu,
Ryo Yoshida
Abstract:
In the last few years, de novo molecular design using machine learning has made great technical progress but its practical deployment has not been as successful. This is mostly owing to the cost and technical difficulty of synthesizing such computationally designed molecules. To overcome such barriers, various methods for synthetic route design using deep neural networks have been studied intensiv…
▽ More
In the last few years, de novo molecular design using machine learning has made great technical progress but its practical deployment has not been as successful. This is mostly owing to the cost and technical difficulty of synthesizing such computationally designed molecules. To overcome such barriers, various methods for synthetic route design using deep neural networks have been studied intensively in recent years. However, little progress has been made in designing molecules and their synthetic routes simultaneously. Here, we formulate the problem of simultaneously designing molecules with the desired set of properties and their synthetic routes within the framework of Bayesian inference. The design variables consist of a set of reactants in a reaction network and its network topology. The design space is extremely large because it consists of all combinations of purchasable reactants, often in the order of millions or more. In addition, the designed reaction networks can adopt any topology beyond simple multistep linear reaction routes. To solve this hard combinatorial problem, we present a powerful sequential Monte Carlo algorithm that recursively designs a synthetic reaction network by sequentially building up single-step reactions. In a case study of designing drug-like molecules based on commercially available compounds, compared with heuristic combinatorial search methods, the proposed method shows overwhelming performance in terms of computational efficiency and coverage and novelty with respect to existing compounds.
△ Less
Submitted 1 March, 2022;
originally announced April 2022.
-
RadonPy: Automated Physical Property Calculation using All-atom Classical Molecular Dynamics Simulations for Polymer Informatics
Authors:
Yoshihiro Hayashi,
Junichiro Shiomi,
Junko Morikawa,
Ryo Yoshida
Abstract:
The rapid growth of data-driven materials research has made it necessary to develop systematically designed, open databases of material properties. However, there are few open databases for polymeric materials compared to other material systems such as inorganic crystals. To this end, we developed RadonPy, the world-first open-source Python library for fully automated all-atom classical molecular…
▽ More
The rapid growth of data-driven materials research has made it necessary to develop systematically designed, open databases of material properties. However, there are few open databases for polymeric materials compared to other material systems such as inorganic crystals. To this end, we developed RadonPy, the world-first open-source Python library for fully automated all-atom classical molecular dynamics (MD) simulations. For a given polymer repeating unit, the entire process of molecular modeling, equilibrium and nonequilibrium MD calculations, and property calculations can be conducted fully automatically. In this study, 15 different properties, including the thermal conductivity, density, specific heat capacity, thermal expansion coefficients, and refractive index, were calculated for more than 1,000 unique amorphous polymers. The calculated properties were compared and validated systematically with experimental values from PoLyInfo. During the high-throughput data production, eight amorphous polymers with extremely high thermal conductivities, exceeding 0.4 W/mK, were identified, including six polymers with unreported thermal conductivities. These polymers were found to have a high density of hydrogen bonding units or rigid backbones. A decomposition analysis of the heat conduction, which is implemented in RadonPy, revealed the underlying mechanisms that yield a high thermal conductivity of the amorphous polymers: heat transfer via hydrogen bonds and dipole-dipole interactions between the polymer chains with their hydrogen bonding units or via the covalent bonds of the polymer backbone with high rigidity. The creation of massive amounts of computational property data using RadonPy will facilitate the development of polymer informatics, similar to how the emergence of the first-principles computational database for inorganic crystals had significantly advanced materials informatics.
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
Whitepaper submitted to Snowmass21: Advanced accelerator linear collider demonstration facility at intermediate energy
Authors:
C. Benedetti,
S. S. Bulanov,
E. Esarey,
C. G. R. Geddes A. J. Gonsalves,
P. M. Jacobs,
S. Knapen,
B. Nachman,
K. Nakamura,
S. Pagan Griso,
C. B. Schroeder,
D. Terzani,
J. van Tilborg,
M. Turner,
W. -M. Yao,
R. Bernstein,
V. Shiltsev,
S. J. Gessner,
M. J. Hogan,
T. Nelson,
C. **g,
I. Low,
X. Lu,
R. Yoshida,
C. Lee,
P. Meade
, et al. (8 additional authors not shown)
Abstract:
It is widely accepted that the next lepton collider beyond a Higgs factory would require center-of-mass energy of the order of up to 15 TeV. Since, given reasonable space and cost restrictions, conventional accelerator technology reaches its limits near this energy, high-gradient advanced acceleration concepts are attractive. Advanced and novel accelerators (ANAs) are leading candidates due to the…
▽ More
It is widely accepted that the next lepton collider beyond a Higgs factory would require center-of-mass energy of the order of up to 15 TeV. Since, given reasonable space and cost restrictions, conventional accelerator technology reaches its limits near this energy, high-gradient advanced acceleration concepts are attractive. Advanced and novel accelerators (ANAs) are leading candidates due to their ability to produce acceleration gradients on the order of 1--100~GV/m, leading to compact acceleration structures. Over the last 10-15 years significant progress has been achieved in accelerating electron beams by ANAs. For example, the demonstration of several-GeV electron beams from laser-powered capillary discharge waveguides, as well as the proof-of-principle coupling of two accelerating structures powered by different laser pulses, has increased interest in ANAs as a viable technology to be considered for a compact, TeV-class, lepton linear collider.
However, intermediate facilities are required to test the technology and demonstrate key subsystems. A 20-100 GeV center-of-mass energy ANA-based lepton collider can be a possible candidate for an intermediate facility. Apart from being a test beam facility for accelerator and detector studies, this collider will provide opportunities to study muon and proton beam acceleration, investigate charged particle interactions with extreme electromagnetic fields (relevant for beam delivery system designs and to study the physics at the interaction point), as well as precision Quantum Chromodynamics and Beyond the Standard Model physics measurements. Possible applications of this collider include the studies of $γγ$ and $e$-ion collider designs.
△ Less
Submitted 15 April, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Crystal structure prediction with machine learning-based element substitution
Authors:
Minoru Kusaba,
Chang Liu,
Ryo Yoshida
Abstract:
The prediction of energetically stable crystal structures formed by a given chemical composition is a central problem in solid-state physics. In principle, the crystalline state of assembled atoms can be determined by optimizing the energy surface, which in turn can be evaluated using first-principles calculations. However, performing the iterative gradient descent on the potential energy surface…
▽ More
The prediction of energetically stable crystal structures formed by a given chemical composition is a central problem in solid-state physics. In principle, the crystalline state of assembled atoms can be determined by optimizing the energy surface, which in turn can be evaluated using first-principles calculations. However, performing the iterative gradient descent on the potential energy surface using first-principles calculations is prohibitively expensive for complex systems, such as those with many atoms per unit cell. Here, we present a unique methodology for crystal structure prediction (CSP) that relies on a machine learning algorithm called metric learning. It is shown that a binary classifier, trained on a large number of already identified crystal structures, can determine the isomorphism of crystal structures formed by two given chemical compositions with an accuracy of approximately 96.4\%. For a given query composition with an unknown crystal structure, the model is used to automatically select from a crystal structure database a set of template crystals with nearly identical stable structures to which element substitution is to be applied. Apart from the local relaxation calculation of the identified templates, the proposed method does not use ab initio calculations. The potential of this substation-based CSP is demonstrated for a wide variety of crystal systems.
△ Less
Submitted 31 May, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Plücker Coordinates of the best-fit Stiefel Tropical Linear Space to a Mixture of Gaussian Distributions
Authors:
Keiji Miura,
Ruriko Yoshida
Abstract:
In this research, we investigate a tropical principal component analysis (PCA) as a best-fit Stiefel tropical linear space to a given sample over the tropical projective torus for its dimensionality reduction and visualization. Especially, we characterize the best-fit Stiefel tropical linear space to a sample generated from a mixture of Gaussian distributions as the variances of the Gaussians go t…
▽ More
In this research, we investigate a tropical principal component analysis (PCA) as a best-fit Stiefel tropical linear space to a given sample over the tropical projective torus for its dimensionality reduction and visualization. Especially, we characterize the best-fit Stiefel tropical linear space to a sample generated from a mixture of Gaussian distributions as the variances of the Gaussians go to zero. For a single Gaussian distribution, we show that the sum of residuals in terms of the tropical metric with the max-plus algebra over a given sample to a fitted Stiefel tropical linear space converges to zero by giving an upper bound for its convergence rate. Meanwhile, for a mixtures of Gaussian distribution, we show that the best-fit tropical linear space can be determined uniquely when we send variances to zero. We briefly consider the best-fit topical polynomial as an extension for the mixture of more than two Gaussians over the tropical projective space of dimension three. We show some geometric properties of these tropical linear spaces and polynomials.
△ Less
Submitted 22 January, 2023; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Solving reward-collecting problems with UAVs: a comparison of online optimization and Q-learning
Authors:
Yixuan Liu,
Chrysafis Vogiatzis,
Ruriko Yoshida,
Erich Morman
Abstract:
Uncrewed autonomous vehicles (UAVs) have made significant contributions to reconnaissance and surveillance missions in past US military campaigns. As the prevalence of UAVs increases, there has also been improvements in counter-UAV technology that makes it difficult for them to successfully obtain valuable intelligence within an area of interest. Hence, it has become important that modern UAVs can…
▽ More
Uncrewed autonomous vehicles (UAVs) have made significant contributions to reconnaissance and surveillance missions in past US military campaigns. As the prevalence of UAVs increases, there has also been improvements in counter-UAV technology that makes it difficult for them to successfully obtain valuable intelligence within an area of interest. Hence, it has become important that modern UAVs can accomplish their missions while maximizing their chances of survival. In this work, we specifically study the problem of identifying a short path from a designated start to a goal, while collecting all rewards and avoiding adversaries that move randomly on the grid. We also provide a possible application of the framework in a military setting, that of autonomous casualty evacuation. We present a comparison of three methods to solve this problem: namely we implement a Deep Q-Learning model, an $\varepsilon$-greedy tabular Q-Learning model, and an online optimization framework. Our computational experiments, designed using simple grid-world environments with random adversaries showcase how these approaches work and compare them in terms of performance, accuracy, and computational time.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
SARS-CoV-2 Dissemination using a Network of the United States Counties
Authors:
Patrick Urrutia,
David Wren,
Chrysafis Vogiatzis,
Ruriko Yoshida
Abstract:
During 2020 and 2021, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission has been increasing amongst the world's population at an alarming rate. Reducing the spread of SARS-CoV-2 and other diseases that are spread in similar manners is paramount for public health officials as they seek to effectively manage resources and potential population control measures such as social d…
▽ More
During 2020 and 2021, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission has been increasing amongst the world's population at an alarming rate. Reducing the spread of SARS-CoV-2 and other diseases that are spread in similar manners is paramount for public health officials as they seek to effectively manage resources and potential population control measures such as social distancing and quarantines. By analyzing the United States' county network structure, one can model and interdict potential higher infection areas. County officials can provide targeted information, preparedness training, as well as increase testing in these areas. While these approaches may provide adequate countermeasures for localized areas, they are inadequate for the holistic United States. We solve this problem by collecting coronavirus disease 2019 (COVID-19) infections and deaths from the Center for Disease Control and Prevention{\color{black},} and {\color{black} adjacency between all counties obtained} from the United States Census Bureau. Generalized network autoregressive (GNAR) time series models have been proposed as an efficient learning algorithm for networked datasets. This work fuses network science and operations research techniques to univariately model COVID-19 cases, deaths, and current survivors across the United States' county network structure.
△ Less
Submitted 20 March, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Descriptors of intrinsic hydrodynamic thermal transport: screening a phonon database in a machine learning approach
Authors:
Pol Torres,
Stephen Wu,
Shenghong Ju,
Chang Liu,
Terumasa Tadano,
Ryo Yoshida,
Junichiro Shiomi
Abstract:
Machine learning techniques are used to explore the intrinsic origins of the hydrodynamic thermal transport and to find new materials interesting for science and engineering. The hydrodynamic thermal transport is governed intrinsically by the hydrodynamic scale and the thermal conductivity. The correlations between these intrinsic properties and harmonic and anharmonic properties, and a large numb…
▽ More
Machine learning techniques are used to explore the intrinsic origins of the hydrodynamic thermal transport and to find new materials interesting for science and engineering. The hydrodynamic thermal transport is governed intrinsically by the hydrodynamic scale and the thermal conductivity. The correlations between these intrinsic properties and harmonic and anharmonic properties, and a large number of compositional (290) and structural (1224) descriptors of 131 crystal compound materials are obtained, revealing some of the key descriptors that determines the magnitude of the intrinsic hydrodynamic effects, most of them related with the phonon relaxation times. Then, a trained black-box model is applied to screen more than 5000 materials. The results identify materials with potential technological applications. Understanding the properties correlated to hydrodynamic thermal transport can help to find new thermoelectric materials and on the design of new materials to ease the heat dissipation in electronic devices.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Modeling Human Sentence Processing with Left-Corner Recurrent Neural Network Grammars
Authors:
Ryo Yoshida,
Hiroshi Noji,
Yohei Oseki
Abstract:
In computational linguistics, it has been shown that hierarchical structures make language models (LMs) more human-like. However, the previous literature has been agnostic about a parsing strategy of the hierarchical models. In this paper, we investigated whether hierarchical structures make LMs more human-like, and if so, which parsing strategy is most cognitively plausible. In order to address t…
▽ More
In computational linguistics, it has been shown that hierarchical structures make language models (LMs) more human-like. However, the previous literature has been agnostic about a parsing strategy of the hierarchical models. In this paper, we investigated whether hierarchical structures make LMs more human-like, and if so, which parsing strategy is most cognitively plausible. In order to address this question, we evaluated three LMs against human reading times in Japanese with head-final left-branching structures: Long Short-Term Memory (LSTM) as a sequential model and Recurrent Neural Network Grammars (RNNGs) with top-down and left-corner parsing strategies as hierarchical models. Our computational modeling demonstrated that left-corner RNNGs outperformed top-down RNNGs and LSTM, suggesting that hierarchical and left-corner architectures are more cognitively plausible than top-down or sequential architectures. In addition, the relationships between the cognitive plausibility and (i) perplexity, (ii) parsing, and (iii) beam size will also be discussed.
△ Less
Submitted 5 October, 2023; v1 submitted 10 September, 2021;
originally announced September 2021.
-
Machine Learning-Assisted Exploration of Thermally Conductive Polymers Based on High-Throughput Molecular Dynamics Simulations
Authors:
Ruimin Ma,
Hanfeng Zhang,
Jiaxin Xu,
Yoshihiro Hayashi,
Ryo Yoshida,
Junichiro Shiomi,
Tengfei Luo
Abstract:
Finding amorphous polymers with higher thermal conductivity is important, as they are ubiquitous in heat transfer applications. With recent progress in material informatics, machine learning approaches have been increasingly adopted for finding or designing materials with desired properties. However, relatively limited effort has been put into finding thermally conductive polymers using machine le…
▽ More
Finding amorphous polymers with higher thermal conductivity is important, as they are ubiquitous in heat transfer applications. With recent progress in material informatics, machine learning approaches have been increasingly adopted for finding or designing materials with desired properties. However, relatively limited effort has been put into finding thermally conductive polymers using machine learning, mainly due to the lack of polymer thermal conductivity databases with reasonable data volume. In this work, we combine high-throughput molecular dynamics (MD) simulations and machine learning to explore polymers with relatively high thermal conductivity (> 0.300 W/m-K). We first randomly select 365 polymers from the existing PolyInfo database and calculate their thermal conductivity using MD simulations. The data are then employed to train a machine learning regression model to quantify the structure-thermal conductivity relation, which is further leveraged to screen polymer candidates in the PolyInfo database with thermal conductivity > 0.300 W/m-K. 133 polymers with MD-calculated thermal conductivity above this threshold are eventually identified. Polymers with a wide range of thermal conductivity values are selected for re-calculation under different simulation conditions, and those polymers found with thermal conductivity above 0.300 W/m-K are mostly calculated to maintain values above this threshold despite fluctuation in the exact values. A classification model is also constructed, and similar results were obtained compared to the regression model in predicting polymers with thermal conductivity above or below 0.300 W/m-K. The strategy and results from this work may contribute to automating the design of polymers with high thermal conductivity.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Lower Perplexity is Not Always Human-Like
Authors:
Tatsuki Kuribayashi,
Yohei Oseki,
Takumi Ito,
Ryo Yoshida,
Masayuki Asahara,
Kentaro Inui
Abstract:
In computational psycholinguistics, various language models have been evaluated against human reading behavior (e.g., eye movement) to build human-like computational models. However, most previous efforts have focused almost exclusively on English, despite the recent trend towards linguistic universal within the general community. In order to fill the gap, this paper investigates whether the estab…
▽ More
In computational psycholinguistics, various language models have been evaluated against human reading behavior (e.g., eye movement) to build human-like computational models. However, most previous efforts have focused almost exclusively on English, despite the recent trend towards linguistic universal within the general community. In order to fill the gap, this paper investigates whether the established results in computational psycholinguistics can be generalized across languages. Specifically, we re-examine an established generalization -- the lower perplexity a language model has, the more human-like the language model is -- in Japanese with typologically different structures from English. Our experiments demonstrate that this established generalization exhibits a surprising lack of universality; namely, lower perplexity is not always human-like. Moreover, this discrepancy between English and Japanese is further explored from the perspective of (non-)uniform information density. Overall, our results suggest that a cross-lingual evaluation will be necessary to construct human-like computational models.
△ Less
Submitted 1 November, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Designing high-performance superconductors with nanoparticle inclusions: comparisons to strong pinning theory
Authors:
Sarah C. Jones,
Masashi Miura,
Ryuji Yoshida,
Takeharu Kato,
Leonardo Civale,
Roland Willa,
Serena Eley
Abstract:
One of the most promising routes for achieving unprecedentedly high critical currents in superconductors is to incorporate dispersed, non-superconducting nanoparticles to control the dissipative motion of vortices. However, these inclusions reduce the overall superconducting volume and can strain the interlaying superconducting matrix, which can detrimentally reduce $T_c$. Consequently, an optimal…
▽ More
One of the most promising routes for achieving unprecedentedly high critical currents in superconductors is to incorporate dispersed, non-superconducting nanoparticles to control the dissipative motion of vortices. However, these inclusions reduce the overall superconducting volume and can strain the interlaying superconducting matrix, which can detrimentally reduce $T_c$. Consequently, an optimal balance must be achieved between the nanoparticle density $n_p$ and size $d$. Determining this balance requires garnering a better understanding of vortex-nanoparticle interactions, described by strong pinning theory. Here, we map the dependence of the critical current on nanoparticle size and density in (Y$_{0.77}$,Gd$_{0.23}$)Ba$_2$Cu$_3$O$_{7-δ}$ films in magnetic fields up to 35 T, and compare the trends to recent results from time-dependent Ginzburg-Landau simulations. We identify consistencies between the field-dependent critical current $J_c(B)$ and expectations from strong pinning theory. Specifically, we find that that $J_c \propto B^{-α}$, where $α$ decreases from $0.66$ to $0.2$ with increasing density of nanoparticles and increases roughly linearly with nanoparticle size $d/ξ$ (normalized to the coherence length). At high fields, the critical current decays faster ($\sim B^{-1}$), suggestive that each nanoparticle has captured a vortex. When nanoparticles capture more than one vortex, a small, high-field peak is expected in $J_c(B)$. Due to a spread in defect sizes, this novel peak effect remains unresolved here. Lastly, we reveal that the dependence of the vortex creep rate $S$ on nanoparticle size and density roughly mirrors that of $α$, and compare our results to low-$T$ nonlinearities in $S(T)$ that are predicted by strong pinning theory.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Tree Topologies along a Tropical Line Segment
Authors:
Ruriko Yoshida,
Shelby Cox
Abstract:
Tropical geometry with the max-plus algebra has been applied to statistical learning models over tree spaces because geometry with the tropical metric over tree spaces has some nice properties such as convexity in terms of the tropical metric. One of the challenges in applications of tropical geometry to tree spaces is the difficulty interpreting outcomes of statistical models with the tropical me…
▽ More
Tropical geometry with the max-plus algebra has been applied to statistical learning models over tree spaces because geometry with the tropical metric over tree spaces has some nice properties such as convexity in terms of the tropical metric. One of the challenges in applications of tropical geometry to tree spaces is the difficulty interpreting outcomes of statistical models with the tropical metric. This paper focuses on combinatorics of tree topologies along a tropical line segment, an intrinsic geodesic with the tropical metric, between two phylogenetic trees over the tree space and we show some properties of a tropical line segment between two trees. Specifically we show that a probability of a tropical line segment of two randomly chosen trees going through the origin (the star tree) is zero if the number of leave is greater than four, and we also show that if two given trees differ only one nearest neighbor interchange (NNI) move, then the tree topology of a tree in the tropical line segment between them is the same tree topology of one of these given two trees with possible zero branch lengths.
△ Less
Submitted 30 October, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Snowy Night-to-Day Translator and Semantic Segmentation Label Similarity for Snow Hazard Indicator
Authors:
Takato Yasuno,
Hiroaki Sugawara,
Junichiro Fujii,
Ryuto Yoshida
Abstract:
In 2021, Japan recorded more than three times as much snowfall as usual, so road user maybe come across dangerous situation. The poor visibility caused by snow triggers traffic accidents. For example, 2021 January 19, due to the dry snow and the strong wind speed of 27 m / s, blizzards occurred and the outlook has been ineffective. Because of the whiteout phenomenon, multiple accidents with 17 cas…
▽ More
In 2021, Japan recorded more than three times as much snowfall as usual, so road user maybe come across dangerous situation. The poor visibility caused by snow triggers traffic accidents. For example, 2021 January 19, due to the dry snow and the strong wind speed of 27 m / s, blizzards occurred and the outlook has been ineffective. Because of the whiteout phenomenon, multiple accidents with 17 casualties occurred, and 134 vehicles were stacked up for 10 hours over 1 km. At the night time zone, the temperature drops and the road surface tends to freeze. CCTV images on the road surface have the advantage that we enable to monitor the status of major points at the same time. Road managers are required to make decisions on road closures and snow removal work owing to the road surface conditions even at night. In parallel, they would provide road users to alert for hazardous road surfaces. This paper propose a method to automate a snow hazard indicator that the road surface region is generated from the night snow image using the Conditional GAN, pix2pix. In addition, the road surface and the snow covered ROI are predicted using the semantic segmentation DeepLabv3+ with a backbone MobileNet, and the snow hazard indicator to automatically compute how much the night road surface is covered with snow. We demonstrate several results applied to the cold and snow region in the winter of Japan January 19 to 21 2021, and mention the usefulness of high similarity between snowy night-to-day fake output and real snowy day image for night snow visibility.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Tropical Support Vector Machines: Evaluations and Extension to Function Spaces
Authors:
Ruriko Yoshida,
Misaki Takamori,
Hideyuki Matsumoto,
Keiji Miura
Abstract:
Support Vector Machines (SVMs) are one of the most popular supervised learning models to classify using a hyperplane in an Euclidean space. Similar to SVMs, tropical SVMs classify data points using a tropical hyperplane under the tropical metric with the max-plus algebra. In this paper, first we show generalization error bounds of tropical SVMs over the tropical projective torus. While the general…
▽ More
Support Vector Machines (SVMs) are one of the most popular supervised learning models to classify using a hyperplane in an Euclidean space. Similar to SVMs, tropical SVMs classify data points using a tropical hyperplane under the tropical metric with the max-plus algebra. In this paper, first we show generalization error bounds of tropical SVMs over the tropical projective torus. While the generalization error bounds attained via Vapnik-Chervonenkis (VC) dimensions in a distribution-free manner still depend on the dimension, we also show numerically and theoretically by extreme value statistics that the tropical SVMs for classifying data points from two Gaussian distributions as well as empirical data sets of different neuron types are fairly robust against the curse of dimensionality. Extreme value statistics also underlie the anomalous scaling behaviors of the tropical distance between random vectors with additional noise dimensions. Finally, we define tropical SVMs over a function space with the tropical metric.
△ Less
Submitted 4 October, 2022; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Potentials and challenges of polymer informatics: exploiting machine learning for polymer design
Authors:
Stephen Wu,
Hironao Yamada,
Yoshihiro Hayashi,
Massimiliano Zamengo,
Ryo Yoshida
Abstract:
There has been rapidly growing demand of polymeric materials coming from different aspects of modern life because of the highly diverse physical and chemical properties of polymers. Polymer informatics is an interdisciplinary research field of polymer science, computer science, information science and machine learning that serves as a platform to exploit existing polymer data for efficient design…
▽ More
There has been rapidly growing demand of polymeric materials coming from different aspects of modern life because of the highly diverse physical and chemical properties of polymers. Polymer informatics is an interdisciplinary research field of polymer science, computer science, information science and machine learning that serves as a platform to exploit existing polymer data for efficient design of functional polymers. Despite many potential benefits of employing a data-driven approach to polymer design, there has been notable challenges of the development of polymer informatics attributed to the complex hierarchical structures of polymers, such as the lack of open databases and unified structural representation. In this study, we review and discuss the applications of machine learning on different aspects of the polymer design process through four perspectives: polymer databases, representation (descriptor) of polymers, predictive models for polymer properties, and polymer design strategy. We hope that this paper can serve as an entry point for researchers interested in the field of polymer informatics.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Tropical Geometric Variation of Phylogenetic Tree Shapes
Authors:
Bo Lin,
Anthea Monod,
Ruriko Yoshida
Abstract:
We study the behavior of phylogenetic tree shapes in the tropical geometric interpretation of tree space. Tree shapes are formally referred to as tree topologies; a tree topology can also be thought of as a tree combinatorial type, which is given by the tree's branching configuration and leaf labeling. We use the tropical line segment as a framework to define notions of variance as well as invaria…
▽ More
We study the behavior of phylogenetic tree shapes in the tropical geometric interpretation of tree space. Tree shapes are formally referred to as tree topologies; a tree topology can also be thought of as a tree combinatorial type, which is given by the tree's branching configuration and leaf labeling. We use the tropical line segment as a framework to define notions of variance as well as invariance of tree topologies: we provide a combinatorial search theorem that describes all tree topologies occurring along a tropical line segment, as well as a setting under which tree topologies do not change along a tropical line segment. Our study is motivated by comparison to the moduli space endowed with a geodesic metric proposed by Billera, Holmes, and Vogtmann (referred to as BHV space); we consider the tropical geometric setting as an alternative framework to BHV space for sets of phylogenetic trees. We give an algorithm to compute tropical line segments which is lower in computational complexity than the fastest method currently available for BHV geodesics and show that its trajectory behaves more subtly: while the BHV geodesic traverses the origin for vastly different tree topologies, the tropical line segment bypasses it.
△ Less
Submitted 19 February, 2022; v1 submitted 10 October, 2020;
originally announced October 2020.
-
A General Class of Transfer Learning Regression without Implementation Cost
Authors:
Shunya Minami,
Song Liu,
Stephen Wu,
Kenji Fukumizu,
Ryo Yoshida
Abstract:
We propose a novel framework that unifies and extends existing methods of transfer learning (TL) for regression. To bridge a pretrained source model to the model on a target task, we introduce a density-ratio reweighting function, which is estimated through the Bayesian framework with a specific prior distribution. By changing two intrinsic hyperparameters and the choice of the density-ratio model…
▽ More
We propose a novel framework that unifies and extends existing methods of transfer learning (TL) for regression. To bridge a pretrained source model to the model on a target task, we introduce a density-ratio reweighting function, which is estimated through the Bayesian framework with a specific prior distribution. By changing two intrinsic hyperparameters and the choice of the density-ratio model, the proposed method can integrate three popular methods of TL: TL based on cross-domain similarity regularization, a probabilistic TL using the density-ratio estimation, and fine-tuning of pretrained neural networks. Moreover, the proposed method can benefit from its simple implementation without any additional cost; the regression model can be fully trained using off-the-shelf libraries for supervised learning in which the original output variable is simply transformed to a new output variable. We demonstrate its simplicity, generality, and applicability using various real data applications.
△ Less
Submitted 16 December, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Tropical Data Science
Authors:
Ruriko Yoshida
Abstract:
Phylogenomics is a new field which applies to tools in phylogenetics to genome data. Due to a new technology and increasing amount of data, we face new challenges to analyze them over a space of phylogenetic trees. Because a space of phylogenetic trees with a fixed set of labels on leaves is not Euclidean, we cannot simply apply tools in data science. In this paper we survey some new developments…
▽ More
Phylogenomics is a new field which applies to tools in phylogenetics to genome data. Due to a new technology and increasing amount of data, we face new challenges to analyze them over a space of phylogenetic trees. Because a space of phylogenetic trees with a fixed set of labels on leaves is not Euclidean, we cannot simply apply tools in data science. In this paper we survey some new developments of machine learning models using tropical geometry to analyze a set of phylogenetic trees over a tree space.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
A Bayesian algorithm for retrosynthesis
Authors:
Zhongliang Guo,
Stephen Wu,
Mitsuru Ohno,
Ryo Yoshida
Abstract:
The identification of synthetic routes that end with a desired product has been an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited fraction of the entire reaction space. At present, emerging machine-learning technologies are overturning the process of retrosynthetic planning. The objective of this study is to discover synthetic routes backwardly…
▽ More
The identification of synthetic routes that end with a desired product has been an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited fraction of the entire reaction space. At present, emerging machine-learning technologies are overturning the process of retrosynthetic planning. The objective of this study is to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of two steps: a deep neural network is trained that forwardly predicts a product of the given reactants with a high level of accuracy, following which this forward model is inverted into the backward one via Bayes' law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. The Bayesian retrosynthesis algorithm could successfully rediscover 80.3% and 50.0% of known synthetic routes of single-step and two-step reactions within top-10 accuracy, respectively, thereby outperforming state-of-the-art algorithms in terms of the overall accuracy. Remarkably, the Monte Carlo method, which was specifically designed for the presence of diverse multiple routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We investigated the potential applicability of such diverse candidates based on expert knowledge from synthetic organic chemistry.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
Tropical Support Vector Machine and its Applications to Phylogenomics
Authors:
Xiaoxian Tang,
Houjie Wang,
Ruriko Yoshida
Abstract:
Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmanni…
▽ More
Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we can not directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of max-plus algebra. Therefore, to classify multi-locus data sets for phylogenetic analysis, we propose tropical support vector machines (SVMs). Like classical SVMs, a tropical SVM is a discriminative classifier defined by the tropical hyperplane which maximizes the minimum tropical distance from data points to itself in order to separate these data points into sectors (half-spaces) in the tropical projective torus. Both hard margin tropical SVMs and soft margin tropical SVMs can be formulated as linear programming problems. We focus on classifying two categories of data, and we study a simpler case by assuming the data points from the same category ideally stay in the same sector of a tropical separating hyperplane. For hard margin tropical SVMs, we prove the necessary and sufficient conditions for two categories of data points to be separated, and we show an explicit formula for the optimal value of the feasible linear programming problem. For soft margin tropical SVMs, we develop novel methods to compute an optimal tropical separating hyperplane. Computational experiments show our methods work well. We end this paper with open problems.
△ Less
Submitted 24 March, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Recreation of the Periodic Table with an Unsupervised Machine Learning Algorithm
Authors:
Minoru Kusaba,
Chang Liu,
Yukinori Koyama,
Kiyoyuki Terakura,
Ryo Yoshida
Abstract:
In 1869, the first draft of the periodic table was published by Russian chemist Dmitri Mendeleev. In terms of data science, his achievement can be viewed as a successful example of feature embedding based on human cognition: chemical properties of all known elements at that time were compressed onto the two-dimensional grid system for tabular display. In this study, we seek to answer the question…
▽ More
In 1869, the first draft of the periodic table was published by Russian chemist Dmitri Mendeleev. In terms of data science, his achievement can be viewed as a successful example of feature embedding based on human cognition: chemical properties of all known elements at that time were compressed onto the two-dimensional grid system for tabular display. In this study, we seek to answer the question of whether machine learning can reproduce or recreate the periodic table by using observed physicochemical properties of the elements. To achieve this goal, we developed a periodic table generator (PTG). The PTG is an unsupervised machine learning algorithm based on the generative topographic map** (GTM), which can automate the translation of high-dimensional data into a tabular form with varying layouts on-demand. The PTG autonomously produced various arrangements of chemical symbols, which organized a two-dimensional array such as Mendeleev's periodic table or three-dimensional spiral table according to the underlying periodicity in the given data. We further showed what the PTG learned from the element data and how the element features, such as melting point and electronegativity, are compressed to the lower-dimensional latent spaces.
△ Less
Submitted 28 February, 2021; v1 submitted 23 December, 2019;
originally announced December 2019.
-
Tropical principal component analysis on the space of ultrametrics
Authors:
Robert Page,
Leon Zhang,
Ruriko Yoshida
Abstract:
In 2019, Yoshida et al. introduced a notion of tropical principal component analysis (PCA). The output is a tropical polytope with a fixed number of vertices that best fits the data. We here apply tropical PCA to dimension reduction and visualization of data sampled from the space of phylogenetic trees. Our main results are twofold: the existence of a tropical cell decomposition into regions of fi…
▽ More
In 2019, Yoshida et al. introduced a notion of tropical principal component analysis (PCA). The output is a tropical polytope with a fixed number of vertices that best fits the data. We here apply tropical PCA to dimension reduction and visualization of data sampled from the space of phylogenetic trees. Our main results are twofold: the existence of a tropical cell decomposition into regions of fixed tree topology and the development of a stochastic optimization method to estimate the tropical PCA using a Markov Chain Monte Carlo (MCMC) approach. This method performs well with simulation studies, and it is applied to three empirical datasets: Apicomplexa and African coelacanth genomes as well as sequences of hemagglutinin for influenza from New York.
△ Less
Submitted 24 November, 2019;
originally announced November 2019.
-
Exploring diamond-like lattice thermal conductivity crystals via feature-based transfer learning
Authors:
Shenghong Ju,
Ryo Yoshida,
Chang Liu,
Kenta Hongo,
Terumasa Tadano,
Junichiro Shiomi
Abstract:
Ultrahigh lattice thermal conductivity materials hold great importance since they play a critical role in the thermal management of electronic and optical devices. Models using machine learning can search for materials with outstanding higher-order properties like thermal conductivity. However, the lack of sufficient data to train a model is a serious hurdle. Herein we show that big data can compl…
▽ More
Ultrahigh lattice thermal conductivity materials hold great importance since they play a critical role in the thermal management of electronic and optical devices. Models using machine learning can search for materials with outstanding higher-order properties like thermal conductivity. However, the lack of sufficient data to train a model is a serious hurdle. Herein we show that big data can complement small data for accurate predictions when lower-order feature properties available in big data are selected properly and applied to transfer learning. The connection between the crystal information and thermal conductivity is directly built with a neural network by transferring descriptors acquired through a pre-trained model for the feature property. Successful transfer learning shows the ability of extrapolative prediction and reveals descriptors for lattice anharmonicity. Transfer learning is employed to screen over 60000 compounds to identify novel crystals that can serve as alternatives to diamond.
△ Less
Submitted 24 September, 2019;
originally announced September 2019.
-
Pion and Kaon Structure at the Electron-Ion Collider
Authors:
Arlene C. Aguilar,
Zafir Ahmed,
Christine Aidala,
Salina Ali,
Vincent Andrieux,
John Arrington,
Adnan Bashir,
Vladimir Berdnikov,
Daniele Binosi,
Lei Chang,
Chen Chen,
Muyang Chen,
João Pacheco B. C. de Melo,
Markus Diefenthaler,
Minghui Ding,
Rolf Ent,
Tobias Frederico,
Fei Gao,
Ralf W. Gothe,
Mohammad Hattawy,
Timothy J. Hobbs,
Tanja Horn,
Garth M. Huber,
Shaoyang Jia,
Cynthia Keppel
, et al. (26 additional authors not shown)
Abstract:
Understanding the origin and dynamics of hadron structure and in turn that of atomic nuclei is a central goal of nuclear physics. This challenge entails the questions of how does the roughly 1 GeV mass-scale that characterizes atomic nuclei appear; why does it have the observed value; and, enigmatically, why are the composite Nambu-Goldstone (NG) bosons in quantum chromodynamics (QCD) abnormally l…
▽ More
Understanding the origin and dynamics of hadron structure and in turn that of atomic nuclei is a central goal of nuclear physics. This challenge entails the questions of how does the roughly 1 GeV mass-scale that characterizes atomic nuclei appear; why does it have the observed value; and, enigmatically, why are the composite Nambu-Goldstone (NG) bosons in quantum chromodynamics (QCD) abnormally light in comparison? In this perspective, we provide an analysis of the mass budget of the pion and proton in QCD; discuss the special role of the kaon, which lies near the boundary between dominance of strong and Higgs mass-generation mechanisms; and explain the need for a coherent effort in QCD phenomenology and continuum calculations, in exa-scale computing as provided by lattice QCD, and in experiments to make progress in understanding the origins of hadron masses and the distribution of that mass within them. We compare the unique capabilities foreseen at the electron-ion collider (EIC) with those at the hadron-electron ring accelerator (HERA), the only previous electron-proton collider; and describe five key experimental measurements, enabled by the EIC and aimed at delivering fundamental insights that will generate concrete answers to the questions of how mass and structure arise in the pion and kaon, the Standard Model's NG modes, whose surprisingly low mass is critical to the evolution of our Universe.
△ Less
Submitted 16 September, 2019; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
Authors:
Anthea Monod,
Bo Lin,
Ruriko Yoshida,
Qiwen Kang
Abstract:
Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. They are also objects of interest in pure mathematics, such as algebraic geometry and combinatorics, due to their discrete geometry. Although they are important data structures, they face the significant challenge that sets of trees form a non-Euclidean phylogenetic tree space, which means that…
▽ More
Phylogenetic trees are the fundamental mathematical representation of evolutionary processes in biology. They are also objects of interest in pure mathematics, such as algebraic geometry and combinatorics, due to their discrete geometry. Although they are important data structures, they face the significant challenge that sets of trees form a non-Euclidean phylogenetic tree space, which means that standard computational and statistical methods cannot be directly applied. In this work, we explore the statistical feasibility of a pure mathematical representation of the set of all phylogenetic trees based on tropical geometry for both descriptive and inferential statistics, and unsupervised and supervised machine learning. Our exploration is both theoretical and practical. We show that the tropical geometric phylogenetic tree space endowed with a generalized Hilbert projective metric exhibits analytic, geometric, and topological properties that are desirable for theoretical studies in probability and statistics and allow for well-defined questions to be posed. We illustrate the statistical feasibility of the tropical geometric perspective for phylogenetic trees with an example of both a descriptive and inferential statistical task. Moreover, this approach exhibits increased computational efficiency and statistical performance over the current state-of-the-art, which we illustrate with a real data example on seasonal influenza. Our results demonstrate the viability of the tropical geometric setting for parametric statistical and probabilistic studies of sets of phylogenetic trees.
△ Less
Submitted 29 June, 2022; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Tropical Principal Component Analysis and its Application to Phylogenetics
Authors:
Ruriko Yoshida,
Leon Zhang,
Xu Zhang
Abstract:
Principal component analysis is a widely-used method for the dimensionality reduction of a given data set in a high-dimensional Euclidean space. Here we define and analyze two analogues of principal component analysis in the setting of tropical geometry. In one approach, we study the Stiefel tropical linear space of fixed dimension closest to the data points in the tropical projective torus; in th…
▽ More
Principal component analysis is a widely-used method for the dimensionality reduction of a given data set in a high-dimensional Euclidean space. Here we define and analyze two analogues of principal component analysis in the setting of tropical geometry. In one approach, we study the Stiefel tropical linear space of fixed dimension closest to the data points in the tropical projective torus; in the other approach, we consider the tropical polytope with a fixed number of vertices closest to the data points. We then give approximative algorithms for both approaches and apply them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.
△ Less
Submitted 14 October, 2017; v1 submitted 7 October, 2017;
originally announced October 2017.
-
Principal component analysis and the locus of the Frechet mean in the space of phylogenetic trees
Authors:
Tom M. W. Nye,
Xiaoxian Tang,
Grady Weyenberg,
Ruriko Yoshida
Abstract:
Most biological data are multidimensional, posing a major challenge to human comprehension and computational analysis. Principal component analysis is the most popular approach to rendering two- or three-dimensional representations of the major trends in such multidimensional data. The problem of multidimensionality is acute in the rapidly growing area of phylogenomics. Evolutionary relationships…
▽ More
Most biological data are multidimensional, posing a major challenge to human comprehension and computational analysis. Principal component analysis is the most popular approach to rendering two- or three-dimensional representations of the major trends in such multidimensional data. The problem of multidimensionality is acute in the rapidly growing area of phylogenomics. Evolutionary relationships are represented by phylogenetic trees, and very typically a phylogenomic analysis results in a collection of such trees, one for each gene in the analysis. Principal component analysis offers a means of quantifying variation and summarizing a collection of phylogenies by dimensional reduction. However, the space of all possible phylogenies on a fixed set of species does not form a Euclidean vector space, so principal component analysis must be reformulated in the geometry of tree-space, which is a CAT(0) geodesic metric space. Previous work has focused on construction of the first principal component, or principal geodesic. Here we propose a geometric object which represents a $k$-th order principal component: the locus of the weighted Fréchet mean of $k+1$ points in tree-space, where the weights vary over the standard $k$-dimensional simplex. We establish basic properties of these objects, in particular that locally they generically have dimension $k$, and we propose an efficient algorithm for projection onto these surfaces. Combined with a stochastic optimization algorithm, this projection algorithm gives a procedure for constructing a principal component of arbitrary order in tree-space. Simulation studies confirm these algorithms perform well, and they are applied to data sets of Apicomplexa gene trees and the African coelacanth genome. The results enable visualizations of slices of tree-space, revealing structure within these complex data sets.
△ Less
Submitted 10 September, 2016;
originally announced September 2016.
-
Probing nuclear gluons with heavy quarks at EIC
Authors:
E. Chudakov,
D. Higinbotham,
Ch. Hyde,
S. Furletov,
Yu. Furletova,
D. Nguyen,
M. Stratmann,
M. Strikman,
C. Weiss,
R. Yoshida
Abstract:
We explore the feasibility of direct measurements of nuclear gluon densities using heavy-quark production (open charm, beauty) at a future Electron-Ion Collider (EIC). We focus on the regions x > 0.3 (EMC effect) and x ~ 0.05-0.1 (antishadowing), where the nuclear modifications of the gluon density offer insight into non-nucleonic degrees of freedom and the QCD structure of nucleon-nucleon interac…
▽ More
We explore the feasibility of direct measurements of nuclear gluon densities using heavy-quark production (open charm, beauty) at a future Electron-Ion Collider (EIC). We focus on the regions x > 0.3 (EMC effect) and x ~ 0.05-0.1 (antishadowing), where the nuclear modifications of the gluon density offer insight into non-nucleonic degrees of freedom and the QCD structure of nucleon-nucleon interactions. We describe the charm production rates and momentum distributions in nuclear deep-inelastic scattering (DIS) at large x_B, and comment on the possible methods for charm reconstruction using next-generation detectors at the EIC (pi/K identification, tracking, vertex detection).
△ Less
Submitted 30 August, 2016;
originally announced August 2016.
-
Semigroups --- A Computational Approach
Authors:
Florian Kohl,
Yanxi Li,
Johannes Rauh,
Ruriko Yoshida
Abstract:
The question whether there exists an integral solution to the system of linear equations with non-negative constraints, $A\x = \b, \, \x \ge 0$, where $A \in \Z^{m\times n}$ and ${\mathbf b} \in \Z^m$, finds its applications in many areas, such as operation research, number theory and statistics. In order to solve this problem, we have to understand the semigroup generated by the columns of the ma…
▽ More
The question whether there exists an integral solution to the system of linear equations with non-negative constraints, $A\x = \b, \, \x \ge 0$, where $A \in \Z^{m\times n}$ and ${\mathbf b} \in \Z^m$, finds its applications in many areas, such as operation research, number theory and statistics. In order to solve this problem, we have to understand the semigroup generated by the columns of the matrix $A$ and the structure of the "holes" which are the difference between the semigroup generated by the columns of the matrix $A$ and its saturation. In this paper, we discuss the implementation of an algorithm by Hemmecke, Takemura, and Yoshida that computes the set of holes of a semigroup, % generated by the columns of $A$ and we discuss applications to problems in combinatorics. Moreover, we compute the set of holes for the common diagonal effect model, and we show that the $n$th linear ordering polytope has the integer-decomposition property for $n\leq 7$. The software is available at
\url{http://ehrhart.math.fu-berlin.de/People/fkohl/HASE/}.
△ Less
Submitted 6 April, 2017; v1 submitted 10 August, 2016;
originally announced August 2016.
-
Tropical Fermat-Weber points
Authors:
Bo Lin,
Ruriko Yoshida
Abstract:
In a metric space, the Fermat-Weber points of a sample are statistics to measure the central tendency of the sample and it is well-known that the Fermat-Weber point of a sample is not necessarily unique in the metric space. We investigate the computation of Fermat-Weber points under the tropical metric on the quotient space $\mathbb{R}^{n} \!/ \mathbb{R} {\bf 1}$ with a fixed $n \in \mathbb{N}$, m…
▽ More
In a metric space, the Fermat-Weber points of a sample are statistics to measure the central tendency of the sample and it is well-known that the Fermat-Weber point of a sample is not necessarily unique in the metric space. We investigate the computation of Fermat-Weber points under the tropical metric on the quotient space $\mathbb{R}^{n} \!/ \mathbb{R} {\bf 1}$ with a fixed $n \in \mathbb{N}$, motivated by its application to the space of equidistant phylogenetic trees with $N$ leaves (in this case $n=\binom{N}{2}$) realized as the tropical linear space of all ultrametrics. We show that the set of all tropical Fermat-Weber points of a finite sample is always a classical convex polytope, and we present a combinatorial formula for a key value associated to this set. We identify conditions under which this set is a singleton. We apply numerical experiments to analyze the set of the tropical Fermat-Weber points within a space of phylogenetic trees. We discuss the issues in the computation of the tropical Fermat-Weber points.
△ Less
Submitted 15 February, 2018; v1 submitted 15 April, 2016;
originally announced April 2016.
-
Convexity in Tree Spaces
Authors:
Bo Lin,
Bernd Sturmfels,
Xiaoxian Tang,
Ruriko Yoshida
Abstract:
We study the geometry of metrics and convexity structures on the space of phylogenetic trees, which is here realized as the tropical linear space of all \ ultrametrics. The ${\rm CAT}(0)$-metric of Billera-Holmes-Vogtman arises from the theory of orthant spaces. While its geodesics can be computed by the Owen-Provan algorithm, geodesic triangles are complicated. We show that the dimension of such…
▽ More
We study the geometry of metrics and convexity structures on the space of phylogenetic trees, which is here realized as the tropical linear space of all \ ultrametrics. The ${\rm CAT}(0)$-metric of Billera-Holmes-Vogtman arises from the theory of orthant spaces. While its geodesics can be computed by the Owen-Provan algorithm, geodesic triangles are complicated. We show that the dimension of such a triangle can be arbitrarily high. Tropical convexity and the tropical metric behave better. They exhibit properties desirable for geometric statistics, such as geodesics of small depth.
△ Less
Submitted 14 June, 2016; v1 submitted 29 October, 2015;
originally announced October 2015.
-
Quadratic Fermi Node in a 3D Strongly Correlated Semimetal
Authors:
Takeshi Kondo,
M. Nakayama,
R. Chen,
J. J. Ishikawa,
E. -G. Moon,
T. Yamamoto,
Y. Ota,
W. Malaeb,
H. Kanai,
Y. Nakashima,
Y. Ishida,
R. Yoshida,
H. Yamamoto,
M. Matsunami,
S. Kimura,
N. Inami,
K. Ono,
H. Kumigashira,
S. Nakatsuji,
L. Balents,
S. Shin
Abstract:
Strong spin-orbit coupling fosters exotic electronic states such as topological insulators and superconductors, but the combination of strong spin-orbit and strong electron-electron interactions is just beginning to be understood. Central to this emerging area are the 5d transition metal iridium oxides. Here, in the pyrochlore iridate Pr2Ir2O7, we identify a nontrivial state with a single point Fe…
▽ More
Strong spin-orbit coupling fosters exotic electronic states such as topological insulators and superconductors, but the combination of strong spin-orbit and strong electron-electron interactions is just beginning to be understood. Central to this emerging area are the 5d transition metal iridium oxides. Here, in the pyrochlore iridate Pr2Ir2O7, we identify a nontrivial state with a single point Fermi node protected by cubic and time-reversal symmetries, using a combination of angle-resolved photoemission spectroscopy and first principles calculations. Owing to its quadratic dispersion, the unique coincidence of four degenerate states at the Fermi energy, and strong Coulomb interactions, non-Fermi liquid behavior is predicted, for which we observe some evidence. Our discovery implies that Pr2Ir2O7 is a parent state that can be manipulated to produce other strongly correlated topological phases, such as topological Mott insulator, Weyl semi-metal, and quantum spin and anomalous Hall states.
△ Less
Submitted 16 December, 2015; v1 submitted 27 October, 2015;
originally announced October 2015.