-
MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time Series
Authors:
**gchao Ni,
Gauthier Guinet,
Peihong Jiang,
Laurent Callot,
Andrey Kan
Abstract:
In large IT systems, software deployment is a crucial process in online services as their code is regularly updated. However, a faulty code change may degrade the target service's performance and cause cascading outages in downstream services. Thus, software deployments should be comprehensively monitored, and their anomalies should be detected timely. In this paper, we study the problem of anomal…
▽ More
In large IT systems, software deployment is a crucial process in online services as their code is regularly updated. However, a faulty code change may degrade the target service's performance and cause cascading outages in downstream services. Thus, software deployments should be comprehensively monitored, and their anomalies should be detected timely. In this paper, we study the problem of anomaly detection for deployments. We begin by identifying the challenges unique to this anomaly detection problem, which is at entity-level (e.g., deployments), relative to the more typical problem of anomaly detection in multivariate time series (MTS). The unique challenges include the heterogeneity of deployments, the low latency tolerance, the ambiguous anomaly definition, and the limited supervision. To address them, we propose a novel framework, semi-supervised hybrid Model for Entity-Level Online Detection of anomalY (MELODY). MELODY first transforms the MTS of different entities to the same feature space by an online feature extractor, then uses a newly proposed semi-supervised deep one-class model for detecting anomalous entities. We evaluated MELODY on real data of cloud services with 1.2M+ time series. The relative F1 score improvement of MELODY over the state-of-the-art methods ranges from 7.6% to 56.5%. The user evaluation suggests MELODY is suitable for monitoring deployments in large online systems.
△ Less
Submitted 6 June, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Option pricing under stochastic volatility on a quantum computer
Authors:
Guoming Wang,
Angus Kan
Abstract:
We develop quantum algorithms for pricing Asian and barrier options under the Heston model, a popular stochastic volatility model, and estimate their costs, in terms of T-count, T-depth and number of logical qubits, on instances under typical market conditions. These algorithms are based on combining well-established numerical methods for stochastic differential equations and quantum amplitude est…
▽ More
We develop quantum algorithms for pricing Asian and barrier options under the Heston model, a popular stochastic volatility model, and estimate their costs, in terms of T-count, T-depth and number of logical qubits, on instances under typical market conditions. These algorithms are based on combining well-established numerical methods for stochastic differential equations and quantum amplitude estimation technique. In particular, we empirically show that, despite its simplicity, weak Euler method achieves the same level of accuracy as the better-known strong Euler method in this task. Furthermore, by eliminating the expensive procedure of preparing Gaussian states, the quantum algorithm based on weak Euler scheme achieves drastically better efficiency than the one based on strong Euler scheme. Our resource analysis suggests that option pricing under stochastic volatility is a promising application of quantum computers, and that our algorithms render the hardware requirement for reaching practical quantum advantage in financial applications less stringent than prior art.
△ Less
Submitted 3 March, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Directed Evolution of Microorganisms for Engineered Living Materials
Authors:
Julie M. Laurent,
Ankit Jain,
Anton Kan,
Mathias Steinacher,
Nadia Enrriquez Casimiro,
Stavros Stavrakis,
Andrew J. deMello,
André R. Studart
Abstract:
Microorganisms can create engineered materials with exquisite structures and living functionalities. Although synthetic biology tools to genetically manipulate microorganisms continue to expand, the bottom-up rational design of engineered living materials still relies on prior knowledge of genotype-phenotype links for the function of interest. Here, we utilize a high-throughput directed evolution…
▽ More
Microorganisms can create engineered materials with exquisite structures and living functionalities. Although synthetic biology tools to genetically manipulate microorganisms continue to expand, the bottom-up rational design of engineered living materials still relies on prior knowledge of genotype-phenotype links for the function of interest. Here, we utilize a high-throughput directed evolution platform to enhance the fitness of whole microorganisms under selection pressure and identify novel genetic pathways to program the functionalities of engineered living materials. Using Komagataeibacter sucrofermentans as a model cellulose-producing microorganism, we show that our droplet-based microfluidic platform enables the directed evolution of these bacteria towards a small number of cellulose overproducers from an initial pool of 40'000 random mutants. Sequencing of the evolved strains reveals an unexpected link between the cellulose-forming ability of the bacteria and a gene encoding a protease complex responsible for protein turnover in the cell. The ability to enhance the fitness of microorganisms towards specific phenotypes and to discover new genotype-phenotype links makes this high-throughput directed evolution platform a promising tool for the development of the next generation of engineered living materials.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Fluctuation-Induced Transitions in Anisotropic Two-Dimensional Turbulence
Authors:
Lichuan Xu,
Adrian van Kan,
Chang Liu,
Edgar Knobloch
Abstract:
Two-dimensional (2D) turbulence features an inverse energy cascade that produces large-scale flow structures such as large-scale vortices (LSVs) and unidirectional jets. We investigate the dynamics of such structures using extensive direct numerical simulations (DNS) of randomly forced, viscously damped 2D turbulence within a periodic rectangular (Cartesian) domain $[0,L_x]\times[0,L_y]$. LSVs for…
▽ More
Two-dimensional (2D) turbulence features an inverse energy cascade that produces large-scale flow structures such as large-scale vortices (LSVs) and unidirectional jets. We investigate the dynamics of such structures using extensive direct numerical simulations (DNS) of randomly forced, viscously damped 2D turbulence within a periodic rectangular (Cartesian) domain $[0,L_x]\times[0,L_y]$. LSVs form and dominate the system when the domain aspect ratio $δ= L_x/L_y \approx 1$, while unidirectional jets predominate at $δ\gtrsim 1.1$. At intermediate $δ$, both structures are metastable, with noise-induced transitions between LSVs and jets. We derive and verify predictions for the dependence of kinetic energy E and flow polarity on the nondimensional control parameters. We further collect detailed statistics on the lifetimes of LSVs and jets from DNS runs that are up to 10738 viscous diffusive times long. The distribution of the lifetimes is consistent with that of a memoryless process. Our DNS show an exponential dependence of the mean lifetime on $δ$. Mean lifetimes depend sensitively on the Reynolds number Re: as Re increases, the energy gap between LSV (lower E) and jet states (higher E) arising from anisotropic dissipation increases, leading to an approximately exponential increase in lifetimes with Re for both LSVs and jets. Similarly, as the forcing scale decreases, transitions become less frequent. We study the transitions in detail, revealing that they occur in two stages: an initial, rapid redistribution of kinetic energy by nonlinear triadic interactions deforms LSVs into jets or vice versa. In the second stage, the energy of the newly formed structure slowly adjusts to its associated equilibrium value on a longer, viscous timescale, producing hysteresis. Our findings shed new light on the dynamics of coherent large-scale structures in anisotropic turbulence.
△ Less
Submitted 23 April, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Flexible entangled state generation in linear optics
Authors:
Brendan Pankovich,
Alex Neville,
Angus Kan,
Srikrishna Omkar,
Kwok Ho Wan,
Kamil Brádler
Abstract:
Fault-tolerant quantum computation can be achieved by creating constant-sized, entangled resource states and performing entangling measurements on subsets of their qubits. Linear optical quantum computers can be designed based on this approach, even though entangling operations at the qubit level are non-deterministic in this platform. Probabilistic generation and measurement of entangled states m…
▽ More
Fault-tolerant quantum computation can be achieved by creating constant-sized, entangled resource states and performing entangling measurements on subsets of their qubits. Linear optical quantum computers can be designed based on this approach, even though entangling operations at the qubit level are non-deterministic in this platform. Probabilistic generation and measurement of entangled states must be pushed beyond the required threshold by some combination of scheme optimisation, introduction of redundancy and auxiliary state assistance. We report progress in each of these areas. We explore multi-qubit fusion measurements on dual-rail photonic qubits and their role in measurement-based resource state generation, showing that it is possible to boost the success probability of photonic GHZ state analysers with single photon auxiliary states. By incorporating generators of basic entangled "seed" states, we provide a method that simplifies the process of designing and optimising generators of complex, encoded resource states by establishing links to ZX diagrams.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
From a vortex gas to a vortex crystal in instability-driven two-dimensional turbulence
Authors:
A. van Kan,
B. Favier,
K. Julien,
E. Knobloch
Abstract:
We study structure formation in two-dimensional turbulence driven by an external force, interpolating between linear instability forcing and random stirring, subject to nonlinear dam**. Using extensive direct numerical simulations, we uncover a rich parameter space featuring four distinct branches of stationary solutions: large-scale vortices, hybrid states with embedded shielded vortices (SVs)…
▽ More
We study structure formation in two-dimensional turbulence driven by an external force, interpolating between linear instability forcing and random stirring, subject to nonlinear dam**. Using extensive direct numerical simulations, we uncover a rich parameter space featuring four distinct branches of stationary solutions: large-scale vortices, hybrid states with embedded shielded vortices (SVs) of either sign, and two states composed of many similar SVs. Of the latter, the first is a dense vortex gas where all SVs have the same sign and diffuse across the domain. The second is a hexagonal vortex crystal forming from this gas when the instability is sufficiently weak. These solutions coexist stably over a wide parameter range. The late-time evolution of the system from small-amplitude initial conditions is nearly self-similar, involving three phases: initial inverse cascade, random nucleation of SVs from turbulence and, once a critical number of vortices is reached, a phase of explosive nucleation of SVs, leading to a statistically stationary state. The vortex gas is continued in the forcing parameter, revealing a sharp transition towards the crystal state as the forcing strength decreases. This transition is analysed in terms of the diffusion of individual vortices and tools from statistical physics. The crystal can also decay via an inverse cascade resulting from the breakdown of shielding or insufficient nonlinear dam** acting on SVs. Our study highlights the importance of the forcing details in two-dimensional turbulence and reveals the presence of nontrivial SV states in this system, specifically the emergence and melting of a vortex crystal.
△ Less
Submitted 6 January, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
High photon-loss threshold quantum computing using GHZ-state measurements
Authors:
Brendan Pankovich,
Angus Kan,
Kwok Ho Wan,
Maike Ostmann,
Alex Neville,
Srikrishna Omkar,
Adel Sohbi,
Kamil Brádler
Abstract:
We propose fault-tolerant architectures based on performing projective measurements in the Greenberger-Horne-Zeilinger (GHZ) basis on constant-sized, entangled resource states. We present linear-optical constructions of the architectures, where the GHZ-state measurements are encoded to suppress the errors induced by photon loss and the probabilistic nature of linear optics. Simulations of our cons…
▽ More
We propose fault-tolerant architectures based on performing projective measurements in the Greenberger-Horne-Zeilinger (GHZ) basis on constant-sized, entangled resource states. We present linear-optical constructions of the architectures, where the GHZ-state measurements are encoded to suppress the errors induced by photon loss and the probabilistic nature of linear optics. Simulations of our constructions demonstrate high single-photon loss thresholds compared to the state-of-the-art linear-optical architecture realized with encoded two-qubit fusion measurements performed on constant-sized resource states. We believe this result shows a resource-efficient path to achieving photonic fault-tolerant quantum computing.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
TidyBot: Personalized Robot Assistance with Large Language Models
Authors:
Jimmy Wu,
Rika Antonova,
Adam Kan,
Marion Lepert,
Andy Zeng,
Shuran Song,
Jeannette Bohg,
Szymon Rusinkiewicz,
Thomas Funkhouser
Abstract:
For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly d…
▽ More
For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.
△ Less
Submitted 11 October, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Collisions of localized patterns in a nonvariational Swift-Hohenberg equation
Authors:
Mathi Raja,
Adrian van Kan,
Benjamin Foster,
Edgar Knobloch
Abstract:
The cubic-quintic Swift-Hohenberg equation (SH35) has been proposed as an order parameter description of several convective systems with reflection symmetry in the layer midplane, including binary fluid convection. We use numerical continuation, together with extensive direct numerical simulations, to study SH35 with an additional nonvariational quadratic term to model the effects of breaking the…
▽ More
The cubic-quintic Swift-Hohenberg equation (SH35) has been proposed as an order parameter description of several convective systems with reflection symmetry in the layer midplane, including binary fluid convection. We use numerical continuation, together with extensive direct numerical simulations, to study SH35 with an additional nonvariational quadratic term to model the effects of breaking the midplane reflection symmetry. The nonvariational structure of the model leads to the propagation of asymmetric spatially localized structures (LSs). An asymptotic prediction for the drift velocity of such structures is validated numerically. Next, we present an extensive study of possible collision scenarios between identical and nonidentical traveling structures, varying a temperature-like control parameter. The final state may be a simple bound state of the initial LSs or longer or shorter than the sum of the two initial states as a result of nonlinear interactions. The Maxwell point of the variational system is shown to have no bearing on which of these scenarios is realized. Instead, we argue that the stability properties of bound states are key. While individual LSs lie on a modified snakes-and-ladders structure in the nonvariational SH35, the multi-pulse bound states resulting from collisions lie on isolas in parameter space. In the gradient SH35, such isolas are always of figure-eight shape, but in the present non-gradient case they are generically more complex, some of which terminate in T-point bifurcations. A reduced model consisting of two coupled ordinary differential equations is proposed to describe the linear interactions between the tails of the LSs in which the model parameters are deduced using gradient descent optimization. For collisions leading to the formation of simple bound states, the reduced model reproduces the trajectories of LSs with high quantitative accuracy.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Prostate Lesion Estimation using Prostate Masks from Biparametric MRI
Authors:
Ahmet Karagoz,
Mustafa Ege Seker,
Mert Yergin,
Tarkan Atak Kan,
Mustafa Said Kartal,
Ercan Karaarslan,
Deniz Alis,
Ilkay Oksuz
Abstract:
Biparametric MRI has emerged as an alternative to multiparametric prostate MRI, which eliminates the need for the potential harms to the patient due to the contrast medium. One major issue with biparametric MRI is difficulty to detect clinically significant prostate cancer (csPCA). Deep learning algorithms have emerged as an alternative solution to detect csPCA in cohort studies. We present a work…
▽ More
Biparametric MRI has emerged as an alternative to multiparametric prostate MRI, which eliminates the need for the potential harms to the patient due to the contrast medium. One major issue with biparametric MRI is difficulty to detect clinically significant prostate cancer (csPCA). Deep learning algorithms have emerged as an alternative solution to detect csPCA in cohort studies. We present a workflow which predicts csPCA on biparametric prostate MRI PI-CAI 2022 Challenge with over 10,000 carefully-curated prostate MRI exams. We propose to to segment the prostate gland first to the central gland (transition + central zone) and the peripheral gland. Then we utilize these predcitions in combination with T2, ADC and DWI images to train an ensemble nnU-Net model. Finally, we utilize clinical indices PSA and ADC intensity distributions of lesion regions to reduce the false positives. Our method achieves top results on open-validation stage with a AUROC of 0.888 and AP of 0.732.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Federated Hypergradient Descent
Authors:
Andrew K Kan
Abstract:
In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resour…
▽ More
In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline. Conventionally, hyperparameter tuning methods involve at least some degree of trial-and-error, which is known to be sample inefficient. In order to address our motivations, we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a one-shot online procedure. We investigate the challenges and solutions of deriving analytical gradients with respect to the hyperparameters of interest. Our approach is inspired by the fact that, with the exception of local data, we have full knowledge of all components involved in our training process, and this fact can be exploited in our algorithm impactfully. We show that FATHOM is more communication efficient than Federated Averaging (FedAvg) with optimized, static valued hyperparameters, and is also more computationally efficient overall. As a communication efficient, one-shot online procedure, FATHOM solves the bottleneck of costly communication and limited local computation, by eliminating a potentially wasteful tuning process, and by optimizing the hyperparamters adaptively throughout the training procedure without trial-and-error. We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets, using FedJAX as our baseline framework.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
1/f noise and anomalous scaling in Lévy noise-driven on-off intermittency
Authors:
Adrian van Kan,
François Pétrélis
Abstract:
On-off intermittency occurs in nonequilibrium physical systems close to bifurcation points and is characterised by an aperiodic switching between a large-amplitude "on" state and a small-amplitude "off" state. Lévy on-off intermittency is a recently introduced generalisation of on-off intermittency to multiplicative Lévy noise, which depends on a stability parameter $α$ and a skewness parameter…
▽ More
On-off intermittency occurs in nonequilibrium physical systems close to bifurcation points and is characterised by an aperiodic switching between a large-amplitude "on" state and a small-amplitude "off" state. Lévy on-off intermittency is a recently introduced generalisation of on-off intermittency to multiplicative Lévy noise, which depends on a stability parameter $α$ and a skewness parameter $β$. Here, we derive two novel results on Lévy on-off intermittency by leveraging known exact results on the first-passage time statistics of Lévy flights. First, we compute anomalous critical exponents explicitly as a function of arbitrary Lévy noise parameters $(α,β)$ for the first time, by a heuristic method, complementing previous results. The predictions are verified using numerical solutions of the fractional Fokker-Planck equation. Second, we derive the power spectrum $S(f)$ of Lévy on-off intermittency and show that it displays a power law $S(f)\propto f^κ$ at low frequencies $f$, where $κ\in (-1,0)$ depends on the noise parameters $α,β$. An explicit expression for $κ$ is obtained in terms of $(α,β)$. The predictions are verified using long time series realisations of Lévy on-off intermittency. Our findings help shed light on instabilities subject to non-equilibrium, power-law-distributed fluctuations, emphasizing that their properties can differ starkly from the case of Gaussian fluctuations.
△ Less
Submitted 4 December, 2022; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Unsupervised Model Selection for Time-series Anomaly Detection
Authors:
Mononito Goswami,
Cristian Challu,
Laurent Callot,
Lenon Minorics,
Andrey Kan
Abstract:
Anomaly detection in time-series has a wide range of practical applications. While numerous anomaly detection methods have been proposed in the literature, a recent survey concluded that no single method is the most accurate across various datasets. To make matters worse, anomaly labels are scarce and rarely available in practice. The practical problem of selecting the most accurate model for a gi…
▽ More
Anomaly detection in time-series has a wide range of practical applications. While numerous anomaly detection methods have been proposed in the literature, a recent survey concluded that no single method is the most accurate across various datasets. To make matters worse, anomaly labels are scarce and rarely available in practice. The practical problem of selecting the most accurate model for a given dataset without labels has received little attention in the literature. This paper answers this question i.e. Given an unlabeled dataset and a set of candidate anomaly detectors, how can we select the most accurate model? To this end, we identify three classes of surrogate (unsupervised) metrics, namely, prediction error, model centrality, and performance on injected synthetic anomalies, and show that some metrics are highly correlated with standard supervised anomaly detection performance metrics such as the $F_1$ score, but to varying degrees. We formulate metric combination with multiple imperfect surrogate metrics as a robust rank aggregation problem. We then provide theoretical justification behind the proposed approach. Large-scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised approach is as effective as selecting the most accurate model based on partially labeled data.
△ Less
Submitted 24 January, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Spontaneous suppression of inverse energy cascade in instability-driven 2D turbulence
Authors:
Adrian van Kan,
Benjamin Favier,
Keith Julien,
Edgar Knobloch
Abstract:
Instabilities of fluid flows often generate turbulence. Using extensive direct numerical simulations, we study two-dimensional turbulence driven by a wavenumber-localised instability superposed on stochastic forcing, in contrast to previous studies of state-independent forcing. As the contribution of the instability forcing, measured by a parameter $γ$, increases, the system undergoes two transiti…
▽ More
Instabilities of fluid flows often generate turbulence. Using extensive direct numerical simulations, we study two-dimensional turbulence driven by a wavenumber-localised instability superposed on stochastic forcing, in contrast to previous studies of state-independent forcing. As the contribution of the instability forcing, measured by a parameter $γ$, increases, the system undergoes two transitions. For $γ$ below a first threshold, a regular large-scale vortex condensate forms. Above this threshold, shielded vortices (SVs) emerge within the condensate. At a second, larger value of $γ$, the condensate breaks down, and a gas of weakly interacting vortices with broken symmetry spontaneously emerges, characterised by preponderance of vortices of one sign only and suppressed inverse energy cascade. The latter transition is shown to depend on the dam** mechanism. The number density of SVs in the broken symmetry state slowly increases via a random nucleation process. Bistability is observed between the condensate and mixed SV-condensate states. Our findings provide new evidence for a strong dependence of two-dimensional turbulence phenomenology on the forcing.
△ Less
Submitted 8 November, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.
-
LineVD: Statement-level Vulnerability Detection using Graph Neural Networks
Authors:
David Hin,
Andrey Kan,
Huaming Chen,
M. Ali Babar
Abstract:
Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to vulnerabilities. This limits the ability of developers to efficiently inspect and interpret the predictions from a learnt model, which is crucial for integrating mach…
▽ More
Current machine-learning based software vulnerability detection methods are primarily conducted at the function-level. However, a key limitation of these methods is that they do not indicate the specific lines of code contributing to vulnerabilities. This limits the ability of developers to efficiently inspect and interpret the predictions from a learnt model, which is crucial for integrating machine-learning based tools into the software development workflow. Graph-based models have shown promising performance in function-level vulnerability detection, but their capability for statement-level vulnerability detection has not been extensively explored. While interpreting function-level predictions through explainable AI is one promising direction, we herein consider the statement-level software vulnerability detection task from a fully supervised learning perspective. We propose a novel deep learning framework, LineVD, which formulates statement-level vulnerability detection as a node classification task. LineVD leverages control and data dependencies between statements using graph neural networks, and a transformer-based model to encode the raw source code tokens. In particular, by addressing the conflicting outputs between function-level and statement-level information, LineVD significantly improve the prediction performance without vulnerability status for function code. We have conducted extensive experiments against a large-scale collection of real-world C/C++ vulnerabilities obtained from multiple real-world projects, and demonstrate an increase of 105\% in F1-score over the current state-of-the-art.
△ Less
Submitted 25 March, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Online Time Series Anomaly Detection with State Space Gaussian Processes
Authors:
Christian Bock,
François-Xavier Aubet,
Jan Gasthaus,
Andrey Kan,
Ming Chen,
Laurent Callot
Abstract:
We propose r-ssGPFA, an unsupervised online anomaly detection model for uni- and multivariate time series building on the efficient state space formulation of Gaussian processes. For high-dimensional time series, we propose an extension of Gaussian process factor analysis to identify the common latent processes of the time series, allowing us to detect anomalies efficiently in an interpretable man…
▽ More
We propose r-ssGPFA, an unsupervised online anomaly detection model for uni- and multivariate time series building on the efficient state space formulation of Gaussian processes. For high-dimensional time series, we propose an extension of Gaussian process factor analysis to identify the common latent processes of the time series, allowing us to detect anomalies efficiently in an interpretable manner. We gain explainability while speeding up computations by imposing an orthogonality constraint on the map** from the latent to the observed. Our model's robustness is improved by using a simple heuristic to skip Kalman updates when encountering anomalous observations. We investigate the behaviour of our model on synthetic data and show on standard benchmark datasets that our method is competitive with state-of-the-art methods while being computationally cheaper.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Bistability of the large-scale dynamics in quasi-two-dimensional turbulence
Authors:
Xander M. de Wit,
Adrian van Kan,
Alexandros Alexakis
Abstract:
In many geophysical and astrophysical flows, suppression of fluctuations along one direction of the flow drives a quasi-2D upscale flux of kinetic energy, leading to the formation of strong vortex condensates at the largest scales. Recent studies have shown that the transition towards this condensate state is hysteretic, giving rise to a limited bistable range in which both the condensate state as…
▽ More
In many geophysical and astrophysical flows, suppression of fluctuations along one direction of the flow drives a quasi-2D upscale flux of kinetic energy, leading to the formation of strong vortex condensates at the largest scales. Recent studies have shown that the transition towards this condensate state is hysteretic, giving rise to a limited bistable range in which both the condensate state as well as the regular 3D state can exist at the same parameter values. In this work, we use direct numerical simulations of thin-layer flow to investigate whether this bistable range survives as the domain size and turbulence intensity are increased. By studying the time scales at which rare transitions occur from one state into the other, we find that the bistable range grows as the box size and/or Reynolds number Re are increased, showing that the bistability is neither a finite-size nor a finite-Re effect. We furthermore predict a crossover from a bimodal regime at low box size, low Re to a regime of pure hysteresis at high box size, high Re, in which any transition from one state to the other is prohibited at any finite time scale.
△ Less
Submitted 18 March, 2022; v1 submitted 1 January, 2022;
originally announced January 2022.
-
Estimating nonlinear stability from time series data
Authors:
Adrian van Kan,
Jannes Jegminat,
Jonathan Donges
Abstract:
Basin stability (BS) is a measure of nonlinear stability in multi-stable dynamical systems. BS has previously been estimated using Monte-Carlo simulations, which requires the explicit knowledge of a dynamical model. We discuss the requirements for estimating BS from time series data in the presence of strong perturbations, and illustrate our approach for two simple models of climate tip** elemen…
▽ More
Basin stability (BS) is a measure of nonlinear stability in multi-stable dynamical systems. BS has previously been estimated using Monte-Carlo simulations, which requires the explicit knowledge of a dynamical model. We discuss the requirements for estimating BS from time series data in the presence of strong perturbations, and illustrate our approach for two simple models of climate tip** elements: the Amazon rain forest and the thermohaline ocean circulation. We discuss the applicability of our method to observational data as constrained by the relevant time scales of total observation time, typical return time of perturbations and internal convergence time scale of the system of interest and other factors.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Mobility of bacterial protein Hfq on dsDNA; Role of C-terminus mediated transient binding
Authors:
Chuan Jie Tan,
Rajib Basak,
Indresh Yadav,
Jeroen A. van Kan,
Veronique Arluison,
Johan R. C. van der Maarel
Abstract:
The mobility of protein is fundamental in the machinery of life. Here, we have investigated the effect of DNA binding in conjunction with DNA internal motion of the bacterial Hfq master regulator devoid of its amyloid C-terminus domain. Hfq is one of the most abundant nucleoid associated proteins that shape the bacterial chromosome and is involved in several aspects of nucleic acid metabolism. Flu…
▽ More
The mobility of protein is fundamental in the machinery of life. Here, we have investigated the effect of DNA binding in conjunction with DNA internal motion of the bacterial Hfq master regulator devoid of its amyloid C-terminus domain. Hfq is one of the most abundant nucleoid associated proteins that shape the bacterial chromosome and is involved in several aspects of nucleic acid metabolism. Fluorescence microscopy has been used to track a C-terminus domain lacking mutant form of Hfq on double stranded DNA, which is stretched by confinement to a rectangular nanofluidic channel. The mobility of the mutant is strongly accelerated with respect to the wild type variant. Furthermore, it shows a reverse dependence on the internal motion of DNA, in that slower motion results in slower protein diffusion. Results demonstrate the subtle role of DNA internal motion in controlling the mobility of a nucleoid associated protein, and, in particular, the importance of transient binding and moving DNA strands out of the way.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
3+1D $θ$-Term on the Lattice from the Hamiltonian Perspective
Authors:
Angus Kan,
Lena Funcke,
Stefan Kühn,
Luca Dellantonio,
**glei Zhang,
Jan F. Haase,
Christine A. Muschik,
Karl Jansen
Abstract:
Quantum and tensor network simulations have emerged as prominent sign-problem free approaches to lattice gauge theories. Unlike conventional Markov chain Monte Carlo methods, they are based on the Hamiltonian formulation. In this talk, we fill a gap in the literature and present the first derivation of the Hamiltonian 3+1D $θ$-term -- which is an important sign-problem afflicted term -- for Abelia…
▽ More
Quantum and tensor network simulations have emerged as prominent sign-problem free approaches to lattice gauge theories. Unlike conventional Markov chain Monte Carlo methods, they are based on the Hamiltonian formulation. In this talk, we fill a gap in the literature and present the first derivation of the Hamiltonian 3+1D $θ$-term -- which is an important sign-problem afflicted term -- for Abelian and non-Abelian lattice gauge theories. Furthermore, we perform exact diagonalization for a 3+1D U(1) lattice gauge theory including the $θ$-term on a unit periodic cube. Our numerical results reveal a novel phase transition at fixed values of $θ$ in the strong-coupling regime. The transition is evidenced by an avoided level crossing in the ground state energy, as well as sudden changes in the plaquette expectation value, the electric energy density, and the topological charge density. Extensions of our work to larger lattices can be readily performed using state-of-the-art tensor network simulations. Moreover, our work provides a concrete starting point for an eventual quantum simulation of the $θ$-dependent phase structure and dynamics of lattice gauge theories in 3+1D. This talk is mainly based on [1]. We expand beyond [1] by including a derivation of the (non-)Abelian fixed-length Higgs term in the Hamiltonian formulation for future studies of (non-)Abelian-Higgs models with a $θ$-term.
△ Less
Submitted 29 November, 2021; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Lattice Quantum Chromodynamics and Electrodynamics on a Universal Quantum Computer
Authors:
Angus Kan,
Yunseong Nam
Abstract:
It is widely anticipated that a large-scale quantum computer will offer an evermore accurate simulation of nature, opening the floodgates for exciting scientific breakthroughs and technological innovations. Here, we show a complete, instruction-by-instruction rubric to simulate U(1), SU(2), and SU(3) lattice gauge theories on a quantum computer. These theories describe quantum electrodynamics and…
▽ More
It is widely anticipated that a large-scale quantum computer will offer an evermore accurate simulation of nature, opening the floodgates for exciting scientific breakthroughs and technological innovations. Here, we show a complete, instruction-by-instruction rubric to simulate U(1), SU(2), and SU(3) lattice gauge theories on a quantum computer. These theories describe quantum electrodynamics and chromodynamics, the key ingredients that form the fabric of our universe. We further provide a concrete estimate of the quantum computational resources required for an accurate simulation of lattice gauge theories using a second-order product formula. We show that lattice gauge theories in any spatial dimension can be simulated using $\tilde{O}(T^{3/2}N^{3/2}Λ/ε^{1/2})$ T gates, where $N$ is the number of lattice sites, $Λ$ is the bosonic gauge field truncation, and $T$ is the simulation time.
△ Less
Submitted 26 January, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Energy cascades in rapidly rotating and stratified turbulence within elongated domains
Authors:
Adrian van Kan,
Alexandros Alexakis
Abstract:
We study forced, rapildy rotating and stably stratified turbulence in an elongated domain using an asymptotic expansion at simultaneously low Rossby number $Ro\ll1$ and large domain height compared to the energy injection scale, $h=H/\ell_{in}\gg1$. The resulting equations depend on the parameter $λ=(h Ro )^{-1}$ and the Froude number $Fr$. An extensive set of direct numerical simulations (DNS) is…
▽ More
We study forced, rapildy rotating and stably stratified turbulence in an elongated domain using an asymptotic expansion at simultaneously low Rossby number $Ro\ll1$ and large domain height compared to the energy injection scale, $h=H/\ell_{in}\gg1$. The resulting equations depend on the parameter $λ=(h Ro )^{-1}$ and the Froude number $Fr$. An extensive set of direct numerical simulations (DNS) is performed to explore the parameter space $(λ,Fr)$. We show that a forward energy cascade occurs in one region of this space, and a split energy cascade outside it. At weak stratification (large $Fr$), an inverse cascade is observed for sufficiently large $λ$. At strong stratification (small $Fr$) the flow becomes approximately hydrostatic and an inverse cascade is always observed. For both weak and strong stratification, we present theoretical arguments supporting the observed energy cascade phenomenology. Our results shed light on an asymptotic region in the phase diagram of rotating and stratified turbulence, which is difficult to attain by brute-force DNS.
△ Less
Submitted 21 October, 2021; v1 submitted 13 June, 2021;
originally announced June 2021.
-
Investigating a (3+1)D Topological $θ$-Term in the Hamiltonian Formulation of Lattice Gauge Theories for Quantum and Classical Simulations
Authors:
Angus Kan,
Lena Funcke,
Stefan Kühn,
Luca Dellantonio,
**glei Zhang,
Jan F. Haase,
Christine A. Muschik,
Karl Jansen
Abstract:
Quantum technologies offer the prospect to efficiently simulate sign-problem afflicted regimes in lattice field theory, such as the presence of topological terms, chemical potentials, and out-of-equilibrium dynamics. In this work, we derive the (3+1)D topological $θ$-term for Abelian and non-Abelian lattice gauge theories in the Hamiltonian formulation, paving the way towards Hamiltonian-based sim…
▽ More
Quantum technologies offer the prospect to efficiently simulate sign-problem afflicted regimes in lattice field theory, such as the presence of topological terms, chemical potentials, and out-of-equilibrium dynamics. In this work, we derive the (3+1)D topological $θ$-term for Abelian and non-Abelian lattice gauge theories in the Hamiltonian formulation, paving the way towards Hamiltonian-based simulations of such terms on quantum and classical computers. We further study numerically the zero-temperature phase structure of a (3+1)D U(1) lattice gauge theory with the $θ$-term via exact diagonalization for a single periodic cube. In the strong coupling regime, our results suggest the occurrence of a phase transition at constant values of $θ$, as indicated by an avoided level-crossing and abrupt changes in the plaquette expectation value, the electric energy density, and the topological charge density. These results could in principle be cross-checked by the recently developed (3+1)D tensor network methods and quantum simulations, once sufficient resources become available.
△ Less
Submitted 18 October, 2021; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Geometric microcanonical theory of two-dimensional Truncated Euler flows
Authors:
Adrian van Kan,
Alexandros Alexakis,
Marc Brachet
Abstract:
This paper presents a geometric microcanonical ensemble perspective on two-dimensional Truncated Euler flows, which contain a finite number of (Fourier) modes and conserve energy and enstrophy. We explicitly perform phase space volume integrals over shells of constant energy and enstrophy. Two applications are considered. In a first part, we determine the average energy spectrum for highly condens…
▽ More
This paper presents a geometric microcanonical ensemble perspective on two-dimensional Truncated Euler flows, which contain a finite number of (Fourier) modes and conserve energy and enstrophy. We explicitly perform phase space volume integrals over shells of constant energy and enstrophy. Two applications are considered. In a first part, we determine the average energy spectrum for highly condensed flow configurations and show that the result is consistent with Kraichnan's canonical ensemble description, despite the fact that no thermodynamic limit is invoked. In a second part, we compute the probability density for the largest-scale mode of a free-slip flow in a square, which displays reversals. We test the results against numerical simulations of a minimal model and find excellent agreement with the microcanonical theory, unlike the canonical theory, which fails to describe the bimodal statistics. This article is part of the theme issue "Mathematical problems in physical fluid dynamics".
△ Less
Submitted 4 July, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Lévy on-off intermittency
Authors:
Adrian van Kan,
Alexandros Alexakis,
Marc-Etienne Brachet
Abstract:
We present a new form of intermittency, Lévy on-off intermittency, which arises from multiplicative $α$-stable white noise close to an instability threshold. We study this problem in the linear and nonlinear regimes, both theoretically and numerically, for the case of a pitchfork bifurcation with fluctuating growth rate. We compute the stationary distribution analytically and numerically from the…
▽ More
We present a new form of intermittency, Lévy on-off intermittency, which arises from multiplicative $α$-stable white noise close to an instability threshold. We study this problem in the linear and nonlinear regimes, both theoretically and numerically, for the case of a pitchfork bifurcation with fluctuating growth rate. We compute the stationary distribution analytically and numerically from the associated fractional Fokker-Planck equation in the Stratonovich interpretation. We characterize the system in the parameter space $(α,β)$ of the noise, with stability parameter $α\in (0,2)$ and skewness parameter $β\in[-1,1]$. Five regimes are identified in this parameter space, in addition to the well-studied Gaussian case $α=2$. Three regimes are located at $1<α<2$, where the noise has finite mean but infinite variance. They are differentiated by $β$ and all display a critical transition at the deterministic instability threshold, with on-off intermittency close to onset. Critical exponents are computed from the stationary distribution. Each regime is characterized by a specific form of the density and specific critical exponents, which differ starkly from the Gaussian case. A finite or infinite number of integer-order moments may converge, depending on parameters. Two more regimes are found at $0<α\leq 1$. There, the mean of the noise diverges, and no critical transition occurs. In one case the origin is always unstable, independently of the distance $μ$ from the deterministic threshold. In the other case, the origin is conversely always stable, independently of $μ$. We thus demonstrate that an instability subject to non-equilibrium, power-law-distributed fluctuations can display substantially different properties than for Gaussian thermal fluctuations, in terms of statistics and critical behavior.
△ Less
Submitted 24 April, 2021; v1 submitted 13 February, 2021;
originally announced February 2021.
-
Intermittency of three-dimensional perturbations in a point-vortex model
Authors:
Adrian van Kan,
Alexandros Alexakis,
Marc Etienne Brachet
Abstract:
Three-dimensional (3D) instabilities on a (potentially turbulent) two-dimensional (2D) flow are still incompletely understood, despite recent progress. Here, based on known physical properties of such 3-D instabilities, we propose a simple, energy-conserving model describing this situation. It consists of a 2D point-vortex flow coupled to localized 3D perturbations (ergophages), such that ergophag…
▽ More
Three-dimensional (3D) instabilities on a (potentially turbulent) two-dimensional (2D) flow are still incompletely understood, despite recent progress. Here, based on known physical properties of such 3-D instabilities, we propose a simple, energy-conserving model describing this situation. It consists of a 2D point-vortex flow coupled to localized 3D perturbations (ergophages), such that ergophages can gain energy by altering vortex-vortex distances through an induced divergent velocity field, thus decreasing point-vortex energy. We investigate the model in three distinct stages of evolution: (i) The linear regime, where the ergophage amplitude grows or decays exponentially on average, with a randomly fluctuating instantaneous growth rate. The growth rate has a small auto-correlation time, and follows a probability distribution featuring a power-law tail with exponent between -2 and -5/3 (up to a cut-off), depending on the point-vortex base flow. Consequently, the logarithmic ergophage amplitude performs a free Lévy flight. (ii) The passive-nonlinear regime of the model, where the 2D flow evolves independently of the ergophage amplitudes, which saturate by non-linear self-interactions without affecting the 2D flow. In this regime the system exhibits a new type of on-off intermittency that we name Lévy on-off intermittency, and which we study in a companion paper. We compute the bifurcation diagram for the mean and variance of the perturbation amplitude, as well as the probability density of the perturbation amplitude. (iii) Finally, we characterize the the fully nonlinear regime, where ergophages feed back on the 2D flow, and study how the vortex temperature is altered by the interaction with ergophages. It is shown that when the amplitude of the ergophages is sufficiently large, the 2D flow saturates to a zero-temperature state. Given the limitations of existing theories ...
△ Less
Submitted 24 April, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
J-Recs: Principled and Scalable Recommendation Justification
Authors:
Namyong Park,
Andrey Kan,
Christos Faloutsos,
Xin Luna Dong
Abstract:
Online recommendation is an essential functionality across a variety of services, including e-commerce and video streaming, where items to buy, watch, or read are suggested to users. Justifying recommendations, i.e., explaining why a user might like the recommended item, has been shown to improve user satisfaction and persuasiveness of the recommendation. In this paper, we develop a method for gen…
▽ More
Online recommendation is an essential functionality across a variety of services, including e-commerce and video streaming, where items to buy, watch, or read are suggested to users. Justifying recommendations, i.e., explaining why a user might like the recommended item, has been shown to improve user satisfaction and persuasiveness of the recommendation. In this paper, we develop a method for generating post-hoc justifications that can be applied to the output of any recommendation algorithm. Existing post-hoc methods are often limited in providing diverse justifications, as they either use only one of many available types of input data, or rely on the predefined templates. We address these limitations of earlier approaches by develo** J-Recs, a method for producing concise and diverse justifications. J-Recs is a recommendation model-agnostic method that generates diverse justifications based on various types of product and user data (e.g., purchase history and product attributes). The challenge of jointly processing multiple types of data is addressed by designing a principled graph-based approach for justification generation. In addition to theoretical analysis, we present an extensive evaluation on synthetic and real-world data. Our results show that J-Recs satisfies desirable properties of justifications, and efficiently produces effective justifications, matching user preferences up to 20% more accurately than baselines.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Towards simulating 2D effects in lattice gauge theories on a quantum computer
Authors:
Danny Paulson,
Luca Dellantonio,
Jan F. Haase,
Alessio Celi,
Angus Kan,
Andrew Jena,
Christian Kokail,
Rick van Bijnen,
Karl Jansen,
Peter Zoller,
Christine A. Muschik
Abstract:
Gauge theories are the most successful theories for describing nature at its fundamental level, but obtaining analytical or numerical solutions often remains a challenge. We propose an experimental quantum simulation scheme to study ground state properties in two-dimensional quantum electrodynamics (2D QED) using existing quantum technology. The proposal builds on a formulation of lattice gauge th…
▽ More
Gauge theories are the most successful theories for describing nature at its fundamental level, but obtaining analytical or numerical solutions often remains a challenge. We propose an experimental quantum simulation scheme to study ground state properties in two-dimensional quantum electrodynamics (2D QED) using existing quantum technology. The proposal builds on a formulation of lattice gauge theories as effective spin models in arXiv:2006.14160, which reduces the number of qubits needed by eliminating redundant degrees of freedom and by using an efficient truncation scheme for the gauge fields. The latter endows our proposal with the perspective to take a well-controlled continuum limit. Our protocols allow in principle scaling up to large lattices and offer the perspective to connect the lattice simulation to low energy observable quantities, e.g. the hadron spectrum, in the continuum theory. By including both dynamical matter and a non-minimal gauge field truncation, we provide the novel opportunity to observe 2D effects on present-day quantum hardware. More specifically, we present two Variational Quantum Eigensolver (VQE) based protocols for the study of magnetic field effects, and for taking an important first step towards computing the running coupling of QED. For both instances, we include variational quantum circuits for qubit-based hardware, which we explicitly apply to trapped ion quantum computers. We simulate the proposed VQE experiments classically to calculate the required measurement budget under realistic conditions. While this feasibility analysis is done for trapped ions, our approach can be easily adapted to other platforms. The techniques presented here, combined with advancements in quantum hardware pave the way for reaching beyond the capabilities of classical simulations by extending our framework to include fermionic potentials or topological terms.
△ Less
Submitted 30 July, 2021; v1 submitted 20 August, 2020;
originally announced August 2020.
-
A resource efficient approach for quantum and classical simulations of gauge theories in particle physics
Authors:
Jan F. Haase,
Luca Dellantonio,
Alessio Celi,
Danny Paulson,
Angus Kan,
Karl Jansen,
Christine A. Muschik
Abstract:
Gauge theories establish the standard model of particle physics, and lattice gauge theory (LGT) calculations employing Markov Chain Monte Carlo (MCMC) methods have been pivotal in our understanding of fundamental interactions. The present limitations of MCMC techniques may be overcome by Hamiltonian-based simulations on classical or quantum devices, which further provide the potential to address q…
▽ More
Gauge theories establish the standard model of particle physics, and lattice gauge theory (LGT) calculations employing Markov Chain Monte Carlo (MCMC) methods have been pivotal in our understanding of fundamental interactions. The present limitations of MCMC techniques may be overcome by Hamiltonian-based simulations on classical or quantum devices, which further provide the potential to address questions that lay beyond the capabilities of the current approaches. However, for continuous gauge groups, Hamiltonian-based formulations involve infinite-dimensional gauge degrees of freedom that can solely be handled by truncation. Current truncation schemes require dramatically increasing computational resources at small values of the bare couplings, where magnetic field effects become important. Such limitation precludes one from `taking the continuous limit' while working with finite resources. To overcome this limitation, we provide a resource-efficient protocol to simulate LGTs with continuous gauge groups in the Hamiltonian formulation. Our new method allows for calculations at arbitrary values of the bare coupling and lattice spacing. The approach consists of the combination of a Hilbert space truncation with a regularization of the gauge group, which permits an efficient description of the magnetically-dominated regime. We focus here on Abelian gauge theories and use $2+1$ dimensional quantum electrodynamics as a benchmark example to demonstrate this efficient framework to achieve the continuum limit in LGTs. This possibility is a key requirement to make quantitative predictions at the field theory level and offers the long-term perspective to utilise quantum simulations to compute physically meaningful quantities in regimes that are precluded to quantum Monte Carlo.
△ Less
Submitted 21 January, 2021; v1 submitted 25 June, 2020;
originally announced June 2020.
-
AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types
Authors:
Xin Luna Dong,
Xiang He,
Andrey Kan,
Xian Li,
Yan Liang,
Jun Ma,
Yifan Ethan Xu,
Chenwei Zhang,
Tong Zhao,
Gabriel Blanco Saldana,
Saurabh Deshpande,
Alexandre Michetti Manduca,
Jay Ren,
Surender Pal Singh,
Fan Xiao,
Haw-Shiuan Chang,
Giannis Karamanolakis,
Yuning Mao,
Yaqing Wang,
Christos Faloutsos,
Andrew McCallum,
Jiawei Han
Abstract:
Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products p…
▽ More
Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals
Authors:
Namyong Park,
Andrey Kan,
Xin Luna Dong,
Tong Zhao,
Christos Faloutsos
Abstract:
Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with…
▽ More
Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with multiple types of nodes and edges. On the other hand, there are external input signals, such as the number of votes or pageviews, which can directly tell us about the importance of entities in a KG. While several methods have been developed to tackle this problem, their use of these external signals has been limited as they are not designed to consider multiple signals simultaneously. In this paper, we develop an end-to-end model MultiImport, which infers latent node importance from multiple, potentially overlap**, input signals. MultiImport is a latent variable model that captures the relation between node importance and input signals, and effectively learns from multiple signals with potential conflicts. Also, MultiImport provides an effective estimator based on attentive graph neural networks. We ran experiments on real-world KGs to show that MultiImport handles several challenges involved with inferring node importance from multiple input signals, and consistently outperforms existing methods, achieving up to 23.7% higher NDCG@100 than the state-of-the-art method.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Octet: Online Catalog Taxonomy Enrichment with Self-Supervision
Authors:
Yuning Mao,
Tong Zhao,
Andrey Kan,
Chenwei Zhang,
Xin Luna Dong,
Christos Faloutsos,
Jiawei Han
Abstract:
Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing…
▽ More
Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing incomplete taxonomies remains an open yet important research question. Taxonomy enrichment not only requires the robustness to deal with emerging terms but also the consistency between existing taxonomy structure and new term attachment. In this paper, we present a self-supervised end-to-end framework, Octet, for Online Catalog Taxonomy EnrichmenT. Octet leverages heterogeneous information unique to online catalog taxonomies such as user queries, items, and their relations to the taxonomy nodes while requiring no other supervision than the existing taxonomies. We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment. Extensive experiments in different online domains demonstrate the superiority of Octet over state-of-the-art methods via both automatic and human evaluations. Notably, Octet enriches an online catalog taxonomy in production to 2 times larger in the open-world evaluation.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Role of the forcing dimensionality in thin-layer turbulent energy cascades
Authors:
Basile Poujol,
Adrian van Kan,
Alexandros Alexakis
Abstract:
We investigate the transition from forward to inverse energy cascade in turbulent flows in thin layers, varying the functional form of the forcing and the thickness of the layer. We show that, as the forcing function becomes more three-dimensional, the inverse cascade is suppressed and the critical height hc, where the transition occurs, is decreased. We study the dependence of this critical heigh…
▽ More
We investigate the transition from forward to inverse energy cascade in turbulent flows in thin layers, varying the functional form of the forcing and the thickness of the layer. We show that, as the forcing function becomes more three-dimensional, the inverse cascade is suppressed and the critical height hc, where the transition occurs, is decreased. We study the dependence of this critical height on a parameter r which measures the dimensionality of the forcing and thus construct a phase space diagram in the parameter space r-h. We discuss the effect of Reynolds number and domain size.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Critical transition in fast-rotating turbulence within highly elongated domains
Authors:
Adrian van Kan,
Alexandros Alexakis
Abstract:
We study rapidly rotating turbulent flows in a highly elongated domain using an asymptotic expansion at simultaneously low Rossby number $Ro\ll 1$ and large domain height compared to the energy injection scale, $h=H/\ell_{in}\gg 1$. We solve the resulting equations using an extensive set of direct numerical simulations for different parameter regimes. As the parameter $λ= (h Ro)^{-1}$ is increased…
▽ More
We study rapidly rotating turbulent flows in a highly elongated domain using an asymptotic expansion at simultaneously low Rossby number $Ro\ll 1$ and large domain height compared to the energy injection scale, $h=H/\ell_{in}\gg 1$. We solve the resulting equations using an extensive set of direct numerical simulations for different parameter regimes. As the parameter $λ= (h Ro)^{-1}$ is increased beyond a threshold $λ_c$, a transition is observed from a state without an inverse energy cascade to a state with an inverse energy cascade. } For large Reynolds number and large horizontal box size, we provide evidence for criticality of the transition in terms of the large-scale energy dissipation rate.
△ Less
Submitted 28 May, 2020; v1 submitted 11 December, 2019;
originally announced December 2019.
-
Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks
Authors:
Namyong Park,
Andrey Kan,
Xin Luna Dong,
Tong Zhao,
Christos Faloutsos
Abstract:
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. Whi…
▽ More
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.
△ Less
Submitted 16 June, 2019; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Rare transitions to thin-layer turbulent condensates
Authors:
Adrian van Kan,
Takahiro Nemoto,
Alexandros Alexakis
Abstract:
Turbulent flows in a thin layer can develop an inverse energy cascade leading to spectral condensation of energy when the layer height is smaller than a certain threshold. These spectral condensates take the form of large-scale vortices in physical space. Recently, evidence for bistability was found in this system close to the critical height: depending on the initial conditions, the flow is eithe…
▽ More
Turbulent flows in a thin layer can develop an inverse energy cascade leading to spectral condensation of energy when the layer height is smaller than a certain threshold. These spectral condensates take the form of large-scale vortices in physical space. Recently, evidence for bistability was found in this system close to the critical height: depending on the initial conditions, the flow is either in a condensate state with most of the energy in the two-dimensional (2-D) large-scale modes, or in a three-dimensional (3-D) flow state with most of the energy in the small-scale modes. This bistable regime is characterised by the statistical properties of random and rare transitions between these two locally stable states. Here, we examine these statistical properties in thin-layer turbulent flows, where the energy is injected by either stochastic or deterministic forcing. To this end, by using a large number of direct numerical simulations (DNS), we measure the decay time $τ_d$ of the 2-D condensate to 3-D flow state and the build-up time $τ_b$ of the 2-D condensate. We show that both of these times $τ_d,τ_b$ follow an exponential distribution with mean values increasing faster than exponentially as the layer height approaches the threshold. We further show that the dynamics of large-scale kinetic energy may be modeled by a stochastic Langevin equation. From time-series analysis of DNS data, we determine the effective potential that shows two minima corresponding to the 2-D and 3-D states when the layer height is close to the threshold.
△ Less
Submitted 5 June, 2019; v1 submitted 13 March, 2019;
originally announced March 2019.
-
Condensates in thin-layer turbulence
Authors:
Adrian van Kan,
Alexandros Alexakis
Abstract:
We examine the steady state of turbulent flows in thin layers using direct numerical simulations. It is shown that when the layer thickness is smaller than a critical height, an inverse cascade arises which leads to the formation of a steady state condensate where most of the energy is concentrated in the largest scale of the system. For layers of thickness smaller than a second critical height, t…
▽ More
We examine the steady state of turbulent flows in thin layers using direct numerical simulations. It is shown that when the layer thickness is smaller than a critical height, an inverse cascade arises which leads to the formation of a steady state condensate where most of the energy is concentrated in the largest scale of the system. For layers of thickness smaller than a second critical height, the flow at steady state becomes exactly two-dimensional. The amplitude of the condensate is studied as a function of layer thickness and Reynolds number. Bi-stability and intermittent bursts are found close to the two critical points. The results are interpreted based on a mean field three-scale model that reproduces some of the basic features of the numerical results.
△ Less
Submitted 13 March, 2019; v1 submitted 1 August, 2018;
originally announced August 2018.
-
Compaction and condensation of DNA mediated by the C-terminal domain of Hfq
Authors:
Antoine Malabirade,
Kai Jiang,
Krzysztof Kubiak,
Alvaro Diaz-Mendoza,
Fan Liu,
Jeroen A. van Kan,
Jean-Franccois Berret,
Veronique Arluison,
Johan R. C. van der Maarel
Abstract:
Hfq is a bacterial protein that is involved in several aspects of nucleic acids metabolism. It has been described as one of the nucleoid associated proteins sha** the bacterial chromosome, although it is better known to influence translation and turnover of cellular RNAs. Here, we explore the role of Escherichia coli Hfq C-terminal domain in the compaction of double stranded DNA. Various experim…
▽ More
Hfq is a bacterial protein that is involved in several aspects of nucleic acids metabolism. It has been described as one of the nucleoid associated proteins sha** the bacterial chromosome, although it is better known to influence translation and turnover of cellular RNAs. Here, we explore the role of Escherichia coli Hfq C-terminal domain in the compaction of double stranded DNA. Various experimental methodologies, including fluorescence microscopy imaging of single DNA molecules confined inside nanofluidic channels, atomic force microscopy, isothermal titration microcalorimetry, and electrophoretic mobility assays have been used to follow the assembly of the C-terminal and N-terminal regions of Hfq on DNA. Results highlight the role of Hfq C-terminal arms in DNA binding, change in mechanical properties of the double helix and compaction of DNA into a condensed form. The propensity for bridging and compaction of DNA by the C-terminal domain might be related to aggregation of bound protein and may have implications for protein binding related gene regulation.
△ Less
Submitted 12 May, 2017;
originally announced May 2017.
-
Constrained basin stability for studying transient phenomena in dynamical systems
Authors:
Adrian van Kan,
Jannes Jegminat,
Jonathan Donges,
Jürgen Kurths
Abstract:
Transient dynamics are of large interest in many areas of science. Here, a generalization of basin stability (BS) is presented: constrained basin stability (CBS) that is sensitive to various different types of transients arising from finite size perturbations. CBS is applied to the paradigmatic Lorenz system for uncovering nonlinear precursory phenomena of a boundary crisis bifurcation. Further, C…
▽ More
Transient dynamics are of large interest in many areas of science. Here, a generalization of basin stability (BS) is presented: constrained basin stability (CBS) that is sensitive to various different types of transients arising from finite size perturbations. CBS is applied to the paradigmatic Lorenz system for uncovering nonlinear precursory phenomena of a boundary crisis bifurcation. Further, CBS is used in a model of the Earth's carbon cycle as a return time-dependent stability measure of the system's global attractor. Both case studies illustrate how CBS's sensitivity to transients complements BS in its function as an early warning signal and as a stability measure. CBS is broadly applicable in systems where transients matter, from physics and engineering to sustainability science. Thus, CBS complements stability analysis with BS as well as classical linear stability analysis and will be a useful tool for many applications.
△ Less
Submitted 19 March, 2016; v1 submitted 9 January, 2016;
originally announced January 2016.
-
A Time Decoupling Approach for Studying Forum Dynamics
Authors:
Andrey Kan,
Jeffrey Chan,
Conor Hayes,
Bernie Hogan,
James Bailey,
Christopher Leckie
Abstract:
Online forums are rich sources of information about user communication activity over time. Finding temporal patterns in online forum communication threads can advance our understanding of the dynamics of conversations. The main challenge of temporal analysis in this context is the complexity of forum data. There can be thousands of interacting users, who can be numerically described in many differ…
▽ More
Online forums are rich sources of information about user communication activity over time. Finding temporal patterns in online forum communication threads can advance our understanding of the dynamics of conversations. The main challenge of temporal analysis in this context is the complexity of forum data. There can be thousands of interacting users, who can be numerically described in many different ways. Moreover, user characteristics can evolve over time. We propose an approach that decouples temporal information about users into sequences of user events and inter-event times. We develop a new feature space to represent the event sequences as paths, and we model the distribution of the inter-event times. We study over 30,000 users across four Internet forums, and discover novel patterns in user communication. We find that users tend to exhibit consistency over time. Furthermore, in our feature space, we observe regions that represent unlikely user behaviors. Finally, we show how to derive a numerical representation for each forum, and we then use this representation to derive a novel clustering of multiple forums.
△ Less
Submitted 11 January, 2012;
originally announced January 2012.
-
Effects of electrostatic screening on the conformation of single DNA molecules confined in a nanochannel
Authors:
Ce Zhang,
Fang Zhang,
Jeroen A. van Kan,
Johan R. C. van der Maarel
Abstract:
Single T4-DNA molecules were confined in rectangular-shaped channels with a depth of 300 nm and a width in the range 150-300 nm casted in a poly(dimethylsiloxane) nanofluidic chip. The extensions of the DNA molecules were measured with fluorescence microscopy as a function of the ionic strength and composition of the buffer as well as the DNA intercalation level by the YOYO-1 dye. The data were…
▽ More
Single T4-DNA molecules were confined in rectangular-shaped channels with a depth of 300 nm and a width in the range 150-300 nm casted in a poly(dimethylsiloxane) nanofluidic chip. The extensions of the DNA molecules were measured with fluorescence microscopy as a function of the ionic strength and composition of the buffer as well as the DNA intercalation level by the YOYO-1 dye. The data were interpreted with scaling theory for a wormlike polymer in good solvent, including the effects of confinement, charge, and self-avoidance. It was found that the elongation of the DNA molecules with decreasing ionic strength can be interpreted in terms of an increase of the persistence length. Self-avoidance effects on the extension are moderate, due to the small correlation length imposed by the channel cross-sectional diameter. Intercalation of the dye results in an increase of the DNA contour length and a partial neutralization of the DNA charge, but besides effects of electrostatic origin it has no significant effect on the bare bending rigidity. In the presence of divalent cations, the DNA molecules were observed to contract, but they do not collapse into a condensed structure. It is proposed that this contraction results from a divalent counterion mediated attractive force between the segments of the DNA molecule.
△ Less
Submitted 6 May, 2008;
originally announced May 2008.