-
Computation of the Łojasiewicz exponents of real bivariate analytic functions
Authors:
Si Tiep Dinh,
Feng Guo,
Hong Duc Nguyen,
Tien Son Pham
Abstract:
The main goal of this paper is to present some explicit formulas for computing the {Ł}ojasiewicz exponent in the {Ł}ojasiewicz inequality comparing the rate of growth of two real bivariate analytic function germs.
The main goal of this paper is to present some explicit formulas for computing the {Ł}ojasiewicz exponent in the {Ł}ojasiewicz inequality comparing the rate of growth of two real bivariate analytic function germs.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Twin Auto-Encoder Model for Learning Separable Representation in Cyberattack Detection
Authors:
Phai Vu Dinh,
Quang Uy Nguyen,
Thai Hoang Dinh,
Diep N. Nguyen,
Bao Son Pham,
Eryk Dutkiewicz
Abstract:
Representation Learning (RL) plays a pivotal role in the success of many problems including cyberattack detection. Most of the RL methods for cyberattack detection are based on the latent vector of Auto-Encoder (AE) models. An AE transforms raw data into a new latent representation that better exposes the underlying characteristics of the input data. Thus, it is very useful for identifying cyberat…
▽ More
Representation Learning (RL) plays a pivotal role in the success of many problems including cyberattack detection. Most of the RL methods for cyberattack detection are based on the latent vector of Auto-Encoder (AE) models. An AE transforms raw data into a new latent representation that better exposes the underlying characteristics of the input data. Thus, it is very useful for identifying cyberattacks. However, due to the heterogeneity and sophistication of cyberattacks, the representation of AEs is often entangled/mixed resulting in the difficulty for downstream attack detection models. To tackle this problem, we propose a novel mod called Twin Auto-Encoder (TAE). TAE deterministically transforms the latent representation into a more distinguishable representation namely the \textit{separable representation} and the reconstructsuct the separable representation at the output. The output of TAE called the \textit{reconstruction representation} is input to downstream models to detect cyberattacks. We extensively evaluate the effectiveness of TAE using a wide range of bench-marking datasets. Experiment results show the superior accuracy of TAE over state-of-the-art RL models and well-known machine learning algorithms. Moreover, TAE also outperforms state-of-the-art models on some sophisticated and challenging attacks. We then investigate various characteristics of TAE to further demonstrate its superiority.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Comparative Analysis of Plastid Genomes Using Pangenome Research ToolKit (PGR-TK)
Authors:
Richa Jayanti,
Andrew Kim,
Sean Pham,
Athreya Raghavan,
Anish Sharma,
Manoj P. Samanta
Abstract:
Plastid genomes (plastomes) of angiosperms are of great interest among biologists. High-throughput sequencing is making many such genomes accessible, increasing the need for tools to perform rapid comparative analysis. This exploratory analysis investigates whether the Pangenome Research Tool Kit (PGR-TK) is suitable for analyzing plastomes. After determining the optimal parameters for this tool o…
▽ More
Plastid genomes (plastomes) of angiosperms are of great interest among biologists. High-throughput sequencing is making many such genomes accessible, increasing the need for tools to perform rapid comparative analysis. This exploratory analysis investigates whether the Pangenome Research Tool Kit (PGR-TK) is suitable for analyzing plastomes. After determining the optimal parameters for this tool on plastomes, we use it to compare sequences from each of the genera - Magnolia, Solanum, Fragaria and Cotoneaster, as well as a combined set from 20 rosid genera. PGR-TK recognizes large-scale plastome structures, such as the inverted repeats, among combined sequences from distant rosid families. If the plastid genomes are rotated to the same starting point, it also correctly groups different species from the same genus together in a generated cladogram. The visual approach of PGR-TK provides insights into genome evolution without requiring gene annotations.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Microscopic crystallographic analysis of dislocations in molecular crystals
Authors:
Sang T. Pham,
Natalia Koniuch,
Emily Wynne,
Andy Brown,
Sean M. Collins
Abstract:
Organic molecular crystals encompass a vast range of materials from pharmaceuticals to organic optoelectronics and proteins to waxes in biological and industrial settings. Crystal defects from grain boundaries to dislocations are known to play key roles in mechanisms of growth and also in the functional properties of molecular crystals. In contrast to the precise analysis of individual defects in…
▽ More
Organic molecular crystals encompass a vast range of materials from pharmaceuticals to organic optoelectronics and proteins to waxes in biological and industrial settings. Crystal defects from grain boundaries to dislocations are known to play key roles in mechanisms of growth and also in the functional properties of molecular crystals. In contrast to the precise analysis of individual defects in metals, ceramics, and inorganic semiconductors enabled by electron microscopy, significantly greater ambiguity remains in the experimental determination of individual dislocation character and slip systems in molecular materials. In large part, nanoscale dislocation analysis in molecular crystals has been hindered by the severely constrained electron exposures required to avoid irreversibly degrading these crystals. Here, we present a low-dose, single-exposure approach enabling nanometre-resolved analysis of individual extended dislocations in molecular crystals. We demonstrate the approach for a range of crystal types to reveal dislocation character and operative slip systems unambiguously.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Existence theorems for optimal solutions in semi-algebraic optimization
Authors:
Jae Hyoung Lee,
Gue Myung Lee,
Tien Son Pham
Abstract:
Consider the problem of minimizing a lower semi-continuous semi-algebraic function $f \colon \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\}$ on an unbounded closed semi-algebraic set $S \subset \mathbb{R}^n.$ Employing adequate tools of semi-algebraic geometry, we first establish some properties of the tangency variety of the restriction of $f$ on $S.$ Then we derive verifiable necessary and suffici…
▽ More
Consider the problem of minimizing a lower semi-continuous semi-algebraic function $f \colon \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\}$ on an unbounded closed semi-algebraic set $S \subset \mathbb{R}^n.$ Employing adequate tools of semi-algebraic geometry, we first establish some properties of the tangency variety of the restriction of $f$ on $S.$ Then we derive verifiable necessary and sufficient conditions for the existence of optimal solutions of the problem as well as the boundedness from below and coercivity of the restriction of $f$ on $S.$ We also present a computable formula for the optimal value of the problem.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Subdifferentials at infinity and applications in optimization
Authors:
Do Sang Kim,
Minh Tung Nguyen,
Tien Son Pham
Abstract:
In this work, the notions of normal cones at infinity to unbounded sets and limiting and singular subdifferentials at infinity for extended real value functions are introduced. Various calculus rules for these notions objects are established. A complete characterization of the Lipschitz continuity at infinity for lower semi-continuous functions is given. The obtained results are aimed ultimately a…
▽ More
In this work, the notions of normal cones at infinity to unbounded sets and limiting and singular subdifferentials at infinity for extended real value functions are introduced. Various calculus rules for these notions objects are established. A complete characterization of the Lipschitz continuity at infinity for lower semi-continuous functions is given. The obtained results are aimed ultimately at applications to diverse problems of optimization, such as optimality conditions, coercive properties, weak sharp minima and stability results.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Characterizations of directional openness for set-valued map**s
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
We provide necessary and sufficient conditions for a set-valued map** between finite dimensional spaces to be directionally open by relating this property with directional regularity, Hölder continuity of the inverse map**, coderivatives and variations. These generalize and refine some previously known results.
We provide necessary and sufficient conditions for a set-valued map** between finite dimensional spaces to be directionally open by relating this property with directional regularity, Hölder continuity of the inverse map**, coderivatives and variations. These generalize and refine some previously known results.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Limits of real bivariate rational functions
Authors:
Si Tiep Dinh,
Feng Guo,
Hong Duc Nguyen,
Tien Son Pham
Abstract:
Given two nonzero polynomials $f, g \in\mathbb R[x,y]$ and a point $(a, b) \in \mathbb{R}^2,$ we give some necessary and sufficient conditions for the existence of the limit $\displaystyle \lim_{(x, y) \to (a, b)} \frac{f(x, y)}{g(x, y)}.$ We also show that, if the denominator $g$ has an isolated zero at the given point $(a, b),$ then the set of possible limits of…
▽ More
Given two nonzero polynomials $f, g \in\mathbb R[x,y]$ and a point $(a, b) \in \mathbb{R}^2,$ we give some necessary and sufficient conditions for the existence of the limit $\displaystyle \lim_{(x, y) \to (a, b)} \frac{f(x, y)}{g(x, y)}.$ We also show that, if the denominator $g$ has an isolated zero at the given point $(a, b),$ then the set of possible limits of $\displaystyle \lim_{(x, y) \to (a, b)} \frac{f(x, y)}{g(x, y)}$ is a closed interval in $\overline{\mathbb{R}}$ and can be explicitly determined. As an application, we propose an effective algorithm to verify the existence of the limit and compute the limit (if it exists). Our approach is geometric and is based on Puiseux expansions.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
On definable open continuous map**s
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
For a definable continuous map** $f$ from a definable connected open subset $Ω$ of $\mathbb R^n$ into $\mathbb R^n,$ we show that the following statements are equivalent:
(i) The map** $f$ is open.
(ii) The fibers of $f$ are finite and the Jacobian of $f$ does not change sign on the set of points at which $f$ is differentiable.
(iii) The fibers of ${f}$ are finite and the set of points a…
▽ More
For a definable continuous map** $f$ from a definable connected open subset $Ω$ of $\mathbb R^n$ into $\mathbb R^n,$ we show that the following statements are equivalent:
(i) The map** $f$ is open.
(ii) The fibers of $f$ are finite and the Jacobian of $f$ does not change sign on the set of points at which $f$ is differentiable.
(iii) The fibers of ${f}$ are finite and the set of points at which $f$ is not a local homeomorphism has dimension at most $n - 2.$
As an application, we prove that Whyburn's conjecture is true for definable map**s: A definable open continuous map** of one closed ball into another which maps boundary homeomorphically onto boundary is necessarily a homeomorphism.
△ Less
Submitted 7 July, 2021; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Some classical analysis results for continuous definable map**s
Authors:
Xuan Duc Ha Truong,
Tien Son Pham
Abstract:
In this paper, we show that some fundamental results for smooth map**s (e.g., the Brouwer degree formula, the implicit function and inverse function theorems, the mean value theorem, Sard's theorem, Hadamard's global invertibility criteria, Pourciau's surjectivity and openness results) have natural extensions for continuous map**s that are definable in o-minimal structures. The arguments rely…
▽ More
In this paper, we show that some fundamental results for smooth map**s (e.g., the Brouwer degree formula, the implicit function and inverse function theorems, the mean value theorem, Sard's theorem, Hadamard's global invertibility criteria, Pourciau's surjectivity and openness results) have natural extensions for continuous map**s that are definable in o-minimal structures. The arguments rely on nice properties of definable map**s and sets.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Some variational properties of tangent directions at infinity of real algebraic sets
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
In this paper, we relate the set of asymptotic critical values of a polynomial function $f$ with the set of discontinuity of two functions, the multivalued function which associate to each value $t$ the set of tangent directions at infinity of the fiber $f^{-1}(t)$ and the composition of the $(n-2)$-dimensional volume function with the first one. This gives necessary conditions of equisingularity…
▽ More
In this paper, we relate the set of asymptotic critical values of a polynomial function $f$ with the set of discontinuity of two functions, the multivalued function which associate to each value $t$ the set of tangent directions at infinity of the fiber $f^{-1}(t)$ and the composition of the $(n-2)$-dimensional volume function with the first one. This gives necessary conditions of equisingularity at infinity for the family of the fibers of a real polynomial function.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Nichtnegativstellensätze for definable functions in o-minimal structures
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
This paper addresses to Nichtnegativstellensätze for definable functions in o-minimal structures on $(\mathbb{R}, +, \cdot).$ Namely, let $f, g_1, \ldots, g_l \colon \mathbb{R}^n \to \mathbb{R}$ be definable $C^p$-functions ($p \ge 2$) and assume that $f$ is non-negative on $S := \{x \in \mathbb{R}^n \ | \ g_1(x) \ge 0, \ldots, g_l(x) \ge 0 \}.$ Under some natural hypotheses on zeros of $f$ in…
▽ More
This paper addresses to Nichtnegativstellensätze for definable functions in o-minimal structures on $(\mathbb{R}, +, \cdot).$ Namely, let $f, g_1, \ldots, g_l \colon \mathbb{R}^n \to \mathbb{R}$ be definable $C^p$-functions ($p \ge 2$) and assume that $f$ is non-negative on $S := \{x \in \mathbb{R}^n \ | \ g_1(x) \ge 0, \ldots, g_l(x) \ge 0 \}.$ Under some natural hypotheses on zeros of $f$ in $S,$ we show that $f$ is expressible in the form $f = φ_0 + \sum_{i = 1}^l φ_i g_i,$ where each $φ_i$ is a sum of squares of definable $C^{p - 2}$-functions. As a consequence, we derive global optimality conditions which generalize the Karush--Kuhn--Tucker optimality conditions for nonlinear optimization.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
The mountain pass theorem in terms of tangencies
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
This paper addresses the Mountain Pass Theorem for locally Lipschitz functions on finite-dimensional vector spaces in terms of tangencies. Namely, let $f \colon \mathbb R^n \to \mathbb R$ be a locally Lipschitz function with a mountain pass geometry. Let $$c := \inf_{γ\in \mathcal A}\max_{t\in[0,1]}f(γ(t)),$$ where $\mathcal{A}$ is the set of all continuous paths joining $x^*$ to $y^*.$ We show th…
▽ More
This paper addresses the Mountain Pass Theorem for locally Lipschitz functions on finite-dimensional vector spaces in terms of tangencies. Namely, let $f \colon \mathbb R^n \to \mathbb R$ be a locally Lipschitz function with a mountain pass geometry. Let $$c := \inf_{γ\in \mathcal A}\max_{t\in[0,1]}f(γ(t)),$$ where $\mathcal{A}$ is the set of all continuous paths joining $x^*$ to $y^*.$ We show that either $c$ is a critical value of $f$ or $c$ is a tangency value at infinity of $f.$ This reduces to the Mountain Pass Theorem of Ambrosetti and Rabinowitz in the case where the function $f$ is definable (such as, semi-algebraic) in an o-minimal structure.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Stability of closedness of semi-algebraic sets under continuous semi-algebraic map**s
Authors:
Si Tiep Dinh,
Zbigniew Jelonek,
Tien Son Pham
Abstract:
Given a closed semi-algebraic set $X \subset \mathbb{R}^n$ and a continuous semi-algebraic map** $G \colon X \to \mathbb{R}^m,$ it will be shown that there exists an open dense semi-algebraic subset $\mathscr{U}$ of $L(\mathbb{R}^n, \mathbb{R}^m),$ the space of all linear map**s from $\mathbb{R}^n$ to $\mathbb{R}^m,$ such that for all $F \in \mathscr{U},$ the image $(F + G)(X)$ is a closed (se…
▽ More
Given a closed semi-algebraic set $X \subset \mathbb{R}^n$ and a continuous semi-algebraic map** $G \colon X \to \mathbb{R}^m,$ it will be shown that there exists an open dense semi-algebraic subset $\mathscr{U}$ of $L(\mathbb{R}^n, \mathbb{R}^m),$ the space of all linear map**s from $\mathbb{R}^n$ to $\mathbb{R}^m,$ such that for all $F \in \mathscr{U},$ the image $(F + G)(X)$ is a closed (semi-algebraic) set in $\mathbb{R}^m.$ To do this, we study the tangent cone at infinity $C_\infty X$ and the set $E_\infty X \subset C_\infty X$ of (unit) exceptional directions at infinity of $X.$ Specifically we show that the set $E_\infty X$ is nowhere dense in $C_\infty X \cap \mathbb{S}^{n - 1}.$
△ Less
Submitted 2 April, 2021; v1 submitted 10 October, 2020;
originally announced October 2020.
-
Limits of tangent spaces to definable sets
Authors:
Si Tiep Dinh,
Olivier Le Gal,
Tien Son Pham
Abstract:
We study the set of tangent limits at a given point to a set definable in any o-minimal structure by characterizing the set of exceptional rays in the tangent cone to the set at that point and investigating the set of tangent limits along these rays. Several criteria for determining exceptional rays will be given. The main results of the paper generalize, to the o-minimal setting and to arbitrary…
▽ More
We study the set of tangent limits at a given point to a set definable in any o-minimal structure by characterizing the set of exceptional rays in the tangent cone to the set at that point and investigating the set of tangent limits along these rays. Several criteria for determining exceptional rays will be given. The main results of the paper generalize, to the o-minimal setting and to arbitrary dimension, the main results of O'Shea--Wilson which deals with algebraic surfaces in $\mathbb R^3$.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Learning Neural Textual Representations for Citation Recommendation
Authors:
Binh Thanh Kieu,
Inigo Jauregi Unanue,
Son Bao Pham,
Hieu Xuan Phan,
Massimo Piccardi
Abstract:
With the rapid growth of the scientific literature, manually selecting appropriate citations for a paper is becoming increasingly challenging and time-consuming. While several approaches for automated citation recommendation have been proposed in the recent years, effective document representations for citation recommendation are still elusive to a large extent. For this reason, in this paper we p…
▽ More
With the rapid growth of the scientific literature, manually selecting appropriate citations for a paper is becoming increasingly challenging and time-consuming. While several approaches for automated citation recommendation have been proposed in the recent years, effective document representations for citation recommendation are still elusive to a large extent. For this reason, in this paper we propose a novel approach to citation recommendation which leverages a deep sequential representation of the documents (Sentence-BERT) cascaded with Siamese and triplet networks in a submodular scoring function. To the best of our knowledge, this is the first approach to combine deep representations and submodular selection for a task of citation recommendation. Experiments have been carried out using a popular benchmark dataset - the ACL Anthology Network corpus - and evaluated against baselines and a state-of-the-art approach using metrics such as the MRR and F1-at-k score. The results show that the proposed approach has been able to outperform all the compared approaches in every measured metric.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Stability of closedness of closed convex sets under linear map**s
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
We study the problem of when the continuous linear image of a fixed closed convex set $X \subset\mathbb{R}^n$ is closed. Specifically, we improve the main results in the papers \cite{Borwein2009, Borwein2010} by showing that for all, except for at most a $σ$-porous set, of the linear map**s $T$ from $\mathbb{R}^n$ into $\mathbb{R}^m,$ not only $T(X)$ is closed, but there is also an open neighbor…
▽ More
We study the problem of when the continuous linear image of a fixed closed convex set $X \subset\mathbb{R}^n$ is closed. Specifically, we improve the main results in the papers \cite{Borwein2009, Borwein2010} by showing that for all, except for at most a $σ$-porous set, of the linear map**s $T$ from $\mathbb{R}^n$ into $\mathbb{R}^m,$ not only $T(X)$ is closed, but there is also an open neighborhood of $T$ whose members also preserve the closedness of $X.$
△ Less
Submitted 2 April, 2021; v1 submitted 22 January, 2020;
originally announced January 2020.
-
A Vietnamese Question Answering System
Authors:
Dai Quoc Nguyen,
Dat Quoc Nguyen,
Son Bao Pham
Abstract:
Question answering systems aim to produce exact answers to users' questions instead of a list of related documents as used by current search engines. In this paper, we propose an ontology-based Vietnamese question answering system that allows users to express their questions in natural language. To the best of our knowledge, this is the first attempt to enable users to query an ontological knowled…
▽ More
Question answering systems aim to produce exact answers to users' questions instead of a list of related documents as used by current search engines. In this paper, we propose an ontology-based Vietnamese question answering system that allows users to express their questions in natural language. To the best of our knowledge, this is the first attempt to enable users to query an ontological knowledge base using Vietnamese natural language. Experiments of our system on an organizational ontology show promising results.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
A Vietnamese Text-Based Conversational Agent
Authors:
Dai Quoc Nguyen,
Dat Quoc Nguyen,
Son Bao Pham
Abstract:
This paper introduces a Vietnamese text-based conversational agent architecture on specific knowledge domain which is integrated in a question answering system. When the question answering system fails to provide answers to users' input, our conversational agent can step in to interact with users to provide answers to users. Experimental results are promising where our Vietnamese text-based conver…
▽ More
This paper introduces a Vietnamese text-based conversational agent architecture on specific knowledge domain which is integrated in a question answering system. When the question answering system fails to provide answers to users' input, our conversational agent can step in to interact with users to provide answers to users. Experimental results are promising where our Vietnamese text-based conversational agent achieves positive feedback in a study conducted in the university academic regulation domain.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
A Fast Template-based Approach to Automatically Identify Primary Text Content of a Web Page
Authors:
Dat Quoc Nguyen,
Dai Quoc Nguyen,
Son Bao Pham,
The Duy Bui
Abstract:
Search engines have become an indispensable tool for browsing information on the Internet. The user, however, is often annoyed by redundant results from irrelevant Web pages. One reason is because search engines also look at non-informative blocks of Web pages such as advertisement, navigation links, etc. In this paper, we propose a fast algorithm called FastContentExtractor to automatically detec…
▽ More
Search engines have become an indispensable tool for browsing information on the Internet. The user, however, is often annoyed by redundant results from irrelevant Web pages. One reason is because search engines also look at non-informative blocks of Web pages such as advertisement, navigation links, etc. In this paper, we propose a fast algorithm called FastContentExtractor to automatically detect main content blocks in a Web page by improving the ContentExtractor algorithm. By automatically identifying and storing templates representing the structure of content blocks in a website, content blocks of a new Web page from the Website can be extracted quickly. The hierarchical order of the output blocks is also maintained which guarantees that the extracted content blocks are in the same order as the original ones.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Statistically Significant Discriminative Patterns Searching
Authors:
Hoang Son Pham,
Gwendal Virlet,
Dominique Lavenier,
Alexandre Termier
Abstract:
Discriminative pattern mining is an essential task of data mining. This task aims to discover patterns which occur more frequently in a class than other classes in a class-labeled dataset. This type of patterns is valuable in various domains such as bioinformatics, data classification. In this paper, we propose a novel algorithm, named SSDPS, to discover patterns in two-class datasets. The SSDPS a…
▽ More
Discriminative pattern mining is an essential task of data mining. This task aims to discover patterns which occur more frequently in a class than other classes in a class-labeled dataset. This type of patterns is valuable in various domains such as bioinformatics, data classification. In this paper, we propose a novel algorithm, named SSDPS, to discover patterns in two-class datasets. The SSDPS algorithm owes its efficiency to an original enumeration strategy of the patterns, which allows to exploit some degrees of anti-monotonicity on the measures of discriminance and statistical significance. Experimental results demonstrate that the performance of the SSDPS algorithm is better than others. In addition, the number of generated patterns is much less than the number of other algorithms. Experiment on real data also shows that SSDPS efficiently detects multiple SNPs combinations in genetic data.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Autocovariance Varieties of Moving Average Random Fields
Authors:
Carlos Améndola,
Viet Son Pham
Abstract:
We study the autocovariance functions of moving average random fields over the integer lattice $\mathbb{Z}^d$ from an algebraic perspective. These autocovariances are parametrized polynomially by the moving average coefficients, hence tracing out algebraic varieties. We derive dimension and degree of these varieties and we use their algebraic properties to obtain statistical consequences such as i…
▽ More
We study the autocovariance functions of moving average random fields over the integer lattice $\mathbb{Z}^d$ from an algebraic perspective. These autocovariances are parametrized polynomially by the moving average coefficients, hence tracing out algebraic varieties. We derive dimension and degree of these varieties and we use their algebraic properties to obtain statistical consequences such as identifiability of model parameters. We connect the problem of parameter estimation to the algebraic invariants known as euclidean distance degree and maximum likelihood degree. Throughout, we illustrate the results with concrete examples. In our computations we use tools from commutative algebra and numerical algebraic geometry.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
Estimation of causal CARMA random fields
Authors:
Claudia Klüppelberg,
Viet Son Pham
Abstract:
We estimate model parameters of Lévy-driven causal CARMA random fields by fitting the empirical variogram to the theoretical counterpart using a weighted least squares (WLS) approach. Subsequent to deriving asymptotic results for the variogram estimator, we show strong consistency and asymptotic normality of the parameter estimator. Furthermore, we conduct a simulation study to assess the quality…
▽ More
We estimate model parameters of Lévy-driven causal CARMA random fields by fitting the empirical variogram to the theoretical counterpart using a weighted least squares (WLS) approach. Subsequent to deriving asymptotic results for the variogram estimator, we show strong consistency and asymptotic normality of the parameter estimator. Furthermore, we conduct a simulation study to assess the quality of the WLS estimator for finite samples. For the simulation we utilize numerical approximation schemes based on truncation and discretization of stochastic integrals and we analyze the associated simulation errors in detail. Finally, we apply our results to real data of the cosmic microwave background.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Global mixed Łojasiewicz inequalities and asymptotic critical values
Authors:
Si Tiep Dinh,
Krzysztof Kurdyka,
Tien Son Pham
Abstract:
In this paper, we prove a version of global Łojasiewicz inequality for $C^1$ semialgebraic functions and relate its existence to the set of asymptotic critical values.
In this paper, we prove a version of global Łojasiewicz inequality for $C^1$ semialgebraic functions and relate its existence to the set of asymptotic critical values.
△ Less
Submitted 17 November, 2018;
originally announced November 2018.
-
Solving the Steiner Tree Problem in graphs with Variable Neighborhood Descent
Authors:
Matthieu De Laere,
San Tu Pham,
Patrick De Causmaecker
Abstract:
The Steiner Tree Problem (STP) in graphs is an important problem with various applications in many areas such as design of integrated circuits, evolution theory, networking, etc. In this paper, we propose an algorithm to solve the STP. The algorithm includes a reducer and a solver using Variable Neighborhood Descent (VND), interacting with each other during the search. New constructive heuristics…
▽ More
The Steiner Tree Problem (STP) in graphs is an important problem with various applications in many areas such as design of integrated circuits, evolution theory, networking, etc. In this paper, we propose an algorithm to solve the STP. The algorithm includes a reducer and a solver using Variable Neighborhood Descent (VND), interacting with each other during the search. New constructive heuristics and a vertex score system for intensification purpose are proposed. The algorithm is tested on a set of benchmarks which shows encouraging results.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Lévy-driven causal CARMA random fields
Authors:
Viet Son Pham
Abstract:
We introduce Lévy-driven causal CARMA random fields on $\mathbb{R}^d$, extending the class of CARMA processes. The definition is based on a system of stochastic partial differential equations which generalize the classical state-space representation of CARMA processes. The resulting CARMA model differs fundamentally from the isotropic CARMA random field of Brockwell and Matsuda. We show existence…
▽ More
We introduce Lévy-driven causal CARMA random fields on $\mathbb{R}^d$, extending the class of CARMA processes. The definition is based on a system of stochastic partial differential equations which generalize the classical state-space representation of CARMA processes. The resulting CARMA model differs fundamentally from the isotropic CARMA random field of Brockwell and Matsuda. We show existence of the model under mild assumptions and examine some of its features including the second-order structure and path properties. In particular, we investigate the sampling behavior and formulate conditions for the causal CARMA random field to be an ARMA random field when sampled on an equidistant lattice.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
Volume estimates of sublevel sets of real polynomials
Authors:
Nguyen Quang Dieu,
Dau Hoang Hung,
Tien Son Pham,
Hoang Thieu Anh
Abstract:
We give upper bounds for volume of sublevel sets of real polynomials. Our method is to combine a version of global Lojasiewicz inequality with some well known estimate on volume of tubes around real algebraic sets. Some applications to oscillatory integrals and integration indices of real polynomial are also given.
We give upper bounds for volume of sublevel sets of real polynomials. Our method is to combine a version of global Lojasiewicz inequality with some well known estimate on volume of tubes around real algebraic sets. Some applications to oscillatory integrals and integration indices of real polynomial are also given.
△ Less
Submitted 17 April, 2018; v1 submitted 13 November, 2017;
originally announced November 2017.
-
Dialogue Act Segmentation for Vietnamese Human-Human Conversational Texts
Authors:
Thi Lan Ngo,
Khac Linh Pham,
Minh Son Cao,
Son Bao Pham,
Xuan Hieu Phan
Abstract:
Dialog act identification plays an important role in understanding conversations. It has been widely applied in many fields such as dialogue systems, automatic machine translation, automatic speech recognition, and especially useful in systems with human-computer natural language dialogue interfaces such as virtual assistants and chatbots. The first step of identifying dialog act is identifying th…
▽ More
Dialog act identification plays an important role in understanding conversations. It has been widely applied in many fields such as dialogue systems, automatic machine translation, automatic speech recognition, and especially useful in systems with human-computer natural language dialogue interfaces such as virtual assistants and chatbots. The first step of identifying dialog act is identifying the boundary of the dialog act in utterances. In this paper, we focus on segmenting the utterance according to the dialog act boundaries, i.e. functional segments identification, for Vietnamese utterances. We investigate carefully functional segment identification in two approaches: (1) machine learning approach using maximum entropy (ME) and conditional random fields (CRFs); (2) deep learning approach using bidirectional Long Short-Term Memory (LSTM) with a CRF layer (Bi-LSTM-CRF) on two different conversational datasets: (1) Facebook messages (Message data); (2) transcription from phone conversations (Phone data). To the best of our knowledge, this is the first work that applies deep learning based approach to dialog act segmentation. As the results show, deep learning approach performs appreciably better as to compare with traditional machine learning approaches. Moreover, it is also the first study that tackles dialog act and functional segment identification for Vietnamese.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.
-
The Intermittent Traveling Salesman Problem with Different Temperature Profiles: Greedy or not?
Authors:
Pieter Leyman,
San Tu Pham,
Patrick De Causmaecker
Abstract:
In this research, we discuss the intermittent traveling salesman problem (ITSP), which extends the traditional traveling salesman problem (TSP) by imposing temperature restrictions on each node. These additional constraints limit the maximum allowable visit time per node, and result in multiple visits for each node which cannot be serviced in a single visit. We discuss three different temperature…
▽ More
In this research, we discuss the intermittent traveling salesman problem (ITSP), which extends the traditional traveling salesman problem (TSP) by imposing temperature restrictions on each node. These additional constraints limit the maximum allowable visit time per node, and result in multiple visits for each node which cannot be serviced in a single visit. We discuss three different temperature increase and decrease functions, namely a linear, a quadratic and an exponential function. To solve the problem, we consider three different solution representations as part of a metaheuristic approach. We argue that in case of similar temperature increase and decrease profiles, it is always beneficial to apply a greedy approach, i.e. to process as much as possible given the current node temperature.
△ Less
Submitted 30 January, 2017;
originally announced January 2017.
-
Volterra-type Ornstein-Uhlenbeck processes in space and time
Authors:
Viet Son Pham,
Carsten Chong
Abstract:
We propose a novel class of tempo-spatial Ornstein-Uhlenbeck processes as solutions to Lévy-driven Volterra equations with additive noise and multiplicative drift. After formulating conditions for the existence and uniqueness of solutions, we derive an explicit solution formula and discuss distributional properties such as stationarity, second-order structure and short versus long memory. Furtherm…
▽ More
We propose a novel class of tempo-spatial Ornstein-Uhlenbeck processes as solutions to Lévy-driven Volterra equations with additive noise and multiplicative drift. After formulating conditions for the existence and uniqueness of solutions, we derive an explicit solution formula and discuss distributional properties such as stationarity, second-order structure and short versus long memory. Furthermore, we analyze in detail the path properties of the solution process. In particular, we introduce different notions of càdlàg paths in space and time and establish conditions for the existence of versions with these regularity properties. The theoretical results are accompanied by illustrative examples.
△ Less
Submitted 6 November, 2017; v1 submitted 22 September, 2016;
originally announced September 2016.
-
A MIP Backend for the IDP System
Authors:
San Pham,
Jo Devriendt,
Maurice Bruynooghe,
Patrick De Causmaecker
Abstract:
The IDP knowledge base system currently uses MiniSAT(ID) as its backend Constraint Programming (CP) solver. A few similar systems have used a Mixed Integer Programming (MIP) solver as backend. However, so far little is known about when the MIP solver is preferable. This paper explores this question. It describes the use of CPLEX as a backend for IDP and reports on experiments comparing both backen…
▽ More
The IDP knowledge base system currently uses MiniSAT(ID) as its backend Constraint Programming (CP) solver. A few similar systems have used a Mixed Integer Programming (MIP) solver as backend. However, so far little is known about when the MIP solver is preferable. This paper explores this question. It describes the use of CPLEX as a backend for IDP and reports on experiments comparing both backends.
△ Less
Submitted 2 September, 2016;
originally announced September 2016.
-
The bifurcation set of a real polynomial function of two variables and Newton polygons of singularities at infinity
Authors:
Masaharu Ishikawa,
Tat Thang Nguyen,
Tien Son Pham
Abstract:
In this paper, we determine the bifurcation set of a real polynomial function of two variables for non-degenerate case in the sense of Newton polygons by using a toric compactification. We also count the number of singular phenomena at infinity, called "cleaving" and "vanishing" in the same setting. Finally, we give an upper bound of the number of elements in the bifurcation set in terms of its Ne…
▽ More
In this paper, we determine the bifurcation set of a real polynomial function of two variables for non-degenerate case in the sense of Newton polygons by using a toric compactification. We also count the number of singular phenomena at infinity, called "cleaving" and "vanishing" in the same setting. Finally, we give an upper bound of the number of elements in the bifurcation set in terms of its Newton polygon. To obtain the upper bound, we apply toric modifications to the singularities at infinity successively.
△ Less
Submitted 8 August, 2016;
originally announced August 2016.
-
Łojasiewicz inequalities with explicit exponent for smallest singular value functions
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
Let $F(x) := (f_{ij}(x))_{i=1,\ldots,p; j=1,\ldots,q},$ be a ($p\times q$)-real polynomial matrix and let $f(x)$ be the smallest singular value function of $F(x).$ In this paper, we first give the following {\em nonsmooth} version of Łojasiewicz gradient inequality for the function $f$ with an explicit exponent: {\em For any $\bar x\in \Bbb R^n$, there exist $c > 0$ and $ε> 0$ such that we have fo…
▽ More
Let $F(x) := (f_{ij}(x))_{i=1,\ldots,p; j=1,\ldots,q},$ be a ($p\times q$)-real polynomial matrix and let $f(x)$ be the smallest singular value function of $F(x).$ In this paper, we first give the following {\em nonsmooth} version of Łojasiewicz gradient inequality for the function $f$ with an explicit exponent: {\em For any $\bar x\in \Bbb R^n$, there exist $c > 0$ and $ε> 0$ such that we have for all $\|x - \bar{x}\| < ε,$ \begin{equation*} \inf \{ \| w \| \ : \ w \in {\partial} f(x) \} \ \ge \ c\, |f(x)-f(\bar x)|^{1 - \frac{2}{\mathscr R(n+p,2d+2)}}, \end{equation*} where ${\partial} f(x)$ is the limiting subdifferential of $f$ at $x$, $d:=\max_{i=1,\ldots,p; j=1,\ldots,q}°f_{i j}$ and $\mathscr R(n, d) := d(3d - 3)^{n-1}$ if $d \ge 2$ and $\mathscr R(n, d) := 1$ if $d = 1.$} Then we establish some versions of Łojasiewicz inequality for the distance function with explicit exponents, locally and globally, for the smallest singular value function $f(x)$ of the matrix $F(x)$.
△ Less
Submitted 11 April, 2016;
originally announced April 2016.
-
TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes
Authors:
Ilia Minkin,
Son Pham,
Paul Medvedev
Abstract:
Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results: In this paper, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construc…
▽ More
Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results: In this paper, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construction of the compacted de Bruijn graph from a set of complete genomes. We demonstrate that it can construct the graph for 100 simulated human genomes in less then a day and eight real primates in less than two hours, on a typical shared-memory machine. We believe that this progress will enable novel biological analyses of hundreds of mammalian-sized genomes. Availability: Our code and data is available for download from github.com/medvedevgroup/TwoPaCo Contact: [email protected]
△ Less
Submitted 18 February, 2016;
originally announced February 2016.
-
Error Bounds for Parametric Polynomial Systems with Applications to Higher-Order Stability Analysis and Convergence Rates
Authors:
G. Li,
B. S. Mordukhovich,
T. T. A. Nghia,
T. S. Pham
Abstract:
The paper addresses parametric inequality systems described by polynomial functions in finite dimensions, where state-dependent infinite parameter sets are given by finitely many polynomial inequalities and equalities. Such systems can be viewed, in particular, as solution sets to problems of generalized semi-infinite programming with polynomial data. Exploiting the imposed polynomial structure to…
▽ More
The paper addresses parametric inequality systems described by polynomial functions in finite dimensions, where state-dependent infinite parameter sets are given by finitely many polynomial inequalities and equalities. Such systems can be viewed, in particular, as solution sets to problems of generalized semi-infinite programming with polynomial data. Exploiting the imposed polynomial structure together with powerful tools of variational analysis and semialgebraic geometry, we establish a far-going extension of the Łojasiewicz gradient inequality to the general nonsmooth class of supremum marginal functions as well as higher-order (Hölder type) local error bounds results with explicitly calculated exponents. The obtained results are applied to higher-order quantitative stability analysis for various classes of optimization problems including generalized semi-infinite programming with polynomial data, optimization of real polynomials under polynomial matrix inequality constraints, and polynomial second-order cone programming. Other applications provide explicit convergence rate estimates for the cyclic projection algorithm to find common points of convex sets described by matrix polynomial inequalities and for the asymptotic convergence of trajectories of subgradient dynamical systems in semialgebraic settings.
△ Less
Submitted 12 September, 2015;
originally announced September 2015.
-
Semidefinite approximations of the polynomial abscissa
Authors:
Roxana Heß,
Didier Henrion,
Jean-Bernard Lasserre,
Tien Son Pham
Abstract:
Given a univariate polynomial, its abscissa is the maximum real part of its roots. The abscissa arises naturally when controlling linear differential equations. As a function of the polynomial coefficients, the abscissa is H{ö}lder continuous, and not locally Lipschitz in general, which is a source of numerical difficulties for designing and optimizing control laws. In this paper we propose simple…
▽ More
Given a univariate polynomial, its abscissa is the maximum real part of its roots. The abscissa arises naturally when controlling linear differential equations. As a function of the polynomial coefficients, the abscissa is H{ö}lder continuous, and not locally Lipschitz in general, which is a source of numerical difficulties for designing and optimizing control laws. In this paper we propose simple approximations of the abscissa given by polynomials of fixed degree, and hence controlled complexity. Our approximations are computed by a hierarchy of finite-dimensional convex semidefinite programming problems. When their degree tends to infinity, the polynomial approximations converge in norm to the abcissa, either from above or from below.
△ Less
Submitted 30 July, 2015;
originally announced July 2015.
-
Outgassing History and Escape of the Martian Atmosphere and Water Inventory
Authors:
H. Lammer,
E. Chassefière,
Ö. Karatekin,
A. Morschhauser,
P. B. Niles,
O. Mousis,
P. Odert,
U. V. Möstl,
D. Breuer,
V. Dehant,
M. Grott,
H. Gröller,
E. Hauber,
L. B. S. Pham
Abstract:
The evolution and escape of the martian atmosphere and the planet's water inventory can be separated into an early and late evolutionary epoch. The first epoch started from the planet's origin and lasted $\sim$500 Myr. Because of the high EUV flux of the young Sun and Mars' low gravity it was accompanied by hydrodynamic blow-off of hydrogen and strong thermal escape rates of dragged heavier specie…
▽ More
The evolution and escape of the martian atmosphere and the planet's water inventory can be separated into an early and late evolutionary epoch. The first epoch started from the planet's origin and lasted $\sim$500 Myr. Because of the high EUV flux of the young Sun and Mars' low gravity it was accompanied by hydrodynamic blow-off of hydrogen and strong thermal escape rates of dragged heavier species such as O and C atoms. After the main part of the protoatmosphere was lost, impact-related volatiles and mantle outgassing may have resulted in accumulation of a secondary CO$_2$ atmosphere of a few tens to a few hundred mbar around $\sim$4--4.3 Gyr ago. The evolution of the atmospheric surface pressure and water inventory of such a secondary atmosphere during the second epoch which lasted from the end of the Noachian until today was most likely determined by a complex interplay of various nonthermal atmospheric escape processes, impacts, carbonate precipitation, and serpentinization during the Hesperian and Amazonian epochs which led to the present day surface pressure.
△ Less
Submitted 22 June, 2015;
originally announced June 2015.
-
Convergent Semidefinite Programming Relaxations for Global Bilevel Polynomial Optimization Problems
Authors:
V. Jeyakumar,
J. B. Lasserre,
G. Li,
T. S. Pham
Abstract:
In this paper, we consider a bilevel polynomial optimization problem where the objective and the constraint functions of both the upper and the lower level problems are polynomials. We present methods for finding its global minimizers and global minimum using a sequence of semidefinite programming (SDP) relaxations and provide convergence results for the methods. Our scheme for problems with a con…
▽ More
In this paper, we consider a bilevel polynomial optimization problem where the objective and the constraint functions of both the upper and the lower level problems are polynomials. We present methods for finding its global minimizers and global minimum using a sequence of semidefinite programming (SDP) relaxations and provide convergence results for the methods. Our scheme for problems with a convex lower-level problem involves solving a transformed equivalent single-level problem by a sequence of SDP relaxations; whereas our approach for general problems involving a non-convex polynomial lower-level problem solves a sequence of approximation problems via another sequence of SDP relaxations.
△ Less
Submitted 13 January, 2016; v1 submitted 5 June, 2015;
originally announced June 2015.
-
Łojasiewicz-type inequalities with explicit exponents for the largest eigenvalue function of real symmetric polynomial matrices
Authors:
Si Tiep Dinh,
Tien Son Pham
Abstract:
Let $F(x) := (f_{ij}(x))_{i,j=1,\ldots,p},$ be a real symmetric polynomial matrix of order $p$ and let $f(x)$ be the largest eigenvalue function of the matrix $F(x).$ We denote by ${\partial}^\circ f(x)$ the Clarke subdifferential of $f$ at $x.$ In this paper, we first give the following {\em nonsmooth} version of Łojasiewicz gradient inequality for the function $f$ with an explicit exponent: For…
▽ More
Let $F(x) := (f_{ij}(x))_{i,j=1,\ldots,p},$ be a real symmetric polynomial matrix of order $p$ and let $f(x)$ be the largest eigenvalue function of the matrix $F(x).$ We denote by ${\partial}^\circ f(x)$ the Clarke subdifferential of $f$ at $x.$ In this paper, we first give the following {\em nonsmooth} version of Łojasiewicz gradient inequality for the function $f$ with an explicit exponent: For any $\bar x\in \Bbb R^n$ there exist $c > 0$ and $ε> 0$ such that we have for all $\|x - \bar{x}\| < ε,$ \begin{equation*} \inf \{ \| w \| \ : \ w \in {\partial}^\circ f(x) \} \ \ge \ c\, |f(x) - f(\bar x)|^{1 - \frac{1}{\mathscr{R}(2n+p(n+1),d+3)}}, \end{equation*} where $d:=\max_{i,j = 1, \ldots, p}°f_{i j}$ and $\mathscr{R}$ is a function introduced by D'Acunto and Kurdyka: $\mathscr{R}(n, d) := d(3d - 3)^{n-1}$ if $d \ge 2$ and $\mathscr{R}(n, d) := 1$ if $d = 1.$ Then we establish error bounds with explicitly determined exponents, local and global, for the largest eigenvalue function $f(x)$ of the matrix $F(x)$.
△ Less
Submitted 4 January, 2016; v1 submitted 7 January, 2015;
originally announced January 2015.
-
Ripple Down Rules for Question Answering
Authors:
Dat Quoc Nguyen,
Dai Quoc Nguyen,
Son Bao Pham
Abstract:
Recent years have witnessed a new trend of building ontology-based question answering systems. These systems use semantic web information to produce more precise answers to users' queries. However, these systems are mostly designed for English. In this paper, we introduce an ontology-based question answering system named KbQAS which, to the best of our knowledge, is the first one made for Vietname…
▽ More
Recent years have witnessed a new trend of building ontology-based question answering systems. These systems use semantic web information to produce more precise answers to users' queries. However, these systems are mostly designed for English. In this paper, we introduce an ontology-based question answering system named KbQAS which, to the best of our knowledge, is the first one made for Vietnamese. KbQAS employs our question analysis approach that systematically constructs a knowledge base of grammar rules to convert each input question into an intermediate representation element. KbQAS then takes the intermediate representation element with respect to a target ontology and applies concept-matching techniques to return an answer. On a wide range of Vietnamese questions, experimental results show that the performance of KbQAS is promising with accuracies of 84.1% and 82.4% for analyzing input questions and retrieving output answers, respectively. Furthermore, our question analysis approach can easily be applied to new domains and new languages, thus saving time and human effort.
△ Less
Submitted 4 November, 2015; v1 submitted 12 December, 2014;
originally announced December 2014.
-
A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging
Authors:
Dat Quoc Nguyen,
Dai Quoc Nguyen,
Dang Duc Pham,
Son Bao Pham
Abstract:
In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimenta…
▽ More
In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-of-the-art POS and morphological taggers.
△ Less
Submitted 19 December, 2015; v1 submitted 12 December, 2014;
originally announced December 2014.
-
Sibelia: A scalable and comprehensive synteny block generation tool for closely related microbial genomes
Authors:
Ilya Minkin,
Anand Patel,
Mikhail Kolmogorov,
Nikolay Vyahhi,
Son Pham
Abstract:
Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia c…
▽ More
Comparing strains within the same microbial species has proven effective in the identification of genes and genomic regions responsible for virulence, as well as in the diagnosis and treatment of infectious diseases. In this paper, we present Sibelia, a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia can find synteny blocks that are repeated within genomes as well as blocks shared by multiple genomes. It represents synteny blocks in a hierarchy structure with multiple layers, each of which representing a different granularity level. Sibelia has been designed to work efficiently with a large number of microbial genomes; it finds synteny blocks in 31 S. aureus genomes within 31 minutes and in 59 E.coli genomes within 107 minutes on a standard desktop. Sibelia software is distributed under the GNU GPL v2 license and is available at: https://github.com/bioinf/Sibelia Sibelia's web-server is available at: http://etool.me/software/sibelia
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
Cerulean: A hybrid assembly using high throughput short and long reads
Authors:
Viraj Deshpande,
Eric DK Fung,
Son Pham,
Vineet Bafna
Abstract:
Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could n…
▽ More
Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.
Contribution: We present a hybrid assembly approach that is both computationally effective and produces high quality assemblies. Our algorithm first operates with a simplified version of the assembly graph consisting only of long contigs and gradually improves the assembly by adding smaller contigs in each iteration. In contrast to the state-of-the-art long reads error correction technique, which requires high computational resources and long running time on a supercomputer even for bacterial genome datasets, our software can produce comparable assembly using only a standard desktop in a short running time.
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
Convergence of the Lasserre Hierarchy of SDP Relaxations for Convex Polynomial Programs without Compactness
Authors:
V. Jeyakumar,
T. S. Pham,
G. Li
Abstract:
The Lasserre hierarchy of semidefinite programming (SDP) relaxations is an effective scheme for finding computationally feasible SDP approximations of polynomial optimization over compact semi-algebraic sets. In this paper, we show that, for convex polynomial optimization, the Lasserre hierarchy with a slightly extended quadratic module always converges asymptotically even in the face of non-compa…
▽ More
The Lasserre hierarchy of semidefinite programming (SDP) relaxations is an effective scheme for finding computationally feasible SDP approximations of polynomial optimization over compact semi-algebraic sets. In this paper, we show that, for convex polynomial optimization, the Lasserre hierarchy with a slightly extended quadratic module always converges asymptotically even in the face of non-compact semi-algebraic feasible sets. We do this by exploiting a coercivity property of convex polynomials that are bounded below. We further establish that the positive definiteness of the Hessian of the associated Lagrangian at a saddle-point (rather than the objective function at each minimizer) guarantees finite convergence of the hierarchy. We obtain finite convergence by first establishing a new sum-of-squares polynomial representation of convex polynomials over convex semi-algebraic sets under a saddle-point condition. We finally prove that the existence of a saddle-point of the Lagrangian for a convex polynomial program is also necessary for the hierarchy to have finite convergence.
△ Less
Submitted 27 June, 2013;
originally announced June 2013.
-
Efficient algorithms for robust recovery of images from compressed data
Authors:
Duc Son Pham,
Svetha Venkatesh
Abstract:
Compressed sensing (CS) is an important theory for sub-Nyquist sampling and recovery of compressible data. Recently, it has been extended by Pham and Venkatesh to cope with the case where corruption to the CS data is modeled as impulsive noise. The new formulation, termed as robust CS, combines robust statistics and CS into a single framework to suppress outliers in the CS recovery. To solve the n…
▽ More
Compressed sensing (CS) is an important theory for sub-Nyquist sampling and recovery of compressible data. Recently, it has been extended by Pham and Venkatesh to cope with the case where corruption to the CS data is modeled as impulsive noise. The new formulation, termed as robust CS, combines robust statistics and CS into a single framework to suppress outliers in the CS recovery. To solve the newly formulated robust CS problem, Pham and Venkatesh suggested a scheme that iteratively solves a number of CS problems, the solutions from which converge to the true robust compressed sensing solution. However, this scheme is rather inefficient as it has to use existing CS solvers as a proxy. To overcome limitation with the original robust CS algorithm, we propose to solve the robust CS problem directly in this paper and drive more computationally efficient algorithms by following latest advances in large-scale convex optimization for non-smooth regularization. Furthermore, we also extend the robust CS formulation to various settings, including additional affine constraints, $\ell_1$-norm loss function, mixed-norm regularization, and multi-tasking, so as to further improve robust CS. We also derive simple but effective algorithms to solve these extensions. We demonstrate that the new algorithms provide much better computational advantage over the original robust CS formulation, and effectively solve more sophisticated extensions where the original methods simply cannot. We demonstrate the usefulness of the extensions on several CS imaging tasks.
△ Less
Submitted 26 November, 2012;
originally announced November 2012.
-
ConeRANK: Ranking as Learning Generalized Inequalities
Authors:
Truyen T. Tran,
Duc Son Pham
Abstract:
We propose a new data mining approach in ranking documents based on the concept of cone-based generalized inequalities between vectors. A partial ordering between two vectors is made with respect to a proper cone and thus learning the preferences is formulated as learning proper cones. A pairwise learning-to-rank algorithm (ConeRank) is proposed to learn a non-negative subspace, formulated as a po…
▽ More
We propose a new data mining approach in ranking documents based on the concept of cone-based generalized inequalities between vectors. A partial ordering between two vectors is made with respect to a proper cone and thus learning the preferences is formulated as learning proper cones. A pairwise learning-to-rank algorithm (ConeRank) is proposed to learn a non-negative subspace, formulated as a polyhedral cone, over document-pair differences. The algorithm is regularized by controlling the `volume' of the cone. The experimental studies on the latest and largest ranking dataset LETOR 4.0 shows that ConeRank is competitive against other recent ranking approaches.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.
-
Distribution of PageRank Mass Among Principle Components of the Web
Authors:
Konstantin Avrachenkov,
Nelly Litvak,
Kim Son Pham
Abstract:
We study the PageRank mass of principal components in a bow-tie Web Graph, as a function of the dam** factor c. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the dam** factor, in spite of the fact that it drops to zero when c goes to one. However, a detailed study of the OUT component reveals the pr…
▽ More
We study the PageRank mass of principal components in a bow-tie Web Graph, as a function of the dam** factor c. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the dam** factor, in spite of the fact that it drops to zero when c goes to one. However, a detailed study of the OUT component reveals the presence ``dead-ends'' (small groups of pages linking only to each other) that receive an unfairly high ranking when c is close to one. We argue that this problem can be mitigated by choosing c as small as 1/2.
△ Less
Submitted 13 September, 2007;
originally announced September 2007.
-
A singular perturbation approach for choosing PageRank dam** factor
Authors:
Konstantin Avrachenkov,
Nelly Litvak,
Kim Son Pham
Abstract:
The choice of the PageRank dam** factor is not evident. The Google's choice for the value c=0.85 was a compromise between the true reflection of the Web structure and numerical efficiency. However, the Markov random walk on the original Web Graph does not reflect the importance of the pages because it absorbs in dead ends. Thus, the dam** factor is needed not only for speeding up the computa…
▽ More
The choice of the PageRank dam** factor is not evident. The Google's choice for the value c=0.85 was a compromise between the true reflection of the Web structure and numerical efficiency. However, the Markov random walk on the original Web Graph does not reflect the importance of the pages because it absorbs in dead ends. Thus, the dam** factor is needed not only for speeding up the computations but also for establishing a fair ranking of pages. In this paper, we propose new criteria for choosing the dam** factor, based on the ergodic structure of the Web Graph and probability flows. Specifically, we require that the core component receives a fair share of the PageRank mass. Using singular perturbation approach we conclude that the value c=0.85 is too high and suggest that the dam** factor should be chosen around 1/2. As a by-product, we describe the ergodic structure of the OUT component of the Web Graph in detail. Our analytical results are confirmed by experiments on two large samples of the Web Graph.
△ Less
Submitted 4 December, 2006;
originally announced December 2006.