Search | arXiv e-print repository

Model-Based Multiple Instance Learning

Authors: Ba-Ngu Vo, Dinh Phung, Quang N. Tran, Ba-Tuong Vo

Abstract: While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensi… ▽ More While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. △ Less

Submitted 13 August, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: 16 pages, 15 figures

arXiv:1702.02262 [pdf, other]

Clustering For Point Pattern Data

Authors: Quang N. Tran, Ba-Ngu Vo, Dinh Phung, Ba-Tuong Vo

Abstract: Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns - sets or multi-sets of unordered elements - that are found in numerous applications and data sources. In this paper, we… ▽ More Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns - sets or multi-sets of unordered elements - that are found in numerous applications and data sources. In this paper, we propose two approaches for clustering point patterns. The first is a non-parametric method based on novel distances for sets. The second is a model-based approach, formulated via random finite set theory, and solved by the Expectation-Maximization algorithm. Numerical experiments show that the proposed methods perform well on both simulated and real data. △ Less

Submitted 7 February, 2017; originally announced February 2017.

Comments: Preprint: 23rd Int. Conf. Pattern Recognition (ICPR). Cancun, Mexico, December 2016

arXiv:1701.08473 [pdf, other]

Model-based Classification and Novelty Detection For Point Pattern Data

Authors: Ba-Ngu Vo, Quang N. Tran, Dinh Phung, Ba-Tuong Vo

Abstract: Point patterns are sets or multi-sets of unordered elements that can be found in numerous data sources. However, in data analysis tasks such as classification and novelty detection, appropriate statistical models for point pattern data have not received much attention. This paper proposes the modelling of point pattern data via random finite sets (RFS). In particular, we propose appropriate likeli… ▽ More Point patterns are sets or multi-sets of unordered elements that can be found in numerous data sources. However, in data analysis tasks such as classification and novelty detection, appropriate statistical models for point pattern data have not received much attention. This paper proposes the modelling of point pattern data via random finite sets (RFS). In particular, we propose appropriate likelihood functions, and a maximum likelihood estimator for learning a tractable family of RFS models. In novelty detection, we propose novel ranking functions based on RFS models, which substantially improve performance. △ Less

Submitted 7 February, 2017; v1 submitted 29 January, 2017; originally announced January 2017.

Comments: Prepint: 23rd Int. Conf. Pattern Recognition (ICPR). Cancun, Mexico, December 2016

arXiv:1612.07850 [pdf, other]

doi 10.1109/ICARCV.2016.7838683

Automatic Interpretation of Unordered Point Cloud Data for UAV Navigation in Construction

Authors: M. D. Phung, C. H. Quach, D. T. Chu, N. Q. Nguyen, T. H. Dinh, Q. P. Ha

Abstract: The objective of this work is to develop a data processing system that can automatically generate waypoints for navigation of an unmanned aerial vehicle (UAV) to inspect surfaces of structures like buildings and bridges. The input includes data recorded by two 2D laser scanners, orthogonally mounted on the UAV, and an inertial measurement unit (IMU). To achieve the goal, algorithms are developed t… ▽ More The objective of this work is to develop a data processing system that can automatically generate waypoints for navigation of an unmanned aerial vehicle (UAV) to inspect surfaces of structures like buildings and bridges. The input includes data recorded by two 2D laser scanners, orthogonally mounted on the UAV, and an inertial measurement unit (IMU). To achieve the goal, algorithms are developed to process the data collected. They are separated into three major groups: (i) the data registration and filtering to generate a 3D model of the structure and control the density of point clouds for data completeness enhancement; (ii) the surface and obstacle detection to assist the UAV in monitoring tasks; and (iii) the waypoint generation to set the flight path. Experiments on different data sets show that the developed system is able to reconstruct a 3D point cloud of the structure, extract its surfaces and objects, and generate waypoints for the UAV to accomplish inspection tasks. △ Less

Submitted 12 February, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

Comments: In The 14th International Conference on Control, Automation, Robotics and Vision, ICARCV 2016

arXiv:1612.01812 [pdf, other]

Control Matching via Discharge Code Sequences

Authors: Dang Nguyen, Wei Luo, Dinh Phung, Svetha Venkatesh

Abstract: In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the… ▽ More In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant. △ Less

Submitted 1 December, 2016; originally announced December 2016.

Comments: 5 pages

arXiv:1612.01034 [pdf]

doi 10.1109/AIM.2013.6584297

Localization of networked robot systems subject to random delay and packet loss

Authors: Manh Duong Phung, Thi Thanh Van Nguyen, Thuan Hoang Tran, Quang Vinh Tran

Abstract: This paper deals with the localization problem of mobile robot subject to communication delay and packet loss. The delay and loss may appear in a random fashion in both control inputs and observation measurements. A unified state-space representation is constructed to describe these mixed uncertainties. Based on it, the optimal linear estimator is developed. The main idea is the derivation of a re… ▽ More This paper deals with the localization problem of mobile robot subject to communication delay and packet loss. The delay and loss may appear in a random fashion in both control inputs and observation measurements. A unified state-space representation is constructed to describe these mixed uncertainties. Based on it, the optimal linear estimator is developed. The main idea is the derivation of a relevance factor to incorporate delayed measurements to the being estimate. The estimator is then extended for nonlinear systems. The performance of this method is tested within the simulations in MATLAB and the experiments in a real robot system. The good localization results prove the efficiency of the method for the purpose of localization of networked mobile robot. △ Less

Submitted 3 December, 2016; originally announced December 2016.

Comments: In 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM

arXiv:1611.10075 [pdf, ps, other]

Impulse output rapid stabilization for heat equations

Authors: Kim Dang Phung, Gengsheng Wang, Yashan Xu

Abstract: The main aim of this paper is to provide a new feedback law for the heat equations in a bounded domain $Ω$ with Dirichlet boundary condition. Two constraints will be compulsory: First, The controls are active in a subdomain of $Ω$ and at discrete time points; Second, The observations are made in another subdomain and at different discrete time points. Our strategy consists in linking an observatio… ▽ More The main aim of this paper is to provide a new feedback law for the heat equations in a bounded domain $Ω$ with Dirichlet boundary condition. Two constraints will be compulsory: First, The controls are active in a subdomain of $Ω$ and at discrete time points; Second, The observations are made in another subdomain and at different discrete time points. Our strategy consists in linking an observation estimate at one time, minimal norm impulse control, approximate inverse source problem and rapid output stabilization. △ Less

Submitted 30 November, 2016; originally announced November 2016.

arXiv:1611.09431 [pdf]

doi 10.1109/ICCSCE.2012.6487193

Localization of a unicycle-like mobile robot using LRF and omni-directional camera

Authors: Tran Hiep Dinh, Manh Duong Phung, Thuan Hoang Tran, Quang Vinh Tran

Abstract: This paper addresses the localization problem. The extended Kalman filter (EKF) is employed to localize a unicycle-like mobile robot equipped with a laser range finder (LRF) sensor and an omni-directional camera. The LRF is used to scan the environment which is described with line segments. The segments are extracted by a modified least square quadratic method in which a dynamic threshold is injec… ▽ More This paper addresses the localization problem. The extended Kalman filter (EKF) is employed to localize a unicycle-like mobile robot equipped with a laser range finder (LRF) sensor and an omni-directional camera. The LRF is used to scan the environment which is described with line segments. The segments are extracted by a modified least square quadratic method in which a dynamic threshold is injected. The camera is employed to determine the robot's orientation. The prediction step of the EKF is performed by extracting parameters from the kinematic model and input signal of the robot. The correction step is conducted with the implementation of a line matching algorithm and the comparison between line's parameters of the local and global maps. In the line matching algorithm, a conversion matrix is introduced to reduce the computation cost. Experiments have been carried out in a real mobile robot system and the results prove the applicability of the method for the purpose of localization. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: In 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE)

arXiv:1611.09427 [pdf]

doi 10.1109/RIVF.2008.4586369

Easy-setup eye movement recording system for human-computer interaction

Authors: Manh Duong Phung, Quang Vinh Tran, Kenji Hara, Hirohito Inagaki, Masanobu Abe

Abstract: Tracking the movement of human eyes is expected to yield natural and convenient applications based on human-computer interaction (HCI). To implement an effective eye-tracking system, eye movements must be recorded without placing any restriction on the user's behavior or user discomfort. This paper describes an eye movement recording system that offers free-head, simple configuration. It does not… ▽ More Tracking the movement of human eyes is expected to yield natural and convenient applications based on human-computer interaction (HCI). To implement an effective eye-tracking system, eye movements must be recorded without placing any restriction on the user's behavior or user discomfort. This paper describes an eye movement recording system that offers free-head, simple configuration. It does not require the user to wear anything on her head, and she can move her head freely. Instead of using a computer, the system uses a visual digital signal processor (DSP) camera to detect the position of eye corner, the center of pupil and then calculate the eye movement. Evaluation tests show that the sampling rate of the system can be 300 Hz and the accuracy is about 1.8 degree/s. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: In IEEE International Conference on Research, Innovation and Vision for the Future (RIVF), 2008

arXiv:1609.08752 [pdf, other]

Stabilizing Linear Prediction Models using Autoencoder

Authors: Shivapratap Gopakumar, Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: To date, the instability of prognostic predictors in a sparse high dimensional model, which hinders their clinical adoption, has received little attention. Stable prediction is often overlooked in favour of performance. Yet, stability prevails as key when adopting models in critical areas as healthcare. Our study proposes a stabilization scheme by detecting higher order feature correlations. Using… ▽ More To date, the instability of prognostic predictors in a sparse high dimensional model, which hinders their clinical adoption, has received little attention. Stable prediction is often overlooked in favour of performance. Yet, stability prevails as key when adopting models in critical areas as healthcare. Our study proposes a stabilization scheme by detecting higher order feature correlations. Using a linear model as basis for prediction, we achieve feature stability by regularising latent correlation in features. Latent higher order correlation among features is modelled using an autoencoder network. Stability is enhanced by combining a recent technique that uses a feature graph, and augmenting external unlabelled data for training the autoencoder network. Our experiments are conducted on a heart failure cohort from an Australian hospital. Stability was measured using Consistency index for feature subsets and signal-to-noise ratio for model parameters. Our methods demonstrated significant improvement in feature stability and model estimation stability when compared to baselines. △ Less

Submitted 27 September, 2016; originally announced September 2016.

Comments: accepted in ADMA 2016

arXiv:1609.04508 [pdf, other]

Column Networks for Collective Classification

Authors: Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We… ▽ More Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. △ Less

Submitted 28 November, 2016; v1 submitted 15 September, 2016; originally announced September 2016.

Comments: Accepted at AAAI'17

arXiv:1609.00096 [pdf]

doi 10.1109/ICARCV.2014.7064437

Image segmentation based on histogram of depth and an application in driver distraction detection

Authors: Tran Hiep Dinh, Minh Trien Pham, Manh Duong Phung, Duc Manh Nguyen, Van Manh Hoang, Quang Vinh Tran

Abstract: This study proposes an approach to segment human object from a depth image based on histogram of depth values. The region of interest is first extracted based on a predefined threshold for histogram regions. A region growing process is then employed to separate multiple human bodies with the same depth interval. Our contribution is the identification of an adaptive growth threshold based on the de… ▽ More This study proposes an approach to segment human object from a depth image based on histogram of depth values. The region of interest is first extracted based on a predefined threshold for histogram regions. A region growing process is then employed to separate multiple human bodies with the same depth interval. Our contribution is the identification of an adaptive growth threshold based on the detected histogram region. To demonstrate the effectiveness of the proposed method, an application in driver distraction detection was introduced. After successfully extracting the driver's position inside the car, we came up with a simple solution to track the driver motion. With the analysis of the difference between initial and current frame, a change of cluster position or depth value in the interested region, which cross the preset threshold, is considered as a distracted activity. The experiment results demonstrated the success of the algorithm in detecting typical distracted driving activities such as using phone for calling or texting, adjusting internal devices and drinking in real time. △ Less

Submitted 31 August, 2016; originally announced September 2016.

Comments: 6 pages In 13th International Conference on Control Automation Robotics & Vision (ICARCV), 2014

arXiv:1608.04830 [pdf, other]

Outlier Detection on Mixed-Type Data: An Energy-based Approach

Authors: Kien Do, Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In t… ▽ More Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts. △ Less

Submitted 16 August, 2016; originally announced August 2016.

arXiv:1608.03639 [pdf, other]

Faster Training of Very Deep Networks Via p-Norm Gates

Authors: Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning in v… ▽ More A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning in very deep networks with hundred layers and helps achieve record-breaking results in vision (e.g., ImageNet with Residual Nets) and NLP (e.g., machine translation with GRU). However, there is limited work in analysing the role of gating in the learning process. In this paper, we propose a flexible $p$-norm gating scheme, which allows user-controllable flow and as a consequence, improve the learning speed. This scheme subsumes other existing gating schemes, including those in GRU, Highway Networks and Residual Nets as special cases. Experiments on large sequence and vector datasets demonstrate that the proposed gating scheme helps improve the learning speed significantly without extra overhead. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: To appear in ICPR'16

arXiv:1607.08310 [pdf, other]

Preterm Birth Prediction: Deriving Stable and Interpretable Rules from High Dimensional Data

Authors: Truyen Tran, Wei Luo, Dinh Phung, Jonathan Morris, Kristen Rickard, Svetha Venkatesh

Abstract: Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2-24.2% at specificity of 28.6-33.3%. We take a different approach by exploiting d… ▽ More Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2-24.2% at specificity of 28.6-33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%. △ Less

Submitted 28 July, 2016; originally announced July 2016.

Comments: Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA

arXiv:1606.06793 [pdf, other]

Scalable Semi-supervised Learning with Graph-based Kernel Machine

Authors: Trung Le, Khanh Nguyen, Van Nguyen, Vu Nguyen, Dinh Phung

Abstract: Acquiring labels are often costly, whereas unlabeled data are usually easy to obtain in modern machine learning applications. Semi-supervised learning provides a principled machine learning framework to address such situations, and has been applied successfully in many real-word applications and industries. Nonetheless, most of existing semi-supervised learning methods encounter two serious limita… ▽ More Acquiring labels are often costly, whereas unlabeled data are usually easy to obtain in modern machine learning applications. Semi-supervised learning provides a principled machine learning framework to address such situations, and has been applied successfully in many real-word applications and industries. Nonetheless, most of existing semi-supervised learning methods encounter two serious limitations when applied to modern and large-scale datasets: computational burden and memory usage demand. To this end, we present in this paper the Graph-based semi-supervised Kernel Machine (GKM), a method that leverages the generalization ability of kernel-based method with the geometrical and distributive information formulated through a spectral graph induced from data for semi-supervised learning purpose. Our proposed GKM can be solved directly in the primal form using the Stochastic Gradient Descent method with the ideal convergence rate $O(\frac{1}{T})$. Besides, our formulation is suitable for a wide spectrum of important loss functions in the literature of machine learning (e.g., Hinge, smooth Hinge, Logistic, L1, and ε-insensitive) and smoothness functions (i.e., $l_p(t) = |t|^p$ with $p\ge1$). We further show that the well-known Laplacian Support Vector Machine is a special case of our formulation. We validate our proposed method on several benchmark datasets to demonstrate that GKM is appropriate for the large-scale datasets since it is optimal in memory usage and yields superior classification accuracy whilst simultaneously achieving a significant computation speed-up in comparison with the state-of-the-art baselines. △ Less

Submitted 5 April, 2017; v1 submitted 21 June, 2016; originally announced June 2016.

Comments: 21 pages

arXiv:1605.09198 [pdf, ps, other]

Some inequalities for partial derivatives on time scales

Authors: Tran Dinh Phung

Abstract: We first prove some weighted inequalities for compositions of functions on time scales which are in turn applied to establish some new dynamic Opial-type inequalities in several variables. Some generalizations and applications to partial differential dynamic equations are also considered. We first prove some weighted inequalities for compositions of functions on time scales which are in turn applied to establish some new dynamic Opial-type inequalities in several variables. Some generalizations and applications to partial differential dynamic equations are also considered. △ Less

Submitted 19 May, 2016; originally announced May 2016.

Comments: accepted for publication in AMV

MSC Class: 26D15; 26D10; 26E70

arXiv:1605.01954 [pdf, ps, other]

Observation estimate for kinetic transport equation by diffusion approximation

Authors: Claude Bardos, Kim Dang Phung

Abstract: We study the unique continuation property for the neutron transport equation and for a simplified model of the Fokker-Planck equation in a bounded domain with absorbing boundary condition. An observation estimate is derived. It depends on the smallness of the mean free path and the frequency of the velocity average of the initial data. The proof relies on the well known diffusion approximation und… ▽ More We study the unique continuation property for the neutron transport equation and for a simplified model of the Fokker-Planck equation in a bounded domain with absorbing boundary condition. An observation estimate is derived. It depends on the smallness of the mean free path and the frequency of the velocity average of the initial data. The proof relies on the well known diffusion approximation under convenience scaling and on basic properties of this diffusion. Eventually we propose a direct proof for the observation at one time of parabolic equations. It is based on the analysis of the heat kernel. △ Less

Submitted 6 May, 2016; originally announced May 2016.

arXiv:1605.01116 [pdf, other]

An evaluation of randomized machine learning methods for redundant data: Predicting short and medium-term suicide risk from administrative records and risk assessments

Authors: Thuong Nguyen, Truyen Tran, Shivapratap Gopakumar, Dinh Phung, Svetha Venkatesh

Abstract: Accurate prediction of suicide risk in mental health patients remains an open problem. Existing methods including clinician judgments have acceptable sensitivity, but yield many false positives. Exploiting administrative data has a great potential, but the data has high dimensionality and redundancies in the recording processes. We investigate the efficacy of three most effective randomized machin… ▽ More Accurate prediction of suicide risk in mental health patients remains an open problem. Existing methods including clinician judgments have acceptable sensitivity, but yield many false positives. Exploiting administrative data has a great potential, but the data has high dimensionality and redundancies in the recording processes. We investigate the efficacy of three most effective randomized machine learning techniques random forests, gradient boosting machines, and deep neural nets with dropout in predicting suicide risk. Using a cohort of mental health patients from a regional Australian hospital, we compare the predictive performance with popular traditional approaches clinician judgments based on a checklist, sparse logistic regression and decision trees. The randomized methods demonstrated robustness against data redundancies and superior predictive performance on AUC and F-measure. △ Less

Submitted 3 May, 2016; originally announced May 2016.

arXiv:1604.06518 [pdf, ps, other]

Approximation Vector Machines for Large-scale Online Learning

Authors: Trung Le, Tu Dinh Nguyen, Vu Nguyen, Dinh Phung

Abstract: One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose A… ▽ More One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and $l_1$, $l_2$, and $ε$-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. △ Less

Submitted 27 May, 2017; v1 submitted 21 April, 2016; originally announced April 2016.

Comments: 54 pages

arXiv:1603.01359 [pdf, other]

Learning deep representation of multityped objects and tasks

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-v… ▽ More We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). At the same time, the image may have several social tags, which are best described using a sparse binary vector. Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation. At the same time, the model supports heterogeneously typed tasks. We demonstrate the capacity of the model on two applications: social image retrieval and multiple concept prediction. The deep architecture produces more compact representation, naturally integrates multiviews and multimodalities, exploits better side information, and most importantly, performs competitively against baselines. △ Less

Submitted 4 March, 2016; originally announced March 2016.

arXiv:1602.05285 [pdf, other]

Choice by Elimination via Deep Neural Networks

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing o… ▽ More We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods. △ Less

Submitted 16 February, 2016; originally announced February 2016.

Comments: PAKDD workshop on Biologically Inspired Techniques for Data Mining (BDM'16)

arXiv:1602.02842 [pdf, other]

Collaborative filtering via sparse Markov random fields

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Recommender systems play a central role in providing individualized access to information and services. This paper focuses on collaborative filtering, an approach that exploits the shared structure among mind-liked users and similar items. In particular, we focus on a formal probabilistic framework known as Markov random fields (MRF). We address the open problem of structure learning and introduce… ▽ More Recommender systems play a central role in providing individualized access to information and services. This paper focuses on collaborative filtering, an approach that exploits the shared structure among mind-liked users and similar items. In particular, we focus on a formal probabilistic framework known as Markov random fields (MRF). We address the open problem of structure learning and introduce a sparsity-inducing algorithm to automatically estimate the interaction structures between users and between items. Item-item and user-user correlation networks are obtained as a by-product. Large-scale experiments on movie recommendation and date matching datasets demonstrate the power of the proposed method. △ Less

Submitted 8 February, 2016; originally announced February 2016.

arXiv:1602.00357 [pdf, other]

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

Authors: Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illne… ▽ More Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy. △ Less

Submitted 10 April, 2017; v1 submitted 31 January, 2016; originally announced February 2016.

Comments: Accepted at JBI under the new name: "Predicting healthcare trajectories from medical records: A deep learning approach"

arXiv:1512.08008 [pdf, other]

Discovering topic structures of a temporally evolving document corpus

Authors: Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha Venkatesh

Abstract: In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the mo… ▽ More In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the metabolic syndrome (MetS) -- both increasingly important research subjects with significant social and healthcare consequences. In addition to the collected ASD and metabolic syndrome literature corpora which we made freely available, our contribution also includes an extensive empirical analysis of the proposed framework. We describe a detailed and careful examination of the effects that our algorithms's free parameters have on its output, and discuss the significance of the findings both in the context of the practical application of our algorithm as well as in the context of the existing body of work on temporal topic analysis. Our quantitative analysis is followed by several qualitative case studies highly relevant to the current research on ASD and MetS, on which our algorithm is shown to capture well the actual developments in these fields. △ Less

Submitted 25 December, 2015; originally announced December 2015.

Comments: 2015

arXiv:1507.02973 [pdf, other]

doi 10.1145/2808797.2808908

Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

Authors: Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha Venkatesh

Abstract: Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is… ▽ More Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags. △ Less

Submitted 10 July, 2015; originally announced July 2015.

Comments: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015

arXiv:1506.00246 [pdf, other]

doi 10.1007/s13278-015-0261-5

Using Twitter to learn about the autism community

Authors: Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha Venkatesh, Terry Caelli

Abstract: Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of… ▽ More Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD -- their behaviour, concerns, needs etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work. △ Less

Submitted 31 May, 2015; originally announced June 2015.

Comments: Social Network Analysis and Mining, 2015

arXiv:1503.08972 [pdf, ps, other]

doi 10.1103/PhysRevB.91.115140

Mass-imbalance induced metal-insulator transition in a three-component Hubbard model

Authors: Duong-Bo Nguyen, Duy-Khuong Phung, Van-Nham Phan, Minh-Tien Tran

Abstract: The effects of mass imbalance in a three-component Hubbard model are studied by the dynamical mean-field theory combined with exact diagonalization. The model describes a fermion-fermion mixture of two different particle species with a mass imbalance. One species is two-component fermion particles, and the other is single-component ones. The local interaction between particle species is considered… ▽ More The effects of mass imbalance in a three-component Hubbard model are studied by the dynamical mean-field theory combined with exact diagonalization. The model describes a fermion-fermion mixture of two different particle species with a mass imbalance. One species is two-component fermion particles, and the other is single-component ones. The local interaction between particle species is considered isotropically. It is found that the mass imbalance can drive the mixture from insulator to metal. The insulator-metal transition is a species-selective-like transition of lighter mass particles and occurs only at commensurate particle densities and moderate local interactions. For weak and strong local interactions the mass imbalance does not change the ground state of the mixture. △ Less

Submitted 31 March, 2015; originally announced March 2015.

Journal ref: Phys. Rev. B 91, 115140 (2015)

arXiv:1502.02233 [pdf, other]

Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature

Authors: Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha Venkatesh

Abstract: In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the mo… ▽ More In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) - an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we will make freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature. △ Less

Submitted 8 February, 2015; originally announced February 2015.

Comments: In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2015

arXiv:1408.1162 [pdf, other]

MCMC for Hierarchical Semi-Markov Conditional Random Fields

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh, Hung H. Bui

Abstract: Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the pote… ▽ More Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. △ Less

Submitted 5 August, 2014; originally announced August 2014.

Comments: NIPS'09 Workshop on Deep Learning for Speech Recognition and Related Applications

arXiv:1408.1160 [pdf, other]

Mixed-Variate Restricted Boltzmann Machines

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Modern datasets are becoming heterogeneous. To this end, we present in this paper Mixed-Variate Restricted Boltzmann Machines for simultaneously modelling variables of multiple types and modalities, including binary and continuous responses, categorical options, multicategorical choices, ordinal assessment and category-ranked preferences. Dependency among variables is modeled using latent binary v… ▽ More Modern datasets are becoming heterogeneous. To this end, we present in this paper Mixed-Variate Restricted Boltzmann Machines for simultaneously modelling variables of multiple types and modalities, including binary and continuous responses, categorical options, multicategorical choices, ordinal assessment and category-ranked preferences. Dependency among variables is modeled using latent binary variables, each of which can be interpreted as a particular hidden aspect of the data. The proposed model, similar to the standard RBMs, allows fast evaluation of the posterior for the latent variables. Hence, it is naturally suitable for many common tasks including, but not limited to, (a) as a pre-processing step to convert complex input data into a more convenient vectorial representation through the latent posteriors, thereby offering a dimensionality reduction capacity, (b) as a classifier supporting binary, multiclass, multilabel, and label-ranking outputs, or a regression tool for continuous outputs and (c) as a data completion tool for multimodal and heterogeneous data. We evaluate the proposed model on a large-scale dataset using the world opinion survey results on three tasks: feature extraction and visualization, data completion and prediction. △ Less

Submitted 5 August, 2014; originally announced August 2014.

Comments: Originally published in Proceedings of ACML'11

arXiv:1408.0055 [pdf, other]

Thurstonian Boltzmann Machines: Learning from Multiple Inequalities

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. Our motivation rests in the Thurstonian view that many discrete data types can be considered as being generated from a subset of underlying latent continuous variables, and in the observation that each realisation of a discrete type imposes certain… ▽ More We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. Our motivation rests in the Thurstonian view that many discrete data types can be considered as being generated from a subset of underlying latent continuous variables, and in the observation that each realisation of a discrete type imposes certain inequalities on those variables. Thus learning and inference in TBM reduce to making sense of a set of inequalities. Our proposed TBM naturally supports the following types: Gaussian, intervals, censored, binary, categorical, muticategorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures; namely handwritten digit recognition, collaborative filtering and complex social survey analysis. △ Less

Submitted 31 July, 2014; originally announced August 2014.

Comments: Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28

arXiv:1408.0047 [pdf, other]

Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent op… ▽ More Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion profile of citizens around the world, and is competitive against state-of-art collaborative filtering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments. △ Less

Submitted 31 July, 2014; originally announced August 2014.

Comments: JMLR: Workshop and Conference Proceedings 25:1-16, 2012; Asian Conference on Machine Learning

arXiv:1408.0043 [pdf, other]

Learning From Ordered Sets and Applications in Collaborative Ranking

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed $5$ stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets… ▽ More Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed $5$ stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches $(N!/2)6.93145^{N+1}$ as $N$ approaches infinity. We propose a \texttt{split-and-merge} Metropolis-Hastings procedure that can explore the state-space efficiently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be efficiently evaluated. Finally, we evaluate the proposed model on large-scale collaborative filtering tasks and demonstrate that it is competitive against state-of-the-art methods. △ Less

Submitted 31 July, 2014; originally announced August 2014.

Comments: JMLR: Workshop and Conference Proceedings 25:1-16, 2012, Asian Conference on Machine Learning

arXiv:1407.6432 [pdf, other]

Learning Structured Outputs from Partial Labels using Forest Ensemble

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Learning structured outputs with general structures is computationally challenging, except for tree-structured models. Thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. The idea is based on the realization that a graph is a superimposition of trees. Different from most existing work, our algorithm can handle partial labelling, and thus is particularly attractive in… ▽ More Learning structured outputs with general structures is computationally challenging, except for tree-structured models. Thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. The idea is based on the realization that a graph is a superimposition of trees. Different from most existing work, our algorithm can handle partial labelling, and thus is particularly attractive in practice where reliable labels are often sparsely observed. In addition, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithm to an indoor video surveillance scenario, where activities are modelled at multiple levels. △ Less

Submitted 23 July, 2014; originally announced July 2014.

Comments: Conference version appeared in Truyen et al, AdaBoost.MRF: Boosted Markov random forests and application to multilevel activity recognition. CVPR'06

arXiv:1407.6094 [pdf, other]

Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records

Authors: Shivapratap Gopakumar, Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records. Model estimation… ▽ More Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records. Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures - the Jaccard index and the Consistency index - the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction. △ Less

Submitted 22 July, 2014; originally announced July 2014.

Comments: Submitted to International Workshop on Pattern Recognition for Healthcare Analytics 2014, Sweden. Contains 4 pages, 5 figures

arXiv:1407.6089 [pdf, other]

Learning Rank Functionals: An Empirical Study

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: Ranking is a key aspect of many applications, such as information retrieval, question answering, ad placement and recommender systems. Learning to rank has the goal of estimating a ranking model automatically from training data. In practical settings, the task often reduces to estimating a rank functional of an object with respect to a query. In this paper, we investigate key issues in designing a… ▽ More Ranking is a key aspect of many applications, such as information retrieval, question answering, ad placement and recommender systems. Learning to rank has the goal of estimating a ranking model automatically from training data. In practical settings, the task often reduces to estimating a rank functional of an object with respect to a query. In this paper, we investigate key issues in designing an effective learning to rank algorithm. These include data representation, the choice of rank functionals, the design of the loss function so that it is correlated with the rank metrics used in evaluation. For the loss function, we study three techniques: approximating the rank metric by a smooth function, decomposition of the loss into a weighted sum of element-wise losses and into a weighted sum of pairwise losses. We then present derivations of piecewise losses using the theory of high-order Markov chains and Markov random fields. In experiments, we evaluate these design aspects on two tasks: answer ranking in a Social Question Answering site, and Web Information Retrieval. △ Less

Submitted 7 February, 2015; v1 submitted 22 July, 2014; originally announced July 2014.

arXiv:1407.6084 [pdf, other]

doi 10.1007/s10115-014-0740-4

Stabilized Sparse Ordinal Regression for Medical Risk Stratification

Authors: Truyen Tran, Dinh Phung, Wei Luo, Svetha Venkatesh

Abstract: The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse… ▽ More The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. △ Less

Submitted 22 July, 2014; originally announced July 2014.

arXiv:1407.5764 [pdf, other]

Preference Networks: Probabilistic Models for Recommendation Systems

Authors: Tran The Truyen, Dinh Q. Phung, Svetha Venkatesh

Abstract: Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering… ▽ More Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional Markov random field. Once estimated, it serves as a probabilistic database that supports various useful queries such as rating prediction and top-$N$ recommendation. To handle the challenging problem of learning large networks of users and items, we employ a simple but effective pseudo-likelihood with regularisation. Experiments on the movie rating data demonstrate the merits of the PN. △ Less

Submitted 22 July, 2014; originally announced July 2014.

Comments: In Proc. of 6th Australasian Data Mining Conference (AusDM), Gold Coast, Australia, pages 195--202, 2007

arXiv:1407.5754 [pdf, other]

Tree-based iterated local search for Markov random fields with applications in image analysis

Authors: Truyen Tran, Dinh Phung, Svetha Venkatesh

Abstract: The \emph{maximum a posteriori} (MAP) assignment for general structure Markov random fields (MRFs) is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS) takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framew… ▽ More The \emph{maximum a posteriori} (MAP) assignment for general structure Markov random fields (MRFs) is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS) takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhood and does so with limited memory without any requirement on the cost functions. We evaluate the T-ILS in a simulation of Ising model and two real-world problems in computer vision: stereo matching, image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with a significant computational gain. △ Less

Submitted 22 July, 2014; originally announced July 2014.

arXiv:1401.1974 [pdf, ps, other]

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts

Authors: Vu Nguyen, Dinh Phung, XuanLong Nguyen, Svetha Venkatesh, Hung Hai Bui

Abstract: We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations a… ▽ More We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polya-urn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains. △ Less

Submitted 28 January, 2014; v1 submitted 9 January, 2014; originally announced January 2014.

Comments: Full version of ICML 2014

arXiv:1312.2372 [pdf, ps, other]

doi 10.1109/TSP.2014.2364014

Labeled Random Finite Sets and the Bayes Multi-Target Tracking Filter

Authors: B. -N. Vo, B. -T. Vo, D. Phung

Abstract: We present an efficient numerical implementation of the $δ$-Generalized Labeled Multi-Bernoulli multi-target tracking filter. Each iteration of this filter involves an update operation and a prediction operation, both of which result in weighted sums of multi-target exponentials with intractably large number of terms. To truncate these sums, the ranked assignment and K-th shortest path algorithms… ▽ More We present an efficient numerical implementation of the $δ$-Generalized Labeled Multi-Bernoulli multi-target tracking filter. Each iteration of this filter involves an update operation and a prediction operation, both of which result in weighted sums of multi-target exponentials with intractably large number of terms. To truncate these sums, the ranked assignment and K-th shortest path algorithms are used in the update and prediction, respectively, to determine the most significant terms without exhaustively computing all of the terms. In addition, using tools derived from the same framework, such as probability hypothesis density filtering, we present inexpensive look-ahead strategies to reduce the number of computations. Characterization of the $L_{1}$-error in the multi-target density arising from the truncation is presented. △ Less

Submitted 28 February, 2017; v1 submitted 9 December, 2013; originally announced December 2013.

arXiv:1210.4855 [pdf]

A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning

Authors: Sunil Kumar Gupta, Dinh Q. Phung, Svetha Venkatesh

Abstract: Hierarchical beta process has found interesting applications in recent years. In this paper we present a modified hierarchical beta process prior with applications to hierarchical modeling of multiple data sources. The novel use of the prior over a hierarchical factor model allows factors to be shared across different sources. We derive a slice sampler for this model, enabling tractable inference… ▽ More Hierarchical beta process has found interesting applications in recent years. In this paper we present a modified hierarchical beta process prior with applications to hierarchical modeling of multiple data sources. The novel use of the prior over a hierarchical factor model allows factors to be shared across different sources. We derive a slice sampler for this model, enabling tractable inference even when the likelihood and the prior over parameters are non-conjugate. This allows the application of the model in much wider contexts without restrictions. We present two different data generative models a linear GaussianGaussian model for real valued data and a linear Poisson-gamma model for count data. Encouraging transfer learning results are shown for two real world applications text modeling and content based image retrieval. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Report number: UAI-P-2012-PG-316-325

arXiv:1205.2611 [pdf]

Ordinal Boltzmann Machines for Collaborative Filtering

Authors: Tran The Truyen, Dinh Q. Phung, Svetha Venkatesh

Abstract: Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other item… ▽ More Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and item-based processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. △ Less

Submitted 9 May, 2012; originally announced May 2012.

Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Report number: UAI-P-2009-PG-548-556

arXiv:1109.3863 [pdf, ps, other]

An observability for parabolic equations from a measurable set in time

Authors: Kim Dang Phung, Gengsheng Wang

Abstract: This paper presents a new observability estimate for parabolic equations in $Ω\times(0,T)$, where $Ω$ is a convex domain. The observation region is restricted over a product set of an open nonempty subset of $Ω$ and a subset of positive measure in $(0,T)$. This estimate is derived with the aid of a quantitative unique continuation at one point in time. Applications to the bang-bang property for no… ▽ More This paper presents a new observability estimate for parabolic equations in $Ω\times(0,T)$, where $Ω$ is a convex domain. The observation region is restricted over a product set of an open nonempty subset of $Ω$ and a subset of positive measure in $(0,T)$. This estimate is derived with the aid of a quantitative unique continuation at one point in time. Applications to the bang-bang property for norm and time optimal control problems are provided. △ Less

Submitted 18 September, 2011; originally announced September 2011.

arXiv:1102.2712 [pdf, ps, other]

Energy decay for Maxwell's equations with Ohm's law on partially cubic domains

Authors: Kim Dang Phung

Abstract: We prove a polynomial energy decay for the Maxwell's equations with Ohm's law on partially cubic domains with trapped rays. We prove a polynomial energy decay for the Maxwell's equations with Ohm's law on partially cubic domains with trapped rays. △ Less

Submitted 14 February, 2011; originally announced February 2011.

arXiv:1009.2009 [pdf, ps, other]

Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

Authors: Tran The Truyen, Dinh Q. Phung, Hung H. Bui, Svetha Venkatesh

Abstract: Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirectedMarkov chains tomodel complex hierarchical, nestedMarkov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partiallysupervised lea… ▽ More Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirectedMarkov chains tomodel complex hierarchical, nestedMarkov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partiallysupervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. △ Less

Submitted 10 September, 2010; originally announced September 2010.

Comments: 56 pages, short version presented at NIPS'08

arXiv:1009.1690 [pdf, ps, other]

Probabilistic Models over Ordered Partitions with Application in Learning to Rank

Authors: Tran The Truyen, Dinh Q. Phung, Svetha Venkatesh

Abstract: This paper addresses the general problem of modelling and learning rank data with ties. We propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen… ▽ More This paper addresses the general problem of modelling and learning rank data with ties. We propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on the problem of learning to rank with the data from the recently held Yahoo! challenge, and demonstrate that the models are competitive against well-known rivals. △ Less

Submitted 4 October, 2010; v1 submitted 9 September, 2010; originally announced September 2010.

Comments: 19 pages, 2 figures

arXiv:0912.2202 [pdf, ps, other]

Waves, damped wave and observation

Authors: Kim Dang Phung

Abstract: We consider the wave equation in a bounded domain (eventually convex). Two kinds of inequality are described when occurs trapped ray. Applications to control theory are given. First, we link such kind of estimate with the damped wave equation and its decay rate. Next, we describe the design of an approximate control function by an iterative time reversal method. We consider the wave equation in a bounded domain (eventually convex). Two kinds of inequality are described when occurs trapped ray. Applications to control theory are given. First, we link such kind of estimate with the damped wave equation and its decay rate. Next, we describe the design of an approximate control function by an iterative time reversal method. △ Less

Submitted 11 December, 2009; originally announced December 2009.

Comments: 6 figures, French-Chinese Summer Institute on Applied Mathematics, references are updated

arXiv:math/0512331 [pdf, ps, other]

The cost of approximate controllability for semilinear heat equations in one space dimension

Authors: Kim Dang Phung

Abstract: This paper deals with the approximate controllability for the semilinear heat equation in one space dimension. Our aim is to provide an estimate of the cost of the control. This paper deals with the approximate controllability for the semilinear heat equation in one space dimension. Our aim is to provide an estimate of the cost of the control. △ Less

Submitted 14 December, 2005; originally announced December 2005.

Showing 151–200 of 201 results for author: Phung, D