-
Effective Clustering on Large Attributed Bipartite Graphs
Authors:
Renchi Yang,
Yidu Wu,
Xiaoyang Lin,
Qichen Wang,
Tsz Nam Chan,
Jieming Shi
Abstract:
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, inclu…
▽ More
Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, including social network analysis, recommendation systems, information retrieval, and bioinformatics. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately, engendering severely compromised result quality. The severity of these issues is accentuated in real ABGs, which often encompass millions of nodes and a sheer volume of attribute data, rendering effective k-ABGC over such graphs highly challenging.
In this paper, we propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets. TPO obtains high clustering quality through two major contributions: (i) a novel formulation and transformation of the k-ABGC problem based on multi-scale attribute affinity specialized for capturing attribute affinities between nodes with the consideration of their multi-hop connections in ABGs, and (ii) a highly efficient solver that includes a suite of carefully-crafted optimizations for sidestep** explicit affinity matrix construction and facilitating faster convergence. Extensive experiments, comparing TPO against 19 baselines over 5 real ABGs, showcase the superior clustering quality of TPO measured against ground-truth labels. Moreover, compared to the state of the arts, TPO is often more than 40x faster over both small and large ABGs.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Treatment Outcome Prediction for Intracerebral Hemorrhage via Generative Prognostic Model with Imaging and Tabular Data
Authors:
Wenao Ma,
Cheng Chen,
Jill Abrigo,
Calvin Hoi-Kwan Mak,
Yuqi Gong,
Nga Yan Chan,
Chu Han,
Zaiyi Liu,
Qi Dou
Abstract:
Intracerebral hemorrhage (ICH) is the second most common and deadliest form of stroke. Despite medical advances, predicting treat ment outcomes for ICH remains a challenge. This paper proposes a novel prognostic model that utilizes both imaging and tabular data to predict treatment outcome for ICH. Our model is trained on observational data collected from non-randomized controlled trials, providin…
▽ More
Intracerebral hemorrhage (ICH) is the second most common and deadliest form of stroke. Despite medical advances, predicting treat ment outcomes for ICH remains a challenge. This paper proposes a novel prognostic model that utilizes both imaging and tabular data to predict treatment outcome for ICH. Our model is trained on observational data collected from non-randomized controlled trials, providing reliable predictions of treatment success. Specifically, we propose to employ a variational autoencoder model to generate a low-dimensional prognostic score, which can effectively address the selection bias resulting from the non-randomized controlled trials. Importantly, we develop a variational distributions combination module that combines the information from imaging data, non-imaging clinical data, and treatment assignment to accurately generate the prognostic score. We conducted extensive experiments on a real-world clinical dataset of intracerebral hemorrhage. Our proposed method demonstrates a substantial improvement in treatment outcome prediction compared to existing state-of-the-art approaches. Code is available at https://github.com/med-air/TOP-GPM
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Opti Code Pro: A Heuristic Search-based Approach to Code Refactoring
Authors:
Sourena Khanzadeh,
Samad Alias Nyein Chan,
Richard Valenzano,
Manar Alalfi
Abstract:
This paper presents an approach that evaluates best-first search methods to code refactoring. The motivation for code refactoring could be to improve the design, structure, or implementation of an existing program without changing its functionality. To solve a very specific problem of coupling and cohesion, we propose using heuristic search-based techniques on an approximation of the full code ref…
▽ More
This paper presents an approach that evaluates best-first search methods to code refactoring. The motivation for code refactoring could be to improve the design, structure, or implementation of an existing program without changing its functionality. To solve a very specific problem of coupling and cohesion, we propose using heuristic search-based techniques on an approximation of the full code refactoring problem, to guide the refactoring process toward solutions that have high cohesion and low coupling. We evaluated our approach by providing demonstrative examples of the effectiveness of this approach on random state problems and created a tool to implement the algorithm on Java projects.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Diffusion Model based Semi-supervised Learning on Brain Hemorrhage Images for Efficient Midline Shift Quantification
Authors:
Shizhan Gong,
Cheng Chen,
Yuqi Gong,
Nga Yan Chan,
Wenao Ma,
Calvin Hoi-Kwan Mak,
Jill Abrigo,
Qi Dou
Abstract:
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptio…
▽ More
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
△ Less
Submitted 1 January, 2023;
originally announced January 2023.
-
Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration
Authors:
Jie Chen,
Zaifeng Yang,
Tsz Nam Chan,
Hui Li,
Junhui Hou,
Lap-Pui Chau
Abstract:
High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an…
▽ More
High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Securing Isosceles Triangular Formations under Heterogeneous Sensing and Mixed Constraints
Authors:
Nelson P. K. Chan,
Bayu Jayawardhana,
Hector Garcia de Marina
Abstract:
This paper focuses on securing a triangular shape (up to translation) for a team of three mobile robots that uses heterogeneous sensing mechanism. Based on the available local information, each robot employs the popular gradient-based control law to attain the assigned individual task(s). In the current work, robots are assigned either distance and signed area task(s) or bearing task(s). We provid…
▽ More
This paper focuses on securing a triangular shape (up to translation) for a team of three mobile robots that uses heterogeneous sensing mechanism. Based on the available local information, each robot employs the popular gradient-based control law to attain the assigned individual task(s). In the current work, robots are assigned either distance and signed area task(s) or bearing task(s). We provide a sufficient condition on the gain ratio $R_{\text{Ad}}$ between the signed area and the distance control term such that the desired formation shape, an isosceles triangle, is reached from all feasible starting positions. Numerical simulations are provided to support the theoretical analyses.
△ Less
Submitted 2 January, 2021;
originally announced January 2021.
-
Gradient Descent over Metagrammars for Syntax-Guided Synthesis
Authors:
Nicolas Chan,
Elizabeth Polgreen,
Sanjit A. Seshia
Abstract:
The performance of a syntax-guided synthesis algorithm is highly dependent on the provision of a good syntactic template, or grammar. Provision of such a template is often left to the user to do manually, though in the absence of such a grammar, state-of-the-art solvers will provide their own default grammar, which is dependent on the signature of the target program to be sythesized. In this work,…
▽ More
The performance of a syntax-guided synthesis algorithm is highly dependent on the provision of a good syntactic template, or grammar. Provision of such a template is often left to the user to do manually, though in the absence of such a grammar, state-of-the-art solvers will provide their own default grammar, which is dependent on the signature of the target program to be sythesized. In this work, we speculate this default grammar could be improved upon substantially. We build sets of rules, or metagrammars, for constructing grammars, and perform a gradient descent over these metagrammars aiming to find a metagrammar which solves more benchmarks and on average faster. We show the resulting metagrammar enables CVC4 to solve 26% more benchmarks than the default grammar within a 300s time-out, and that metagrammars learnt from tens of benchmarks generalize to performance on 100s of benchmarks.
△ Less
Submitted 16 July, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Autonomous Vehicle Benchmarking using Unbiased Metrics
Authors:
David Paz,
Po-jung Lai,
Nathan Chan,
Yuqing Jiang,
Henrik I. Christensen
Abstract:
With the recent development of autonomous vehicle technology, there have been active efforts on the deployment of this technology at different scales that include urban and highway driving. While many of the prototypes showcased have been shown to operate under specific cases, little effort has been made to better understand their shortcomings and generalizability to new areas. Distance, uptime an…
▽ More
With the recent development of autonomous vehicle technology, there have been active efforts on the deployment of this technology at different scales that include urban and highway driving. While many of the prototypes showcased have been shown to operate under specific cases, little effort has been made to better understand their shortcomings and generalizability to new areas. Distance, uptime and number of manual disengagements performed during autonomous driving provide a high-level idea on the performance of an autonomous system but without proper data normalization, testing location information, and the number of vehicles involved in testing, the disengagement reports alone do not fully encompass system performance and robustness. Thus, in this study a complete set of metrics are applied for benchmarking autonomous vehicle systems in a variety of scenarios that can be extended for comparison with human drivers and other autonomous vehicle systems. These metrics have been used to benchmark UC San Diego's autonomous vehicle platforms during early deployments for micro-transit and autonomous mail delivery applications.
△ Less
Submitted 11 September, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Angle-Constrained Formation Control for Circular Mobile Robots
Authors:
Nelson P. K. Chan,
Bayu Jayawardhana,
Hector Garcia de Marina
Abstract:
In this letter, we investigate the formation control problem of mobile robots moving in the plane where, instead of assuming robots to be simple points, each robot is assumed to have the form of a disk with equal radius. Based on interior angle measurements of the neighboring robots' disk, which can be obtained from low-cost vision sensors, we propose a gradient-based distributed control law and s…
▽ More
In this letter, we investigate the formation control problem of mobile robots moving in the plane where, instead of assuming robots to be simple points, each robot is assumed to have the form of a disk with equal radius. Based on interior angle measurements of the neighboring robots' disk, which can be obtained from low-cost vision sensors, we propose a gradient-based distributed control law and show the exponential convergence property of the associated error system. By construction, the proposed control law has the appealing property of ensuring collision avoidance between neighboring robots. We also present simulation results for {a team} of four circular mobile robots forming a rectangular shape.
△ Less
Submitted 10 May, 2020;
originally announced May 2020.
-
PolyFit: Polynomial-based Indexing Approach for Fast Approximate Range Aggregate Queries
Authors:
Zhe Li,
Tsz Nam Chan,
Man Lung Yiu,
Christian S. Jensen
Abstract:
Range aggregate queries find frequent application in data analytics. In some use cases, approximate results are preferred over accurate results if they can be computed rapidly and satisfy approximation guarantees. Inspired by a recent indexing approach, we provide means of representing a discrete point data set by continuous functions that can then serve as compact index structures. More specifica…
▽ More
Range aggregate queries find frequent application in data analytics. In some use cases, approximate results are preferred over accurate results if they can be computed rapidly and satisfy approximation guarantees. Inspired by a recent indexing approach, we provide means of representing a discrete point data set by continuous functions that can then serve as compact index structures. More specifically, we develop a polynomial-based indexing approach, called PolyFit, for processing approximate range aggregate queries. PolyFit is capable of supporting multiple types of range aggregate queries, including COUNT, SUM, MIN and MAX aggregates, with guaranteed absolute and relative error bounds. Experiment results show that PolyFit is faster and more accurate and compact than existing learned index structures.
△ Less
Submitted 10 February, 2021; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Matrices of optimal tree-depth and a row-invariant parameterized algorithm for integer programming
Authors:
Timothy F. N. Chan,
Jacob W. Cooper,
Martin Koutecky,
Daniel Kral,
Kristyna Pekarkova
Abstract:
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for some function g. However, the dual tree-depth of a constraint matrix is not preserved by row operations, i.e., a given integer program can be equivalent…
▽ More
A long line of research on fixed parameter tractability of integer programming culminated with showing that integer programs with n variables and a constraint matrix with dual tree-depth d and largest entry D are solvable in time g(d,D)poly(n) for some function g. However, the dual tree-depth of a constraint matrix is not preserved by row operations, i.e., a given integer program can be equivalent to another with a smaller dual tree-depth, and thus does not reflect its geometric structure.
We prove that the minimum dual tree-depth of a row-equivalent matrix is equal to the branch-depth of the matroid defined by the columns of the matrix. We design a fixed parameter algorithm for computing branch-depth of matroids represented over a finite field and a fixed parameter algorithm for computing a row-equivalent matrix with minimum dual tree-depth. Finally, we use these results to obtain an algorithm for integer programming running in time g(d*,D)poly(n) where d* is the branch-depth of the constraint matrix; the branch-depth cannot be replaced by the more permissive notion of branch-width.
△ Less
Submitted 31 January, 2022; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Distributed estimation from relative measurements of heterogeneous and uncertain quality
Authors:
Chiara Ravazzi,
Nelson P. K. Chan,
Paolo Frasca
Abstract:
This paper studies the problem of estimation from relative measurements in a graph, in which a vector indexed over the nodes has to be reconstructed from pairwise measurements of differences between its components associated to nodes connected by an edge. In order to model heterogeneity and uncertainty of the measurements, we assume them to be affected by additive noise distributed according to a…
▽ More
This paper studies the problem of estimation from relative measurements in a graph, in which a vector indexed over the nodes has to be reconstructed from pairwise measurements of differences between its components associated to nodes connected by an edge. In order to model heterogeneity and uncertainty of the measurements, we assume them to be affected by additive noise distributed according to a Gaussian mixture. In this original setup, we formulate the problem of computing the Maximum-Likelihood (ML) estimates and we design two novel algorithms, based on Least Squares regression and Expectation-Maximization (EM). The first algorithm (LS- EM) is centralized and performs the estimation from relative measurements, the soft classification of the measurements, and the estimation of the noise parameters. The second algorithm (Distributed LS-EM) is distributed and performs estimation and soft classification of the measurements, but requires the knowledge of the noise parameters. We provide rigorous proofs of convergence of both algorithms and we present numerical experiments to evaluate and compare their performance with classical solutions. The experiments show the robustness of the proposed methods against different kinds of noise and, for the Distributed LS-EM, against errors in the knowledge of noise parameters.
△ Less
Submitted 26 July, 2018; v1 submitted 24 October, 2017;
originally announced October 2017.