Search | arXiv e-print repository

Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking

Authors: Duc Anh Le, Anh M. T. Bui, Phuong T. Nguyen, Davide Di Ruscio

Abstract: Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers' attention. Unfortunately, this is often underestimated, leaving room for improvement. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code sn… ▽ More Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers' attention. Unfortunately, this is often underestimated, leaving room for improvement. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code snippets and problem descriptions. Yet, getting high-quality titles is still a challenging task, attributed to both the quality of the input data (e.g., containing noise and ambiguity) and inherent constraints in sequence generation models. In this paper, we present FILLER as a solution to generating Stack Overflow post titles using a fine-tuned language model with self-improvement and post ranking. Our study focuses on enhancing pre-trained language models for generating titles for Stack Overflow posts, employing a training and subsequent fine-tuning paradigm for these models. To this end, we integrate the model's predictions into the training process, enabling it to learn from its errors, thereby lessening the effects of exposure bias. Moreover, we apply a post-ranking method to produce a variety of sample candidates, subsequently selecting the most suitable one. To evaluate FILLER, we perform experiments using benchmark datasets, and the empirical findings indicate that our model provides high-quality recommendations. Moreover, it significantly outperforms all the baselines, including Code2Que, SOTitle, CCBERT, M3NSCT5, and GPT3.5-turbo. A user study also shows that FILLER provides more relevant titles, with respect to SOTitle and GPT3.5-turbo. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: The paper has been per-reviewed and accepted for publication to the International Symposium on Empirical Software Engineering and Measurement (ESEM 2024)

arXiv:2403.13204 [pdf, other]

Diversity-Aware Agnostic Ensemble of Sharpness Minimizers

Authors: Anh Bui, Vy Vo, Tung Pham, Dinh Phung, Trung Le

Abstract: There has long been plenty of theoretical and empirical evidence supporting the success of ensemble learning. Deep ensembles in particular take advantage of training randomness and expressivity of individual neural networks to gain prediction diversity, ultimately leading to better generalization, robustness and uncertainty estimation. In respect of generalization, it is found that pursuing wider… ▽ More There has long been plenty of theoretical and empirical evidence supporting the success of ensemble learning. Deep ensembles in particular take advantage of training randomness and expressivity of individual neural networks to gain prediction diversity, ultimately leading to better generalization, robustness and uncertainty estimation. In respect of generalization, it is found that pursuing wider local minima result in models being more robust to shifts between training and testing sets. A natural research question arises out of these two approaches as to whether a boost in generalization ability can be achieved if ensemble learning and loss sharpness minimization are integrated. Our work investigates this connection and proposes DASH - a learning algorithm that promotes diversity and flatness within deep ensembles. More concretely, DASH encourages base learners to move divergently towards low-loss regions of minimal sharpness. We provide a theoretical backbone for our method along with extensive empirical evidence demonstrating an improvement in ensemble generalizability. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12326 [pdf, other]

Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts

Authors: Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung

Abstract: Generative models have demonstrated remarkable potential in generating visually impressive content from textual descriptions. However, training these models on unfiltered internet data poses the risk of learning and subsequently propagating undesirable concepts, such as copyrighted or unethical content. In this paper, we propose a novel method to remove undesirable concepts from text-to-image gene… ▽ More Generative models have demonstrated remarkable potential in generating visually impressive content from textual descriptions. However, training these models on unfiltered internet data poses the risk of learning and subsequently propagating undesirable concepts, such as copyrighted or unethical content. In this paper, we propose a novel method to remove undesirable concepts from text-to-image generative models by incorporating a learnable prompt into the cross-attention module. This learnable prompt acts as additional memory to transfer the knowledge of undesirable concepts into it and reduce the dependency of these concepts on the model parameters and corresponding textual inputs. Because of this knowledge transfer into the prompt, erasing these undesirable concepts is more stable and has minimal negative impact on other concepts. We demonstrate the effectiveness of our method on the Stable Diffusion model, showcasing its superiority over state-of-the-art erasure methods in terms of removing undesirable content while preserving other unrelated elements. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.05873 [pdf, other]

LEGION: Harnessing Pre-trained Language Models for GitHub Topic Recommendations with Distribution-Balance Loss

Authors: Yen-Trang Dang, Thanh-Le Cong, Phuc-Thanh Nguyen, Anh M. T. Bui, Phuong T. Nguyen, Bach Le, Quyet-Thang Huynh

Abstract: Open-source development has revolutionized the software industry by promoting collaboration, transparency, and community-driven innovation. Today, a vast amount of various kinds of open-source software, which form networks of repositories, is often hosted on GitHub - a popular software development platform. To enhance the discoverability of the repository networks, i.e., groups of similar reposito… ▽ More Open-source development has revolutionized the software industry by promoting collaboration, transparency, and community-driven innovation. Today, a vast amount of various kinds of open-source software, which form networks of repositories, is often hosted on GitHub - a popular software development platform. To enhance the discoverability of the repository networks, i.e., groups of similar repositories, GitHub introduced repository topics in 2017 that enable users to more easily explore relevant projects by type, technology, and more. It is thus crucial to accurately assign topics for each GitHub repository. Current methods for automatic topic recommendation rely heavily on TF-IDF for encoding textual data, presenting challenges in understanding semantic nuances. This paper addresses the limitations of existing techniques by proposing Legion, a novel approach that leverages Pre-trained Language Models (PTMs) for recommending topics for GitHub repositories. The key novelty of Legion is three-fold. First, Legion leverages the extensive capabilities of PTMs in language understanding to capture contextual information and semantic meaning in GitHub repositories. Second, Legion overcomes the challenge of long-tailed distribution, which results in a bias toward popular topics in PTMs, by proposing a Distribution-Balanced Loss (DB Loss) to better train the PTMs. Third, Legion employs a filter to eliminate vague recommendations, thereby improving the precision of PTMs. Our empirical evaluation on a benchmark dataset of real-world GitHub repositories shows that Legion can improve vanilla PTMs by up to 26% on recommending GitHubs topics. Legion also can suggest GitHub topics more precisely and effectively than the state-of-the-art baseline with an average improvement of 20% and 5% in terms of Precision and F1-score, respectively. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted to EASE'24

arXiv:2402.11101 [pdf]

doi 10.1039/D4EE00911H

Physics-based material parameters extraction from perovskite experiments via Bayesian optimization

Authors: Hualin Zhan, Viqar Ahmad, Azul Mayon, Grace Tabi, Anh Dinh Bui, Zhuofeng Li, Daniel Walter, Hieu Nguyen, Klaus Weber, Thomas White, Kylie Catchpole

Abstract: The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis… ▽ More The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis platform that can extract up to 8 fundamental material parameters of an organometallic perovskite semiconductor from a transient photoluminescence experiment, based on a complex full physics model that includes drift-diffusion of carriers and dynamic defect occupation. An example study of thermal degradation reveals that the carrier mobility and trap-assisted recombination coefficient are reduced noticeably, while the defect energy level remains nearly unchanged. The reduced carrier mobility can dominate the overall effect on thermal degradation of perovskite solar cells by reducing the fill factor, despite the opposite effect of the reduced trap-assisted recombination coefficient on increasing the fill factor. In future, this platform can be conveniently applied to other experiments or to combinations of experiments, accelerating materials discovery and optimization of semiconductor materials for photovoltaics and other applications. △ Less

Submitted 29 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: The work is published in Energy & Environmental Science (DOI: 10.1039/D4EE00911H). This work is supported by the Australian Centre for Advanced Photovoltaics (ACAP) and received funding from the Australian Renewable Energy Agency (ARENA). H.Z. acknowledges the support of the ACAP Fellowship. H.Z. thanks Pawsey for providing the Nimbus Research Cloud Service

arXiv:2402.00024 [pdf, other]

Can Large Language Models Understand Molecules?

Authors: Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, Alioune Ngom

Abstract: Purpose: Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the abi… ▽ More Purpose: Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the ability to decode SMILES strings into vector representations. Method: We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks, focusing on two key applications: molecular property prediction and drug-drug interaction prediction. Results: We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to pre-trained models on SMILES in molecular prediction tasks and outperform the pre-trained models for the DDI prediction tasks. Conclusion: The performance of LLMs in generating SMILES embeddings shows great potential for further investigation of these models for molecular embedding. We hope our study bridges the gap between LLMs and molecular embedding, motivating additional research into the potential of LLMs in the molecular representation field. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-GPT △ Less

Submitted 20 May, 2024; v1 submitted 5 January, 2024; originally announced February 2024.

arXiv:2311.09671 [pdf, ps, other]

Robust Contrastive Learning With Theory Guarantee

Authors: Ngoc N. Tran, Lam Tran, Hoang Phan, Anh Bui, Tung Pham, Toan Tran, Dinh Phung, Trung Le

Abstract: Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyz… ▽ More Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyzed how the unsupervised loss in the first phase can support the supervised loss in the second phase, none has examined the connection between the unsupervised loss and the robust supervised loss, which can shed light on how to construct an effective unsupervised loss for the first phase of CL. To fill this gap, our work develops rigorous theories to dissect and identify which components in the unsupervised loss can help improve the robust supervised loss and conduct proper experiments to verify our findings. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 27 pages, 0 figures. arXiv admin note: text overlap with arXiv:2305.10252

arXiv:2311.00737 [pdf]

Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning

Authors: Dang Nguyen, Phat K. Huynh, Vinh Duc An Bui, Kee Young Hwang, Nityanand Jain, Chau Nguyen, Le Huu Nhat Minh, Le Van Truong, Xuan Thanh Nguyen, Dinh Hoang Nguyen, Le Tien Dung, Trung Q. Le, Manh-Huong Phan

Abstract: The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through thre… ▽ More The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through three specific breath testing protocols: normal breath, holding breath, and deep breath. We collected breath data from both COVID-19 patients and healthy subjects in Vietnam using this platform, which then served to train and validate ML models. Our evaluation encompassed multiple ML algorithms, including support vector machines and deep learning models, assessing their ability to diagnose COVID-19. Our multi-model validation methodology ensures a thorough comparison and grants the adaptability to select the most optimal model, striking a balance between diagnostic precision with model interpretability. The findings highlight the exceptional potential of our diagnostic tool in pinpointing respiratory anomalies, achieving over 90% accuracy. This innovative sensor technology can be seamlessly integrated into healthcare settings for patient monitoring, marking a significant enhancement for the healthcare infrastructure. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2306.04178 [pdf, other]

Optimal Transport Model Distributional Robustness

Authors: Van-Anh Nguyen, Trung Le, Anh Tuan Bui, Thanh-Toan Do, Dinh Phung

Abstract: Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in the data space. In this work, we explore an optimal transport-based distributional robustness framework in model spaces. Specifically, we examine a model dist… ▽ More Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in the data space. In this work, we explore an optimal transport-based distributional robustness framework in model spaces. Specifically, we examine a model distribution within a Wasserstein ball centered on a given model distribution that maximizes the loss. We have developed theories that enable us to learn the optimal robust center model distribution. Interestingly, our developed theories allow us to flexibly incorporate the concept of sharpness awareness into training, whether it's a single model, ensemble models, or Bayesian Neural Networks, by considering specific forms of the center model distribution. These forms include a Dirac delta distribution over a single model, a uniform distribution over several models, and a general Bayesian Neural Network. Furthermore, we demonstrate that Sharpness-Aware Minimization (SAM) is a specific case of our framework when using a Dirac delta distribution over a single model, while our framework can be seen as a probabilistic extension of SAM. To validate the effectiveness of our framework in the aforementioned settings, we conducted extensive experiments, and the results reveal remarkable improvements compared to the baselines. △ Less

Submitted 1 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted at NeurIPs 2023

Journal ref: Advances in Neural Information Processing Systems, 2023

arXiv:2304.13229 [pdf, other]

Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Authors: Anh Bui, Trung Le, He Zhao, Quan Tran, Paul Montague, Dinh Phung

Abstract: Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that… ▽ More Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that maximize the model losses for simultaneously attacking multiple models). Therefore, multi-objective optimization (MOO) is a natural tool for adversarial example generation to achieve multiple objectives/goals simultaneously. However, we observe that a naive application of MOO tends to maximize all objectives/goals equally, without caring if an objective/goal has been achieved yet. This leads to useless effort to further improve the goal-achieved tasks, while putting less focus on the goal-unachieved tasks. In this paper, we propose \emph{Task Oriented MOO} to address this issue, in the context where we can explicitly define the goal achievement for a task. Our principle is to only maintain the goal-achieved tasks, while letting the optimizer spend more effort on improving the goal-unachieved tasks. We conduct comprehensive experiments for our Task Oriented MOO on various adversarial example generation schemes. The experimental results firmly demonstrate the merit of our proposed approach. Our code is available at \url{https://github.com/tuananhbui89/TAMOO}. △ Less

Submitted 1 June, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.10175 [pdf]

Automated Dynamic Bayesian Networks for Predicting Acute Kidney Injury Before Onset

Authors: David Gordon, Panayiotis Petousis, Anders O. Garlid, Keith Norris, Katherine Tuttle, Susanne B. Nicholas, Alex A. T. Bui

Abstract: Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS),… ▽ More Several algorithms for learning the structure of dynamic Bayesian networks (DBNs) require an a priori ordering of variables, which influences the determined graph topology. However, it is often unclear how to determine this order if feature importance is unknown, especially as an exhaustive search is usually impractical. In this paper, we introduce Ranking Approaches for Unknown Structures (RAUS), an automated framework to systematically inform variable ordering and learn networks end-to-end. RAUS leverages existing statistical methods (Cramers V, chi-squared test, and information gain) to compare variable ordering, resultant generated network topologies, and DBN performance. RAUS enables end-users with limited DBN expertise to implement models via command line interface. We evaluate RAUS on the task of predicting impending acute kidney injury (AKI) from inpatient clinical laboratory data. Longitudinal observations from 67,460 patients were collected from our electronic health record (EHR) and Kidney Disease Improving Global Outcomes (KDIGO) criteria were then applied to define AKI events. RAUS learns multiple DBNs simultaneously to predict a future AKI event at different time points (i.e., 24-, 48-, 72-hours in advance of AKI). We also compared the results of the learned AKI prediction models and variable orderings to baseline techniques (logistic regression, random forests, and extreme gradient boosting). The DBNs generated by RAUS achieved 73-83% area under the receiver operating characteristic curve (AUCROC) within 24-hours before AKI; and 71-79% AUCROC within 48-hours before AKI of any stage in a 7-day observation window. Insights from this automated framework can help efficiently implement and interpret DBNs for clinical decision support. The source code for RAUS is available in GitHub at https://github.com/dgrdn08/RAUS . △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 27 pages (including 8 pages supplementary information)

arXiv:2212.03069 [pdf, other]

Multiple Perturbation Attack: Attack Pixelwise Under Different $\ell_p$-norms For Better Adversarial Performance

Authors: Ngoc N. Tran, Anh Tuan Bui, Dinh Phung, Trung Le

Abstract: Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to maliciou… ▽ More Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually. △ Less

Submitted 7 December, 2022; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: 18 pages, 8 figures, 7 tables

arXiv:2211.04773 [pdf, other]

SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation

Authors: Anh Duc Bui, Soyeon Caren Han, Josiah Poon

Abstract: Scene Graph Generation (SGG) serves a comprehensive representation of the images for human understanding as well as visual understanding tasks. Due to the long tail bias problem of the object and predicate labels in the available annotated data, the scene graph generated from current methodologies can be biased toward common, non-informative relationship labels. Relationship can sometimes be non-m… ▽ More Scene Graph Generation (SGG) serves a comprehensive representation of the images for human understanding as well as visual understanding tasks. Due to the long tail bias problem of the object and predicate labels in the available annotated data, the scene graph generated from current methodologies can be biased toward common, non-informative relationship labels. Relationship can sometimes be non-mutually exclusive, which can be described from multiple perspectives like geometrical relationships or semantic relationships, making it even more challenging to predict the most suitable relationship label. In this work, we proposed the SG-Shuffle pipeline for scene graph generation with 3 components: 1) Parallel Transformer Encoder, which learns to predict object relationships in a more exclusive manner by grou** relationship labels into groups of similar purpose; 2) Shuffle Transformer, which learns to select the final relationship labels from the category-specific feature generated in the previous step; and 3) Weighted CE loss, used to alleviate the training bias caused by the imbalanced dataset. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2203.00553 [pdf, other]

Global-Local Regularization Via Distributional Robustness

Authors: Hoang Phan, Trung Le, Trung Phung, Tuan Anh Bui, Nhat Ho, Dinh Phung

Abstract: Despite superior performance in many situations, deep neural networks are often vulnerable to adversarial examples and distribution shifts, limiting model generalization ability in real-world applications. To alleviate these problems, recent approaches leverage distributional robustness optimization (DRO) to find the most challenging distribution, and then minimize loss function over this most cha… ▽ More Despite superior performance in many situations, deep neural networks are often vulnerable to adversarial examples and distribution shifts, limiting model generalization ability in real-world applications. To alleviate these problems, recent approaches leverage distributional robustness optimization (DRO) to find the most challenging distribution, and then minimize loss function over this most challenging distribution. Regardless of achieving some improvements, these DRO approaches have some obvious limitations. First, they purely focus on local regularization to strengthen model robustness, missing a global regularization effect which is useful in many real-world applications (e.g., domain adaptation, domain generalization, and adversarial machine learning). Second, the loss functions in the existing DRO approaches operate in only the most challenging distribution, hence decouple with the original distribution, leading to a restrictive modeling capability. In this paper, we propose a novel regularization technique, following the veins of Wasserstein-based DRO framework. Specifically, we define a particular joint distribution and Wasserstein-based uncertainty, allowing us to couple the original and most challenging distributions for enhancing modeling capability and applying both local and global regularizations. Empirical studies on different learning problems demonstrate that our proposed approach significantly outperforms the existing regularization approaches in various domains: semi-supervised learning, domain adaptation, domain generalization, and adversarial machine learning. △ Less

Submitted 12 February, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

arXiv:2202.13437 [pdf, other]

A Unified Wasserstein Distributional Robustness Framework for Adversarial Training

Authors: Tuan Anh Bui, Trung Le, Quan Tran, He Zhao, Dinh Phung

Abstract: It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and T… ▽ More It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to "probe" the vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings. △ Less

Submitted 27 February, 2022; originally announced February 2022.

arXiv:2111.13646 [pdf]

doi 10.1109/TPAMI.2023.3346212

Dimension Reduction with Prior Information for Knowledge Discovery

Authors: Anh Tuan Bui

Abstract: This paper addresses the problem of map** high-dimensional data to a low-dimensional space, in the presence of other known features. This problem is ubiquitous in science and engineering as there are often controllable/measurable features in most applications. To solve this problem, this paper proposes a broad class of methods, which is referred to as conditional multidimensional scaling (MDS).… ▽ More This paper addresses the problem of map** high-dimensional data to a low-dimensional space, in the presence of other known features. This problem is ubiquitous in science and engineering as there are often controllable/measurable features in most applications. To solve this problem, this paper proposes a broad class of methods, which is referred to as conditional multidimensional scaling (MDS). An algorithm for optimizing the objective function of conditional MDS is also developed. The convergence of this algorithm is proven under mild assumptions. Conditional MDS is illustrated with kinship terms, facial expressions, textile fabrics, car-brand perception, and cylinder machining examples. These examples demonstrate the advantages of conditional MDS over conventional dimension reduction in improving the estimation quality of the reduced-dimension space and simplifying visualization and knowledge discovery tasks. Computer codes for this work are available in the open-source cml R package. △ Less

Submitted 29 December, 2023; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: Article accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 pages, 8 figures

arXiv:2101.10027 [pdf, other]

Understanding and Achieving Efficient Robustness with Adversarial Supervised Contrastive Learning

Authors: Anh Bui, Trung Le, He Zhao, Paul Montague, Seyit Camtepe, Dinh Phung

Abstract: Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the opportunity to `contrast' between data and class representation in the latent space. In this paper, we investigate CL for improving model robustness… ▽ More Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the opportunity to `contrast' between data and class representation in the latent space. In this paper, we investigate CL for improving model robustness using adversarial samples. We first designed and performed a comprehensive study to understand how adversarial vulnerability behaves in the latent space. Based on this empirical evidence, we propose an effective and efficient supervised contrastive learning to achieve model robustness against adversarial attacks. Moreover, we propose a new sample selection strategy that optimizes the positive/negative sets by removing redundancy and improving correlation with the anchor. Extensive experiments show that our Adversarial Supervised Contrastive Learning (ASCL) approach achieves comparable performance with the state-of-the-art defenses while significantly outperforms other CL-based defense methods by using only $42.8\%$ positives and $6.3\%$ negatives. △ Less

Submitted 22 October, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

arXiv:2012.06916 [pdf, other]

Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors

Authors: Kungang Zhang, Anh T. Bui, Daniel W. Apley

Abstract: Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive rela… ▽ More Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for hel** to identify the nature of the changes. △ Less

Submitted 12 September, 2022; v1 submitted 12 December, 2020; originally announced December 2020.

arXiv:2009.09612 [pdf, other]

Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

Authors: Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, Dinh Phung

Abstract: Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets… ▽ More Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy. Our code is available at: https://github.com/tuananhbui89/Crossing-Collaborative-Ensemble. △ Less

Submitted 4 February, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

arXiv:2007.05123 [pdf, other]

Improving Adversarial Robustness by Enforcing Local and Global Compactness

Authors: Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, Dinh Phung

Abstract: The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representa… ▽ More The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances. △ Less

Submitted 9 July, 2020; originally announced July 2020.

Comments: Proceeding of the European Conference on Computer Vision (ECCV) 2020

arXiv:1908.06337 [pdf, other]

doi 10.1016/j.media.2020.101834

EigenRank by Committee: A Data Subset Selection and Failure Prediction paradigm for Robust Deep Learning based Medical Image Segmentation

Authors: Bilwaj Gaonkar, Joel Beckett, Mark Attiah, Christine Ahn, Matthew Edwards, Bayard Wilson, Azim Laiwalla, Banafsheh Salehi, Bryan Yoo, Alex Bui, Luke Macyszyn

Abstract: Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert phys… ▽ More Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert physicians to a likely segmentation failure. Here we propose a novel algorithm, named `Eigenrank' which addresses both of these challenges. Eigenrank can select for manual labeling, a subset of medical images from a large database, such that a U-Net trained on this subset is superior to one trained on a randomly selected subset of the same size. Eigenrank can also be used to pick out, cases in a large database, where deep learning segmentation will fail. We present our algorithm, followed by results and a discussion of how Eigenrank exploits the Von Neumann information to perform both data subset selection and failure prediction for medical image segmentation using deep learning. △ Less

Submitted 18 January, 2021; v1 submitted 17 August, 2019; originally announced August 2019.

MSC Class: 68T45 (Primary) 68T05; 68T20 (Secondary) ACM Class: I.5.4; I.4.6

Journal ref: Medical Image Analysis, Volume 67, 2021, Medical Image Analysis, Volume 67,2021,101834,ISSN 1361-8415,

arXiv:1810.01621 [pdf, other]

Extreme Augmentation : Can deep learning based medical image segmentation be trained using a single manually delineated scan?

Authors: Bilwaj Gaonkar, Matthew Edwards, Alex Bui, Matthew Brown, Luke Macyszyn

Abstract: Yes, it can. Data augmentation is perhaps the oldest preprocessing step in computer vision literature. Almost every computer vision model trained on imaging data uses some form of augmentation. In this paper, we use the inter-vertebral disk segmentation task alongside a deep residual U-Net as the learning model, to explore the effectiveness of augmentation. In the extreme, we observed that a model… ▽ More Yes, it can. Data augmentation is perhaps the oldest preprocessing step in computer vision literature. Almost every computer vision model trained on imaging data uses some form of augmentation. In this paper, we use the inter-vertebral disk segmentation task alongside a deep residual U-Net as the learning model, to explore the effectiveness of augmentation. In the extreme, we observed that a model trained on patches extracted from just one scan, with each patch augmented 50 times; achieved a Dice score of 0.73 in a validation set of 40 cases. Qualitative evaluation indicated a clinically usable segmentation algorithm, which appropriately segments regions of interest, alongside limited false positive specks. When the initial patches are extracted from nine scans the average Dice coefficient jumps to 0.86 and most of the false positives disappear. While this still falls short of state-of-the-art deep learning based segmentation of discs reported in literature, qualitative examination reveals that it does yield segmentation, which can be amended by expert clinicians with minimal effort to generate additional data for training improved deep models. Extreme augmentation of training data, should thus be construed as a strategy for training deep learning based algorithms, when very little manually annotated data is available to work with. Models trained with extreme augmentation can then be used to accelerate the generation of manually labelled data. Hence, we show that extreme augmentation can be a valuable tool in addressing scaling up small imaging data sets to address medical image segmentation tasks. △ Less

Submitted 6 September, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

arXiv:1806.00712 [pdf, ps, other]

An Interpretable Deep Hierarchical Semantic Convolutional Neural Network for Lung Nodule Malignancy Classification

Authors: Shiwen Shen, Simon X. Han, Denise R. Aberle, Alex A. T. Bui, Willliam Hsu

Abstract: While deep learning methods are increasingly being applied to tasks such as computer-aided diagnosis, these models are difficult to interpret, do not incorporate prior domain knowledge, and are often considered as a "black-box." The lack of model interpretability hinders them from being fully understood by target users such as radiologists. In this paper, we present a novel interpretable deep hier… ▽ More While deep learning methods are increasingly being applied to tasks such as computer-aided diagnosis, these models are difficult to interpret, do not incorporate prior domain knowledge, and are often considered as a "black-box." The lack of model interpretability hinders them from being fully understood by target users such as radiologists. In this paper, we present a novel interpretable deep hierarchical semantic convolutional neural network (HSCNN) to predict whether a given pulmonary nodule observed on a computed tomography (CT) scan is malignant. Our network provides two levels of output: 1) low-level radiologist semantic features, and 2) a high-level malignancy prediction score. The low-level semantic outputs quantify the diagnostic features used by radiologists and serve to explain how the model interprets the images in an expert-driven manner. The information from these low-level tasks, along with the representations learned by the convolutional layers, are then combined and used to infer the high-level task of predicting nodule malignancy. This unified architecture is trained by optimizing a global loss function including both low- and high-level tasks, thereby learning all the parameters within a joint framework. Our experimental results using the Lung Image Database Consortium (LIDC) show that the proposed method not only produces interpretable lung cancer predictions but also achieves significantly better results compared to common 3D CNN approaches. △ Less

Submitted 2 June, 2018; originally announced June 2018.

arXiv:1706.06087 [pdf]

Aztec: A Platform to Render Biomedical Software Findable, Accessible, Interoperable, and Reusable

Authors: Wei Wang, Brian Bleakley, Chelsea Ju, Vincent Kyi, Patrick Tan, Howard Choi, Xinxin Huang, Yichao Zhou, Justin Wood, Ding Wang, Alex Bui, Peipei **

Abstract: Precision medicine and health requires the characterization and phenoty** of biological systems and patient datasets using a variety of data formats. This scenario mandates the centralization of various tools and resources in a unified platform to render them Findable, Accessible, Interoperable, and Reusable (FAIR Principles). Leveraging these principles, Aztec provides the scientific community… ▽ More Precision medicine and health requires the characterization and phenoty** of biological systems and patient datasets using a variety of data formats. This scenario mandates the centralization of various tools and resources in a unified platform to render them Findable, Accessible, Interoperable, and Reusable (FAIR Principles). Leveraging these principles, Aztec provides the scientific community with a new platform that promotes a long-term, sustainable ecosystem of biomedical research software. Aztec is available at https://aztec.bio and its source code is hosted at https://github.com/BD2K-Aztec. △ Less

Submitted 19 June, 2017; originally announced June 2017.

Comments: 21 pages, 4 figures, 2 tables

ACM Class: H.2.8; H.3.1; H.3.3; H.3.6; H.3.7; I.2.6; I.2.7

arXiv:1112.4536 [pdf, ps, other]

Intractability of the Minimum-Flip Supertree problem and its variants

Authors: Sebastian Böcker, Quang Bao Anh Bui, Francois Nicolas, Anke Truss

Abstract: Computing supertrees is a central problem in phylogenetics. The supertree method that is by far the most widely used today was introduced in 1992 and is called Matrix Representation with Parsimony analysis (MRP). Matrix Representation using Flip** (MRF)}, which was introduced in 2002, is an interesting variant of MRP: MRF is arguably more relevant that MRP and various efficient implementations o… ▽ More Computing supertrees is a central problem in phylogenetics. The supertree method that is by far the most widely used today was introduced in 1992 and is called Matrix Representation with Parsimony analysis (MRP). Matrix Representation using Flip** (MRF)}, which was introduced in 2002, is an interesting variant of MRP: MRF is arguably more relevant that MRP and various efficient implementations of MRF have been presented. From a theoretical point of view, implementing MRF or MRP is solving NP-hard optimization problems. The aim of this paper is to study the approximability and the fixed-parameter tractability of the optimization problem corresponding to MRF, namely Minimum-Flip Supertree. We prove strongly negative results. △ Less

Submitted 19 December, 2011; originally announced December 2011.

Comments: To be submitted

arXiv:1109.3561 [pdf, other]

Universal adaptive self-stabilizing traversal scheme: random walk and reloading wave

Authors: Thibault Bernard, Alain Bui, Devan Sohier

Abstract: In this paper, we investigate random walk based token circulation in dynamic environments subject to failures. We describe hypotheses on the dynamic environment that allow random walks to meet the important property that the token visits any node infinitely often. The randomness of this scheme allows it to work on any topology, and require no adaptation after a topological change, which is a desir… ▽ More In this paper, we investigate random walk based token circulation in dynamic environments subject to failures. We describe hypotheses on the dynamic environment that allow random walks to meet the important property that the token visits any node infinitely often. The randomness of this scheme allows it to work on any topology, and require no adaptation after a topological change, which is a desirable property for applications to dynamic systems. For random walks to be a traversal scheme and to answer the concurrence problem, one needs to guarantee that exactly one token circulates in the system. In the presence of transient failures, configurations with multiple tokens or with no token can occur. The meeting property of random walks solves the cases with multiple tokens. The reloading wave mechanism we propose, together with timeouts, allows to detect and solve cases with no token. This traversal scheme is self-stabilizing, and universal, meaning that it needs no assumption on the system topology. We describe conditions on the dynamicity (with a local detection criterion) under which the algorithm is tolerant to dynamic reconfigurations. We conclude by a study on the time between two visits of the token to a node, which we use to tune the parameters of the reloading wave mechanism according to some system characteristics. △ Less

Submitted 16 September, 2011; originally announced September 2011.

arXiv:1011.2953 [pdf, other]

A Distributed Clustering Algorithm for Dynamic Networks

Authors: Thibault Bernard, Alain Bui, Laurence Pilard, Devan Sohier

Abstract: We propose an algorithm that builds and maintains clusters over a network subject to mobility. This algorithm is fully decentralized and makes all the different clusters grow concurrently. The algorithm uses circulating tokens that collect data and move according to a random walk traversal scheme. Their task consists in (i) creating a cluster with the nodes it discovers and (ii) managing the clust… ▽ More We propose an algorithm that builds and maintains clusters over a network subject to mobility. This algorithm is fully decentralized and makes all the different clusters grow concurrently. The algorithm uses circulating tokens that collect data and move according to a random walk traversal scheme. Their task consists in (i) creating a cluster with the nodes it discovers and (ii) managing the cluster expansion; all decisions affecting the cluster are taken only by a node that owns the token. The size of each cluster is maintained higher than $m$ nodes ($m$ is a parameter of the algorithm). The obtained clustering is locally optimal in the sense that, with only a local view of each clusters, it computes the largest possible number of clusters (\emph{ie} the sizes of the clusters are as close to $m$ as possible). This algorithm is designed as a decentralized control algorithm for large scale networks and is mobility-adaptive: after a series of topological changes, the algorithm converges to a clustering. This recomputation only affects nodes in clusters in which topological changes happened, and in adjacent clusters. △ Less

Submitted 12 November, 2010; originally announced November 2010.

arXiv:0812.0736 [pdf, ps, other]

Fully distributed and fault tolerant task management based on diffusions

Authors: Alain Bui, Olivier Flauzac, Cyril Rabat

Abstract: The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and a higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks tha… ▽ More The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and a higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, these mechanisms have to limit the number of exchanged messages to avoid the overload of the network. In a previous paper, we have proposed two methods for the task management called active and passive. These methods are based on a random walk: they are fully distributed and fault tolerant. Each node owns a local tasks states set updated thanks to a random walk and each node is in charge of the local assignment. Here, we propose three methods to improve the efficiency of the active method. These new methods are based on a circulating word. The nodes local tasks states sets are updated thanks to periodical diffusions along trees built from the circulating word. Particularly, we show that these methods increase the efficiency of the active method: they produce less replicated tasks. These three methods are also fully distributed and fault tolerant. On the other way, the circulating word can be exploited for other applications like the resources management or the nodes synchronization. △ Less

Submitted 3 December, 2008; originally announced December 2008.

arXiv:0807.3632 [pdf, other]

How to Compute Times of Random Walks based Distributed Algorithms

Authors: Alain Bui, Devan Sohier

Abstract: Random walk based distributed algorithms make use of a token that circulates in the system according to a random walk scheme to achieve their goal. To study their efficiency and compare it to one of the deterministic solutions, one is led to compute certain quantities, namely the hitting times and the cover time. Until now, only bounds on these quantities were defined. First, this paper presents… ▽ More Random walk based distributed algorithms make use of a token that circulates in the system according to a random walk scheme to achieve their goal. To study their efficiency and compare it to one of the deterministic solutions, one is led to compute certain quantities, namely the hitting times and the cover time. Until now, only bounds on these quantities were defined. First, this paper presents two generalizations of the notions of hitting and cover times to weighted graphs. Indeed, the properties of random walks on symmetrically weighted graphs provide interesting results on random walk based distributed algorithms, such as local load balancing. Both of these generalization are proposed to precisely represent the behaviour of these algorithms, and to take into account what the weights represent. Then, we propose an algorithm to compute the n^2 hitting times on a weighted graph of n vertices, which we improve to obtain a O(n^3) complexity. This complexity is the lowest up to now. This algorithm computes both of the generalizations that we propose for the hitting times on a weighted graph. Finally, we provide the first algorithm to compute the cover time (in both senses) of a graph. We improve it to achieve a complexity of O(n^3 2^n). The algorithms that we present are all robust to a topological change in a limited number of edges. This property allows us to use them on dynamic graphs. △ Less

Submitted 23 July, 2008; originally announced July 2008.

Comments: 18 pages

Showing 1–29 of 29 results for author: Bui, A