Skip to main content

Showing 1–29 of 29 results for author: Le, H M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19765  [pdf, other

    cs.SE cs.LG

    Systematic Literature Review on Application of Learning-based Approaches in Continuous Integration

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, M. Ali Babar

    Abstract: Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensivel… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to be published in IEEE Access

  2. arXiv:2404.17110  [pdf, other

    cs.SE cs.CR cs.LG

    Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

    Authors: Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

    Abstract: Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for develo** high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in the 4th International Workshop on Software Security co-located with the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

  3. arXiv:2402.01955  [pdf, other

    cs.LG cs.AI math.FA

    OPSurv: Orthogonal Polynomials Quadrature Algorithm for Survival Analysis

    Authors: Lilian W. Bialokozowicz, Hoang M. Le, Tristan Sylvain, Peter A. I. Forsyth, Vineel Nagisetty, Greg Mori

    Abstract: This paper introduces the Orthogonal Polynomials Quadrature Algorithm for Survival Analysis (OPSurv), a new method providing time-continuous functional outputs for both single and competing risks scenarios in survival analysis. OPSurv utilizes the initial zero condition of the Cumulative Incidence function and a unique decomposition of probability densities using orthogonal polynomials, allowing i… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    MSC Class: 68W25 (Primary); 65Z05 (Secondary) ACM Class: I.2.0; J.3

  4. arXiv:2401.11105  [pdf, other

    cs.SE cs.CR cs.LG

    Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

    Authors: Triet H. M. Le, Xiaoning Du, M. Ali Babar

    Abstract: Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the useful… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted as a full paper in the technical track at the 21st International Conference on Mining Software Repositories (MSR) 2024

  5. arXiv:2305.12736   

    cs.SE

    Mitigating ML Model Decay in Continuous Integration with Data Drift Detection: An Empirical Study

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgen… ▽ More

    Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

  6. arXiv:2305.12695   

    cs.SE cs.LG

    Systematic Literature Review on Application of Machine Learning in Continuous Integration

    Authors: Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi, Muhammad Ali Babar

    Abstract: This research conducted a systematic review of the literature on machine learning (ML)-based methods in the context of Continuous Integration (CI) over the past 22 years. The study aimed to identify and describe the techniques used in ML-based solutions for CI and analyzed various aspects such as data engineering, feature engineering, hyper-parameter tuning, ML models, evaluation methods, and metr… ▽ More

    Submitted 17 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: This paper got a rejection and we need to address the comments and upload the new version with new results

  7. arXiv:2304.11743  [pdf, other

    cs.CV

    GamutMLP: A Lightweight MLP for Color Loss Recovery

    Authors: Hoang M. Le, Brian Price, Scott Cohen, Michael S. Brown

    Abstract: Cameras and image-editing software often process images in the wide-gamut ProPhoto color space, encompassing 90% of all visible colors. However, when images are encoded for sharing, this color-rich representation is transformed and clipped to fit within the small-gamut standard RGB (sRGB) color space, representing only 30% of visible colors. Recovering the lost color information is challenging due… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

  8. arXiv:2304.02829  [pdf, other

    cs.SE cs.LG

    SoK: Machine Learning for Continuous Integration

    Authors: Ali Kazemi Arani, Mansooreh Zahedi, Triet Huynh Minh Le, Muhammad Ali Babar

    Abstract: Continuous Integration (CI) has become a well-established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: 6 pages, 2 figures, accepted in the ICSE'23 Workshop on Cloud Intelligence / AIOps

  9. arXiv:2207.11708  [pdf, other

    cs.SE cs.CR cs.LG

    Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches

    Authors: Triet H. M. Le

    Abstract: The thesis advances the field of software security by providing knowledge and automation support for software vulnerability assessment using data-driven approaches. Software vulnerability assessment provides important and multifaceted information to prevent and mitigate dangerous cyber-attacks in the wild. The key contributions include a systematisation of knowledge, along with a suite of novel da… ▽ More

    Submitted 20 June, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

    Comments: A thesis submitted for the degree of Doctor of Philosophy at The University of Adelaide. The official version of the thesis can be found at the institutional repository: https://hdl.handle.net/2440/135914

  10. arXiv:2206.09546  [pdf, other

    cs.LG cs.AI cs.LO

    Policy Optimization with Linear Temporal Logic Constraints

    Authors: Cameron Voloshin, Hoang M. Le, Swarat Chaudhuri, Yisong Yue

    Abstract: We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints. The language of LTL allows flexible description of tasks that may be unnatural to encode as a scalar cost function. We consider LTL-constrained PO as a systematic framework, decoupling task specification from policy selection, and as an alternative to the standard of cost sha**. With access to a genera… ▽ More

    Submitted 19 October, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

  11. arXiv:2203.08417  [pdf, other

    cs.SE cs.CR cs.LG

    On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

    Authors: Triet H. M. Le, M. Ali Babar

    Abstract: Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize t… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted as a full paper in the technical track at the 19th International Conference on Mining Software Repositories (MSR) 2022

  12. arXiv:2109.04029  [pdf, other

    cs.CR cs.AI cs.LG

    Automated Security Assessment for the Internet of Things

    Authors: Xuanyu Duan, Mengmeng Ge, Triet H. M. Le, Faheem Ullah, Shang Gao, Xuequan Lu, M. Ali Babar

    Abstract: Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learni… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted for publication at the 26th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2021)

  13. arXiv:2108.08041  [pdf, other

    cs.SE cs.CR cs.LG

    DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

    Authors: Triet H. M. Le, David Hin, Roland Croft, M. Ali Babar

    Abstract: It is increasingly suggested to identify Software Vulnerabilities (SVs) in code commits to give early warnings about potential security risks. However, there is a lack of effort to assess vulnerability-contributing commits right after they are detected to provide timely information about the exploitability, impact and severity of SVs. Such information is important to plan and prioritize the mitiga… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: Accepted as a full paper at the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2021

  14. arXiv:2107.08364  [pdf, other

    cs.SE cs.AI cs.CR cs.LG

    A Survey on Data-driven Software Vulnerability Assessment and Prioritization

    Authors: Triet H. M. Le, Huaming Chen, M. Ali Babar

    Abstract: Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surges in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken… ▽ More

    Submitted 3 April, 2022; v1 submitted 18 July, 2021; originally announced July 2021.

    Comments: Accepted for publication in the ACM Computing Surveys journal (CSUR), 2022

    Journal ref: ACM Comput. Surv., 55, 5 (2022), Article 100

  15. Automated Software Vulnerability Assessment with Concept Drift

    Authors: Triet H. M. Le, Bushra Sabir, M. Ali Babar

    Abstract: Software Engineering researchers are increasingly using Natural Language Processing (NLP) techniques to automate Software Vulnerabilities (SVs) assessment using the descriptions in public repositories. However, the existing NLP-based approaches suffer from concept drift. This problem is caused by a lack of proper treatment of new (out-of-vocabulary) terms for the evaluation of unseen SVs over time… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

    Comments: Published as a full paper at the 16th International Conference on Mining Software Repositories 2019

    Journal ref: Proceedings of the 16th International Conference on Mining Software Repositories, 2019, pp. 371-382

  16. A Large-scale Study of Security Vulnerability Support on Developer Q&A Websites

    Authors: Triet H. M. Le, Roland Croft, David Hin, M. Ali Babar

    Abstract: Context: Security Vulnerabilities (SVs) pose many serious threats to software systems. Developers usually seek solutions to addressing these SVs on developer Question and Answer (Q&A) websites. However, there is still little known about on-going SV-specific discussions on different developer Q&A sites. Objective: We present a large-scale empirical study to understand developers' SV discussions and… ▽ More

    Submitted 21 April, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE 2021)

  17. arXiv:2003.03741  [pdf

    cs.SE cs.IR cs.LG

    PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU Learning

    Authors: Triet H. M. Le, David Hin, Roland Croft, M. Ali Babar

    Abstract: Security is an increasing concern in software development. Developer Question and Answer (Q&A) websites provide a large amount of security discussion. Existing studies have used human-defined rules to mine security discussions, but these works still miss many posts, which may lead to an incomplete analysis of the security practices reported on Q&A websites. Traditional supervised Machine Learning… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at the 17th Mining Software Repositories 2020 conference

  18. arXiv:2002.05442  [pdf, other

    cs.SE cs.AI cs.LG

    Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

    Authors: Triet H. M. Le, Hao Chen, M. Ali Babar

    Abstract: Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we p… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Journal ref: ACM Comput. Surv., 53, 3 (2020), Article 62

  19. arXiv:1911.06854  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

    Authors: Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

    Abstract: We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on div… ▽ More

    Submitted 27 November, 2021; v1 submitted 15 November, 2019; originally announced November 2019.

  20. arXiv:1911.03992  [pdf, ps, other

    math.OC cs.LG math.NA

    Stochastic DCA for minimizing a large sum of DC functions with application to Multi-class Logistic Regression

    Authors: Hoai An Le Thi, Hoai Minh Le, Duy Nhat Phan, Bach Tran

    Abstract: We consider the large sum of DC (Difference of Convex) functions minimization problem which appear in several different areas, especially in stochastic optimization and machine learning. Two DCA (DC Algorithm) based algorithms are proposed: stochastic DCA and inexact stochastic DCA. We prove that the convergence of both algorithms to a critical point is guaranteed with probability one. Furthermore… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

  21. arXiv:1907.05431  [pdf, other

    cs.LG cs.AI cs.PL stat.ML

    Imitation-Projected Programmatic Reinforcement Learning

    Authors: Abhinav Verma, Hoang M. Le, Yisong Yue, Swarat Chaudhuri

    Abstract: We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge -- a meta-algorithm cal… ▽ More

    Submitted 19 January, 2021; v1 submitted 11 July, 2019; originally announced July 2019.

    Comments: Published in Advances in Neural Information Processing Systems (NeurIPS) 2019

  22. arXiv:1903.08738  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Batch Policy Learning under Constraints

    Authors: Hoang M. Le, Cameron Voloshin, Yisong Yue

    Abstract: When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admi… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

  23. A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability

    Authors: Andrew J. Taylor, Victor D. Dorobantu, Meera Krishnamoorthy, Hoang M. Le, Yisong Yue, Aaron D. Ames

    Abstract: The goal of this paper is to understand the impact of learning on control synthesis from a Lyapunov function perspective. In particular, rather than consider uncertainties in the full system dynamics, we employ Control Lyapunov Functions (CLFs) as low-dimensional projections. To understand and characterize the uncertainty that these projected dynamics introduce in the system, we introduce a new no… ▽ More

    Submitted 17 March, 2019; originally announced March 2019.

  24. Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

    Authors: Andrew J. Taylor, Victor D. Dorobantu, Hoang M. Le, Yisong Yue, Aaron D. Ames

    Abstract: Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

  25. arXiv:1806.09620  [pdf, other

    math.OC cs.LG math.NA

    A DCA-Like Algorithm and its Accelerated Version with Application in Data Visualization

    Authors: Hoai An Le Thi, Hoai Minh Le, Duy Nhat Phan, Bach Tran

    Abstract: In this paper, we present two variants of DCA (Different of Convex functions Algorithm) to solve the constrained sum of differentiable function and composite functions minimization problem, with the aim of increasing the convergence speed of DCA. In the first variant, DCA-Like, we introduce a new technique to iteratively modify the decomposition of the objective function. This successive decomposi… ▽ More

    Submitted 25 June, 2018; originally announced June 2018.

  26. arXiv:1803.00590  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Imitation and Reinforcement Learning

    Authors: Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III

    Abstract: We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes o… ▽ More

    Submitted 9 June, 2018; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: Proceedings of the 35th International Conference on Machine Learning (ICML 2018)

  27. arXiv:1703.03121  [pdf, other

    cs.LG

    Coordinated Multi-Agent Imitation Learning

    Authors: Hoang M. Le, Yisong Yue, Peter Carr, Patrick Lucey

    Abstract: We study the problem of imitation learning from demonstrations of multiple coordinating agents. One key challenge in this setting is that learning a good model of coordination can be difficult, since coordination is often implicit in the demonstrations and must be inferred as a latent variable. We propose a joint approach that simultaneously learns a latent coordination model along with the indivi… ▽ More

    Submitted 25 May, 2018; v1 submitted 8 March, 2017; originally announced March 2017.

    Comments: International Conference on Machine Learning 2017

    Journal ref: Hoang M. Le, Yisong Yue, Peter Carr, Patrick Lucey ; Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1995-2003, 2017

  28. arXiv:1606.00968  [pdf, other

    cs.LG

    Smooth Imitation Learning for Online Sequence Prediction

    Authors: Hoang M. Le, Andrew Kang, Yisong Yue, Peter Carr

    Abstract: We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential context input. Since the map** from context to behavior is often complex, we take a learning reduction approach to reduce smooth imitation learning to a regre… ▽ More

    Submitted 3 June, 2016; originally announced June 2016.

    Comments: ICML 2016

  29. arXiv:1407.0286  [pdf, ps, other

    math.NA cs.LG stat.ML

    DC approximation approaches for sparse optimization

    Authors: Hoai An Le Thi, Tao Pham Dinh, Hoai Minh Le, Xuan Thanh Vo

    Abstract: Sparse optimization refers to an optimization problem involving the zero-norm in objective or constraints. In this paper, nonconvex approximation approaches for sparse optimization have been studied with a unifying point of view in DC (Difference of Convex functions) programming framework. Considering a common DC approximation of the zero-norm including all standard sparse inducing penalty functio… ▽ More

    Submitted 2 July, 2014; v1 submitted 1 July, 2014; originally announced July 2014.

    Comments: 35 pages

    MSC Class: 90C26; 90C90