Skip to main content

Showing 1–13 of 13 results for author: Yusuf, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  2. arXiv:2401.07466  [pdf, other

    cs.SE cs.AI

    Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection

    Authors: Imam Nur Bani Yusuf, Lingxiao Jiang

    Abstract: Software, while beneficial, poses potential cybersecurity risks due to inherent vulnerabilities. Detecting these vulnerabilities is crucial, and deep learning has shown promise as an effective tool for this task due to its ability to perform well without extensive feature engineering. However, a challenge in deploying deep learning for vulnerability detection is the limited availability of trainin… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  3. arXiv:2308.08027  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

    Authors: Bolaji Yusuf, Jan Cernocky, Murat Saraclar

    Abstract: Conventional keyword search systems operate on automatic speech recognition (ASR) outputs, which causes them to have a complex indexing and search pipeline. This has led to interest in ASR-free approaches to simplify the search procedure. We recently proposed a neural ASR-free keyword search model which achieves competitive performance while maintaining an efficient and simplified pipeline, where… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3070-3080, 2023

  4. Isolating Compiler Bugs by Generating Effective Witness Programs with Large Language Models

    Authors: Haoxin Tu, Zhide Zhou, He Jiang, Imam Nur Bani Yusuf, Yuxian Li, Lingxiao Jiang

    Abstract: Compiler bugs pose a significant threat to safety-critical applications, and promptly as well as effectively isolating these bugs is crucial for assuring the quality of compilers. However, the limited availability of debugging information on reported bugs complicates the compiler bug isolation task. Existing compiler bug isolation approaches convert the problem into a test program mutation problem… ▽ More

    Submitted 8 May, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE Transactions on Software Engineering

  5. arXiv:2303.10942  [pdf, other

    cs.CL cs.SD eess.AS

    On-the-fly Text Retrieval for End-to-End ASR Adaptation

    Authors: Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko

    Abstract: End-to-end speech recognition models are improved by incorporating external text sources, typically by fusion with an external language model. Such language models have to be retrained whenever the corpus of interest changes. Furthermore, since they store the entire corpus in their parameters, rare words can be challenging to recall. In this work, we propose augmenting a transducer-based ASR model… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023; Appendix added to include ablations that could not fit into the conference 4-page limit

  6. On the Effectiveness of Pretrained Models for API Learning

    Authors: Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo

    Abstract: Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 12 pages, 4 figures, ICPC 2022

    Journal ref: 30th International Conference on Program Comprehension (ICPC '22), May 16--17, 2022, Virtual Event, USA}

  7. arXiv:2202.06045  [pdf, other

    cs.CL cs.SD eess.AS

    USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

    Authors: Bolaji Yusuf, Ankur Gandhe, Alex Sokolov

    Abstract: Improving end-to-end speech recognition by incorporating external text data has been a longstanding research topic. There has been a recent focus on training E2E ASR models that get the performance benefits of external text data without incurring the extra cost of evaluating an external language model at inference time. In this work, we propose training ASR model jointly with a set of text-to-text… ▽ More

    Submitted 12 February, 2022; originally announced February 2022.

    Comments: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022)

  8. arXiv:2108.10357  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Open Vocabulary Keyword Search

    Authors: Bolaji Yusuf, Alican Gok, Batuhan Gundogdu, Murat Saraclar

    Abstract: Recently, neural approaches to spoken content retrieval have become popular. However, they tend to be restricted in their vocabulary or in their ability to deal with imbalanced test settings. These restrictions limit their applicability in keyword search, where the set of queries is not known beforehand, and where the system should return not just whether an utterance contains a query but the exac… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Interspeech 2021

  9. arXiv:2106.04298  [pdf, other

    cs.CL cs.SD eess.AS

    Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings

    Authors: Marcely Zanon Boito, Bolaji Yusuf, Lucas Ondel, Aline Villavicencio, Laurent Besacier

    Abstract: Documenting languages helps to prevent the extinction of endangered dialects, many of which are otherwise expected to disappear by the end of the century. When documenting oral languages, unsupervised word segmentation (UWS) from speech is a useful, yet challenging, task. It consists in producing time-stamps for slicing utterances into smaller segments corresponding to words, being performed from… ▽ More

    Submitted 18 May, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to SIGUL 2022

  10. arXiv:2102.01859  [pdf, other

    cs.SE

    BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems

    Authors: Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong ** Kang, Ferdian Thung, David Lo

    Abstract: Artificial Intelligence (AI) software systems, such as Sentiment Analysis (SA) systems, typically learn from large amounts of data that may reflect human biases. Consequently, the machine learning model in such software systems may exhibit unintended demographic bias based on specific characteristics (e.g., gender, occupation, country-of-origin, etc.). Such biases manifest in an SA system when it… ▽ More

    Submitted 4 October, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  11. arXiv:2011.03115  [pdf, ps, other

    eess.AS cs.LG cs.SD

    A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

    Authors: Bolaji Yusuf, Lucas Ondel, Lukas Burget, Jan Cernocky, Murat Saraclar

    Abstract: In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simultaneously specify the subspace itself as an embedding on a hyper-subspace. We train the hyper-subspace on a set of transcribed languages and transfer it to the target language. In the target language,… ▽ More

    Submitted 9 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Submitted to ICASSP 2021

  12. arXiv:2009.11330  [pdf, other

    cs.LG stat.ML

    Cache Replacement as a MAB with Delayed Feedback and Decaying Costs

    Authors: Farzana Beente Yusuf, Vitalii Stebliankin, Giuseppe Vietri, Giri Narasimhan

    Abstract: Inspired by the cache replacement problem, we propose and solve a new variant of the well-known multi-armed bandit (MAB), thus providing a solution for improving existing state-of-the-art cache management methods. Each arm (or expert) represents a distinct cache replacement policy, which advises on the page to evict from the cache when needed. Feedback on the eviction comes in the form of a "miss"… ▽ More

    Submitted 19 May, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

  13. arXiv:2005.09282  [pdf, other

    eess.AS cs.CL cs.LG stat.ML

    Bayesian Subspace HMM for the Zerospeech 2020 Challenge

    Authors: Bolaji Yusuf, Lucas Ondel

    Abstract: In this paper we describe our submission to the Zerospeech 2020 challenge, where the participants are required to discover latent representations from unannotated speech, and to use those representations to perform speech synthesis, with synthesis quality used as a proxy metric for the unit quality. In our system, we use the Bayesian Subspace Hidden Markov Model (SHMM) for unit discovery. The SHMM… ▽ More

    Submitted 27 July, 2020; v1 submitted 19 May, 2020; originally announced May 2020.