Search | arXiv e-print repository

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area. △ Less

Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

arXiv:2306.12545 [pdf, other]

Neural Multigrid Memory For Computational Fluid Dynamics

Authors: Duc Minh Nguyen, Minh Chau Vu, Tuan Anh Nguyen, Tri Huynh, Nguyen Tri Nguyen, Truong Son Hy

Abstract: Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye… ▽ More Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye & Bilodeau, 2022) and Multigrid Architecture (MgConv, MgResnet) (Ke et al., 2017). VPTR excels in capturing complex spatiotemporal dependencies and handling large input data, making it a promising choice for turbulent flow prediction. Meanwhile, Multigrid Architecture utilizes multiple grids with different resolutions to capture the multiscale nature of turbulent flows, resulting in more accurate and efficient simulations. Through our experiments, we demonstrate the effectiveness of our proposed approach, named MGxTransformer, in accurately predicting velocity, temperature, and turbulence intensity for incompressible turbulent flows across various geometries and flow conditions. Our results exhibit superior accuracy compared to other baselines, while maintaining computational efficiency. Our implementation in PyTorch is available publicly at https://github.com/Combi2k2/MG-Turbulent-Flow △ Less

Submitted 24 June, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:1911.08655 by other authors

arXiv:2303.03915 [pdf, other]

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Authors: Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gerard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa , et al. (29 additional authors not shown)

Abstract: As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the f… ▽ More As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the foreground. This paper documents the data creation and curation efforts undertaken by BigScience to assemble the Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model. We further release a large initial subset of the corpus and analyses thereof, and hope to empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: NeurIPS 2022, Datasets and Benchmarks Track

ACM Class: I.2.7

arXiv:2211.05575 [pdf]

Mobile Robot Motion Control Using a Combination of Fuzzy Logic Method and Kinematic Model

Authors: Anh-Tu Nguyen, Van-Truong Nguyen, Xuan-Thuan Nguyen, Cong-Thanh Vu

Abstract: Mobile robots have been widely used in various aspects of human life. When a robot moves between different positions in the working area to perform the task, controlling motion to follow a pre-defined path is the primary task of a mobile robot. Furthermore, the robot must remain at its desired speed to cooperate with other agents. This paper presents a development of a motion controller, in which… ▽ More Mobile robots have been widely used in various aspects of human life. When a robot moves between different positions in the working area to perform the task, controlling motion to follow a pre-defined path is the primary task of a mobile robot. Furthermore, the robot must remain at its desired speed to cooperate with other agents. This paper presents a development of a motion controller, in which the fuzzy logic method is combined with a kinematic model of a differential drive robot. The simulation results are compared well with experimental results indicate that the method is effective and applicable for actual mobile robots. △ Less

Submitted 23 October, 2022; originally announced November 2022.

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.08610 [pdf, other]

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Authors: Lam Pham, Dusan Salovic, Anahid Jalali, Alexander Schindler, Khoa Tran, Canh Vu, Phu X. Nguyen

Abstract: In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of Mobile… ▽ More In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2206.15076 [pdf, other]

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Authors: Jason Alan Fries, Leon Weber, Natasha Seelam, Gabriel Altay, Debajyoti Datta, Samuele Garda, Myungsun Kang, Ruisi Su, Wojciech Kusa, Samuel Cahyawijaya, Fabio Barth, Simon Ott, Matthias Samwald, Stephen Bach, Stella Biderman, Mario Sänger, Bo Wang, Alison Callahan, Daniel León Periñán, Théo Gigant, Patrick Haller, Jenny Chim, Jose David Posada, John Michael Giorgi, Karthik Rangasai Sivaraman , et al. (18 additional authors not shown)

Abstract: Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the benefits of meta-dataset curation. While successful i… ▽ More Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the benefits of meta-dataset curation. While successful in general-domain text, translating these data-centric approaches to biomedical language modeling remains challenging, as labeled biomedical datasets are significantly underrepresented in popular data hubs. To address this challenge, we introduce BigBIO a community library of 126+ biomedical NLP datasets, currently covering 12 task categories and 10+ languages. BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata, and is compatible with current platforms for prompt engineering and end-to-end few/zero shot language model evaluation. We discuss our process for task schema harmonization, data auditing, contribution guidelines, and outline two illustrative use cases: zero-shot evaluation of biomedical prompts and large-scale, multi-task learning. BigBIO is an ongoing community effort and is available at https://github.com/bigscience-workshop/biomedical △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: Submitted to NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2109.08863

Streaming algorithms for Budgeted $k$-Submodular Maximization problem

Authors: Canh V. Pham, Quang C. Vu, Dung K. T. Ha, Tai T. Nguyen

Abstract: Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set… ▽ More Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set $S_i$, with the total cost of $s$ does not exceed $B$ so that $f(\s)$ is maximized. To address this problem, we propose two streaming algorithms that provide approximation guarantees for the problem. In particular, in the case of each element $e$ has the same cost for all $i$-th sets, we propose a deterministic streaming algorithm which provides an approximation ratio of $\frac{1}{4}-ε$ when $f$ is monotone and $\frac{1}{5}-ε$ when $f$ is non-monotone. For the general case, we propose a random streaming algorithm that provides an approximation ratio of $\min\{\fracα{2}, \frac{(1-α)k}{(1+β)k-β} \}-ε$ when $f$ is monotone and $\min\{\fracα{2}, \frac{(1-α)k}{(1+2β)k-2β} \}-ε$ when $f$ is non-monotone in expectation, where $β=\max_{e\in V, i , j \in [k], i\neq j} \frac{c_i(e)}{c_j(e)}$ and $ε, α$ are fixed inputs. △ Less

Submitted 22 October, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: There are some results of the article that need to be corrected

arXiv:2109.06773 [pdf]

Obstacle Avoidance for Autonomous Mobile Robots Based on Map** Method

Authors: Anh-Tu Nguyen, Cong-Thanh Vu

Abstract: In recent years, the mobile robot has been considerable attention to researchers for its application in various environments. For a mobile robot navigating its way from starting point to a goal point while traversing through deterrents, needs to recognize the obstacles and generate new trajectories to reach the destination. This paper presents an obstacle avoidance method for mobile robots using a… ▽ More In recent years, the mobile robot has been considerable attention to researchers for its application in various environments. For a mobile robot navigating its way from starting point to a goal point while traversing through deterrents, needs to recognize the obstacles and generate new trajectories to reach the destination. This paper presents an obstacle avoidance method for mobile robots using an open-source in robot operation system (ROS) combining with the dynamic window approach (DWA) algorithm. The experiment is carried out using a mobile robot in which the navigation data is based on data collecting by a laser scanner. The experimental results show that the robot could work well in environments containing static and dynamic obstacles. △ Less

Submitted 14 September, 2021; originally announced September 2021.

arXiv:2109.05551 [pdf]

A study and design of localization system for mobile robot based on ROS

Authors: Anh-Tu Nguyen, Cong-Thanh Vu

Abstract: In recent years, the mobile robot has been the concern of numerous researcher since they are widely applied in various fields of daily life. This paper applies a virtual robot operating system (ROS) platform to develop a localization system for robot motion. The proposed system is based on the combination of relative and absolute measurement methods, in which the data from the encoder, digital com… ▽ More In recent years, the mobile robot has been the concern of numerous researcher since they are widely applied in various fields of daily life. This paper applies a virtual robot operating system (ROS) platform to develop a localization system for robot motion. The proposed system is based on the combination of relative and absolute measurement methods, in which the data from the encoder, digital compass, and laser scanner sensor are fused using the extended Kalman filter (EKF). The system also successfully eliminates the errors caused by the environment as well as the error accumulation. The experimental results show good accuracy and stability of position and orientation which can be further applied for the robot working in the indoor environment. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: in Vietnamese language

arXiv:1611.06620 [pdf, other]

doi 10.1007/978-3-319-30671-1_47

A Business Zone Recommender System Based on Facebook and Urban Planning Data

Authors: Jovian Lin, Richard J. Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus T. Kwee, Philips K. Prasetyo

Abstract: We present ZoneRec---a zone recommendation system for physical businesses in an urban city, which uses both public business data from Facebook and urban planning data. The system consists of machine learning algorithms that take in a business' metadata and outputs a list of recommended zones to establish the business in. We evaluate our system using data of food businesses in Singapore and assess… ▽ More We present ZoneRec---a zone recommendation system for physical businesses in an urban city, which uses both public business data from Facebook and urban planning data. The system consists of machine learning algorithms that take in a business' metadata and outputs a list of recommended zones to establish the business in. We evaluate our system using data of food businesses in Singapore and assess the contribution of different feature groups to the recommendation quality. △ Less

Submitted 20 November, 2016; originally announced November 2016.

Journal ref: Proceedings of the European Conference on Information Retrieval, 2016, pp. 641-647

arXiv:1611.05339 [pdf]

CareerMapper: An Automated Resume Evaluation Tool

Authors: Vivian Lai, Kyong ** Shim, Richard J. Oentaryo, Philips K. Prasetyo, Casey Vu, Ee-Peng Lim, David Lo

Abstract: The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of… ▽ More The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of evaluation in candidate selection. Thus, it is imperative that resumes are complete, free of errors and well-organized. We present an automated resume evaluation tool called "CareerMapper". Our tool is designed to conduct a thorough review of a user's LinkedIn profile and provide best recommendations for improved online resumes by analyzing a large number of online user profiles. △ Less

Submitted 16 November, 2016; originally announced November 2016.

Journal ref: Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2016)

arXiv:1609.02839 [pdf, other]

doi 10.1145/2914586.2914588

Where is the Goldmine? Finding Promising Business Locations through Facebook Data Analytics

Authors: Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee

Abstract: If you were to open your own cafe, would you not want to effortlessly identify the most suitable location to set up your shop? Choosing an optimal physical location is a critical decision for numerous businesses, as many factors contribute to the final choice of the location. In this paper, we seek to address the issue by investigating the use of publicly available Facebook Pages data---which incl… ▽ More If you were to open your own cafe, would you not want to effortlessly identify the most suitable location to set up your shop? Choosing an optimal physical location is a critical decision for numerous businesses, as many factors contribute to the final choice of the location. In this paper, we seek to address the issue by investigating the use of publicly available Facebook Pages data---which include user check-ins, types of business, and business locations---to evaluate a user-selected physical location with respect to a type of business. Using a dataset of 20,877 food businesses in Singapore, we conduct analysis of several key factors including business categories, locations, and neighboring businesses. From these factors, we extract a set of relevant features and develop a robust predictive model to estimate the popularity of a business location. Our experiments have shown that the popularity of neighboring business contributes the key features to perform accurate prediction. We finally illustrate the practical usage of our proposed approach via an interactive web application system. △ Less

Submitted 9 September, 2016; originally announced September 2016.

Journal ref: Proceedings of the ACM Conference on Hypertext and Social Media, Halifax, Canada, 2016, pp. 93-102

Showing 1–13 of 13 results for author: Vu, C