-
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Authors:
Shengding Hu,
Yuge Tu,
Xu Han,
Chaoqun He,
Ganqu Cui,
Xiang Long,
Zhi Zheng,
Yewei Fang,
Yuxiang Huang,
Weilin Zhao,
Xinrong Zhang,
Zheng Leng Thai,
Kaihuo Zhang,
Chongyi Wang,
Yuan Yao,
Chenyang Zhao,
Jie Zhou,
Jie Cai,
Zhongwu Zhai,
Ning Ding,
Chao Jia,
Guoyang Zeng,
Dahai Li,
Zhiyuan Liu,
Maosong Sun
Abstract:
The burgeoning interest in develo** Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce…
▽ More
The burgeoning interest in develo** Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants, not only excel in their respective categories but also demonstrate capabilities on par with 7B-13B LLMs. While focusing on SLMs, our approach exhibits scalability in both model and data dimensions for future LLM research. Regarding model scaling, we employ extensive model wind tunnel experiments for stable and optimal scaling. For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation. We present an in-depth analysis of the intriguing training dynamics that occurred in the WSD LRS. With WSD LRS, we are now able to efficiently study data-model scaling law without extensive retraining experiments on both axes of model and data, from which we derive the much higher compute optimal data-model ratio than Chinchilla Optimal. Additionally, we introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation in diverse SLM applications. MiniCPM models are available publicly at https://github.com/OpenBMB/MiniCPM .
△ Less
Submitted 3 June, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Authors:
Chaoqun He,
Renjie Luo,
Yuzhuo Bai,
Shengding Hu,
Zhen Leng Thai,
Junhao Shen,
**yi Hu,
Xu Han,
Yujie Huang,
Yuxiang Zhang,
Jie Liu,
Lei Qi,
Zhiyuan Liu,
Maosong Sun
Abstract:
Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains. With traditional benchmarks becoming less challenging for these models, new rigorous challenges are essential to gauge their advanced abilities. In this work, we present Olym…
▽ More
Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains. With traditional benchmarks becoming less challenging for these models, new rigorous challenges are essential to gauge their advanced abilities. In this work, we present OlympiadBench, an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam. Each problem is detailed with expert-level annotations for step-by-step reasoning. Evaluating top-tier models on OlympiadBench, we implement a comprehensive assessment methodology to accurately evaluate model responses. Notably, the best-performing model, GPT-4V, attains an average score of 17.97% on OlympiadBench, with a mere 10.74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning. Our analysis orienting GPT-4V points out prevalent issues with hallucinations, knowledge omissions, and logical fallacies. We hope that our challenging benchmark can serve as a valuable resource for hel** future AGI research endeavors. The data and evaluation code are available at \url{https://github.com/OpenBMB/OlympiadBench}
△ Less
Submitted 6 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
Authors:
Xinrong Zhang,
Yingfa Chen,
Shengding Hu,
Zihang Xu,
Junhao Chen,
Moo Khai Hao,
Xu Han,
Zhen Leng Thai,
Shuo Wang,
Zhiyuan Liu,
Maosong Sun
Abstract:
Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction. Despite recent strides in making LLMs process contexts with more than 100K tokens, there is currently a lack of a standardized benchmark to evaluate this long-context capability. Existing public benchmarks typically focus on…
▽ More
Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction. Despite recent strides in making LLMs process contexts with more than 100K tokens, there is currently a lack of a standardized benchmark to evaluate this long-context capability. Existing public benchmarks typically focus on contexts around 10K tokens, limiting the assessment and comparison of LLMs in processing longer contexts. In this paper, we propose $\infty$Bench, the first LLM benchmark featuring an average data length surpassing 100K tokens. $\infty$Bench comprises synthetic and realistic tasks spanning diverse domains, presented in both English and Chinese. The tasks in $\infty$Bench are designed to require well understanding of long dependencies in contexts, and make simply retrieving a limited number of passages from contexts not sufficient for these tasks. In our experiments, based on $\infty$Bench, we evaluate the state-of-the-art proprietary and open-source LLMs tailored for processing long contexts. The results indicate that existing long context LLMs still require significant advancements to effectively process 100K+ context. We further present three intriguing analyses regarding the behavior of LLMs processing long context.
△ Less
Submitted 24 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Compatible Natural Gradient Policy Search
Authors:
Joni Pajarinen,
Hong Linh Thai,
Riad Akrour,
Jan Peters,
Gerhard Neumann
Abstract:
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value func…
▽ More
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.
△ Less
Submitted 7 February, 2019;
originally announced February 2019.
-
A Survey and Taxonomy of Resource Optimisation for Executing Bag-of-Task Applications on Public Clouds
Authors:
Long Thai,
Blesson Varghese,
Adam Barker
Abstract:
Cloud computing has been widely adopted due to the flexibility in resource provisioning and on-demand pricing models. Entire clusters of Virtual Machines (VMs) can be dynamically provisioned to meet the computational demands of users. However, from a user's perspective, it is still challenging to utilise cloud resources efficiently. This is because an overwhelmingly wide variety of resource types…
▽ More
Cloud computing has been widely adopted due to the flexibility in resource provisioning and on-demand pricing models. Entire clusters of Virtual Machines (VMs) can be dynamically provisioned to meet the computational demands of users. However, from a user's perspective, it is still challenging to utilise cloud resources efficiently. This is because an overwhelmingly wide variety of resource types with different prices and significant performance variations are available.
This paper presents a survey and taxonomy of existing research in optimising the execution of Bag-of-Task applications on cloud resources. A BoT application consists of multiple independent tasks, each of which can be executed by a VM in any order; these applications are widely used by both the scientific communities and commercial organisations. The objectives of this survey are as follows: (i) to provide the reader with a concise understanding of existing research on optimising the execution of BoT applications on the cloud, (ii) to define a taxonomy that categorises current frameworks to compare and contrast them, and (iii) to present current trends and future research directions in the area.
△ Less
Submitted 24 November, 2017;
originally announced November 2017.
-
Cloud Benchmarking For Maximising Performance of Scientific Applications
Authors:
Blesson Varghese,
Ozgur Akgun,
Ian Miguel,
Long Thai,
Adam Barker
Abstract:
How can applications be deployed on the cloud to achieve maximum performance? This question is challenging to address with the availability of a wide variety of cloud Virtual Machines (VMs) with different performance capabilities. The research reported in this paper addresses the above question by proposing a six step benchmarking methodology in which a user provides a set of weights that indicate…
▽ More
How can applications be deployed on the cloud to achieve maximum performance? This question is challenging to address with the availability of a wide variety of cloud Virtual Machines (VMs) with different performance capabilities. The research reported in this paper addresses the above question by proposing a six step benchmarking methodology in which a user provides a set of weights that indicate how important memory, local communication, computation and storage related operations are to an application. The user can either provide a set of four abstract weights or eight fine grain weights based on the knowledge of the application. The weights along with benchmarking data collected from the cloud are used to generate a set of two rankings - one based only on the performance of the VMs and the other takes both performance and costs into account. The rankings are validated on three case study applications using two validation techniques. The case studies on a set of experimental VMs highlight that maximum performance can be achieved by the three top ranked VMs and maximum performance in a cost-effective manner is achieved by at least one of the top three ranked VMs produced by the methodology.
△ Less
Submitted 1 August, 2016;
originally announced August 2016.
-
DocLite: A Docker-Based Lightweight Cloud Benchmarking Tool
Authors:
Blesson Varghese,
Lawan Thamsuhang Subba,
Long Thai,
Adam Barker
Abstract:
Existing benchmarking methods are time consuming processes as they typically benchmark the entire Virtual Machine (VM) in order to generate accurate performance data, making them less suitable for real-time analytics. The research in this paper is aimed to surmount the above challenge by presenting DocLite - Docker Container-based Lightweight benchmarking tool. DocLite explores lightweight cloud b…
▽ More
Existing benchmarking methods are time consuming processes as they typically benchmark the entire Virtual Machine (VM) in order to generate accurate performance data, making them less suitable for real-time analytics. The research in this paper is aimed to surmount the above challenge by presenting DocLite - Docker Container-based Lightweight benchmarking tool. DocLite explores lightweight cloud benchmarking methods for rapidly executing benchmarks in near real-time. DocLite is built on the Docker container technology, which allows a user-defined memory size and number of CPU cores of the VM to be benchmarked. The tool incorporates two benchmarking methods - the first referred to as the native method employs containers to benchmark a small portion of the VM and generate performance ranks, and the second uses historic benchmark data along with the native method as a hybrid to generate VM ranks. The proposed methods are evaluated on three use-cases and are observed to be up to 91 times faster than benchmarking the entire VM. In both methods, small containers provide the same quality of rankings as a large container. The native method generates ranks with over 90% and 86% accuracy for sequential and parallel execution of an application compared against benchmarking the whole VM. The hybrid method did not improve the quality of the rankings significantly.
△ Less
Submitted 23 March, 2016;
originally announced March 2016.
-
Container-Based Cloud Virtual Machine Benchmarking
Authors:
Blesson Varghese,
Lawan Thamsuhang Subba,
Long Thai,
Adam Barker
Abstract:
With the availability of a wide range of cloud Virtual Machines (VMs) it is difficult to determine which VMs can maximise the performance of an application. Benchmarking is commonly used to this end for capturing the performance of VMs. Most cloud benchmarking techniques are typically heavyweight - time consuming processes which have to benchmark the entire VM in order to obtain accurate benchmark…
▽ More
With the availability of a wide range of cloud Virtual Machines (VMs) it is difficult to determine which VMs can maximise the performance of an application. Benchmarking is commonly used to this end for capturing the performance of VMs. Most cloud benchmarking techniques are typically heavyweight - time consuming processes which have to benchmark the entire VM in order to obtain accurate benchmark data. Such benchmarks cannot be used in real-time on the cloud and incur extra costs even before an application is deployed.
In this paper, we present lightweight cloud benchmarking techniques that execute quickly and can be used in near real-time on the cloud. The exploration of lightweight benchmarking techniques are facilitated by the development of DocLite - Docker Container-based Lightweight Benchmarking. DocLite is built on the Docker container technology which allows a user-defined portion (such as memory size and the number of CPU cores) of the VM to be benchmarked. DocLite operates in two modes, in the first mode, containers are used to benchmark a small portion of the VM to generate performance ranks. In the second mode, historic benchmark data is used along with the first mode as a hybrid to generate VM ranks. The generated ranks are evaluated against three scientific high-performance computing applications. The proposed techniques are up to 91 times faster than a heavyweight technique which benchmarks the entire VM. It is observed that the first mode can generate ranks with over 90% and 86% accuracy for sequential and parallel execution of an application. The hybrid mode improves the correlation slightly but the first mode is sufficient for benchmarking cloud VMs.
△ Less
Submitted 15 January, 2016;
originally announced January 2016.
-
Task Scheduling on the Cloud with Hard Constraints
Authors:
Long Thai,
Blesson Varghese,
Adam Barker
Abstract:
Scheduling Bag-of-Tasks (BoT) applications on the cloud can be more challenging than grid and cluster environ- ments. This is because a user may have a budgetary constraint or a deadline for executing the BoT application in order to keep the overall execution costs low. The research in this paper is motivated to investigate task scheduling on the cloud, given two hard constraints based on a user-d…
▽ More
Scheduling Bag-of-Tasks (BoT) applications on the cloud can be more challenging than grid and cluster environ- ments. This is because a user may have a budgetary constraint or a deadline for executing the BoT application in order to keep the overall execution costs low. The research in this paper is motivated to investigate task scheduling on the cloud, given two hard constraints based on a user-defined budget and a deadline. A heuristic algorithm is proposed and implemented to satisfy the hard constraints for executing the BoT application in a cost effective manner. The proposed algorithm is evaluated using four scenarios that are based on the trade-off between performance and the cost of using different cloud resource types. The experimental evaluation confirms the feasibility of the algorithm in satisfying the constraints. The key observation is that multiple resource types can be a better alternative to using a single type of resource.
△ Less
Submitted 20 July, 2015;
originally announced July 2015.
-
Budget Constrained Execution of Multiple Bag-of-Tasks Applications on the Cloud
Authors:
Long Thai,
Blesson Varghese,
Adam Barker
Abstract:
Optimising the execution of Bag-of-Tasks (BoT) applications on the cloud is a hard problem due to the trade- offs between performance and monetary cost. The problem can be further complicated when multiple BoT applications need to be executed. In this paper, we propose and implement a heuristic algorithm that schedules tasks of multiple applications onto different cloud virtual machines in order t…
▽ More
Optimising the execution of Bag-of-Tasks (BoT) applications on the cloud is a hard problem due to the trade- offs between performance and monetary cost. The problem can be further complicated when multiple BoT applications need to be executed. In this paper, we propose and implement a heuristic algorithm that schedules tasks of multiple applications onto different cloud virtual machines in order to maximise performance while satisfying a given budget constraint. Current approaches are limited in task scheduling since they place a limit on the number of cloud resources that can be employed by the applications. However, in the proposed algorithm there are no such limits, and in comparison with other approaches, the algorithm on average achieves an improved performance of 10%. The experimental results also highlight that the algorithm yields consistent performance even with low budget constraints which cannot be achieved by competing approaches.
△ Less
Submitted 20 July, 2015;
originally announced July 2015.
-
Executing Bag of Distributed Tasks on Virtually Unlimited Cloud Resources
Authors:
Long Thai,
Blesson Varghese,
Adam Barker
Abstract:
Bag-of-Distributed-Tasks (BoDT) application is the collection of identical and independent tasks each of which requires a piece of input data located around the world. As a result, Cloud computing offers an ef- fective way to execute BoT application as it not only consists of multiple geographically distributed data centres but also allows a user to pay for what she actually uses only. In this pap…
▽ More
Bag-of-Distributed-Tasks (BoDT) application is the collection of identical and independent tasks each of which requires a piece of input data located around the world. As a result, Cloud computing offers an ef- fective way to execute BoT application as it not only consists of multiple geographically distributed data centres but also allows a user to pay for what she actually uses only. In this paper, BoDT on the Cloud using virtually unlimited cloud resources. A heuristic algorithm is proposed to find an execution plan that takes budget constraints into account. Compared with other approaches, with the same given budget, our algorithm is able to reduce the overall execution time up to 50%.
△ Less
Submitted 1 June, 2015;
originally announced June 2015.
-
Cloud Services Brokerage: A Survey and Research Roadmap
Authors:
Adam Barker,
Blesson Varghese,
Long Thai
Abstract:
A Cloud Services Brokerage (CSB) acts as an intermediary between cloud service providers (e.g., Amazon and Google) and cloud service end users, providing a number of value adding services. CSBs as a research topic are in there infancy. The goal of this paper is to provide a concise survey of existing CSB technologies in a variety of areas and highlight a roadmap, which details five future opportun…
▽ More
A Cloud Services Brokerage (CSB) acts as an intermediary between cloud service providers (e.g., Amazon and Google) and cloud service end users, providing a number of value adding services. CSBs as a research topic are in there infancy. The goal of this paper is to provide a concise survey of existing CSB technologies in a variety of areas and highlight a roadmap, which details five future opportunities for research.
△ Less
Submitted 1 June, 2015;
originally announced June 2015.
-
Cloud Benchmarking for Performance
Authors:
Blesson Varghese,
Ozgur Akgun,
Ian Miguel,
Long Thai,
Adam Barker
Abstract:
How can applications be deployed on the cloud to achieve maximum performance? This question has become significant and challenging with the availability of a wide variety of Virtual Machines (VMs) with different performance capabilities in the cloud. The above question is addressed by proposing a six step benchmarking methodology in which a user provides a set of four weights that indicate how imp…
▽ More
How can applications be deployed on the cloud to achieve maximum performance? This question has become significant and challenging with the availability of a wide variety of Virtual Machines (VMs) with different performance capabilities in the cloud. The above question is addressed by proposing a six step benchmarking methodology in which a user provides a set of four weights that indicate how important each of the following groups: memory, processor, computation and storage are to the application that needs to be executed on the cloud. The weights along with cloud benchmarking data are used to generate a ranking of VMs that can maximise performance of the application. The rankings are validated through an empirical analysis using two case study applications; the first is a financial risk application and the second is a molecular dynamics simulation, which are both representative of workloads that can benefit from execution on the cloud. Both case studies validate the feasibility of the methodology and highlight that maximum performance can be achieved on the cloud by selecting the top ranked VMs produced by the methodology.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
Optimal Deployment of Geographically Distributed Workflow Engines on the Cloud
Authors:
Long Thai,
Adam Barker,
Blesson Varghese,
Ozgur Akgun,
Ian Miguel
Abstract:
When orchestrating Web service workflows, the geographical placement of the orchestration engine(s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper, we present a framework that, given a DAG-based workflow specification, computes the o…
▽ More
When orchestrating Web service workflows, the geographical placement of the orchestration engine(s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper, we present a framework that, given a DAG-based workflow specification, computes the op- timal Amazon EC2 cloud regions to deploy the orchestration engines and execute a workflow. The framework incorporates a constraint model that solves the workflow deployment problem, which is generated using an automated constraint modelling system. The feasibility of the framework is evaluated by executing different sample workflows representative of sci- entific workloads. The experimental results indicate that the framework reduces the workflow execution time and provides a speed up of 1.3x-2.5x over centralised approaches.
△ Less
Submitted 30 October, 2014;
originally announced October 2014.
-
Executing Bag of Distributed Tasks on the Cloud: Investigating the Trade-offs Between Performance and Cost
Authors:
Long Thai,
Blesson Varghese,
Adam Barker
Abstract:
Bag of Distributed Tasks (BoDT) can benefit from decentralised execution on the Cloud. However, there is a trade-off between the performance that can be achieved by employing a large number of Cloud VMs for the tasks and the monetary constraints that are often placed by a user. The research reported in this paper is motivated towards investigating this trade-off so that an optimal plan for deployi…
▽ More
Bag of Distributed Tasks (BoDT) can benefit from decentralised execution on the Cloud. However, there is a trade-off between the performance that can be achieved by employing a large number of Cloud VMs for the tasks and the monetary constraints that are often placed by a user. The research reported in this paper is motivated towards investigating this trade-off so that an optimal plan for deploying BoDT applications on the cloud can be generated. A heuristic algorithm, which considers the user's preference of performance and cost is proposed and implemented. The feasibility of the algorithm is demonstrated by generating execution plans for a sample application. The key result is that the algorithm generates optimal execution plans for the application over 91\% of the time.
△ Less
Submitted 30 October, 2014;
originally announced October 2014.
-
A Facial Expression Classification System Integrating Canny, Principal Component Analysis and Artificial Neural Network
Authors:
Le Hoang Thai,
Nguyen Do Thai Nguyen,
Tran Son Hai
Abstract:
Facial Expression Classification is an interesting research problem in recent years. There are a lot of methods to solve this problem. In this research, we propose a novel approach using Canny, Principal Component Analysis (PCA) and Artificial Neural Network. Firstly, in preprocessing phase, we use Canny for local region detection of facial images. Then each of local region's features will be pres…
▽ More
Facial Expression Classification is an interesting research problem in recent years. There are a lot of methods to solve this problem. In this research, we propose a novel approach using Canny, Principal Component Analysis (PCA) and Artificial Neural Network. Firstly, in preprocessing phase, we use Canny for local region detection of facial images. Then each of local region's features will be presented based on Principal Component Analysis (PCA). Finally, using Artificial Neural Network (ANN)applies for Facial Expression Classification. We apply our proposal method (Canny_PCA_ANN) for recognition of six basic facial expressions on JAFFE database consisting 213 images posed by 10 Japanese female models. The experimental result shows the feasibility of our proposal method.
△ Less
Submitted 17 November, 2011;
originally announced November 2011.
-
Fingerprint recognition using standardized fingerprint model
Authors:
Le Hoang Thai,
Ha Nhat Tam
Abstract:
Fingerprint recognition is one of most popular and accuracy Biometric technologies. Nowadays, it is used in many real applications. However, recognizing fingerprints in poor quality images is still a very complex problem. In recent years, many algorithms, models...are given to improve the accuracy of recognition system. This paper discusses on the standardized fingerprint model which is used to sy…
▽ More
Fingerprint recognition is one of most popular and accuracy Biometric technologies. Nowadays, it is used in many real applications. However, recognizing fingerprints in poor quality images is still a very complex problem. In recent years, many algorithms, models...are given to improve the accuracy of recognition system. This paper discusses on the standardized fingerprint model which is used to synthesize the template of fingerprints. In this model, after pre-processing step, we find the transformation between templates, adjust parameters, synthesize fingerprint, and reduce noises. Then, we use the final fingerprint to match with others in FVC2004 fingerprint database (DB4) to show the capability of the model.
△ Less
Submitted 15 July, 2011;
originally announced July 2011.