-
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
Authors:
Eldar Kurtic,
Amir Moeini,
Dan Alistarh
Abstract:
We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across l…
▽ More
We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across leading LLMs, we obtain stable average performance while generating benchmark instances dynamically, following a target difficulty level. Thus, our benchmark alleviates concerns about test-set leakage into training data, an issue that often undermines popular benchmarks. Additionally, we conduct a comprehensive evaluation of both open and closed-source state-of-the-art LLMs on Mathador-LM. Our findings reveal that contemporary models struggle with Mathador-LM, scoring significantly lower than average 3rd graders. This stands in stark contrast to their strong performance on popular mathematical reasoning benchmarks.
△ Less
Submitted 19 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
D2RLIR : an improved and diversified ranking function in interactive recommendation systems based on deep reinforcement learning
Authors:
Vahid Baghi,
Seyed Mohammad Seyed Motehayeri,
Ali Moeini,
Rooholah Abedian
Abstract:
Recently, interactive recommendation systems based on reinforcement learning have been attended by researchers due to the consider recommendation procedure as a dynamic process and update the recommendation model based on immediate user feedback, which is neglected in traditional methods. The existing works have two significant drawbacks. Firstly, inefficient ranking function to produce the Top-N…
▽ More
Recently, interactive recommendation systems based on reinforcement learning have been attended by researchers due to the consider recommendation procedure as a dynamic process and update the recommendation model based on immediate user feedback, which is neglected in traditional methods. The existing works have two significant drawbacks. Firstly, inefficient ranking function to produce the Top-N recommendation list. Secondly, focusing on recommendation accuracy and inattention to other evaluation metrics such as diversity. This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture to model dynamic users' interaction with the recommender agent and maximize the expected long-term reward. Furthermore, we propose utilizing Spotify's ANNoy algorithm to find the most similar items to generated action by actor-network. After that, the Total Diversity Effect Ranking algorithm is used to generate the recommendations concerning relevancy and diversity. Moreover, we apply positional encoding to compute representations of the user's interaction sequence without using sequence-aligned recurrent neural networks. Extensive experiments on the MovieLens dataset demonstrate that our proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences.
△ Less
Submitted 28 October, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
A Hybrid Approach to Enhance Pure Collaborative Filtering based on Content Feature Relationship
Authors:
Mohammad Maghsoudi Mehrabani,
Hamid Mohayeji,
Ali Moeini
Abstract:
Recommendation systems get expanding significance because of their applications in both the scholarly community and industry. With the development of additional data sources and methods of extracting new information other than the rating history of clients on items, hybrid recommendation algorithms, in which some methods have usually been combined to improve performance, have become pervasive. In…
▽ More
Recommendation systems get expanding significance because of their applications in both the scholarly community and industry. With the development of additional data sources and methods of extracting new information other than the rating history of clients on items, hybrid recommendation algorithms, in which some methods have usually been combined to improve performance, have become pervasive. In this work, we first introduce a novel method to extract the implicit relationship between content features using a sort of well-known methods from the natural language processing domain, namely Word2Vec. In contrast to the typical use of Word2Vec, we utilize some features of items as words of sentences to produce neural feature embeddings, through which we can calculate the similarity between features. Next, we propose a novel content-based recommendation system that employs the relationship to determine vector representations for items by which the similarity between items can be computed (RELFsim). Our evaluation results demonstrate that it can predict the preference a user would have for a set of items as good as pure collaborative filtering. This content-based algorithm is also embedded in a pure item-based collaborative filtering algorithm to deal with the cold-start problem and enhance its accuracy. Our experiments on a benchmark movie dataset corroborate that the proposed approach improves the accuracy of the system.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
An Integer Programming Model for Binary Knapsack Problem with Value-Related Dependencies among Elements
Authors:
Davoud Mougouei,
David M. W. Powers,
Asghar Moeini
Abstract:
Binary Knapsack Problem (BKP) is to select a subset of an element (item) set with the highest value while kee** the total weight within the capacity of the knapsack. This paper presents an integer programming model for a variation of BKP where the value of each element may depend on selecting or ignoring other elements. Strengths of such Value-Related Dependencies are assumed to be imprecise and…
▽ More
Binary Knapsack Problem (BKP) is to select a subset of an element (item) set with the highest value while kee** the total weight within the capacity of the knapsack. This paper presents an integer programming model for a variation of BKP where the value of each element may depend on selecting or ignoring other elements. Strengths of such Value-Related Dependencies are assumed to be imprecise and hard to specify. To capture this imprecision, we have proposed modeling value-related dependencies using fuzzy graphs and their algebraic structure.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.