Search | arXiv e-print repository

Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models

Authors: Eldar Kurtic, Amir Moeini, Dan Alistarh

Abstract: We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across l… ▽ More We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. This benchmark is inspired by the Mathador game, where the objective is to reach a target number using basic arithmetic operations on a given set of base numbers, following a simple set of rules. We show that, across leading LLMs, we obtain stable average performance while generating benchmark instances dynamically, following a target difficulty level. Thus, our benchmark alleviates concerns about test-set leakage into training data, an issue that often undermines popular benchmarks. Additionally, we conduct a comprehensive evaluation of both open and closed-source state-of-the-art LLMs on Mathador-LM. Our findings reveal that contemporary models struggle with Mathador-LM, scoring significantly lower than average 3rd graders. This stands in stark contrast to their strong performance on popular mathematical reasoning benchmarks. △ Less

Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2110.15089 [pdf]

D2RLIR : an improved and diversified ranking function in interactive recommendation systems based on deep reinforcement learning

Authors: Vahid Baghi, Seyed Mohammad Seyed Motehayeri, Ali Moeini, Rooholah Abedian

Abstract: Recently, interactive recommendation systems based on reinforcement learning have been attended by researchers due to the consider recommendation procedure as a dynamic process and update the recommendation model based on immediate user feedback, which is neglected in traditional methods. The existing works have two significant drawbacks. Firstly, inefficient ranking function to produce the Top-N… ▽ More Recently, interactive recommendation systems based on reinforcement learning have been attended by researchers due to the consider recommendation procedure as a dynamic process and update the recommendation model based on immediate user feedback, which is neglected in traditional methods. The existing works have two significant drawbacks. Firstly, inefficient ranking function to produce the Top-N recommendation list. Secondly, focusing on recommendation accuracy and inattention to other evaluation metrics such as diversity. This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture to model dynamic users' interaction with the recommender agent and maximize the expected long-term reward. Furthermore, we propose utilizing Spotify's ANNoy algorithm to find the most similar items to generated action by actor-network. After that, the Total Diversity Effect Ranking algorithm is used to generate the recommendations concerning relevancy and diversity. Moreover, we apply positional encoding to compute representations of the user's interaction sequence without using sequence-aligned recurrent neural networks. Extensive experiments on the MovieLens dataset demonstrate that our proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences. △ Less

Submitted 28 October, 2021; v1 submitted 28 October, 2021; originally announced October 2021.

arXiv:2005.08148 [pdf, other]

A Hybrid Approach to Enhance Pure Collaborative Filtering based on Content Feature Relationship

Authors: Mohammad Maghsoudi Mehrabani, Hamid Mohayeji, Ali Moeini

Abstract: Recommendation systems get expanding significance because of their applications in both the scholarly community and industry. With the development of additional data sources and methods of extracting new information other than the rating history of clients on items, hybrid recommendation algorithms, in which some methods have usually been combined to improve performance, have become pervasive. In… ▽ More Recommendation systems get expanding significance because of their applications in both the scholarly community and industry. With the development of additional data sources and methods of extracting new information other than the rating history of clients on items, hybrid recommendation algorithms, in which some methods have usually been combined to improve performance, have become pervasive. In this work, we first introduce a novel method to extract the implicit relationship between content features using a sort of well-known methods from the natural language processing domain, namely Word2Vec. In contrast to the typical use of Word2Vec, we utilize some features of items as words of sentences to produce neural feature embeddings, through which we can calculate the similarity between features. Next, we propose a novel content-based recommendation system that employs the relationship to determine vector representations for items by which the similarity between items can be computed (RELFsim). Our evaluation results demonstrate that it can predict the preference a user would have for a set of items as good as pure collaborative filtering. This content-based algorithm is also embedded in a pure item-based collaborative filtering algorithm to deal with the cold-start problem and enhance its accuracy. Our experiments on a benchmark movie dataset corroborate that the proposed approach improves the accuracy of the system. △ Less

Submitted 16 May, 2020; originally announced May 2020.

Comments: The 10th Conference on Information and Knowledge Technology (IKT2019)

arXiv:1702.06662 [pdf, ps, other]

An Integer Programming Model for Binary Knapsack Problem with Value-Related Dependencies among Elements

Authors: Davoud Mougouei, David M. W. Powers, Asghar Moeini

Abstract: Binary Knapsack Problem (BKP) is to select a subset of an element (item) set with the highest value while kee** the total weight within the capacity of the knapsack. This paper presents an integer programming model for a variation of BKP where the value of each element may depend on selecting or ignoring other elements. Strengths of such Value-Related Dependencies are assumed to be imprecise and… ▽ More Binary Knapsack Problem (BKP) is to select a subset of an element (item) set with the highest value while kee** the total weight within the capacity of the knapsack. This paper presents an integer programming model for a variation of BKP where the value of each element may depend on selecting or ignoring other elements. Strengths of such Value-Related Dependencies are assumed to be imprecise and hard to specify. To capture this imprecision, we have proposed modeling value-related dependencies using fuzzy graphs and their algebraic structure. △ Less

Submitted 21 February, 2017; originally announced February 2017.

Showing 1–4 of 4 results for author: Moeini, A