-
Skolem Meets Schanuel
Authors:
Yuri Bilu,
Florian Luca,
Joris Nieuwveld,
Joël Ouaknine,
David Purser,
James Worrell
Abstract:
The celebrated Skolem-Mahler-Lech Theorem states that the set of zeros of a linear recurrence sequence is the union of a finite set and finitely many arithmetic progressions. The corresponding computational question, the Skolem Problem, asks to determine whether a given linear recurrence sequence has a zero term. Although the Skolem-Mahler-Lech Theorem is almost 90 years old, decidability of the S…
▽ More
The celebrated Skolem-Mahler-Lech Theorem states that the set of zeros of a linear recurrence sequence is the union of a finite set and finitely many arithmetic progressions. The corresponding computational question, the Skolem Problem, asks to determine whether a given linear recurrence sequence has a zero term. Although the Skolem-Mahler-Lech Theorem is almost 90 years old, decidability of the Skolem Problem remains open. The main contribution of this paper is an algorithm to solve the Skolem Problem for simple linear recurrence sequences (those with simple characteristic roots). Whenever the algorithm terminates, it produces a stand-alone certificate that its output is correct -- a set of zeros together with a collection of witnesses that no further zeros exist. We give a proof that the algorithm always terminates assuming two classical number-theoretic conjectures: the Skolem Conjecture (also known as the Exponential Local-Global Principle) and the $p$-adic Schanuel Conjecture. Preliminary experiments with an implementation of this algorithm within the tool \textsc{Skolem} point to the practical applicability of this method.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Multilingual Argument Mining: Datasets and Analysis
Authors:
Orith Toledo-Ronen,
Matan Orbach,
Yonatan Bilu,
Artem Spector,
Noam Slonim
Abstract:
The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model…
▽ More
The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation. We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments, presumably because quality is harder to preserve under translation. In addition, focusing on the translate-train approach, we show how the choice of languages for translation, and the relations among them, affect the accuracy of the resultant model. Finally, to facilitate evaluation of transfer learning on argument mining tasks, we provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
The workweek is the best time to start a family -- A Study of GPT-2 Based Claim Generation
Authors:
Shai Gretz,
Yonatan Bilu,
Edo Cohen-Karlik,
Noam Slonim
Abstract:
Argument generation is a challenging task whose research is timely considering its potential impact on social media and the dissemination of information. Here we suggest a pipeline based on GPT-2 for generating coherent claims, and explore the types of claims that it produces, and their veracity, using an array of manual and automatic assessments. In addition, we explore the interplay between this…
▽ More
Argument generation is a challenging task whose research is timely considering its potential impact on social media and the dissemination of information. Here we suggest a pipeline based on GPT-2 for generating coherent claims, and explore the types of claims that it produces, and their veracity, using an array of manual and automatic assessments. In addition, we explore the interplay between this task and the task of Claim Retrieval, showing how they can complement one another.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus
Authors:
Yonatan Bilu,
Shai Gretz,
Edo Cohen,
Noam Slonim
Abstract:
One of the most impressive human endeavors of the past two decades is the collection and categorization of human knowledge in the free and accessible format that is Wikipedia. In this work we ask what makes a term worthy of entering this edifice of knowledge, and having a page of its own in Wikipedia? To what extent is this a natural product of on-going human discourse and discussion rather than a…
▽ More
One of the most impressive human endeavors of the past two decades is the collection and categorization of human knowledge in the free and accessible format that is Wikipedia. In this work we ask what makes a term worthy of entering this edifice of knowledge, and having a page of its own in Wikipedia? To what extent is this a natural product of on-going human discourse and discussion rather than an idiosyncratic choice of Wikipedia editors? Specifically, we aim to identify such "wiki-worthy" terms in a massive news corpus, and see if this can be done with no, or minimal, dependency on actual Wikipedia entries. We suggest a five-step pipeline for doing so, providing baseline results for all five, and the relevant datasets for benchmarking them. Our work sheds new light on the domain-specific Automatic Term Extraction problem, with the problem at hand being a domain-independent variant of it.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Out of the Echo Chamber: Detecting Countering Debate Speeches
Authors:
Matan Orbach,
Yonatan Bilu,
Assaf Toledo,
Dan Lahav,
Michal Jacovi,
Ranit Aharonov,
Noam Slonim
Abstract:
An educated and informed consumption of media content has become a challenge in modern times. With the shift from traditional news outlets to social media and similar venues, a major concern is that readers are becoming encapsulated in "echo chambers" and may fall prey to fake news and disinformation, lacking easy access to dissenting views. We suggest a novel task aiming to alleviate some of thes…
▽ More
An educated and informed consumption of media content has become a challenge in modern times. With the shift from traditional news outlets to social media and similar venues, a major concern is that readers are becoming encapsulated in "echo chambers" and may fall prey to fake news and disinformation, lacking easy access to dissenting views. We suggest a novel task aiming to alleviate some of these concerns -- that of detecting articles that most effectively counter the arguments -- and not just the stance -- made in a given text. We study this problem in the context of debate speeches. Given such a speech, we aim to identify, from among a set of speeches on the same topic and with an opposing stance, the ones that directly counter it. We provide a large dataset of 3,685 such speeches (in English), annotated for this relation, which hopefully would be of general interest to the NLP community. We explore several algorithms addressing this task, and while some are successful, all fall short of expert human performance, suggesting room for further research. All data collected during this work is freely available for research.
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
Financial Event Extraction Using Wikipedia-Based Weak Supervision
Authors:
Liat Ein-Dor,
Ariel Gera,
Orith Toledo-Ronen,
Alon Halfon,
Benjamin Sznajder,
Lena Dankin,
Yonatan Bilu,
Yoav Katz,
Noam Slonim
Abstract:
Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging relevant Wikipedia sections to extract weak labels for sentences describing economic events. Whereas previous weakly supervised approaches required a knowledge-base of…
▽ More
Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging relevant Wikipedia sections to extract weak labels for sentences describing economic events. Whereas previous weakly supervised approaches required a knowledge-base of such events, or corresponding financial figures, our approach requires no such additional data, and can be employed to extract economic events related to companies which are not even mentioned in the training data.
△ Less
Submitted 28 November, 2022; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Corpus Wide Argument Mining -- a Working Solution
Authors:
Liat Ein-Dor,
Eyal Shnarch,
Lena Dankin,
Alon Halfon,
Benjamin Sznajder,
Ariel Gera,
Carlos Alzate,
Martin Gleize,
Leshem Choshen,
Yufang Hou,
Yonatan Bilu,
Ranit Aharonov,
Noam Slonim
Abstract:
One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehen…
▽ More
One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehensive set of relevant arguments, over a wide range of topics, it requires leveraging a large and diverse corpus in an appropriate manner. Here we present a first end-to-end high-precision, corpus-wide argument mining system. This is made possible by combining sentence-level queries over an appropriate indexing of a very large corpus of newspaper articles, with an iterative annotation scheme. This scheme addresses the inherent label bias in the data and pinpoints the regions of the sample space whose manual labeling is required to obtain high-precision among top-ranked candidates.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
A Dataset of General-Purpose Rebuttal
Authors:
Matan Orbach,
Yonatan Bilu,
Ariel Gera,
Yoav Kantor,
Lena Dankin,
Tamar Lavee,
Lili Kotlerman,
Shachar Mirkin,
Michal Jacovi,
Ranit Aharonov,
Noam Slonim
Abstract:
In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension…
▽ More
In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension over argumentative content: given a speech on some specified topic, and a list of relevant arguments, the goal is to determine which of the arguments appear in the speech. The general rebuttals we describe here (written in English) overcome the need for topic-specific arguments to be provided, by proving to be applicable for a large set of topics. This allows creating responses beyond the scope of topics for which specific arguments are available. All data collected during this work is freely available for research.
△ Less
Submitted 1 September, 2019;
originally announced September 2019.
-
Argument Invention from First Principles
Authors:
Yonatan Bilu,
Ariel Gera,
Daniel Hershcovich,
Benjamin Sznajder,
Dan Lahav,
Guy Moshkowich,
Anael Malet,
Assaf Gavron,
Noam Slonim
Abstract:
Competitive debaters often find themselves facing a challenging task -- how to debate a topic they know very little about, with only minutes to prepare, and without access to books or the Internet? What they often do is rely on "first principles", commonplace arguments which are relevant to many topics, and which they have refined in past debates.
In this work we aim to explicitly define a taxon…
▽ More
Competitive debaters often find themselves facing a challenging task -- how to debate a topic they know very little about, with only minutes to prepare, and without access to books or the Internet? What they often do is rely on "first principles", commonplace arguments which are relevant to many topics, and which they have refined in past debates.
In this work we aim to explicitly define a taxonomy of such principled recurring arguments, and, given a controversial topic, to automatically identify which of these arguments are relevant to the topic.
As far as we know, this is the first time that this approach to argument invention is formalized and made explicit in the context of NLP.
The main goal of this work is to show that it is possible to define such a taxonomy. While the taxonomy suggested here should be thought of as a "first attempt" it is nonetheless coherent, covers well the relevant topics and coincides with what professional debaters actually argue in their speeches, and facilitates automatic argument invention for new topics.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Controversy in Context
Authors:
Benjamin Sznajder,
Ariel Gera,
Yonatan Bilu,
Dafna Sheinwald,
Ella Rabinovich,
Ranit Aharonov,
David Konopnicki,
Noam Slonim
Abstract:
With the growing interest in social applications of Natural Language Processing and Computational Argumentation, a natural question is how controversial a given concept is. Prior works relied on Wikipedia's metadata and on content analysis of the articles pertaining to a concept in question. Here we show that the immediate textual context of a concept is strongly indicative of this property, and,…
▽ More
With the growing interest in social applications of Natural Language Processing and Computational Argumentation, a natural question is how controversial a given concept is. Prior works relied on Wikipedia's metadata and on content analysis of the articles pertaining to a concept in question. Here we show that the immediate textual context of a concept is strongly indicative of this property, and, using simple and language-independent machine-learning tools, we leverage this observation to achieve state-of-the-art results in controversiality prediction. In addition, we analyze and make available a new dataset of concepts labeled for controversiality. It is significantly larger than existing datasets, and grades concepts on a 0-10 scale, rather than treating controversiality as a binary label.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining
Authors:
Tamar Lavee,
Matan Orbach,
Lili Kotlerman,
Yoav Kantor,
Shai Gretz,
Lena Dankin,
Shachar Mirkin,
Michal Jacovi,
Yonatan Bilu,
Ranit Aharonov,
Noam Slonim
Abstract:
Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indee…
▽ More
Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of $400$ speeches in English discussing $200$ controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.
△ Less
Submitted 27 July, 2019;
originally announced July 2019.
-
On the practically interesting instances of MAXCUT
Authors:
Yonatan Bilu,
Amit Daniely,
Nati Linial,
Michael Saks
Abstract:
The complexity of a computational problem is traditionally quantified based on the hardness of its worst case. This approach has many advantages and has led to a deep and beautiful theory. However, from the practical perspective, this leaves much to be desired. In application areas, practically interesting instances very often occupy just a tiny part of an algorithm's space of instances, and the v…
▽ More
The complexity of a computational problem is traditionally quantified based on the hardness of its worst case. This approach has many advantages and has led to a deep and beautiful theory. However, from the practical perspective, this leaves much to be desired. In application areas, practically interesting instances very often occupy just a tiny part of an algorithm's space of instances, and the vast majority of instances are simply irrelevant. Addressing these issues is a major challenge for theoretical computer science which may make theory more relevant to the practice of computer science.
Following Bilu and Linial, we apply this perspective to MAXCUT, viewed as a clustering problem. Using a variety of techniques, we investigate practically interesting instances of this problem. Specifically, we show how to solve in polynomial time distinguished, metric, expanding and dense instances of MAXCUT under mild stability assumptions. In particular, $(1+ε)$-stability (which is optimal) suffices for metric and dense MAXCUT. We also show how to solve in polynomial time $Ω(\sqrt{n})$-stable instances of MAXCUT, substantially improving the best previously known result.
△ Less
Submitted 22 May, 2012;
originally announced May 2012.
-
Are stable instances easy?
Authors:
Yonatan Bilu,
Nathan Linial
Abstract:
We introduce the notion of a stable instance for a discrete optimization problem, and argue that in many practical situations only sufficiently stable instances are of interest. The question then arises whether stable instances of NP--hard problems are easier to solve. In particular, whether there exist algorithms that solve correctly and in polynomial time all sufficiently stable instances of s…
▽ More
We introduce the notion of a stable instance for a discrete optimization problem, and argue that in many practical situations only sufficiently stable instances are of interest. The question then arises whether stable instances of NP--hard problems are easier to solve. In particular, whether there exist algorithms that solve correctly and in polynomial time all sufficiently stable instances of some NP--hard problem. The paper focuses on the Max--Cut problem, for which we show that this is indeed the case.
△ Less
Submitted 17 June, 2009;
originally announced June 2009.