Search | arXiv e-print repository

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Authors: Philippe Laban, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu

Abstract: LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. We design a procedure to synthesize Haystacks of documents, ensuring that specifi… ▽ More LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. We design a procedure to synthesize Haystacks of documents, ensuring that specific \textit{insights} repeat across documents. The "Summary of a Haystack" (SummHay) task then requires a system to process the Haystack and generate, given a query, a summary that identifies the relevant insights and precisely cites the source documents. Since we have precise knowledge of what insights should appear in a haystack summary and what documents should be cited, we implement a highly reproducible automatic evaluation that can score summaries on two aspects - Coverage and Citation. We generate Haystacks in two domains (conversation, news), and perform a large-scale evaluation of 10 LLMs and corresponding 50 RAG systems. Our findings indicate that SummHay is an open challenge for current systems, as even systems provided with an Oracle signal of document relevance lag our estimate of human performance (56\%) by 10+ points on a Joint Score. Without a retriever, long-context LLMs like GPT-4o and Claude 3 Opus score below 20% on SummHay. We show SummHay can also be used to study enterprise RAG systems and position bias in long-context models. We hope future systems can equal and surpass human performance on SummHay. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2404.16251 [pdf, ps, other]

Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

Authors: Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

Abstract: Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and ope… ▽ More Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research. △ Less

Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2311.09458 [pdf, other]

Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of Lexical Overlap in Train and Test Reference Summaries

Authors: Prafulla Kumar Choubey, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu

Abstract: Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competencies. We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with traini… ▽ More Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competencies. We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries. We observe up to a 5x (1.2x) difference in ROUGE-2 (entity recall) scores between the subsets with the lowest and highest similarity. Next, we show that such training repetitions also make a model vulnerable to rote learning, reproducing data artifacts such as factual errors, especially when reference test summaries are lexically close to training summaries. Consequently, we propose to limit lexical repetitions in training summaries during both supervised fine-tuning and likelihood calibration stages to improve the performance on novel test cases while retaining average performance. Our automatic and human evaluations on novel test subsets and recent news articles show that limiting lexical repetitions in training summaries can prevent rote learning and improve generalization. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: EMNLP 2023-Findings

arXiv:2311.09184 [pdf, other]

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

Authors: Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan

Abstract: While large language models (LLMs) already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for the desired summary characteristi… ▽ More While large language models (LLMs) already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for the desired summary characteristics. To this end, we curate an evaluation-only dataset for this task setting and conduct human evaluation on 5 LLM-based summarization systems. We then benchmark LLM-based automatic evaluation for this task with 4 different evaluation protocols and 11 LLMs, resulting in 40 evaluation methods in total. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) all LLM-based evaluation methods cannot achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation. We make our collected benchmark, InstruSum, publicly available to facilitate future research in this direction. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: GitHub Repo: https://github.com/yale-nlp/InstruSum

arXiv:2309.09369 [pdf, other]

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Authors: Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

Abstract: Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To… ▽ More Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Next, to enable consistent automatic evaluation, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness of summaries. Through correlation analyses, we outline the best practices for effectively using automatic LLM-based metrics on the DiverseSumm dataset. Finally, we study how LLMs summarize multiple news articles by analyzing which type of diverse information LLMs are capable of identifying. Our analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them mainly due to their limited coverage, with GPT-4 only able to cover under 40% of the diverse information on average. △ Less

Submitted 22 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: NAACL 2024

arXiv:2305.17779 [pdf, other]

Generating EDU Extracts for Plan-Guided Summary Re-Ranking

Authors: Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Kathleen McKeown, Noémie Elhadad

Abstract: Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse beam search) produce candidates with redundant, and often low quality, content. In this paper, we design a novel method to generate candidates for re… ▽ More Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse beam search) produce candidates with redundant, and often low quality, content. In this paper, we design a novel method to generate candidates for re-ranking that addresses these issues. We ground each candidate abstract on its own unique content plan and generate distinct plan-guided abstracts using a model's top beam. More concretely, a standard language model (a BART LM) auto-regressively generates elemental discourse unit (EDU) content plans with an extractive copy mechanism. The top K beams from the content plan generator are then used to guide a separate LM, which produces a single abstractive candidate for each distinct plan. We apply an existing re-ranker (BRIO) to abstractive candidates generated from our method, as well as baseline decoding methods. We show large relevance improvements over previously published methods on widely used single document news article corpora, with ROUGE-2 F1 gains of 0.88, 2.01, and 0.38 on CNN / Dailymail, NYT, and Xsum, respectively. A human evaluation on CNN / DM validates these results. Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow EDU plans outperforms sampling-based methods by 1.05 ROUGE-2 F1 points. Code to generate and realize plans is available at https://github.com/griff4692/edu-sum. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.14540 [pdf, other]

LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

Authors: Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

Abstract: With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de… ▽ More With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency detection compared to traditional non-LLM methods. However, a closer analysis reveals that most LLMs fail on more complex formulations of the task and exposes issues with existing evaluation benchmarks, affecting evaluation precision. To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. This new benchmark is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, as we estimate inter-annotator agreement at about 0.9. Most LLMs struggle on SummEdits, with performance close to random chance. The best-performing model, GPT-4, is still 8\% below estimated human performance, highlighting the gaps in LLMs' ability to reason about facts and detect inconsistencies when they occur. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.14239 [pdf, other]

On Learning to Summarize with Large Language Models as References

Authors: Yixin Liu, Kejian Shi, Katherine S He, Longtian Ye, Alexander R. Fabbri, Pengfei Liu, Dragomir Radev, Arman Cohan

Abstract: Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we investigate a new learning setting of text summarization models that considers the LLMs as the reference or the gold-standard oracle on these datasets. To examine the standard practices that a… ▽ More Recent studies have found that summaries generated by large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. Therefore, we investigate a new learning setting of text summarization models that considers the LLMs as the reference or the gold-standard oracle on these datasets. To examine the standard practices that are aligned with this new learning setting, we investigate two LLM-based summary quality evaluation methods for model training and adopt a contrastive learning training method to leverage the LLM-guided learning signals. Our experiments on the CNN/DailyMail and XSum datasets demonstrate that smaller summarization models can achieve similar performance as LLMs under LLM-based evaluation. However, we found that the smaller models can not yet reach LLM-level performance under human evaluation despite promising improvements brought by our proposed training methods. Meanwhile, we perform a meta-analysis on this new learning setting that reveals a discrepancy between human and LLM-based evaluation, highlighting the benefits and risks of this LLM-as-reference setting we investigated. △ Less

Submitted 16 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: GitHub Repo: https://github.com/yixinL7/SumLLM

arXiv:2303.03608 [pdf, other]

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation

Authors: Yixin Liu, Alexander R. Fabbri, Yilun Zhao, Pengfei Liu, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev

Abstract: Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation pipeline that first extracts basic information units from one text sequence and then checks the extracted units in another sequence. The metrics we de… ▽ More Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation pipeline that first extracts basic information units from one text sequence and then checks the extracted units in another sequence. The metrics we developed include two-stage metrics that can provide high interpretability at both the fine-grained unit level and summary level, and one-stage metrics that achieve a balance between efficiency and interpretability. We make the developed tools publicly available at https://github.com/Yale-LILY/AutoACU. △ Less

Submitted 16 November, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: EMNLP 2023 Camera Ready Version

arXiv:2212.10449 [pdf, other]

Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization

Authors: Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kryściński, Chien-Sheng Wu

Abstract: In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries. In this paper, we introduce Socratic pretraining, a question-driven, unsupervised pretraining objective specifically designed to improve controllability in summarization tasks. By training a model to generate and answer relevant questio… ▽ More In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries. In this paper, we introduce Socratic pretraining, a question-driven, unsupervised pretraining objective specifically designed to improve controllability in summarization tasks. By training a model to generate and answer relevant questions in a given context, Socratic pretraining enables the model to more effectively adhere to user-provided queries and identify relevant content to be summarized. We demonstrate the effectiveness of this approach through extensive experimentation on two summarization domains, short stories and dialogue, and multiple control strategies: keywords, questions, and factoid QA pairs. Our pretraining method relies only on unlabeled documents and a question generation system and outperforms pre-finetuning approaches that use additional supervised data. Furthermore, our results show that Socratic pretraining cuts task-specific labeled data requirements in half, is more faithful to user-provided queries, and achieves state-of-the-art performance on QMSum and SQuALITY. △ Less

Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: To appear at ACL 2023

arXiv:2212.07981 [pdf, other]

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation

Authors: Yixin Liu, Alexander R. Fabbri, Pengfei Liu, Yilun Zhao, Linyong Nan, Ruilin Han, Simeng Han, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev

Abstract: Human evaluation is the foundation upon which the evaluation of both summarization systems and automatic metrics rests. However, existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale, and an in-depth analysis of human evaluation is lacking. Therefore, we address the shortcomings of existing summarization evaluation along the f… ▽ More Human evaluation is the foundation upon which the evaluation of both summarization systems and automatic metrics rests. However, existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale, and an in-depth analysis of human evaluation is lacking. Therefore, we address the shortcomings of existing summarization evaluation along the following axes: (1) We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units and allows for a high inter-annotator agreement. (2) We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems on three datasets. (3) We conduct a comparative study of four human evaluation protocols, underscoring potential confounding factors in evaluation setups. (4) We evaluate 50 automatic metrics and their variants using the collected human annotations across evaluation protocols and demonstrate how our benchmark leads to more statistically stable and significant results. The metrics we benchmarked include recent methods based on large language models (LLMs), GPTScore and G-Eval. Furthermore, our findings have important implications for evaluating LLMs, as we show that LLMs adjusted by human feedback (e.g., GPT-3.5) may overfit unconstrained human evaluation, which is affected by the annotators' prior, input-agnostic preferences, calling for more robust, targeted evaluation methods. △ Less

Submitted 6 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: ACL 2023 Camera Ready

arXiv:2211.15914 [pdf, other]

Prompted Opinion Summarization with GPT-3.5

Authors: Adithya Bhaskar, Alexander R. Fabbri, Greg Durrett

Abstract: Large language models have shown impressive performance across a wide variety of tasks, including text summarization. In this paper, we show that this strong performance extends to opinion summarization. We explore several pipeline methods for applying GPT-3.5 to summarize a large collection of user reviews in a prompted fashion. To handle arbitrarily large numbers of user reviews, we explore recu… ▽ More Large language models have shown impressive performance across a wide variety of tasks, including text summarization. In this paper, we show that this strong performance extends to opinion summarization. We explore several pipeline methods for applying GPT-3.5 to summarize a large collection of user reviews in a prompted fashion. To handle arbitrarily large numbers of user reviews, we explore recursive summarization as well as methods for selecting salient content to summarize through supervised clustering or extraction. On two datasets, an aspect-oriented summarization dataset of hotel reviews (SPACE) and a generic summarization dataset of Amazon and Yelp reviews (FewSum), we show that GPT-3.5 models achieve very strong performance in human evaluation. We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity to contrast these different methods. △ Less

Submitted 23 May, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Accepted to ACL (Findings) 2023

arXiv:2211.06196 [pdf, other]

Improving Factual Consistency in Summarization with Compression-Based Post-Editing

Authors: Alexander R. Fabbri, Prafulla Kumar Choubey, Jesse Vig, Chien-Sheng Wu, Caiming Xiong

Abstract: State-of-the-art summarization models still struggle to be factually consistent with the input text. A model-agnostic way to address this problem is post-editing the generated summaries. However, existing approaches typically fail to remove entity errors if a suitable input entity replacement is not available or may insert erroneous content. In our work, we focus on removing extrinsic entity error… ▽ More State-of-the-art summarization models still struggle to be factually consistent with the input text. A model-agnostic way to address this problem is post-editing the generated summaries. However, existing approaches typically fail to remove entity errors if a suitable input entity replacement is not available or may insert erroneous content. In our work, we focus on removing extrinsic entity errors, or entities not in the source, to improve consistency while retaining the summary's essential information and form. We propose to use sentence-compression data to train the post-editing model to take a summary with extrinsic entity errors marked with special tokens and output a compressed, well-formed summary with those errors removed. We show that this model improves factual consistency while maintaining ROUGE, improving entity precision by up to 30% on XSum, and that this model can be applied on top of another post-editor, improving entity precision by up to a total of 38%. We perform an extensive comparison of post-editing approaches that demonstrate trade-offs between factual consistency, informativeness, and grammaticality, and we analyze settings where post-editors show the largest improvements. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: EMNLP 2022

arXiv:2211.05886 [pdf, ps, other]

CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

Authors: Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryściński, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman

Abstract: This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique cha… ▽ More This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique challenges and is yet underexplored for text summarization systems. In this shared task, we introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts. We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions. As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total. We discuss the submissions and the baselines for each sub-task in this paper, along with directions for facilitating future work in the field. △ Less

Submitted 6 December, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 4 pages + 3 for references and appendix

arXiv:2209.00840 [pdf, other]

FOLIO: Natural Language Reasoning with First-Order Logic

Authors: Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri , et al. (10 additional authors not shown)

Abstract: Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FO… ▽ More Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4. △ Less

Submitted 17 May, 2024; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2205.12854 [pdf, other]

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

Authors: Liyan Tang, Tanya Goyal, Alexander R. Fabbri, Philippe Laban, Jiacheng Xu, Semih Yavuz, Wojciech Kryściński, Justin F. Rousseau, Greg Durrett

Abstract: The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has be… ▽ More The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights. △ Less

Submitted 25 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted to ACL 2023

arXiv:2112.08542 [pdf, other]

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

Authors: Alexander R. Fabbri, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

Abstract: Factual consistency is an essential quality of text summarization models in practical settings. Existing work in evaluating this dimension can be broadly categorized into two lines of research, entailment-based and question answering (QA)-based metrics, and different experimental setups often lead to contrasting conclusions as to which paradigm performs the best. In this work, we conduct an extens… ▽ More Factual consistency is an essential quality of text summarization models in practical settings. Existing work in evaluating this dimension can be broadly categorized into two lines of research, entailment-based and question answering (QA)-based metrics, and different experimental setups often lead to contrasting conclusions as to which paradigm performs the best. In this work, we conduct an extensive comparison of entailment and QA-based metrics, demonstrating that carefully choosing the components of a QA-based metric, especially question generation and answerability classification, is critical to performance. Building on those insights, we propose an optimized metric, which we call QAFactEval, that leads to a 14% average improvement over previous QA-based metrics on the SummaC factual consistency benchmark, and also outperforms the best-performing entailment-based metric. Moreover, we find that QA-based and entailment-based metrics can offer complementary signals and be combined into a single metric for a further performance boost. △ Less

Submitted 29 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: NAACL 2022

arXiv:2112.07637 [pdf, other]

Exploring Neural Models for Query-Focused Summarization

Authors: Jesse Vig, Alexander R. Fabbri, Wojciech Kryściński, Chien-Sheng Wu, Wenhao Liu

Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. While recently released datasets, such as QMSum or AQuaMuSe, facilitate research efforts in QFS, the field lacks a comprehensive study of the broad space of applicable modeling methods. In this paper we conduct a systematic exploration of neur… ▽ More Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. While recently released datasets, such as QMSum or AQuaMuSe, facilitate research efforts in QFS, the field lacks a comprehensive study of the broad space of applicable modeling methods. In this paper we conduct a systematic exploration of neural approaches to QFS, considering two general classes of methods: two-stage extractive-abstractive solutions and end-to-end models. Within those categories, we investigate existing models and explore strategies for transfer learning. We also present two modeling extensions that achieve state-of-the-art performance on the QMSum dataset, up to a margin of 3.38 ROUGE-1, 3.72 ROUGE2, and 3.28 ROUGE-L when combined with transfer learning strategies. Results from human evaluation suggest that the best models produce more comprehensive and factually consistent summaries compared to a baseline model. Code and checkpoints are made publicly available: https://github.com/salesforce/query-focused-sum. △ Less

Submitted 26 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Findings of NAACL 2022

arXiv:2112.04139 [pdf, other]

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

Authors: Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Ye** Choi, Noah A. Smith

Abstract: Natural language processing researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic metrics and of crowdworker judgments. Meanwhile, efforts to improve generation models tend to depend on simple n-gram overlap metrics (e.g., BLEU, ROUGE). We argue that new advances on models and metrics should each more direc… ▽ More Natural language processing researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic metrics and of crowdworker judgments. Meanwhile, efforts to improve generation models tend to depend on simple n-gram overlap metrics (e.g., BLEU, ROUGE). We argue that new advances on models and metrics should each more directly benefit and inform the other. We therefore propose a generalization of leaderboards, bidimensional leaderboards (Billboards), that simultaneously tracks progress in language generation models and metrics for their evaluation. Unlike conventional unidimensional leaderboards that sort submitted systems by predetermined metrics, a Billboard accepts both generators and evaluation metrics as competing entries. A Billboard automatically creates an ensemble metric that selects and linearly combines a few metrics based on a global analysis across generators. Further, metrics are ranked based on their correlation with human judgments. We release four Billboards for machine translation, summarization, and image captioning. We demonstrate that a linear ensemble of a few diverse metrics sometimes substantially outperforms existing metrics in isolation. Our mixed-effects model analysis shows that most automatic metrics, especially the reference-based ones, overrate machine over human generation, demonstrating the importance of updating metrics as generation models become stronger (and perhaps more similar to humans) in the future. Our project website is available at https://nlp.cs.washington.edu/billboard/. △ Less

Submitted 18 May, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Proc. of NAACL 2022

arXiv:2111.06474 [pdf, other]

AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization

Authors: Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab

Abstract: Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions. Each question thread can receive a large number of answers with different perspectives. One goal of answer summarization is to produce a summary that reflects the range of answer perspectives. A major obstacle for this task is the absenc… ▽ More Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions. Each question thread can receive a large number of answers with different perspectives. One goal of answer summarization is to produce a summary that reflects the range of answer perspectives. A major obstacle for this task is the absence of a dataset to provide supervision for producing such summaries. Recent works propose heuristics to create such data, but these are often noisy and do not cover all answer perspectives present. This work introduces a novel dataset of 4,631 CQA threads for answer summarization curated by professional linguists. Our pipeline gathers annotations for all subtasks of answer summarization, including relevant answer sentence selection, grou** these sentences based on perspectives, summarizing each perspective, and producing an overall summary. We analyze and benchmark state-of-the-art models on these subtasks and introduce a novel unsupervised approach for multi-perspective data augmentation that boosts summarization performance according to automatic evaluation. Finally, we propose reinforcement learning rewards to improve factual consistency and answer coverage and analyze areas for improvement. △ Less

Submitted 29 April, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

Comments: NAACL 2022; arXiv admin note: substantial text overlap with arXiv:2104.08536

arXiv:2110.07166 [pdf, other]

CaPE: Contrastive Parameter Ensembling for Reducing Hallucination in Abstractive Summarization

Authors: Prafulla Kumar Choubey, Alexander R. Fabbri, Jesse Vig, Chien-Sheng Wu, Wenhao Liu, Nazneen Fatema Rajani

Abstract: Hallucination is a known issue for neural abstractive summarization models. Recent work suggests that the degree of hallucination may depend on errors in the training data. In this work, we propose a new method called Contrastive Parameter Ensembling (CaPE) to use training data more effectively, utilizing variations in noise in training samples to reduce hallucination. We first select clean and no… ▽ More Hallucination is a known issue for neural abstractive summarization models. Recent work suggests that the degree of hallucination may depend on errors in the training data. In this work, we propose a new method called Contrastive Parameter Ensembling (CaPE) to use training data more effectively, utilizing variations in noise in training samples to reduce hallucination. We first select clean and noisy subsets from the training data using different automatic factual metrics. Then, we fine-tune a base summarization model, which is trained on all training samples, on the clean (noisy) subset to obtain an \textit{expert} (\textit{anti-expert}) model. Finally, we adjust the parameters of base model by the difference between parameters of the \textit{expert} and \textit{anti-expert} models, steering the base model towards the \textit{expert} model and away from the \textit{anti-expert} model. Experimental results show that CaPE improves performance across different automatic factual metrics and human evaluation, with the maximum improvement of 16.69\% and 15.78\% on summary-level dependency-arc entailment accuracy for the XSUM and CNN/DM datasets. The improvement in factual performance does not degrade the performance on other metrics of informativeness such as ROUGE. △ Less

Submitted 20 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2106.00829 [pdf, other]

ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining

Authors: Alexander R. Fabbri, Faiaz Rahman, Imad Rizvi, Borui Wang, Haoran Li, Yashar Mehdad, Dragomir Radev

Abstract: While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles. This research gap is due, in part, to the lack of standardized datasets for summarizing online discussions. To address this gap, we design annotation protocols motivated by an issues--viewpoints--assertions framework to… ▽ More While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles. This research gap is due, in part, to the lack of standardized datasets for summarizing online discussions. To address this gap, we design annotation protocols motivated by an issues--viewpoints--assertions framework to crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data. To create a comprehensive benchmark, we also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. Furthermore, we incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: ACL 2021

arXiv:2104.08536 [pdf, other]

Multi-Perspective Abstractive Answer Summarization

Authors: Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Mona Diab

Abstract: Community Question Answering (CQA) forums such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of questions. Each question thread can receive a large number of answers with different perspectives. The goal of multi-perspective answer summarization is to produce a summary that includes all perspectives of the answer. A major obstacle for multi-perspective, ab… ▽ More Community Question Answering (CQA) forums such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of questions. Each question thread can receive a large number of answers with different perspectives. The goal of multi-perspective answer summarization is to produce a summary that includes all perspectives of the answer. A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries. This work introduces a novel dataset creation method to automatically create multi-perspective, bullet-point abstractive summaries from an existing CQA forum. Supervision provided by this dataset trains models to inherently produce multi-perspective summaries. Additionally, to train models to output more diverse, faithful answer summaries while retaining multiple perspectives, we propose a multi-reward optimization technique coupled with a sentence-relevance prediction multi-task loss. Our methods demonstrate improved coverage of perspectives and faithfulness as measured by automatic and human evaluations compared to a strong baseline. △ Less

Submitted 17 April, 2021; originally announced April 2021.

arXiv:2010.12836 [pdf, other]

Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Authors: Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, Yashar Mehdad

Abstract: Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a novel and generalizable method, called WikiTransfer, for fin… ▽ More Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner. WikiTransfer fine-tunes pretrained models on pseudo-summaries, produced from generic Wikipedia data, which contain characteristics of the target dataset, such as the length and level of abstraction of the desired summaries. WikiTransfer models achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional diverse datasets. These models are more robust to noisy data and also achieve better or comparable few-shot performance using 10 and 100 training examples when compared to few-shot transfer from other summarization datasets. To further boost performance, we employ data augmentation via round-trip translation as well as introduce a regularization term for improved few-shot transfer. To understand the role of dataset aspects in transfer performance and the quality of the resulting output summaries, we further study the effect of the components of our unsupervised fine-tuning data and analyze few-shot performance using both automatic and human evaluation. △ Less

Submitted 11 April, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: NAACL 2021

arXiv:2007.12626 [pdf, other]

SummEval: Re-evaluating Summarization Evaluation

Authors: Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev

Abstract: The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization mode… ▽ More The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations, 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics, 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format, 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics, 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in develo** evaluation metrics that better correlate with human judgments. △ Less

Submitted 1 February, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: 11 pages, 4 tables, 2 figures; pre-MIT Press publication version

arXiv:2004.11892 [pdf, other]

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Authors: Alexander R. Fabbri, Patrick Ng, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

Abstract: Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to traini… ▽ More Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships. Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14%, and 20% when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA. △ Less

Submitted 24 April, 2020; originally announced April 2020.

Comments: ACL 2020

arXiv:1909.01716 [pdf, other]

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

Authors: Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R. Fabbri, Irene Li, Dan Friedman, Dragomir R. Radev

Abstract: Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and… ▽ More Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate the authors' original highlights (abstract) and the article's actual impacts on the community (citations), to create comprehensive, hybrid summaries. We conduct experiments to demonstrate the efficacy of our corpus in training data-driven models for scientific paper summarization and the advantage of our hybrid summaries over abstracts and traditional citation-based summaries. Our large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research. △ Less

Submitted 15 September, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

Comments: AAAI 2019

arXiv:1906.10910 [pdf, other]

Creating A Neural Pedagogical Agent by Jointly Learning to Review and Assess

Authors: Youngnam Lee, Youngduck Choi, Junghyun Cho, Alexander R. Fabbri, Hyunbin Loh, Chanyou Hwang, Yongku Lee, Sang-Wook Kim, Dragomir Radev

Abstract: Machine learning plays an increasing role in intelligent tutoring systems as both the amount of data available and specialization among students grow. Nowadays, these systems are frequently deployed on mobile applications. Users on such mobile education platforms are dynamic, frequently being added, accessing the application with varying levels of focus, and changing while using the service. The e… ▽ More Machine learning plays an increasing role in intelligent tutoring systems as both the amount of data available and specialization among students grow. Nowadays, these systems are frequently deployed on mobile applications. Users on such mobile education platforms are dynamic, frequently being added, accessing the application with varying levels of focus, and changing while using the service. The education material itself, on the other hand, is often static and is an exhaustible resource whose use in tasks such as problem recommendation must be optimized. The ability to update user models with respect to educational material in real-time is thus essential; however, existing approaches require time-consuming re-training of user features whenever new data is added. In this paper, we introduce a neural pedagogical agent for real-time user modeling in the task of predicting user response correctness, a central task for mobile education applications. Our model, inspired by work in natural language processing on sequence modeling and machine translation, updates user features in real-time via bidirectional recurrent neural networks with an attention mechanism over embedded question-response pairs. We experiment on the mobile education application SantaTOEIC, which has 559k users, 66M response data points as well as a set of 10k study problems each expert-annotated with topic tags and gathered since 2016. Our model outperforms existing approaches over several metrics in predicting user response correctness, notably out-performing other methods on new users without large question-response histories. Additionally, our attention mechanism and annotated tag set allow us to create an interpretable education platform, with a smart review system that addresses the aforementioned issue of varied user attention and problem exhaustion. △ Less

Submitted 1 July, 2019; v1 submitted 26 June, 2019; originally announced June 2019.

Comments: 9 pages, 9 figures, 7 tables

arXiv:1906.01749 [pdf, other]

Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model

Authors: Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, Dragomir R. Radev

Abstract: Automatic generation of summaries from multiple news articles is a valuable tool as the number of online publications grows rapidly. Single document summarization (SDS) systems have benefited from advances in neural encoder-decoder model thanks to the availability of large datasets. However, multi-document summarization (MDS) of news articles has been limited to datasets of a couple of hundred exa… ▽ More Automatic generation of summaries from multiple news articles is a valuable tool as the number of online publications grows rapidly. Single document summarization (SDS) systems have benefited from advances in neural encoder-decoder model thanks to the availability of large datasets. However, multi-document summarization (MDS) of news articles has been limited to datasets of a couple of hundred examples. In this paper, we introduce Multi-News, the first large-scale MDS news dataset. Additionally, we propose an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets. We benchmark several methods on Multi-News and release our data and code in hope that this work will promote advances in summarization in the multi-document setting. △ Less

Submitted 19 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: ACL 2019, 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019

arXiv:1903.09755 [pdf, other]

Trifocal Relative Pose from Lines at Points and its Efficient Solution

Authors: Ricardo Fabbri, Timothy Duff, Hongyi Fan, Margaret Regan, David da Costa de Pinho, Elias Tsigaridas, Charles Wampler, Jonathan Hauenstein, Benjamin Kimia, Anton Leykin, Tomas Pajdla

Abstract: We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of i) three points and one line and the novel case of ii) three points and two lines through two of the points. These problems are too difficult to be efficiently solved by the state of the art Groebner basis methods. Our method is based on a new… ▽ More We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of i) three points and one line and the novel case of ii) three points and two lines through two of the points. These problems are too difficult to be efficiently solved by the state of the art Groebner basis methods. Our method is based on a new efficient homotopy continuation (HC) solver framework MINUS, which dramatically speeds up previous HC solving by specializing HC methods to generic cases of our problems. We characterize their number of solutions and show with simulated experiments that our solvers are numerically robust and stable under image noise, a key contribution given the borderline intractable degree of nonlinearity of trinocular constraints. We show in real experiments that i) SIFT feature location and orientation provide good enough point-and-line correspondences for three-view reconstruction and ii) that we can solve difficult cases with too few or too noisy tentative matches, where the state of the art structure from motion initialization fails. △ Less

Submitted 29 November, 2022; v1 submitted 23 March, 2019; originally announced March 2019.

Comments: First appeared at CVPR - Computer Vision and Pattern Recognition Conference 2020. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 while most authors were in residence at Brown University's Institute for Computational and Experimental Research in Mathematics -- ICERM, in Providence, RI

MSC Class: 14Qxx; 12Yxx; 51N15; 14N05; 53A20; 17B81; 22E70; 53A04; 53A55; 53Bxx; 53B5; 57R25; 58C25; 68T40; 68U05; 70B1; 70G55; 70G65; 90C30 ACM Class: I.4.5; I.4.8; I.2.9; I.2.10; I.1.2; G.1.3; G.1.5

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, preprint available December 2022

arXiv:1811.12181 [pdf, other]

What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning

Authors: Irene Li, Alexander R. Fabbri, Robert R. Tung, Dragomir R. Radev

Abstract: Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question o… ▽ More Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of "what should one learn first," we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset. △ Less

Submitted 26 November, 2018; originally announced November 2018.

arXiv:1808.07531 [pdf, other]

Sarcasm Analysis using Conversation Context

Authors: Debanjan Ghosh, Alexander R. Fabbri, Smaranda Muresan

Abstract: Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, the speaker's sarcastic intent is not always apparent without additional context. Focusing on social media discussions, we investigate three issues: (1) does modeling conversation context help in sarcasm detection; (2) can we identify what part of conversation context triggered the sarc… ▽ More Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, the speaker's sarcastic intent is not always apparent without additional context. Focusing on social media discussions, we investigate three issues: (1) does modeling conversation context help in sarcasm detection; (2) can we identify what part of conversation context triggered the sarcastic reply; and (3) given a sarcastic post that contains multiple sentences, can we identify the specific sentence that is sarcastic. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the current turn. We show that LSTM networks with sentence-level attention on context and current turn, as well as the conditional LSTM network (Rocktaschel et al. 2016), outperform the LSTM model that reads only the current turn. As conversation context, we consider the prior turn, the succeeding turn or both. Our computational models are tested on two types of social media platforms: Twitter and discussion forums. We discuss several differences between these datasets ranging from their size to the nature of the gold-label annotations. To address the last two issues, we present a qualitative analysis of attention weights produced by the LSTM models (with attention) and discuss the results compared with human performance on the two tasks. △ Less

Submitted 28 August, 2018; v1 submitted 22 August, 2018; originally announced August 2018.

Comments: Computational Linguistics (journal)

arXiv:1805.04617 [pdf, other]

TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation

Authors: Alexander R. Fabbri, Irene Li, Prawat Trairatvorakul, Yijiao He, Wei Tai Ting, Robert Tung, Caitlin Westerfield, Dragomir R. Radev

Abstract: The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources. In order to learn this dynamic field or stay up-to-date on the latest research, students as well as educators and researchers must constantly sift through multiple sources to find valuable, relevant information. To address… ▽ More The field of Natural Language Processing (NLP) is growing rapidly, with new research published daily along with an abundance of tutorials, codebases and other online resources. In order to learn this dynamic field or stay up-to-date on the latest research, students as well as educators and researchers must constantly sift through multiple sources to find valuable, relevant information. To address this situation, we introduce TutorialBank, a new, publicly available dataset which aims to facilitate NLP education and research. We have manually collected and categorized over 6,300 resources on NLP as well as the related fields of Artificial Intelligence (AI), Machine Learning (ML) and Information Retrieval (IR). Our dataset is notably the largest manually-picked corpus of resources intended for NLP education which does not include only academic papers. Additionally, we have created both a search engine and a command-line tool for the resources and have annotated the corpus to include lists of research topics, relevant resources for each topic, prerequisite relations among topics, relevant sub-parts of individual resources, among other annotations. We are releasing the dataset and present several avenues for further research. △ Less

Submitted 11 May, 2018; originally announced May 2018.

Comments: ACL 2018, 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018

arXiv:1712.09359 [pdf, other]

Basic concepts and tools for the Toki Pona minimal and constructed language: description of the language and main issues; analysis of the vocabulary; text synthesis and syntax highlighting; Wordnet synsets

Authors: Renato Fabbri

Abstract: A minimal constructed language (conlang) is useful for experiments and comfortable for making tools. The Toki Pona (TP) conlang is minimal both in the vocabulary (with only 14 letters and 124 lemmas) and in the (about) 10 syntax rules. The language is useful for being a used and somewhat established minimal conlang with at least hundreds of fluent speakers. This article exposes current concepts an… ▽ More A minimal constructed language (conlang) is useful for experiments and comfortable for making tools. The Toki Pona (TP) conlang is minimal both in the vocabulary (with only 14 letters and 124 lemmas) and in the (about) 10 syntax rules. The language is useful for being a used and somewhat established minimal conlang with at least hundreds of fluent speakers. This article exposes current concepts and resources for TP, and makes available Python (and Vim) scripted routines for the analysis of the language, synthesis of texts, syntax highlighting schemes, and the achievement of a preliminary TP Wordnet. Focus is on the analysis of the basic vocabulary, as corpus analyses were found. The synthesis is based on sentence templates, relates to context by kee** track of used words, and renders larger texts by using a fixed number of phonemes (e.g. for poems) and number of sentences, words and letters (e.g. for paragraphs). Syntax highlighting reflects morphosyntactic classes given in the official dictionary and different solutions are described and implemented in the well-established Vim text editor. The tentative TP Wordnet is made available in three patterns of relations between synsets and word lemmas. In summary, this text holds potentially novel conceptualizations about, and tools and results in analyzing, synthesizing and syntax highlighting the TP language. △ Less

Submitted 3 July, 2018; v1 submitted 26 December, 2017; originally announced December 2017.

Comments: Python and Vim scripts in this repository: https://github.com/ttm/prv/

arXiv:1712.06933 [pdf, ps, other]

An anthropological account of the Vim text editor: features and tweaks after 10 years of usage

Authors: Renato Fabbri

Abstract: The Vim text editor is very rich in capabilities and thus complex. This article is a description of Vim and a set of considerations about its usage and design. It results from more than ten years of experience in using Vim for writing and editing various types of documents, e.g. Python, C++, JavaScript, ChucK programs; \LaTeX, Markdown, HTML, RDF, Make and other markup files; % TTM binary files. I… ▽ More The Vim text editor is very rich in capabilities and thus complex. This article is a description of Vim and a set of considerations about its usage and design. It results from more than ten years of experience in using Vim for writing and editing various types of documents, e.g. Python, C++, JavaScript, ChucK programs; \LaTeX, Markdown, HTML, RDF, Make and other markup files; % TTM binary files. It is commonplace, in the Vim users and developers communities, to say that it takes about ten years to master (or start mastering) this text editor, and I find that other experienced users have a different view of Vim and that they use a different set of features. Therefore, this document exposes my understandings in order to confront my usage with that of other Vim users. Another goal is to make available a reference document with which new users can grasp a sound overview by reading it and the discussions that it might generate. Also, it should be useful for users of any degree of experience, including me, as a compendium of commands, namespaces and tweaks. Upon feedback, and maturing of my Vim usage, this document might be enhanced and expanded. △ Less

Submitted 18 December, 2017; originally announced December 2017.

Comments: Scripts and other files are in this repository: https://github.com/ttm/vim

arXiv:1711.04612 [pdf, other]

The Algorithmic-Autoregulation (AA) Methodology and Software: a collective focus on self-transparency

Authors: Renato Fabbri

Abstract: There are numerous efforts to achieve a lightweight and systematic account of what is done by a group and its individuals. The Algorithmic-Autoregulation (AA) is a special case, in which a technical community embraced the challenge of registering their own dedication for sharing processes, self-transparency, and documenting the efforts. AA is used since June/2011 by dozens of researchers and softw… ▽ More There are numerous efforts to achieve a lightweight and systematic account of what is done by a group and its individuals. The Algorithmic-Autoregulation (AA) is a special case, in which a technical community embraced the challenge of registering their own dedication for sharing processes, self-transparency, and documenting the efforts. AA is used since June/2011 by dozens of researchers and software developers, with the support of different software gadgets and for distinct tasks. This article describes these implementations and statistics of their usage including expected natural properties and ontological formalisms which eases comparative analysis and furthers integration. △ Less

Submitted 26 October, 2017; originally announced November 2017.

Comments: Scripts and data in https://github.com/ttm/ensaaio

Report number: ISSN 2527-2357, ISBN 978-85-5676-019-7

Journal ref: Anais do XX ENMC - Encontro Nacional de Modelagem Computacional e VIII ECTM - Encontro de Ciências e Tecnologia de Materiais, Nova Friburgo, RJ - 16 a 19 Outubro 2017

arXiv:1711.04609 [pdf, other]

Text Mining Descriptions Of Dreams: aesthetic and clinical efforts

Authors: Renato Fabbri, Fabiane M. Borges

Abstract: Dreams are highly valued in both Freudian psychoanalysis and less conservative clinical traditions. Text mining enables the extraction of meaning from writings in powerful and unexpected ways. In this work, we report methods, uses and results obtained by mining descriptions of dreams. The texts were collected as part of a course in Schizoanalysis (Clinical Psychology) from dozens of participants.… ▽ More Dreams are highly valued in both Freudian psychoanalysis and less conservative clinical traditions. Text mining enables the extraction of meaning from writings in powerful and unexpected ways. In this work, we report methods, uses and results obtained by mining descriptions of dreams. The texts were collected as part of a course in Schizoanalysis (Clinical Psychology) from dozens of participants. They were subsequently mined using various techniques for the achievement of poems and summaries, which were then used in clinical sessions by means of music and declamation. The results were found aesthetically appealing and effective to engage the audience. The expansion of the corpus, mining methods and strategies for using the derivatives for art and therapy are considered for future work. △ Less

Submitted 26 October, 2017; originally announced November 2017.

Comments: Scripts and corpus in https://github.com/ttm/sonhos, Anais do XX ENMC - Encontro Nacional de Modelagem Computacional e VIII ECTM - Encontro de Ciências e Tecnologia de Materiais, Nova Friburgo, RJ - 16 a 19 Outubro 2017

Report number: ISSN 2527-2357, ISBN 978-85-5676-019-7

arXiv:1710.09954 [pdf, other]

Audiovisual Analytics Vocabulary and Ontology (AAVO): initial core and example expansion

Authors: Renato Fabbri, Maria Cristina Ferreira de Oliveira

Abstract: Visual Analytics might be defined as data mining assisted by interactive visual interfaces. The field has been receiving prominent consideration by researchers, developers and the industry. The literature, however, is complex because it involves multiple fields of knowledge and is considerably recent. In this article we describe an initial tentative organization of the knowledge in the field as an… ▽ More Visual Analytics might be defined as data mining assisted by interactive visual interfaces. The field has been receiving prominent consideration by researchers, developers and the industry. The literature, however, is complex because it involves multiple fields of knowledge and is considerably recent. In this article we describe an initial tentative organization of the knowledge in the field as an OWL ontology and a SKOS vocabulary. This effort might be useful in many ways that include conceptual considerations and software implementations. Within the results and discussions, we expose a core and an example expansion of the conceptualization, and incorporate design issues that enhance the expressive power of the abstraction. △ Less

Submitted 26 October, 2017; originally announced October 2017.

Comments: Scripts in https://github.com/ttm/aavo/

Report number: ISSN 2527-2357, ISBN 978-85-5676-019-7

Journal ref: Anais do XX ENMC - Encontro Nacional de Modelagem Computacional e VIII ECTM - Encontro de Ciências e Tecnologia de Materiais, Nova Friburgo, RJ - 16 a 19 Outubro 2017

arXiv:1710.09952 [pdf, other]

Enhancements of linked data expressiveness for ontologies

Authors: Renato Fabbri

Abstract: The semantic web has received many contributions of researchers as ontologies which, in this context, i.e. within RDF linked data, are formalized conceptualizations that might use different protocols, such as RDFS, OWL DL and OWL FULL. In this article, we describe new expressive techniques which were found necessary after elaborating dozens of OWL ontologies for the scientific academy, the State a… ▽ More The semantic web has received many contributions of researchers as ontologies which, in this context, i.e. within RDF linked data, are formalized conceptualizations that might use different protocols, such as RDFS, OWL DL and OWL FULL. In this article, we describe new expressive techniques which were found necessary after elaborating dozens of OWL ontologies for the scientific academy, the State and the civil society. They consist in: 1) stating possible uses a property might have without incurring into axioms or restrictions; 2) assigning a level of priority for an element (class, property, triple); 3) correct depiction in diagrams of relations between classes, between individuals which are imperative, and between individuals which are optional; 4) a convenient association between OWL classes and SKOS concepts. We propose specific rules to accomplish these enhancements and exemplify both its use and the difficulties that arise because these techniques are currently not established as standards to the ontology designer. △ Less

Submitted 26 October, 2017; originally announced October 2017.

Report number: ISSN 2527-2357, ISBN 978-85-5676-019-7

Journal ref: Anais do XX ENMC - Encontro Nacional de Modelagem Computacional e VIII ECTM - Encontro de Ciências e Tecnologia de Materiais, Nova Friburgo, RJ - 16 a 19 Outubro 2017

arXiv:1710.09233 [pdf, other]

A Simple Text Analytics Model To Assist Literary Criticism: comparative approach and example on James Joyce against Shakespeare and the Bible

Authors: Renato Fabbri, Luis Henrique Garcia

Abstract: Literary analysis, criticism or studies is a largely valued field with dedicated journals and researchers which remains mostly within the humanities scope. Text analytics is the computer-aided process of deriving information from texts. In this article we describe a simple and generic model for performing literary analysis using text analytics. The method relies on statistical measures of: 1) toke… ▽ More Literary analysis, criticism or studies is a largely valued field with dedicated journals and researchers which remains mostly within the humanities scope. Text analytics is the computer-aided process of deriving information from texts. In this article we describe a simple and generic model for performing literary analysis using text analytics. The method relies on statistical measures of: 1) token and sentence sizes and 2) Wordnet synset features. These measures are then used in Principal Component Analysis where the texts to be analyzed are observed against Shakespeare and the Bible, regarded as reference literature. The model is validated by analyzing selected works from James Joyce (1882-1941), one of the most important writers of the 20th century. We discuss the consistency of this approach, the reasons why we did not use other techniques (e.g. part-of-speech tagging) and the ways by which the analysis model might be adapted and enhanced. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: Scripts and corpus in https://github.com/ttm/joyce

Report number: ISSN 2527-2357, ISBN 978-85-5676-019-7

Journal ref: Anais do XX ENMC - Encontro Nacional de Modelagem Computacional e VIII ECTM - Encontro de Ciências e Tecnologia de Materiais, Nova Friburgo, RJ - 16 a 19 Outubro 2017

arXiv:1707.06226 [pdf, other]

The Role of Conversation Context for Sarcasm Detection in Online Interactions

Authors: Debanjan Ghosh, Alexander Richard Fabbri, Smaranda Muresan

Abstract: Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker's sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sar… ▽ More Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker's sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: SIGDial 2017

arXiv:1707.03946 [pdf, other]

The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning

Authors: Anil Usumezbas, Ricardo Fabbri, Benjamin Kimia

Abstract: The three-dimensional reconstruction of scenes from multiple views has made impressive strides in recent years, chiefly by methods correlating isolated feature points, intensities, or curvilinear structure. In the general setting, i.e., without requiring controlled acquisition, limited number of objects, abundant patterns on objects, or object curves to follow particular models, the majority of th… ▽ More The three-dimensional reconstruction of scenes from multiple views has made impressive strides in recent years, chiefly by methods correlating isolated feature points, intensities, or curvilinear structure. In the general setting, i.e., without requiring controlled acquisition, limited number of objects, abundant patterns on objects, or object curves to follow particular models, the majority of these methods produce unorganized point clouds, meshes, or voxel representations of the reconstructed scene, with some exceptions producing 3D drawings as networks of curves. Many applications, e.g., robotics, urban planning, industrial design, and hard surface modeling, however, require structured representations which make explicit 3D curves, surfaces, and their spatial relationships. Reconstructing surface representations can now be constrained by the 3D drawing acting like a scaffold to hang on the computed representations, leading to increased robustness and quality of reconstruction. This paper presents one way of completing such 3D drawings with surface reconstructions, by exploring occlusion reasoning through lofting algorithms. △ Less

Submitted 12 July, 2017; originally announced July 2017.

Comments: CVPR 2017 expanded version with improvements over camera ready, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2017

arXiv:1609.05561 [pdf, other]

doi 10.1007/978-3-319-46493-0_5

From Multiview Image Curves to 3D Drawings

Authors: Anil Usumezbas, Ricardo Fabbri, Benjamin B. Kimia

Abstract: Reconstructing 3D scenes from multiple views has made impressive strides in recent years, chiefly by correlating isolated feature points, intensity patterns, or curvilinear structures. In the general setting - without controlled acquisition, abundant texture, curves and surfaces following specific models or limiting scene complexity - most methods produce unorganized point clouds, meshes, or voxel… ▽ More Reconstructing 3D scenes from multiple views has made impressive strides in recent years, chiefly by correlating isolated feature points, intensity patterns, or curvilinear structures. In the general setting - without controlled acquisition, abundant texture, curves and surfaces following specific models or limiting scene complexity - most methods produce unorganized point clouds, meshes, or voxel representations, with some exceptions producing unorganized clouds of 3D curve fragments. Ideally, many applications require structured representations of curves, surfaces and their spatial relationships. This paper presents a step in this direction by formulating an approach that combines 2D image curves into a collection of 3D curves, with topological connectivity between them represented as a 3D graph. This results in a 3D drawing, which is complementary to surface representations in the same sense as a 3D scaffold complements a tent taut over it. We evaluate our results against truth on synthetic and real datasets. △ Less

Submitted 18 September, 2016; originally announced September 2016.

Comments: Expanded ECCV 2016 version with tweaked figures and including an overview of the supplementary material available at multiview-3d-drawing.sourceforge.net

MSC Class: 65D17; 68U05; 68U10; 53A20 ACM Class: I.4.8; I.4.10; I.4.6; I.3.5; J.6

Journal ref: Lecture Notes in Computer Science, 9908, pp 70-87, september 2016

arXiv:1604.08256 [pdf, other]

doi 10.1007/s11263-016-0912-7

Multiview Differential Geometry of Curves

Authors: Ricardo Fabbri, Benjamin Kimia

Abstract: The field of multiple view geometry has seen tremendous progress in reconstruction and calibration due to methods for extracting reliable point features and key developments in projective geometry. Point features, however, are not available in certain applications and result in unstructured point cloud reconstructions. General image curves provide a complementary feature when keypoints are scarce,… ▽ More The field of multiple view geometry has seen tremendous progress in reconstruction and calibration due to methods for extracting reliable point features and key developments in projective geometry. Point features, however, are not available in certain applications and result in unstructured point cloud reconstructions. General image curves provide a complementary feature when keypoints are scarce, and result in 3D curve geometry, but face challenges not addressed by the usual projective geometry of points and algebraic curves. We address these challenges by laying the theoretical foundations of a framework based on the differential geometry of general curves, including stationary curves, occluding contours, and non-rigid curves, aiming at stereo correspondence, camera estimation (including calibration, pose, and multiview epipolar geometry), and 3D reconstruction given measured image curves. By gathering previous results into a cohesive theory, novel results were made possible, yielding three contributions. First we derive the differential geometry of an image curve (tangent, curvature, curvature derivative) from that of the underlying space curve (tangent, curvature, curvature derivative, torsion). Second, we derive the differential geometry of a space curve from that of two corresponding image curves. Third, the differential motion of an image curve is derived from camera motion and the differential geometry and motion of the space curve. The availability of such a theory enables novel curve-based multiview reconstruction and camera estimation systems to augment existing point-based approaches. This theory has been used to reconstruct a "3D curve sketch", to determine camera pose from local curve geometry, and tracking; other developments are underway. △ Less

Submitted 27 April, 2016; originally announced April 2016.

Comments: International Journal of Computer Vision Final Accepted version. International Journal of Computer Vision, 2016. The final publication is available at Springer via http://dx.doi.org/10.1007/s11263-016-0912-7

MSC Class: 53A04; 53A17; 53A20 ACM Class: I.4.8; I.3.5

arXiv:1604.08255 [pdf]

doi 10.5329/RESI.2014.130200

The Algorithmic Autoregulation Software Development Methodology

Authors: Renato Fabbri, Ricardo Fabbri, Vilson Vieira, Daniel Penalva, Danilo Shiga, Marcos Mendonca, Alexandre Negrao, Lucas Zambianchi, Gabriela Thume

Abstract: We present a new self-regulating methodology for coordinating distributed team work called Algorithmic Autoregulation (AA), based on recent social networking concepts and individual merit. Team members take on an egalitarian role, and stay voluntarily logged into so-called AA sessions for part of their time (e.g. 2 hours per day), during which they create periodical logs - short text sentences - t… ▽ More We present a new self-regulating methodology for coordinating distributed team work called Algorithmic Autoregulation (AA), based on recent social networking concepts and individual merit. Team members take on an egalitarian role, and stay voluntarily logged into so-called AA sessions for part of their time (e.g. 2 hours per day), during which they create periodical logs - short text sentences - they wish to share about their activity with the team. These logs are publicly aggregated in a website and are peer-validated after the end of a session, as in code review. A short screencast is ideally recorded at the end of each session to make AA logs more understandable. This methodology has shown to be well-suited for increasing the efficiency of distributed teams working on Global Software Development (GSD), as observed in our reported experience in actual real-world situations. This efficiency boost is mainly achieved through 1) built-in asynchronous on-demand communication in conjunction with documentation of work, products, and processes, and 2) reduced need for central management, meetings or time-consuming reports. Hence, the AA methodology legitimizes and facilitates the activities of a distributed software team. It thus enables other entities to have a solid means to fund these activities, allowing for new and concrete business models to emerge for very distributed software development. AA has been proposed, at its core, as a way of sustaining self-replicating hacker initiatives. These claims are discussed in a real case-study of running a distributed free software hacker team called Lab Macambira. △ Less

Submitted 27 April, 2016; originally announced April 2016.

ACM Class: D.2.9

Journal ref: RESI, v. 13, n. 2, 2014

arXiv:1505.06640 [pdf, ps, other]

Continuous voting by approval and participation

Authors: Renato Fabbri, Ricardo Poppi

Abstract: In finding the adequate way to prioritize proposals, the Brazilian participation community agreed about the measurement of two indexes, one of approval and one of participation. Both practice and literature is constantly handled by the experts involved, and the formalization of such model and metrics seems novel. Also, the relevance of this report is strengthened by the nearby use of these indexes… ▽ More In finding the adequate way to prioritize proposals, the Brazilian participation community agreed about the measurement of two indexes, one of approval and one of participation. Both practice and literature is constantly handled by the experts involved, and the formalization of such model and metrics seems novel. Also, the relevance of this report is strengthened by the nearby use of these indexes by the Brazilian General Secretariat of the Republic to raise and prioritize proposals about public health care in open processes. △ Less

Submitted 24 April, 2015; originally announced May 2015.

arXiv:1502.01312 [pdf, other]

Vivace: a collaborative live coding language and platform

Authors: Vilson Vieira, Guilherme Lunhani, Geraldo Magela de Castro Rocha Junior, Caleb Mascarenhas Luporini, Daniel Penalva, Ricardo Fabbri, Renato Fabbri

Abstract: Live coding is a performance and creative technique based on improvised and interactive coding. Many recent endeavors have focused in live coding both because of aesthetics and as a way to alleviate performance drawbacks when the musical instrument is a computer. This paper describes the principles and the design of Vivace, a live coding language and environment built with Web technologies to be e… ▽ More Live coding is a performance and creative technique based on improvised and interactive coding. Many recent endeavors have focused in live coding both because of aesthetics and as a way to alleviate performance drawbacks when the musical instrument is a computer. This paper describes the principles and the design of Vivace, a live coding language and environment built with Web technologies to be executed on web browsers. The approach is compelling by 1) allowing many performers to code simultaneously, 2) the synthesis of audio and video, 3) a very simple syntax, 4) being a multiplatform software. We also strive to contextualize Vivace by means of historical and usage summaries including a live coding sub-genre. △ Less

Submitted 30 October, 2017; v1 submitted 13 January, 2015; originally announced February 2015.

Report number: ISSN 2175-6759

Journal ref: Proceedings of the 16th Brazilian Symposium on Computer Music, SBCM 2017

arXiv:1501.02662 [pdf, other]

Social Participation Ontology: community documentation, enhancements and use examples

Authors: Renato Fabbri, Henrique Parra Parra Filho, Rodrigo Bandeira de Luna, Ricardo Augusto Poppi Martins, Flor Karina Mamani Amanqui, Dilvan de Abreu Moreira, Osvaldo Novais de Oliveira Junior

Abstract: Participatory democracy advances in virtually all governments and especially in South America which exhibits a mixed culture and social predisposition. This article presents the "Social Participation Ontology" (OPS from the Brazilian name \emph{Ontologia de Participação Social}) implemented in compliance with the Web Ontology Language standard (OWL) for fostering social participation, specially in… ▽ More Participatory democracy advances in virtually all governments and especially in South America which exhibits a mixed culture and social predisposition. This article presents the "Social Participation Ontology" (OPS from the Brazilian name \emph{Ontologia de Participação Social}) implemented in compliance with the Web Ontology Language standard (OWL) for fostering social participation, specially in virtual platforms. The entities and links of OPS were defined based on an extensive collaboration of specialists. It is shown that OPS is instrumental for information retrieval from the contents of the portal, both in terms of the actors (at various levels) as well as mechanisms and activities. Significantly, OPS is linked to other OWL ontologies as an upper ontology and via FOAF and BFO as higher upper ontologies, which yields sound organization and access of knowledge and data. In order to illustrate the usefulness of OPS, we present results on ontological expansion and integration with other ontologies and data. Ongoing work involves further adoption of OPS by the official Brazilian federal portal for social participation and NGO s, and further linkage to other ontologies for social participation. △ Less

Submitted 30 October, 2017; v1 submitted 12 January, 2015; originally announced January 2015.

Comments: See ancillary for table of terms, OPS code and figures. Further information is at https://github.com/ttm/ops

arXiv:1412.7311 [pdf, other]

Versinus: a visualization method for graphs in evolution

Authors: Renato Fabbri

Abstract: This article presents a novel visualization approach for dynamic graphs, the versinus method, specially useful for real world networks exhibiting free-scale properties. With a simple and fixed layout, and a small set of visual markups, the method has been useful for understanding network dynamics. Local community often suggests that it be reported, which motivated this article. Online resources de… ▽ More This article presents a novel visualization approach for dynamic graphs, the versinus method, specially useful for real world networks exhibiting free-scale properties. With a simple and fixed layout, and a small set of visual markups, the method has been useful for understanding network dynamics. Local community often suggests that it be reported, which motivated this article. Online resources deliver videos and computer scripts for rendering new animations. This article has a concise description of the method. △ Less

Submitted 23 December, 2014; originally announced December 2014.

Comments: article written by request of research colleagues that appreciated these visualizations. arXiv admin note: text overlap with arXiv:1310.7769

arXiv:1412.7309 [pdf, other]

A connective differentiation of textual production in interaction networks

Authors: Renato Fabbri

Abstract: This paper explores textual production in interaction networks, with special emphasis on its relation to topological measures. Four email lists were selected, in which measures were taken from the texts participants wrote. Peripheral, intermediary and hub sectors of these networks were observed to have discrepant linguistic elaborations. For completeness of exposition, correlation of textual and t… ▽ More This paper explores textual production in interaction networks, with special emphasis on its relation to topological measures. Four email lists were selected, in which measures were taken from the texts participants wrote. Peripheral, intermediary and hub sectors of these networks were observed to have discrepant linguistic elaborations. For completeness of exposition, correlation of textual and topological measures were observed for the entire network and for each connective sector. The formation of principal components is used for further insights of how measures are related. △ Less

Submitted 23 December, 2014; originally announced December 2014.

Showing 1–50 of 59 results for author: Fabbri, R