Search | arXiv e-print repository

arXiv:2404.17730 [pdf, other]

Bridging the Social & Technical Divide in Augmentative and Alternative Communication (AAC) Applications for Autistic Adults

Authors: Lara J. Martin, Malathy Nagalakshmi

Abstract: Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth i… ▽ More Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth interviews with 12 autistic adults to find the pain points of current AAC and determine what general technological advances they would find helpful. We found that in addition to technological issues, there are many societal issues as well. We found 9 different categories of themes from our interviews: input options, output options, selecting or adapting AAC for a good fit, when to start or swap AAC, benefits (of use), access (to AAC), stumbling blocks for continued use, social concerns, and lack of control. In this paper, we go through these nine categories in depth and then suggest possible guidelines for the NLP community, AAC application makers, and policy makers to improve AAC use for autistic adults. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.01295 [pdf, other]

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

Authors: Yi-Lin Tuan, Xilun Chen, Eric Michael Smith, Louis Martin, Soumya Batra, Asli Celikyilmaz, William Yang Wang, Daniel M. Bikel

Abstract: As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, an… ▽ More As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, and hurting users' mental health. In this work, we propose to balance safety and helpfulness in diverse use cases by controlling both attributes in LLM. We explore training-free and fine-tuning methods that do not require extra human annotations and analyze the challenges of controlling safety and helpfulness in LLMs. Our experiments demonstrate that our method can rewind a learned model and unlock its controllability. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.03661 [pdf]

doi 10.1016/j.iot.2023.100779

Development and evaluation of Artificial Intelligence techniques for IoT data quality assessment and curation

Authors: Laura Martín, Luis Sánchez, Jorge Lanza, Pablo Sotres

Abstract: Nowadays, data is becoming the new fuel for economic wealth and creation of novel and profitable business models. Multitude of technologies are contributing to an abundance of information sources which are already the baseline for multi-millionaire services and applications. Internet of Things (IoT), is probably the most representative one. However, for an economy of data to actually flourish ther… ▽ More Nowadays, data is becoming the new fuel for economic wealth and creation of novel and profitable business models. Multitude of technologies are contributing to an abundance of information sources which are already the baseline for multi-millionaire services and applications. Internet of Things (IoT), is probably the most representative one. However, for an economy of data to actually flourish there are still several critical challenges that have to be overcome. Among them, data quality can become an issue when data come from heterogeneous sources or have different formats, standards and scale. Improving data quality is of utmost importance for any domain since data are the basis for any decision-making system and decisions will not be accurate if they are based on inadequate low-quality data. In this paper we are presenting a solution for assessing several quality dimensions of IoT data streams as they are generated. Additionally, the solution described in the paper actually improves the quality of data streams by curating them through the application of Artificial Intelligence techniques. The approach followed in our work has been to append data quality information as metadata linked to each individual piece of curated data. We have leveraged linked-data principles and integrated the developed AI-based IoT data curation mechanisms within a Data Enrichment Toolchain (DET) that employs the NGSI-LD standard to harmonize and enrich heterogeneous data sources. Furthermore, we have evaluated our design under experimental research conditions, achieving a robust compromise between functionality and overhead. Besides, it demonstrates a stable and scalable performance. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: This work is published in Elsevier Internet of Things. This work was supported by the European Commission CEF Programme by means of the project SALTED under the Action Number 2020-EU-IA-0274 and by the Spanish State Research Agency (AEI) by means of the project SITED under Grant Agreement No. PID2021-125725OB-I00

Journal ref: Internet of Things, Volume 22, July 2023, 100779

arXiv:2403.03648 [pdf, other]

doi 10.3390/s24051695

A Connector for Integrating NGSI-LD Data into Open Data Portals

Authors: Laura Martín, Jorge Lanza, Víctor González, Juan Ramón Santana, Pablo Sotres, Luis Sánchez

Abstract: Nowadays, there are plenty of data sources generating massive amounts of information that, combined with novel data analytics frameworks, are meant to support optimisation in many application domains. Nonetheless, there are still shortcomings in terms of data discoverability, accessibility and interoperability. Open Data portals have emerged as a shift towards openness and discoverability. However… ▽ More Nowadays, there are plenty of data sources generating massive amounts of information that, combined with novel data analytics frameworks, are meant to support optimisation in many application domains. Nonetheless, there are still shortcomings in terms of data discoverability, accessibility and interoperability. Open Data portals have emerged as a shift towards openness and discoverability. However, they do not impose any condition to the data itself, just stipulate how datasets have to be described. Alternatively, the NGSI-LD standard pursues harmonisation in terms of data modelling and accessibility. This paper presents a solution that bridges these two domains (i.e., Open Data portals and NGSI-LD-based data) in order to keep benefiting from the structured description of datasets offered by Open Data portals, while ensuring the interoperability provided by the NGSI-LD standard. Our solution aggregates the data into coherent datasets and generate high-quality descriptions, ensuring comprehensiveness, interoperability and accessibility. The proposed solution has been validated through a real-world implementation that exposes IoT data in NGSI-LD format through the European Data Portal (EDP). Moreover, the results from the Metadata Quality Assessment that the EDP implements, show that the datasets' descriptions generated achieve excellent ranking in terms of the Findability, Accessibility, Interoperability and Reusability (FAIR) data principles. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: This work belongs to the Special Issue Data Engineering in the Internet of Things of MDPI Sensors. This work has been partially supported by the project SALTED from the European Union's Connecting Europe Facility program under Action Number 2020-EU-IA-0274, and by the project SITED under Grant Agreement No. PID2021-125725OB-I00 funded by MCIN/AEI/10.13039/501100011033 and the European Union FEDER

Journal ref: Sensors 2024, 24, 1695

arXiv:2402.07462 [pdf]

A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse?

Authors: Nathan I. N. Henry, Mangor Pedersen, Matt Williams, Jamin L. B. Martin, Liesje Donkin

Abstract: The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate… ▽ More The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate the behavioral patterns of AI. Behavioral hormesis is a phenomenon where low frequencies of a behavior have beneficial effects, while high frequencies are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the hormetic limits of repeatable behaviors. We demonstrate how HALO can solve the 'paperclip maximizer' scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of 'values' based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. This positions HALO as a promising solution for the value-loading problem, which involves embedding human-aligned values into an AI system, and the weak-to-strong generalization problem, which explores whether weak models can supervise stronger models as they become more intelligent. Hence, HALO opens several research avenues that may lead to the development of a computational value system that allows an AI algorithm to learn whether the decisions it makes are right or wrong. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 24 pages, 7 figures

MSC Class: 68T01; 68T37; 68T42 ACM Class: I.2.0; I.2.8; I.2.11

arXiv:2401.16212 [pdf, other]

Better Call GPT, Comparing Large Language Models Against Lawyers

Authors: Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, Rivindu Perera

Abstract: This paper presents a groundbreaking comparison between Large Language Models and traditional legal contract reviewers, Junior Lawyers and Legal Process Outsourcers. We dissect whether LLMs can outperform humans in accuracy, speed, and cost efficiency during contract review. Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match o… ▽ More This paper presents a groundbreaking comparison between Large Language Models and traditional legal contract reviewers, Junior Lawyers and Legal Process Outsourcers. We dissect whether LLMs can outperform humans in accuracy, speed, and cost efficiency during contract review. Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods. These results are not just statistics, they signal a seismic shift in legal practice. LLMs stand poised to disrupt the legal industry, enhancing accessibility and efficiency of legal services. Our research asserts that the era of LLM dominance in legal contract review is upon us, challenging the status quo and calling for a reimagined future of legal workflows. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 16 pages

arXiv:2312.09727 [pdf, other]

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

Authors: Hendrik Laux, Emil Mededovic, Ahmed Hallawa, Lukas Martin, Arne Peine, Anke Schmeink

Abstract: This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model. Moving away from the resource-intensive trends prevalent in recent literature, our method distills knowledge from a trained Conformer-based ASR model, achieving competitive performance on standard VSR benchma… ▽ More This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model. Moving away from the resource-intensive trends prevalent in recent literature, our method distills knowledge from a trained Conformer-based ASR model, achieving competitive performance on standard VSR benchmarks with significantly less resource utilization. Using unlabeled audio-visual data only, our baseline model achieves a word error rate (WER) of 47.4% and 54.7% on the LRS2 and LRS3 test benchmarks, respectively. After fine-tuning the model with limited labeled data, the word error rate reduces to 35% (LRS2) and 45.7% (LRS3). Our model can be trained on a single consumer-grade GPU within a few days and is capable of performing real-time end-to-end VSR on dated hardware, suggesting a path towards more accessible and resource-efficient VSR methodologies. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted for publication at ICASSP 2024

arXiv:2311.08834 [pdf, ps, other]

A* search algorithm for an optimal investment problem in vehicle-sharing systems

Authors: Ba Luat Le, Layla Martin, Emrah Demir, Duc Minh Vu

Abstract: We study an optimal investment problem that arises in the context of the vehicle-sharing system. Given a set of locations to build stations, we need to determine i) the sequence of stations to be built and the number of vehicles to acquire in order to obtain the target state where all stations are built, and ii) the number of vehicles to acquire and their allocation in order to maximize the total… ▽ More We study an optimal investment problem that arises in the context of the vehicle-sharing system. Given a set of locations to build stations, we need to determine i) the sequence of stations to be built and the number of vehicles to acquire in order to obtain the target state where all stations are built, and ii) the number of vehicles to acquire and their allocation in order to maximize the total profit returned by operating the system when some or all stations are open. The profitability associated with operating open stations, measured over a specific time period, is represented as a linear optimization problem applied to a collection of open stations. With operating capital, the owner of the system can open new stations. This property introduces a set-dependent aspect to the duration required for opening a new station, and the optimal investment problem can be viewed as a variant of the Traveling Salesman Problem (TSP) with set-dependent cost. We propose an A* search algorithm to address this particular variant of the TSP. Computational experiments highlight the benefits of the proposed algorithm in comparison to the widely recognized Dijkstra algorithm and propose future research to explore new possibilities and applications for both exact and approximate A* algorithms. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Full version of the conference paper which is accepted to be appear in the proceeding of the The 12th International Conference on Computational Data and Social Networks - SCONET2023

arXiv:2309.16039 [pdf, other]

Effective Long-Context Scaling of Foundation Models

Authors: Wenhan Xiong, **gyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

Abstract: We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchm… ▽ More We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchmarks, our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2. Notably, with a cost-effective instruction tuning procedure that does not require human-annotated long instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k's overall performance on a suite of long-context tasks. Alongside these results, we provide an in-depth analysis on the individual components of our method. We delve into Llama's position encodings and discuss its limitation in modeling long dependencies. We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences. △ Less

Submitted 13 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2308.12950 [pdf, other]

Code Llama: Open Foundation Models for Code

Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, **gyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom , et al. (1 additional authors not shown)

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama… ▽ More We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B, 34B and 70B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B, 13B and 70B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 67% and 65% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use. △ Less

Submitted 31 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.07540 [pdf, other]

doi 10.1609/aiide.v19i1.27534

CALYPSO: LLMs as Dungeon Masters' Assistants

Authors: Andrew Zhu, Lara J. Martin, Andrew Head, Chris Callison-Burch

Abstract: The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of hum… ▽ More The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of human cognition, making the task tiring and unapproachable to new players. Large language models (LLMs) like GPT-3 and ChatGPT have shown remarkable abilities to generate coherent natural language text. In this paper, we conduct a formative evaluation with DMs to establish the use cases of LLMs in D&D and tabletop gaming generally. We introduce CALYPSO, a system of LLM-powered interfaces that support DMs with information and inspiration specific to their own scenario. CALYPSO distills game context into bite-sized prose and helps brainstorm ideas without distracting the DM from the game. When given access to CALYPSO, DMs reported that it generated high-fidelity text suitable for direct presentation to players, and low-fidelity ideas that the DM could develop further while maintaining their creative agency. We see CALYPSO as exemplifying a paradigm of AI-augmented tools that provide synchronous creative assistance within established game worlds, and tabletop gaming more broadly. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures. AIIDE 2023

Journal ref: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2023

arXiv:2307.09288 [pdf, other]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. △ Less

Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.08896 [pdf, other]

Multilingual End to End Entity Linking

Authors: Mikhail Plekhanov, Nora Kassner, Kashyap Popat, Louis Martin, Simone Merello, Borislav Kozlovskii, Frédéric A. Dreyer, Nicola Cancedda

Abstract: Entity Linking is one of the most common Natural Language Processing tasks in practical applications, but so far efficient end-to-end solutions with multilingual coverage have been lacking, leading to complex model stacks. To fill this gap, we release and open source BELA, the first fully end-to-end multilingual entity linking model that efficiently detects and links entities in texts in any of 97… ▽ More Entity Linking is one of the most common Natural Language Processing tasks in practical applications, but so far efficient end-to-end solutions with multilingual coverage have been lacking, leading to complex model stacks. To fill this gap, we release and open source BELA, the first fully end-to-end multilingual entity linking model that efficiently detects and links entities in texts in any of 97 languages. We provide here a detailed description of the model and report BELA's performance on four entity linking datasets covering high- and low-resource languages. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.12027 [pdf, other]

Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Ty** and Polar Box Embeddings

Authors: Mattia Atzeni, Mikhail Plekhanov, Frédéric A. Dreyer, Nora Kassner, Simone Merello, Louis Martin, Nicola Cancedda

Abstract: Entity linking methods based on dense retrieval are an efficient and widely used solution in large-scale applications, but they fall short of the performance of generative models, as they are sensitive to the structure of the embedding space. In order to address this issue, this paper introduces DUCK, an approach to infusing structural information in the space of entity representations, using prio… ▽ More Entity linking methods based on dense retrieval are an efficient and widely used solution in large-scale applications, but they fall short of the performance of generative models, as they are sensitive to the structure of the embedding space. In order to address this issue, this paper introduces DUCK, an approach to infusing structural information in the space of entity representations, using prior knowledge of entity types. Inspired by duck ty** in programming languages, we propose to define the type of an entity based on the relations that it has with other entities in a knowledge graph. Then, porting the concept of box embeddings to spherical polar coordinates, we propose to represent relations as boxes on the hypersphere. We optimize the model to cluster entities of similar type by placing them inside the boxes corresponding to their relations. Our experiments show that our method sets new state-of-the-art results on standard entity-disambiguation benchmarks, it improves the performance of the model by up to 7.9 F1 points, outperforms other type-aware approaches, and matches the results of generative models with 18 times more parameters. △ Less

Submitted 20 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Accepted at EMNLP 2023

arXiv:2305.01528 [pdf, other]

doi 10.18653/v1/2023.acl-long.229

FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information

Authors: Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara J. Martin, Chris Callison-Burch

Abstract: Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created… ▽ More Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created and was not a true gold standard game state. We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D&D gameplay on Discord with true game state info. We recorded game play sessions of players who used the Avrae bot, which was developed to aid people in playing D&D online, capturing language, game commands and underlying game state information. We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information, improving both automated metrics and human judgments of quality. Additionally, we show that LLMs can generate executable Avrae commands, particularly after finetuning. △ Less

Submitted 25 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 21 pages, 2 figures. Accepted at ACL 2023

Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4171-4193

arXiv:2303.14589 [pdf, other]

SASS: Data and Methods for Subject Aware Sentence Simplification

Authors: Brad Windsor, Luke Martin, Anand Tyagi

Abstract: Sentence simplification tends to focus on the generic simplification of sentences by making them more readable and easier to understand. This paper provides a dataset aimed at training models that perform subject aware sentence simplifications rather than simplifying sentences as a whole. We also test models on that dataset which are inspired by model architecture used in abstractive summarization… ▽ More Sentence simplification tends to focus on the generic simplification of sentences by making them more readable and easier to understand. This paper provides a dataset aimed at training models that perform subject aware sentence simplifications rather than simplifying sentences as a whole. We also test models on that dataset which are inspired by model architecture used in abstractive summarization. We hand generated portions of the data and augment the dataset by further manipulating those hand written simplifications. Our results show that data-augmentation, data-masking, and model architecture choices used in summarization provide a solid baseline for comparison on subject aware simplification. △ Less

Submitted 25 March, 2023; originally announced March 2023.

arXiv:2303.00767 [pdf, other]

A Feasible Hybrid Quantum-Assisted Digital Signature for Arbitrary Message Length

Authors: Marta Irene García Cid, Laura Ortiz Martín, David Domingo Martín, Rodrigo Martín Sánchez-Ledesma, Juan Pedro Brito Méndez, Vicente Martín Ayuso

Abstract: Currently used digital signatures based on asymmetric cryptography will be vulnerable to quantum computers running Shor's algorithm. In this work, we propose a new quantum-assisted digital signature protocol based on symmetric keys generated by QKD, that allows signing and verifying messages in a simple way implementing an integration of currently available classical and quantum technologies. The… ▽ More Currently used digital signatures based on asymmetric cryptography will be vulnerable to quantum computers running Shor's algorithm. In this work, we propose a new quantum-assisted digital signature protocol based on symmetric keys generated by QKD, that allows signing and verifying messages in a simple way implementing an integration of currently available classical and quantum technologies. The protocol is described for a three-user scenario composed of one sender and two receivers. In contrast to previous schemes, it is independent of the message length. The security of the protocol has been analyzed, as well as its integrity, authenticity and non-repudiation properties. △ Less

Submitted 1 March, 2023; originally announced March 2023.

arXiv:2302.13048 [pdf, other]

doi 10.18653/v1/2023.acl-demo.1

Human-in-the-Loop Schema Induction

Authors: Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou, Hainiu Xu, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch

Abstract: Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic el… ▽ More Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 10 pages, ACL2023 demo track

arXiv:2301.08104 [pdf, other]

doi 10.1609/icwsm.v17i1.22141

Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community

Authors: Salvatore Giorgi, Ke Zhao, Alexander H. Feng, Lara J. Martin

Abstract: In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and,… ▽ More In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and, thus, the author has two distinct voices presented in the story. In this study, we identify linguistic and narrative features associated with the author as the character or as a narrator. We use these features to answer the following questions: (1) what makes an asshole character and (2) what makes an asshole narrator? We extract both Author-as-Character features (e.g., demographics, narrative event chain, and emotional arc) and Author-as-Narrator features (i.e., the style and emotion of the story as a whole) in order to identify which aspects of the narrative are correlated with the final moral judgment. Our work shows that "assholes" as Characters frame themselves as lacking agency with a more positive personal arc, while "assholes" as Narrators will tell emotional and opinionated stories. △ Less

Submitted 15 March, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

Journal ref: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) 2023, 17(1), 233-244

arXiv:2212.10754 [pdf, other]

doi 10.18653/v1/2023.findings-acl.832

CoRRPUS: Code-based Structured Prompting for Neurosymbolic Story Understanding

Authors: Yijiang River Dong, Lara J. Martin, Chris Callison-Burch

Abstract: Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of… ▽ More Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI Task 2 and Re^3) with minimal hand engineering. We hope that this work can help highlight the importance of symbolic representations and specialized prompting for LLMs as these models require some guidance for performing reasoning tasks properly. △ Less

Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted to Findings of ACL 2023

Journal ref: Findings of ACL 2023, pp. 13152-13168

arXiv:2210.07109 [pdf, other]

doi 10.18653/v1/2022.emnlp-main.637

Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence

Authors: Chris Callison-Burch, Gaurav Singh Tomar, Lara J. Martin, Daphne Ippolito, Suma Bailis, David Reitter

Abstract: AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 9… ▽ More AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 900 games, with a total of 7,000 players, 800,000 dialogue turns, 500,000 dice rolls, and 58 million words. We automatically annotate the data with partial state information about the game play. We train a large language model (LM) to generate the next game turn, conditioning it on different information. The LM can respond as a particular character or as the player who runs the game--i.e., the Dungeon Master (DM). It is trained to produce dialogue that is either in-character (roleplaying in the fictional world) or out-of-character (discussing rules or strategy). We perform a human evaluation to determine what factors make the generated output plausible and interesting. We further perform an automatic evaluation to determine how well the model can predict the game state given the history and examine how well tracking the game state improves its ability to produce plausible conversational output. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022

Journal ref: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9379-9393, Dec. 2022

arXiv:2209.10897 [pdf, other]

Process Modeling and Conformance Checking in Healthcare: A COVID-19 Case Study

Authors: Elisabetta Benevento, Marco Pegoraro, Mattia Antoniazzi, Harry H. Beyel, Viki Peeva, Paul Balfanz, Wil M. P. van der Aalst, Lukas Martin, Gernot Marx

Abstract: The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: develo** a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and a… ▽ More The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: develo** a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and analyzing the adherence of the observed behavior (recorded in the information system of the hospital) to such guidelines. We show that, through conformance checking techniques, it is possible to analyze the care process for COVID-19 patients, highlighting the main deviations from the clinical guidelines. The results provide physicians with useful indications for improving the process and ensuring service quality and patient satisfaction. We share the resulting model as an open-source BPMN file. △ Less

Submitted 23 November, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 12 pages, 2 figures, 3 tables, 15 references

arXiv:2209.06148 [pdf, other]

Entity Tagging: Extracting Entities in Text Without Mention Supervision

Authors: Christina Du, Kashyap Popat, Louis Martin, Fabio Petroni

Abstract: Detection and disambiguation of all entities in text is a crucial task for a wide range of applications. The typical formulation of the problem involves two stages: detect mention boundaries and link all mentions to a knowledge base. For a long time, mention detection has been considered as a necessary step for extracting all entities in a piece of text, even if the information about mention spans… ▽ More Detection and disambiguation of all entities in text is a crucial task for a wide range of applications. The typical formulation of the problem involves two stages: detect mention boundaries and link all mentions to a knowledge base. For a long time, mention detection has been considered as a necessary step for extracting all entities in a piece of text, even if the information about mention spans is ignored by some downstream applications that merely focus on the set of extracted entities. In this paper we show that, in such cases, detection of mention boundaries does not bring any considerable performance gain in extracting entities, and therefore can be skipped. To conduct our analysis, we propose an "Entity Tagging" formulation of the problem, where models are evaluated purely on the set of extracted entities without considering mentions. We compare a state-of-the-art mention-aware entity linking solution against GET, a mention-agnostic sequence-to-sequence model that simply outputs a list of disambiguated entities given an input context. We find that these models achieve comparable performance when trained both on a fully and partially annotated dataset across multiple benchmarks, demonstrating that GET can extract disambiguated entities with strong performance without explicit mention boundaries supervision. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2209.02057 [pdf, other]

Applying Machine Learning to Life Insurance: some knowledge sharing to master it

Authors: Antoine Chancel, Laura Bradier, Antoine Ly, Razvan Ionescu, Laurene Martin, Marguerite Sauce

Abstract: Machine Learning permeates many industries, which brings new source of benefits for companies. However within the life insurance industry, Machine Learning is not widely used in practice as over the past years statistical models have shown their efficiency for risk assessment. Thus insurers may face difficulties to assess the value of the artificial intelligence. Focusing on the modification of th… ▽ More Machine Learning permeates many industries, which brings new source of benefits for companies. However within the life insurance industry, Machine Learning is not widely used in practice as over the past years statistical models have shown their efficiency for risk assessment. Thus insurers may face difficulties to assess the value of the artificial intelligence. Focusing on the modification of the life insurance industry over time highlights the stake of using Machine Learning for insurers and benefits that it can bring by unleashing data value. This paper reviews traditional actuarial methodologies for survival modeling and extends them with Machine Learning techniques. It points out differences with regular machine learning models and emphasizes importance of specific implementations to face censored data with machine learning models family. In complement to this article, a Python library has been developed. Different open-source Machine Learning algorithms have been adjusted to adapt the specificities of life insurance data, namely censoring and truncation. Such models can be easily applied from this SCOR library to accurately model life insurance risks. △ Less

Submitted 27 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2207.10710 [pdf, other]

doi 10.1103/PhysRevC.107.014321

Interpretable Boosted Decision Tree Analysis for the Majorana Demonstrator

Authors: I. J. Arnquist, F. T. Avignone III, A. S. Barabash, C. J. Barton, K. H. Bhimani, E. Blalock, B. Bos, M. Busch, M. Buuck, T. S. Caldwell, Y -D. Chan, C. D. Christofferson, P. -H. Chu, M. L. Clark, C. Cuesta, J. A. Detwiler, Yu. Efremenko, S. R. Elliott, G. K. Giovanetti, M. P. Green, J. Gruszko, I. S. Guinn, V. E. Guiseppe, C. R. Haufe, R. Henning , et al. (30 additional authors not shown)

Abstract: The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logi… ▽ More The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logic, allowing us to learn from the machine to feedback to the traditional analysis. In this work, we have presented the first machine learning analysis of the data from the Majorana Demonstrator; this is also the first interpretable machine learning analysis of any germanium detector experiment. Two gradient boosted decision tree models are trained to learn from the data, and a game-theory-based model interpretability study is conducted to understand the origin of the classification power. By learning from data, this analysis recognizes the correlations among reconstruction parameters to further enhance the background rejection performance. By learning from the machine, this analysis reveals the importance of new background categories to reciprocally benefit the standard Majorana analysis. This model is highly compatible with next-generation germanium detector experiments like LEGEND since it can be simultaneously trained on a large number of detectors. △ Less

Submitted 15 February, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 13 pages, 9 figures

arXiv:2203.01893 [pdf, other]

A Transdisciplinary Approach for Generating Synthetic but Realistic Domestic Sex Trafficking Networks

Authors: Daniel Kosmas, Christina Melander, Emily Singerhouse, Thomas C. Sharkey, Kayse Lee Maass, Kelle Barrick, Lauren Martin

Abstract: One of the major challenges associated with applying operations research (OR) models to disrupting human trafficking networks is the limited amount of reliable data sources readily available for public use, since operations are intentionally hidden to prevent detection, and data from known operations are often incomplete. To help address this data gap, we propose a network generator for domestic s… ▽ More One of the major challenges associated with applying operations research (OR) models to disrupting human trafficking networks is the limited amount of reliable data sources readily available for public use, since operations are intentionally hidden to prevent detection, and data from known operations are often incomplete. To help address this data gap, we propose a network generator for domestic sex trafficking networks by integrating OR concepts and qualitative research. Multiple sources regarding sex trafficking in the upper Midwest of the United States have been triangulated to ensure that networks produced by the generator are realistic, including law enforcement case file analysis, interviews with domain experts, and a survivor-centered advisory group with first-hand knowledge of sex trafficking. The output models the relationships between traffickers, so-called "bottoms", and victims. This generator allows operations researchers to access realistic sex trafficking network structures in a responsible manner that does not disclose identifiable details of the people involved. We demonstrate the use of output networks in exploring policy recommendations from max flow network interdiction with restructuring. To do so, we propose a novel conceptualization of flow as the ability of a trafficker to control their victims. Our results show the importance of understanding how sex traffickers react to disruptions, especially in terms of recruiting new victims. △ Less

Submitted 12 January, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

arXiv:2202.07880 [pdf, other]

$\rm{C {\small IS}}^2$: A Simplified Commonsense Inference Evaluation for Story Prose

Authors: Bryan Li, Lara J. Martin, Chris Callison-Burch

Abstract: Transformers have been showing near-human performance on a variety of tasks, but they are not without their limitations. We discuss the issue of conflating results of transformers that are instructed to do multiple tasks simultaneously. In particular, we focus on the domain of commonsense reasoning within story prose, which we call contextual commonsense inference (CCI). We look at the GLUCOSE (Mo… ▽ More Transformers have been showing near-human performance on a variety of tasks, but they are not without their limitations. We discuss the issue of conflating results of transformers that are instructed to do multiple tasks simultaneously. In particular, we focus on the domain of commonsense reasoning within story prose, which we call contextual commonsense inference (CCI). We look at the GLUCOSE (Mostafazadeh et al. 2020) dataset and task for predicting implicit commonsense inferences between story sentences. Since the GLUCOSE task simultaneously generates sentences and predicts the CCI relation, there is a conflation in the results. Is the model really measuring CCI or is its ability to generate grammatical text carrying the results? In this paper, we introduce the task contextual commonsense inference in sentence selection ($\rm{C {\small IS}}^2$), a simplified task that avoids conflation by eliminating language generation altogether. Our findings emphasize the necessity of future work to disentangle language generation from the desired NLP tasks at hand. △ Less

Submitted 19 October, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Published at the Workshop on Commonsense Representation and Reasoning (CSRR) @ ACL 2022

arXiv:2202.04625 [pdf, other]

Analyzing Medical Data with Process Mining: a COVID-19 Case Study

Authors: Marco Pegoraro, Madhavi Bangalore Shankara Narayana, Elisabetta Benevento, Wil M. P. van der Aalst, Lukas Martin, Gernot Marx

Abstract: The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of even… ▽ More The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of events, effectively allowing to perform process mining analyses. In this paper, we will present some preliminary findings obtained with established process mining techniques in regard of the medical data of patients of the Uniklinik Aachen hospital affected by the recent epidemic of COVID-19. We show that process mining techniques are able to reconstruct a model of the ICU treatments for COVID patients. △ Less

Submitted 25 March, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: 9 pages, 5 figures, 11 references

arXiv:2112.10684 [pdf, other]

Efficient Large Scale Language Modeling with Mixtures of Experts

Authors: Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, **gfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

Abstract: Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we… ▽ More Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use. △ Less

Submitted 26 October, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: EMNLP 2022

arXiv:2112.08593 [pdf, other]

Goal-Directed Story Generation: Augmenting Generative Language Models with Reinforcement Learning

Authors: Amal Alabdulkarim, Winston Li, Lara J. Martin, Mark O. Riedl

Abstract: The advent of large pre-trained generative language models has provided a common framework for AI story generation via sampling the model to create sequences that continue the story. However, sampling alone is insufficient for story generation. In particular, it is hard to direct a language model to create stories to reach a specific goal event. We present two automated techniques grounded in deep… ▽ More The advent of large pre-trained generative language models has provided a common framework for AI story generation via sampling the model to create sequences that continue the story. However, sampling alone is insufficient for story generation. In particular, it is hard to direct a language model to create stories to reach a specific goal event. We present two automated techniques grounded in deep reinforcement learning and reward sha** to control the plot of computer-generated stories. The first utilizes proximal policy optimization to fine-tune an existing transformer-based language model to generate text continuations but also be goal-seeking. The second extracts a knowledge graph from the unfolding story, which is used by a policy network with graph attention to select a candidate continuation generated by a language model. We report on automated metrics pertaining to how often stories achieve a given goal event as well as human participant rankings of coherence and overall story quality compared to baselines and ablations. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: preprint

arXiv:2107.04126 [pdf, other]

Many Objective Bayesian Optimization

Authors: Lucia Asencio Martín, Eduardo C. Garrido-Merchán

Abstract: Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOB… ▽ More Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOBO) is a set of methods that has been successfully applied for the simultaneous optimization of black-boxes. Concretely, BO methods rely on a probabilistic model of the objective functions, typically a Gaussian process. This model generates a predictive distribution of the objectives. However, MOBO methods have problems when the number of objectives in a multi-objective optimization problem are 3 or more, which is the many objective setting. In particular, the BO process is more costly as more objectives are considered, computing the quality of the solution via the hyper-volume is also more costly and, most importantly, we have to evaluate every objective function, wasting expensive computational, economic or other resources. However, as more objectives are involved in the optimization problem, it is highly probable that some of them are redundant and not add information about the problem solution. A measure that represents how similar are GP predictive distributions is proposed. We also propose a many objective Bayesian optimization algorithm that uses this metric to determine whether two objectives are redundant. The algorithm stops evaluating one of them if the similarity is found, saving resources and not hurting the performance of the multi-objective BO algorithm. We show empirical evidence in a set of toy, synthetic, benchmark and real experiments that GPs predictive distributions of the effectiveness of the metric and the algorithm. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2101.08061

arXiv:2104.09006 [pdf, other]

Sentiment Classification in Swahili Language Using Multilingual BERT

Authors: Gati L. Martin, Medard E. Mswahili, Young-Seob Jeong

Abstract: The evolution of the Internet has increased the amount of information that is expressed by people on different platforms. This information can be product reviews, discussions on forums, or social media platforms. Accessibility of these opinions and peoples feelings open the door to opinion mining and sentiment analysis. As language and speech technologies become more advanced, many languages have… ▽ More The evolution of the Internet has increased the amount of information that is expressed by people on different platforms. This information can be product reviews, discussions on forums, or social media platforms. Accessibility of these opinions and peoples feelings open the door to opinion mining and sentiment analysis. As language and speech technologies become more advanced, many languages have been used and the best models have been obtained. However, due to linguistic diversity and lack of datasets, African languages have been left behind. In this study, by using the current state-of-the-art model, multilingual BERT, we perform sentiment classification on Swahili datasets. The data was created by extracting and annotating 8.2k reviews and comments on different social media platforms and the ISEAR emotion dataset. The data were classified as either positive or negative. The model was fine-tuned and achieve the best accuracy of 87.59%. △ Less

Submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to African NLP Workshop, EACL 2021 (non-archival)

arXiv:2104.07560 [pdf, other]

Rethinking Automatic Evaluation in Sentence Simplification

Authors: Thomas Scialom, Louis Martin, Jacopo Staiano, Éric Villemonte de la Clergerie, Benoît Sagot

Abstract: Automatic evaluation remains an open research question in Natural Language Generation. In the context of Sentence Simplification, this is particularly challenging: the task requires by nature to replace complex words with simpler ones that shares the same meaning. This limits the effectiveness of n-gram based metrics like BLEU. Going hand in hand with the recent advances in NLG, new metrics have b… ▽ More Automatic evaluation remains an open research question in Natural Language Generation. In the context of Sentence Simplification, this is particularly challenging: the task requires by nature to replace complex words with simpler ones that shares the same meaning. This limits the effectiveness of n-gram based metrics like BLEU. Going hand in hand with the recent advances in NLG, new metrics have been proposed, such as BERTScore for Machine Translation. In summarization, the QuestEval metric proposes to automatically compare two texts by questioning them. In this paper, we first propose a simple modification of QuestEval allowing it to tackle Sentence Simplification. We then extensively evaluate the correlations w.r.t. human judgement for several metrics including the recent BERTScore and QuestEval, and show that the latter obtain state-of-the-art correlations, outperforming standard metrics like BLEU and SARI. More importantly, we also show that a large part of the correlations are actually spurious for all the metrics. To investigate this phenomenon further, we release a new corpus of evaluated simplifications, this time not generated by systems but instead, written by humans. This allows us to remove the spurious correlations and draw very different conclusions from the original ones, resulting in a better understanding of these metrics. In particular, we raise concerns about very low correlations for most of traditional metrics. Our results show that the only significant measure of the Meaning Preservation is our adaptation of QuestEval. △ Less

Submitted 16 April, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: updated affiliation and link to data

arXiv:2102.05961 [pdf]

Empirical Analysis on Productivity Prediction and Locality for Use Case Points Method

Authors: Mohammad Azzeh, Ali Bou Nassif, Cuauhtemoc Lopez Martin

Abstract: Use Case Points (UCP) method has been around for over two decades. Although, there was a substantial criticism concerning the algebraic construction and factors assessment of UCP, it remains an efficient early size estimation method. Predicting software effort from UCP is still an ever-present challenge. The earlier version of UCP method suggested using productivity as a cost driver, where fixed o… ▽ More Use Case Points (UCP) method has been around for over two decades. Although, there was a substantial criticism concerning the algebraic construction and factors assessment of UCP, it remains an efficient early size estimation method. Predicting software effort from UCP is still an ever-present challenge. The earlier version of UCP method suggested using productivity as a cost driver, where fixed or a few pre-defined productivity ratios have been widely agreed. While this approach was successful when no enough historical data is available, it is no longer acceptable because software projects are different in terms of development aspects. Therefore, it is better to understand the relationship between productivity and other UCP variables. This paper examines the impact of data locality approaches on productivity and effort prediction from multiple UCP variables. The environmental factors are used as partitioning factors to produce local homogeneous data either based on their influential levels or using clustering algorithms. Different machine learning methods, including solo and ensemble methods, are used to construct productivity and effort prediction models based on the local data. The results demonstrate that the prediction models that are created based on local data surpass models that use entire data. Also, the results show that conforming the hypothetical assumption between productivity and environmental factors is not necessarily a requirement for success of locality. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: Paper accepted in Software Quality Journal, Springer

arXiv:2012.02938 [pdf, other]

Cirrus: A Long-range Bi-pattern LiDAR Dataset

Authors: Ze Wang, Sihao Ding, Ying Li, Jonas Fenn, Sohini Roychowdhury, Andreas Wallin, Lane Martin, Scott Ryvola, Guillermo Sapiro, Qiang Qiu

Abstract: In this paper, we introduce Cirrus, a new long-range bi-pattern LiDAR public dataset for autonomous driving tasks such as 3D object detection, critical to highway driving and timely decision making. Our platform is equipped with a high-resolution video camera and a pair of LiDAR sensors with a 250-meter effective range, which is significantly longer than existing public datasets. We record paired… ▽ More In this paper, we introduce Cirrus, a new long-range bi-pattern LiDAR public dataset for autonomous driving tasks such as 3D object detection, critical to highway driving and timely decision making. Our platform is equipped with a high-resolution video camera and a pair of LiDAR sensors with a 250-meter effective range, which is significantly longer than existing public datasets. We record paired point clouds simultaneously using both Gaussian and uniform scanning patterns. Point density varies significantly across such a long range, and different scanning patterns further diversify object representation in LiDAR. In Cirrus, eight categories of objects are exhaustively annotated in the LiDAR point clouds for the entire effective range. To illustrate the kind of studies supported by this new dataset, we introduce LiDAR model adaptation across different ranges, scanning patterns, and sensor devices. Promising results show the great potential of this new dataset to the robotics and computer vision communities. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.02068 [pdf, other]

Exhaustive Entity Recognition for Coptic: Challenges and Solutions

Authors: Amir Zeldes, Lance Martin, Sichang Tu

Abstract: Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations. Inthis paper we present entity recognition for Coptic, the language of Hellenistic era Egypt. Weevaluate NLP appro… ▽ More Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations. Inthis paper we present entity recognition for Coptic, the language of Hellenistic era Egypt. Weevaluate NLP approaches to the task and lay out difficulties in applying them to a low-resource,morphologically complex language. We present solutions for named and non-named nested en-tity recognition and semi-automatic entity linking to Wikipedia, relying on robust dependencyparsing, feature-based CRF models, and hand-crafted knowledge base resources, enabling highaccuracy NER with orders of magnitude less data than those used for high resource languages.The results suggest avenues for research on other languages in similar settings. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: 9 pages, 2 figures, 5 tables. Accepted by The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

MSC Class: 68-06; 68-04

arXiv:2008.03900 [pdf, ps, other]

Wikidata Constraints on MARS (Extended Technical Report)

Authors: David L. Martin, Peter F. Patel-Schneider

Abstract: Wikidata constraints, albeit useful, are represented and processed in an incomplete, ad hoc fashion. Constraint declarations do not fully express their meaning, and thus do not provide a precise, unambiguous basis for constraint specification, or a logical foundation for constraint-checking implementations. In prior work we have proposed a logical framework for Wikidata as a whole, based on multi-… ▽ More Wikidata constraints, albeit useful, are represented and processed in an incomplete, ad hoc fashion. Constraint declarations do not fully express their meaning, and thus do not provide a precise, unambiguous basis for constraint specification, or a logical foundation for constraint-checking implementations. In prior work we have proposed a logical framework for Wikidata as a whole, based on multi-attributed relational structures (MARS) and related logical languages. In this paper we explain how constraints are handled in the proposed framework, and show that nearly all of Wikidata's existing property constraints can be completely characterized in it, in a natural and economical fashion. We also give characterizations for several proposed property constraints, and show that a variety of non-property constraints can be handled in the same framework. △ Less

Submitted 16 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: 22 pages, no figures. V2 includes a title change, revision of the abstract, and a handful of minor changes in the body of the paper and the appendix

arXiv:2007.04725 [pdf, other]

EVO-RL: Evolutionary-Driven Reinforcement Learning

Authors: Ahmed Hallawa, Thorsten Born, Anke Schmeink, Guido Dartmann, Arne Peine, Lukas Martin, Giovanni Iacca, A. E. Eiben, Gerd Ascheid

Abstract: In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, where we distinctly differentiate between purely evolvable (instinctive) behaviour versus purely learnable behaviour. Furthermore, we propose that… ▽ More In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, where we distinctly differentiate between purely evolvable (instinctive) behaviour versus purely learnable behaviour. Furthermore, we propose that this distinction is decided by the evolutionary process, thus allowing evo-RL to be adaptive to different environments. In addition, evo-RL facilitates learning on environments with rewardless states, which makes it more suited for real-world problems with incomplete information. To show that evo-RL leads to state-of-the-art performance, we present the performance of different state-of-the-art reinforcement learning algorithms when operating within evo-RL and compare it with the case when these same algorithms are executed independently. Results show that reinforcement learning algorithms embedded within our evo-RL approach significantly outperform the stand-alone versions of the same RL algorithms on OpenAI Gym control problems with rewardless states constrained by the same computational budget. △ Less

Submitted 10 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: 9 pages, 7 figures

arXiv:2006.02000 [pdf, other]

MultiXNet: Multiclass Multistage Multimodal Motion Prediction

Authors: Nemanja Djuric, Henggang Cui, Zhaoen Su, Shangxuan Wu, Huahua Wang, Fang-Chieh Chou, Luisa San Martin, Song Feng, Rui Hu, Yang Xu, Alyssa Dayan, Sidney Zhang, Brian C. Becker, Gregory P. Meyer, Carlos Vallespi-Gonzalez, Carl K. Wellington

Abstract: One of the critical pieces of the self-driving puzzle is understanding the surroundings of a self-driving vehicle (SDV) and predicting how these surroundings will change in the near future. To address this task we propose MultiXNet, an end-to-end approach for detection and motion prediction based directly on lidar sensor data. This approach builds on prior work by handling multiple classes of traf… ▽ More One of the critical pieces of the self-driving puzzle is understanding the surroundings of a self-driving vehicle (SDV) and predicting how these surroundings will change in the near future. To address this task we propose MultiXNet, an end-to-end approach for detection and motion prediction based directly on lidar sensor data. This approach builds on prior work by handling multiple classes of traffic actors, adding a jointly trained second-stage trajectory refinement step, and producing a multimodal probability distribution over future actor motion that includes both multiple discrete traffic behaviors and calibrated continuous position uncertainties. The method was evaluated on large-scale, real-world data collected by a fleet of SDVs in several cities, with the results indicating that it outperforms existing state-of-the-art approaches. △ Less

Submitted 24 May, 2021; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: Accepted for publication at IEEE Intelligent Vehicles Symposium (IV) 2021

arXiv:2005.00481 [pdf, other]

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

Authors: Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia

Abstract: In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are eva… ▽ More In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are evaluated using datasets that are focused on a single transformation, such as lexical paraphrasing or splitting. This makes it impossible to understand the ability of simplification models in more realistic settings. To alleviate this limitation, this paper introduces ASSET, a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations. Through quantitative and qualitative experiments, we show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task. Furthermore, we motivate the need for develo** better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: Accepted to ACL 2020 (camera-ready version)

arXiv:2005.00352 [pdf, other]

MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases

Authors: Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes, Benoît Sagot

Abstract: Progress in sentence simplification has been hindered by a lack of labeled parallel simplification data, particularly in languages other than English. We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data. MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data ins… ▽ More Progress in sentence simplification has been hindered by a lack of labeled parallel simplification data, particularly in languages other than English. We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data. MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data instead of proper simplification data. These models leverage unsupervised pretraining and controllable generation mechanisms to flexibly adjust attributes such as length and lexical complexity at inference time. We further present a method to mine such paraphrase data in any language from Common Crawl using semantic sentence embeddings, thus removing the need for labeled data. We evaluate our approach on English, French, and Spanish simplification benchmarks and closely match or outperform the previous best supervised results, despite not using any labeled simplification data. We push the state of the art further by incorporating labeled simplification data. △ Less

Submitted 16 April, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

arXiv:1911.03894 [pdf, other]

doi 10.18653/v1/2020.acl-main.645

CamemBERT: a Tasty French Language Model

Authors: Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot

Abstract: Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models --in all languages except English-- very limited. In this paper, we investigate the feasibility of training monolingual Transformer-based lan… ▽ More Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models --in all languages except English-- very limited. In this paper, we investigate the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating our language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. We show that the use of web crawled data is preferable to the use of Wikipedia data. More surprisingly, we show that a relatively small web crawled dataset (4GB) leads to results that are as good as those obtained using larger datasets (130+GB). Our best performing model CamemBERT reaches or improves the state of the art in all four downstream tasks. △ Less

Submitted 21 May, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

Comments: ACL 2020 long paper. Web site: https://camembert-model.fr

Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020, Online

arXiv:1910.12589 [pdf, other]

Forecasting the Success of Television Series using Machine Learning

Authors: Ramya Akula, Zachary Wieselthier, Laura Martin, Ivan Garibay

Abstract: Television is an ever-evolving multi billion dollar industry. The success of a television show in an increasingly technological society is a vast multi-variable formula. The art of success is not just something that happens, but is studied, replicated, and applied. Hollywood can be unpredictable regarding success, as many movies and sitcoms that are hyped up and promise to be a hit end up being bo… ▽ More Television is an ever-evolving multi billion dollar industry. The success of a television show in an increasingly technological society is a vast multi-variable formula. The art of success is not just something that happens, but is studied, replicated, and applied. Hollywood can be unpredictable regarding success, as many movies and sitcoms that are hyped up and promise to be a hit end up being box office failures and complete disappointments. In current studies, linguistic exploration is being performed on the relationship between Television series and target community of viewers. Having a decision support system that can display sound and predictable results would be needed to build confidence in the investment of a new TV series. The models presented in this study use data to study and determine what makes a sitcom successful. In this paper, we use descriptive and predictive modeling techniques to assess the continuing success of television comedies: The Office, Big Bang Theory, Arrested Development, Scrubs, and South Park. The factors that are tested for statistical significance on episode ratings are character presence, director, and writer. These statistics show that while characters are indeed crucial to the shows themselves, the creation and direction of the shows pose implication upon the ratings and therefore the success of the shows. We use machine learning based forecasting models to accurately predict the success of shows. The models represent a baseline to understanding the success of a television show and how producers can increase the success of current television shows or utilize this data in the creation of future shows. Due to the many factors that go into a series, the empirical analysis in this work shows that there is no one-fits-all model to forecast the rating or success of a television show. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: 9 Pages, 10 Figures and 2 Tables

arXiv:1910.02677 [pdf, other]

Controllable Sentence Simplification

Authors: Louis Martin, Benoît Sagot, Éric de la Clergerie, Antoine Bordes

Abstract: Text simplification aims at making a text easier to read and understand by simplifying grammar and structure while kee** the underlying information identical. It is often considered an all-purpose generic task where the same simplification is suitable for all; however multiple audiences can benefit from simplified text in different ways. We adapt a discrete parametrization mechanism that provide… ▽ More Text simplification aims at making a text easier to read and understand by simplifying grammar and structure while kee** the underlying information identical. It is often considered an all-purpose generic task where the same simplification is suitable for all; however multiple audiences can benefit from simplified text in different ways. We adapt a discrete parametrization mechanism that provides explicit control on simplification systems based on Sequence-to-Sequence models. As a result, users can condition the simplifications returned by a model on attributes such as length, amount of paraphrasing, lexical complexity and syntactic complexity. We also show that carefully chosen values of these attributes allow out-of-the-box Sequence-to-Sequence models to outperform their standard counterparts on simplification benchmarks. Our model, which we call ACCESS (as shorthand for AudienCe-CEntric Sentence Simplification), establishes the state of the art at 41.87 SARI on the WikiLarge test set, a +1.42 improvement over the best previously reported score. △ Less

Submitted 20 April, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: Code and models: https://github.com/facebookresearch/access

arXiv:1909.12537 [pdf, other]

Fast shared response model for fMRI data

Authors: Hugo Richard, Lucas Martin, Ana Luısa Pinho, Jonathan Pillow, Bertrand Thirion

Abstract: The shared response model provides a simple but effective framework to analyse fMRI data of subjects exposed to naturalistic stimuli. However when the number of subjects or runs is large, fitting the model requires a large amount of memory and computational power, which limits its use in practice. In this work, we introduce the FastSRM algorithm that relies on an intermediate atlas-based represent… ▽ More The shared response model provides a simple but effective framework to analyse fMRI data of subjects exposed to naturalistic stimuli. However when the number of subjects or runs is large, fitting the model requires a large amount of memory and computational power, which limits its use in practice. In this work, we introduce the FastSRM algorithm that relies on an intermediate atlas-based representation. It provides considerable speed-up in time and memory usage, hence it allows easy and fast large-scale analysis of naturalistic-stimulus fMRI data. Using four different datasets, we show that our method matches the performance of the original SRM algorithm while being about 5x faster and 20x to 40x more memory efficient. Based on this contribution, we use FastSRM to predict age from movie watching data on the CamCAN sample. Besides delivering accurate predictions (mean absolute error of 7.5 years), FastSRM extracts topographic patterns that are predictive of age, demonstrating that brain activity during free perception reflects age. △ Less

Submitted 3 December, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

arXiv:1909.12469 [pdf]

Telescope: an interactive tool for managing large scale analysis from mobile devices

Authors: Jaqueline J. Brito, Thiago Mosqueiro, Jeremy Rotman, Victor Xue, Douglas J. Chapski, Juan De la Hoz, Paulo Matias, Lana Martin, Alex Zelikovsky, Matteo Pellegrinni, Serghei Mangul

Abstract: In today's world of big data, computational analysis has become a key driver of biomedical research. Recent exponential growth in the volume of available omics data has reshaped the landscape of contemporary biology, creating demand for a continuous feedback loop that seamlessly integrates experimental biology techniques and bioinformatics tools. High-performance computational facilities are capab… ▽ More In today's world of big data, computational analysis has become a key driver of biomedical research. Recent exponential growth in the volume of available omics data has reshaped the landscape of contemporary biology, creating demand for a continuous feedback loop that seamlessly integrates experimental biology techniques and bioinformatics tools. High-performance computational facilities are capable of processing considerable volumes of data, yet often lack an easy-to-use interface to guide the user in supervising and adjusting bioinformatics analysis in real-time. Here we report the development of Telescope, a novel interactive tool that interfaces with high-performance computational clusters to deliver an intuitive user interface for controlling and monitoring bioinformatics analyses in real-time. Telescope was designed to natively operate with a simple and straightforward interface using Web 2.0 technology compatible with most modern devices (e.g., tablets and personal smartphones). Telescope provides a modern and elegant solution to integrate computational analyses into the experimental environment of biomedical research. Additionally, it allows biomedical researchers to leverage the power of large computational facilities in a user-friendly manner. Telescope is freely available at https://github.com/Mangul-Lab-USC/telescope. △ Less

Submitted 5 December, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

arXiv:1909.03480 [pdf, other]

doi 10.1609/aaai.v34i05.6232

Story Realization: Expanding Plot Events into Sentences

Authors: Prithviraj Ammanabrolu, Ethan Tien, Wesley Cheung, Zhaochen Luo, William Ma, Lara J. Martin, Mark O. Riedl

Abstract: Neural network based approaches to automated story plot generation attempt to learn how to generate novel plots from a corpus of natural language plot summaries. Prior work has shown that a semantic abstraction of sentences called events improves neural plot generation and and allows one to decompose the problem into: (1) the generation of a sequence of events (event-to-event) and (2) the transfor… ▽ More Neural network based approaches to automated story plot generation attempt to learn how to generate novel plots from a corpus of natural language plot summaries. Prior work has shown that a semantic abstraction of sentences called events improves neural plot generation and and allows one to decompose the problem into: (1) the generation of a sequence of events (event-to-event) and (2) the transformation of these events into natural language sentences (event-to-sentence). However, typical neural language generation approaches to event-to-sentence can ignore the event details and produce grammatically-correct but semantically-unrelated sentences. We present an ensemble-based model that generates natural language guided by events.We provide results---including a human subjects study---for a full end-to-end automated story generation system showing that our method generates more coherent and plausible stories than baseline approaches. △ Less

Submitted 21 November, 2019; v1 submitted 8 September, 2019; originally announced September 2019.

Comments: In proceedings of AAAI 2020

Journal ref: AAAI Conference on Artificial Intelligence (AAAI), vol. 34, no. 5, pp. 7375-7382, Apr. 2020

arXiv:1908.04567 [pdf, other]

EASSE: Easier Automatic Sentence Simplification Evaluation

Authors: Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia

Abstract: We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), word-level accuracy scores for certain simplification transformations, reference-independent quality esti… ▽ More We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems. EASSE provides a single access point to a broad range of evaluation resources: standard automatic metrics for assessing SS outputs (e.g. SARI), word-level accuracy scores for certain simplification transformations, reference-independent quality estimation features (e.g. compression ratio), and standard test data for SS evaluation (e.g. TurkCorpus). Finally, EASSE generates easy-to-visualise reports on the various metrics and features above and on how a particular SS output fares against reference simplifications. Through experiments, we show that these functionalities allow for better comparison and understanding of the performance of SS systems. △ Less

Submitted 13 September, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: EMNLP-IJCNLP 2019 Demo (Camera-ready Version)

arXiv:1901.10746 [pdf, other]

Reference-less Quality Estimation of Text Simplification Systems

Authors: Louis Martin, Samuel Humeau, Pierre-Emmanuel Mazaré, Antoine Bordes, Éric Villemonte de La Clergerie, Benoît Sagot

Abstract: The evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between th… ▽ More The evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between the simplified text and its original version. In this paper, we compare multiple approaches to reference-less quality estimation of sentence-level text simplification systems, based on the dataset used for the QATS 2016 shared task. We distinguish three different dimensions: gram-maticality, meaning preservation and simplicity. We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics. △ Less

Submitted 30 January, 2019; originally announced January 2019.

Journal ref: 1st Workshop on Automatic Text Adaptation (ATA), Nov 2018, Tilburg, Netherlands. https://www.ida.liu.se/~evere22/ATA-18/

arXiv:1809.10736 [pdf, other]

doi 10.24963/ijcai.2019/829

Controllable Neural Story Plot Generation via Reward Sha**

Authors: Pradyumna Tambwekar, Murtaza Dhuliawala, Lara J. Martin, Animesh Mehta, Brent Harrison, Mark O. Riedl

Abstract: Language-modeling--based approaches to story plot generation attempt to construct a plot by sampling from a language model (LM) to predict the next character, word, or sentence to add to the story. LM techniques lack the ability to receive guidance from the user to achieve a specific goal, resulting in stories that don't have a clear sense of progression and lack coherence. We present a reward-sha… ▽ More Language-modeling--based approaches to story plot generation attempt to construct a plot by sampling from a language model (LM) to predict the next character, word, or sentence to add to the story. LM techniques lack the ability to receive guidance from the user to achieve a specific goal, resulting in stories that don't have a clear sense of progression and lack coherence. We present a reward-sha** technique that analyzes a story corpus and produces intermediate rewards that are backpropagated into a pre-trained LM in order to guide the model towards a given goal. Automated evaluations show our technique can create a model that generates story plots which consistently achieve a specified goal. Human-subject studies show that the generated stories have more plausible event ordering than baseline plot generation techniques. △ Less

Submitted 18 January, 2023; v1 submitted 27 September, 2018; originally announced September 2018.

Comments: Pradyumna Tambwekar & Murtaza Dhuliawala contributed equally

Journal ref: In International Joint Conference on Artificial Intelligence (IJCAI), Macau, China, Jul. 2019, pp. 5982-5988

Showing 1–50 of 58 results for author: Martin, L