-
Scalable and Domain-General Abstractive Proposition Segmentation
Authors:
Mohammad Javad Hosseini,
Yang Gao,
Tim Baumgärtner,
Alex Fabrikant,
Reinald Kim Amplayo
Abstract:
Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming t…
▽ More
Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming text into simple, self-contained, well-formed sentences. Several recent works have demonstrated the utility of proposition segmentation with few-shot prompted LLMs for downstream tasks such as retrieval-augmented grounding and fact verification. However, this approach does not scale to large amounts of text and may not always extract all the facts from the input text. In this paper, we first introduce evaluation metrics for the task to measure several dimensions of quality. We then propose a scalable, yet accurate, proposition segmentation model. We model proposition segmentation as a supervised task by training LLMs on existing annotated datasets and show that training yields significantly improved results. We further show that by using the fine-tuned LLMs as teachers for annotating large amounts of multi-domain synthetic distillation data, we can train smaller student models with results similar to the teacher LLMs. We then demonstrate that our technique leads to effective domain generalization, by annotating data in two domains outside the original training data and evaluating on them. Finally, as a key contribution of the paper, we share an easy-to-use API for NLP practitioners to use.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Authors:
Tim Baumgärtner,
Yang Gao,
Dana Alon,
Donald Metzler
Abstract:
Reinforcement Learning from Human Feedback (RLHF) is a popular method for aligning Language Models (LM) with human values and preferences. RLHF requires a large number of preference pairs as training data, which are often used in both the Supervised Fine-Tuning and Reward Model training, and therefore publicly available datasets are commonly used. In this work, we study to what extent a malicious…
▽ More
Reinforcement Learning from Human Feedback (RLHF) is a popular method for aligning Language Models (LM) with human values and preferences. RLHF requires a large number of preference pairs as training data, which are often used in both the Supervised Fine-Tuning and Reward Model training, and therefore publicly available datasets are commonly used. In this work, we study to what extent a malicious actor can manipulate the LMs generations by poisoning the preferences, i.e., injecting poisonous preference pairs into these datasets and the RLHF training process. We propose strategies to build poisonous preference pairs and test their performance by poisoning two widely used preference datasets. Our results show that preference poisoning is highly effective: by injecting a small amount of poisonous data (1-5% of the original dataset), we can effectively manipulate the LM to generate a target entity in a target sentiment (positive or negative). The findings from our experiments also shed light on strategies to defend against the preference poisoning attack.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Towards clinical translation of deep-learning based classification of DSA image sequences for stroke treatment
Authors:
Timo Baumgärtner,
Benjamin J. Mittmann,
Till Malzacher,
Johannes Roßkopf,
Michael Braun,
Bernd Schmitz,
Alfred M. Franz
Abstract:
In the event of stroke, a catheter-guided procedure (thrombectomy) is used to remove blood clots. Feasibility of machine learning based automatic classifications for thrombus detection on digital substraction angiography (DSA) sequences has been demonstrated. It was however not used live in the clinic, yet. We present an open-source tool for automatic thrombus classification and test it on three s…
▽ More
In the event of stroke, a catheter-guided procedure (thrombectomy) is used to remove blood clots. Feasibility of machine learning based automatic classifications for thrombus detection on digital substraction angiography (DSA) sequences has been demonstrated. It was however not used live in the clinic, yet. We present an open-source tool for automatic thrombus classification and test it on three selected clinical cases regarding functionality and classification runtime. With our trained model all large vessel occlusions in the M1 segment were correctly classified. One small remaining M3 thrombus was not detected. Runtime was in the range from 1 to 10 seconds depending on the used hardware. We conclude that our open-source software tool enables clinical staff to classify DSA sequences in (close to) realtime and can be used for further studies in clinics.
△ Less
Submitted 23 May, 2023;
originally announced June 2023.
-
Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
Authors:
Tobias Baumgartner,
Stefanie Klatt
Abstract:
The filming of sporting events projects and flattens the movement of athletes in the world onto a 2D broadcast image. The pixel locations of joints in these images can be detected with high validity. Recovering the actual 3D movement of the limbs (kinematics) of the athletes requires lifting these 2D pixel locations back into a third dimension, implying a certain scene geometry. The well-known lin…
▽ More
The filming of sporting events projects and flattens the movement of athletes in the world onto a 2D broadcast image. The pixel locations of joints in these images can be detected with high validity. Recovering the actual 3D movement of the limbs (kinematics) of the athletes requires lifting these 2D pixel locations back into a third dimension, implying a certain scene geometry. The well-known line markings of sports fields allow for the calibration of the camera and for determining the actual geometry of the scene. Close-up shots of athletes are required to extract detailed kinematics, which in turn obfuscates the pertinent field markers for camera calibration. We suggest partial sports field registration, which determines a set of scene-consistent camera calibrations up to a single degree of freedom. Through joint optimization of 3D pose estimation and camera calibration, we demonstrate the successful extraction of 3D running kinematics on a 400m track. In this work, we combine advances in 2D human pose estimation and camera calibration via partial sports field registration to demonstrate an avenue for collecting valid large-scale kinematic datasets. We generate a synthetic dataset of more than 10k images in Unreal Engine 5 with different viewpoints, running styles, and body types, to show the limitations of existing monocular 3D HPE methods. Synthetic data and code are available at https://github.com/tobibaum/PartialSportsFieldReg_3DHPE.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
Authors:
Haritz Puerto,
Tim Baumgärtner,
Rachneet Sachdeva,
Haishuo Fang,
Hao Zhang,
Sewin Tariverdian,
Kexin Wang,
Iryna Gurevych
Abstract:
The continuous development of Question Answering (QA) datasets has drawn the research community's attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging…
▽ More
The continuous development of Question Answering (QA) datasets has drawn the research community's attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available at http://square.ukp-lab.de.
△ Less
Submitted 17 May, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking
Authors:
Tim Baumgärtner,
Leonardo F. R. Ribeiro,
Nils Reimers,
Iryna Gurevych
Abstract:
Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets. This pipeline covers scenarios like question answering or navigational queries, however, for information-seeking scenarios, users often provide information on whether a document is relevant to their query in form of clicks or explicit feedback. Therefore, i…
▽ More
Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets. This pipeline covers scenarios like question answering or navigational queries, however, for information-seeking scenarios, users often provide information on whether a document is relevant to their query in form of clicks or explicit feedback. Therefore, in this work, we explore how relevance feedback can be directly integrated into neural re-ranking models by adopting few-shot and parameter-efficient learning techniques. Specifically, we introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. Further, we explore Cross-Encoder models that we pre-train using meta-learning and subsequently fine-tune for each query, training only on the feedback documents. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario. Extensive experiments demonstrate that integrating relevance feedback directly in neural re-ranking models improves their performance, and fusing lexical ranking with our best performing neural re-ranker outperforms all other methods by 5.2 nDCG@20.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA
Authors:
Rachneet Sachdeva,
Haritz Puerto,
Tim Baumgärtner,
Sewin Tariverdian,
Hao Zhang,
Kexin Wang,
Hossain Shaikh Saadi,
Leonardo F. R. Ribeiro,
Iryna Gurevych
Abstract:
Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in…
▽ More
Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model's prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de.
△ Less
Submitted 20 October, 2022; v1 submitted 19 August, 2022;
originally announced August 2022.
-
UKP-SQUARE: An Online Platform for Question Answering Research
Authors:
Tim Baumgärtner,
Kexin Wang,
Rachneet Sachdeva,
Max Eichler,
Gregor Geigle,
Clifton Poth,
Hannah Sterz,
Haritz Puerto,
Leonardo F. R. Ribeiro,
Jonas Pfeiffer,
Nils Reimers,
Gözde Gül Şahin,
Iryna Gurevych
Abstract:
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that cons…
▽ More
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that consider a single domain, model or setup, there exists no framework where users can easily explore and compare such pipelines and can extend them according to their needs. To address this issue, we present UKP-SQUARE, an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests. In addition, QA researchers can develop, manage, and share their custom Skills using our microservices that support a wide range of models (Transformers, Adapters, ONNX), datastores and retrieval techniques (e.g., sparse and dense). UKP-SQUARE is available on https://square.ukp-lab.de.
△ Less
Submitted 28 March, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
On the Realization of Compositionality in Neural Networks
Authors:
Joris Baan,
Jana Leible,
Mitja Nikolaus,
David Rau,
Dennis Ulmer,
Tim Baumgärtner,
Dieuwke Hupkes,
Elia Bruni
Abstract:
We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has s…
▽ More
We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has shown to be an effective method for encouraging more compositional solutions (Hupkes et al.,2019). We first confirm that the models with attentive guidance indeed infer more compositional solutions than the baseline, by training them on the lookup table task presented by Liška et al. (2019). We then do an in-depth analysis of the structural differences between the two model types, focusing in particular on the organisation of the parameter space and the hidden layer activations and find noticeable differences in both these aspects. Guided networks focus more on the components of the input rather than the sequence as a whole and develop small functional groups of neurons with specific purposes that use their gates more selectively. Results from parameter heat maps, component swap** and graph analysis also indicate that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons.
△ Less
Submitted 6 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
Authors:
Janosch Haber,
Tim Baumgärtner,
Ece Takmaz,
Lieke Gelderloos,
Elia Bruni,
Raquel Fernández
Abstract:
This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising…
▽ More
This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising both their visual context as well as previously established referring expressions. We provide a detailed description of the task setup and a thorough analysis of the 2,500 dialogues collected. To further illustrate the novel features of the dataset, we propose a baseline model for reference resolution which uses a simple method to take into account shared information accumulated in a reference chain. Our results show that this information is particularly important to resolve later descriptions and underline the need to develop more sophisticated models of common ground in dialogue interaction.
△ Less
Submitted 26 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat
Authors:
Ravi Shekhar,
Aashish Venkatesh,
Tim Baumgärtner,
Elia Bruni,
Barbara Plank,
Raffaella Bernardi,
Raquel Fernández
Abstract:
We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components. As a test-bed, we focus on the GuessWhat?! game, a two-player game where the goal is to identify an object in a complex visual scene by asking a sequence of yes/no questions. Our visually-grounded encoder leverages synergies between guessing and as…
▽ More
We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components. As a test-bed, we focus on the GuessWhat?! game, a two-player game where the goal is to identify an object in a complex visual scene by asking a sequence of yes/no questions. Our visually-grounded encoder leverages synergies between guessing and asking questions, as it is trained jointly using multi-task learning. We further enrich our model via a cooperative learning regime. We show that the introduction of both the joint architecture and cooperative learning lead to accuracy improvements over the baseline system. We compare our approach to an alternative system which extends the baseline with reinforcement learning. Our in-depth analysis shows that the linguistic skills of the two models differ dramatically, despite approaching comparable performance levels. This points at the importance of analyzing the linguistic output of competing systems beyond numeric comparison solely based on task success.
△ Less
Submitted 15 March, 2019; v1 submitted 10 September, 2018;
originally announced September 2018.
-
Ask No More: Deciding when to guess in referential visual dialogue
Authors:
Ravi Shekhar,
Tim Baumgartner,
Aashish Venkatesh,
Elia Bruni,
Raffaella Bernardi,
Raquel Fernandez
Abstract:
Our goal is to explore how the abilities brought in by a dialogue manager can be included in end-to-end visually grounded conversational agents. We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to…
▽ More
Our goal is to explore how the abilities brought in by a dialogue manager can be included in end-to-end visually grounded conversational agents. We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess. Our analyses show that adding a decision making component produces dialogues that are less repetitive and that include fewer unnecessary questions, thus potentially leading to more efficient and less unnatural interactions.
△ Less
Submitted 12 June, 2018; v1 submitted 17 May, 2018;
originally announced May 2018.
-
Deep Architectures for Face Attributes
Authors:
Tobi Baumgartner,
Jack Culpepper
Abstract:
We train a deep convolutional neural network to perform identity classification using a new dataset of public figures annotated with age, gender, ethnicity and emotion labels, and then fine-tune it for attribute classification. An optimal sharing pattern of computational resources within this network is determined by experiment, requiring only 1 G flops to produce all predictions. Rather than fine…
▽ More
We train a deep convolutional neural network to perform identity classification using a new dataset of public figures annotated with age, gender, ethnicity and emotion labels, and then fine-tune it for attribute classification. An optimal sharing pattern of computational resources within this network is determined by experiment, requiring only 1 G flops to produce all predictions. Rather than fine-tune by relearning weights in one additional layer after the penultimate layer of the identity network, we try several different depths for each attribute. We find that prediction of age and emotion is improved by fine-tuning from earlier layers onward, presumably because deeper layers are progressively invariant to non-identity related changes in the input.
△ Less
Submitted 28 September, 2016;
originally announced September 2016.
-
Wiselib: A Generic Algorithm Library for Heterogeneous Sensor Networks
Authors:
Tobias Baumgartner,
Ioannis Chatzigiannakis,
Sandor P. Fekete,
Christos Koninis,
Alexander Kroeller,
Apostolos Pyrgelis
Abstract:
One unfortunate consequence of the success story of wireless sensor networks (WSNs) in separate research communities is an ever-growing gap between theory and practice. Even though there is a increasing number of algorithmic methods for WSNs, the vast majority has never been tried in practice; conversely, many practical challenges are still awaiting efficient algorithmic solutions. The main cause…
▽ More
One unfortunate consequence of the success story of wireless sensor networks (WSNs) in separate research communities is an ever-growing gap between theory and practice. Even though there is a increasing number of algorithmic methods for WSNs, the vast majority has never been tried in practice; conversely, many practical challenges are still awaiting efficient algorithmic solutions. The main cause for this discrepancy is the fact that programming sensor nodes still happens at a very technical level. We remedy the situation by introducing Wiselib, our algorithm library that allows for simple implementations of algorithms onto a large variety of hardware and software. This is achieved by employing advanced C++ techniques such as templates and inline functions, allowing to write generic code that is resolved and bound at compile time, resulting in virtually no memory or computation overhead at run time.
The Wiselib runs on different host operating systems, such as Contiki, iSense OS, and ScatterWeb. Furthermore, it runs on virtual nodes simulated by Shawn. For any algorithm, the Wiselib provides data structures that suit the specific properties of the target platform. Algorithm code does not contain any platform-specific specializations, allowing a single implementation to run natively on heterogeneous networks.
In this paper, we describe the building blocks of the Wiselib, and analyze the overhead. We demonstrate the effectiveness of our approach by showing how routing algorithms can be implemented. We also report on results from experiments with real sensor-node hardware.
△ Less
Submitted 16 January, 2011;
originally announced January 2011.
-
A Protocol for Self-Synchronized Duty-Cycling in Sensor Networks: Generic Implementation in Wiselib
Authors:
Hugo Hernández,
Tobias Baumgartner,
Maria J. Blesa,
Christian Blum,
Alexander Kröller,
Sandor P. Fekete
Abstract:
In this work we present a protocol for self-synchronized duty-cycling in wireless sensor networks with energy harvesting capabilities. The protocol is implemented in Wiselib, a library of generic algorithms for sensor networks. Simulations are conducted with the sensor network simulator Shawn. They are based on the specifications of real hardware known as iSense sensor nodes. The experimental resu…
▽ More
In this work we present a protocol for self-synchronized duty-cycling in wireless sensor networks with energy harvesting capabilities. The protocol is implemented in Wiselib, a library of generic algorithms for sensor networks. Simulations are conducted with the sensor network simulator Shawn. They are based on the specifications of real hardware known as iSense sensor nodes. The experimental results show that the proposed mechanism is able to adapt to changing energy availabilities. Moreover, it is shown that the system is very robust against packet loss.
△ Less
Submitted 21 October, 2010;
originally announced October 2010.
-
Simultaneous Event Execution in Heterogeneous Wireless Sensor Networks
Authors:
Tobias Baumgartner,
Sandor P. Fekete,
Winfried Hellmann,
Alexander Kroeller
Abstract:
We present a synchronization algorithm to let nodes in a sensor network simultaneously execute a task at a given point in time. In contrast to other time synchronization algorithms we do not provide a global time basis that is shared on all nodes. Instead, any node in the network can spontaneously initiate a process that allows the simultaneous execution of arbitrary tasks. We show that our approa…
▽ More
We present a synchronization algorithm to let nodes in a sensor network simultaneously execute a task at a given point in time. In contrast to other time synchronization algorithms we do not provide a global time basis that is shared on all nodes. Instead, any node in the network can spontaneously initiate a process that allows the simultaneous execution of arbitrary tasks. We show that our approach is beneficial in scenarios where a global time is not needed, as it requires little communication compared with other time synchronization algorithms. We also show that our algorithm works in heterogeneous systems where the hardware provides highly varying clock accuracy. Moreover, heterogeneity does not only affect the hardware, but also the communication channels. We deal with different connection types---from highly unreliable and fluctuating wireless channels to reliable and fast wired connections.
△ Less
Submitted 29 September, 2010;
originally announced September 2010.
-
Hallway Monitoring: Distributed Data Processing with Wireless Sensor Networks
Authors:
Tobias Baumgartner,
Sandor P. Fekete,
Tom Kamphans,
Alexander Kroeller,
Max Pagel
Abstract:
We present a sensor network testbed that monitors a hallway. It consists of 120 load sensors and 29 passive infrared sensors (PIRs), connected to 30 wireless sensor nodes. There are also 29 LEDs and speakers installed, operating as actuators, and enabling a direct interaction between the testbed and passers-by. Beyond that, the network is heterogeneous, consisting of three different circuit boards…
▽ More
We present a sensor network testbed that monitors a hallway. It consists of 120 load sensors and 29 passive infrared sensors (PIRs), connected to 30 wireless sensor nodes. There are also 29 LEDs and speakers installed, operating as actuators, and enabling a direct interaction between the testbed and passers-by. Beyond that, the network is heterogeneous, consisting of three different circuit boards---each with its specific responsibility. The design of the load sensors is of extremely low cost compared to industrial solutions and easily transferred to other settings. The network is used for in-network data processing algorithms, offering possibilities to develop, for instance, distributed target-tracking algorithms. Special features of our installation are highly correlated sensor data and the availability of miscellaneous sensor types.
△ Less
Submitted 24 September, 2010;
originally announced September 2010.