Search | arXiv e-print repository

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

Authors: Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michał Podstawski, Torsten Hoefler

Abstract: Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur frequently, but are challenging because the embed… ▽ More Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur frequently, but are challenging because the embeddings of these documents may be distant in the embedding space, making it hard to retrieve them all. This paper introduces Multi-Head RAG (MRAG), a novel scheme designed to address this gap with a simple yet powerful idea: leveraging activations of Transformer's multi-head attention layer, instead of the decoder layer, as keys for fetching multi-aspect documents. The driving motivation is that different attention heads can learn to capture different data aspects. Harnessing the corresponding activations results in embeddings that represent various facets of data items and queries, improving the retrieval accuracy for complex queries. We provide an evaluation methodology and metrics, synthetic datasets, and real-world use cases to demonstrate MRAG's effectiveness, showing improvements of up to 20% in relevance over standard RAG baselines. MRAG can be seamlessly integrated with existing RAG frameworks and benchmarking tools like RAGAS as well as different classes of data stores. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02524 [pdf, other]

CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks

Authors: Maciej Besta, Lorenzo Paleari, Ales Kubicek, Piotr Nyczyk, Robert Gerstenberger, Patrick Iff, Tomasz Lehmann, Hubert Niewiadomski, Torsten Hoefler

Abstract: Large Language Models (LLMs) are revolutionizing various domains, yet verifying their answers remains a significant challenge, especially for intricate open-ended tasks such as consolidation, summarization, and extraction of knowledge. In this work, we propose CheckEmbed: an accurate, scalable, and simple LLM verification approach. CheckEmbed is driven by a straightforward yet powerful idea: in or… ▽ More Large Language Models (LLMs) are revolutionizing various domains, yet verifying their answers remains a significant challenge, especially for intricate open-ended tasks such as consolidation, summarization, and extraction of knowledge. In this work, we propose CheckEmbed: an accurate, scalable, and simple LLM verification approach. CheckEmbed is driven by a straightforward yet powerful idea: in order to compare LLM solutions to one another or to the ground-truth, compare their corresponding answer-level embeddings obtained with a model such as GPT Text Embedding Large. This reduces a complex textual answer to a single embedding, facilitating straightforward, fast, and meaningful verification. We develop a comprehensive verification pipeline implementing the CheckEmbed methodology. The CheckEmbed pipeline also comes with metrics for assessing the truthfulness of the LLM answers, such as embedding heatmaps and their summaries. We show how to use these metrics for deploying practical engines that decide whether an LLM answer is satisfactory or not. We apply the pipeline to real-world document analysis tasks, including term extraction and document summarization, showcasing significant improvements in accuracy, cost-effectiveness, and runtime performance compared to existing token-, sentence-, and fact-level schemes such as BERTScore or SelfCheckGPT. △ Less

Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2403.12231 [pdf, other]

Edge-Disjoint Spanning Trees on Star-Product Networks

Authors: Aleyah Dawkins, Kelly Isham, Ales Kubicek, Kartik Lakhotia, Laura Monroe

Abstract: Star-product graphs are a natural extension of the Cartesian product, but have not been well-studied. We show that many important established and emerging network topologies, including HyperX, SlimFly, BundleFly, PolarStar, mesh, and torus, are in fact star-product graphs. While this connection was known for BundleFly and PolarStar, it was not for the others listed. We extend a method of constru… ▽ More Star-product graphs are a natural extension of the Cartesian product, but have not been well-studied. We show that many important established and emerging network topologies, including HyperX, SlimFly, BundleFly, PolarStar, mesh, and torus, are in fact star-product graphs. While this connection was known for BundleFly and PolarStar, it was not for the others listed. We extend a method of constructing maximal and near-maximal sets of edge-disjoint spanning trees on Cartesian products to the star product, thus obtain maximal or near-maximal sets of edge-disjoint spanning trees on new networks of importance, where such sets can improve bandwidth of collective operations and therefore accelerate many important workloads in high-performance computing. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2401.14295 [pdf, other]

Demystifying Chains, Trees, and Graphs of Thoughts

Authors: Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler

Abstract: The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the… ▽ More The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM's capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and other parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques. △ Less

Submitted 5 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2310.03742 [pdf, other]

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network

Authors: Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler

Abstract: Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully a… ▽ More Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully analyze performance. We demonstrate techniques for simple cabling and cabling validation as well as a novel high-performance routing architecture for InfiniBand-based low-diameter topologies. Our real-world benchmarks show SF's strong performance for many modern workloads such as deep neural network training, graph analytics, or linear algebra kernels. SF outperforms non-blocking Fat Trees in scalability while offering comparable or better performance and lower cost for large network sizes. Our work can facilitate deploying SF while the associated (open-source) routing architecture is fully portable and applicable to accelerate any low-diameter interconnect. △ Less

Submitted 21 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Journal ref: Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24) Santa Clara, CA, USA April 16-18, 2024

arXiv:2308.09687 [pdf, other]

doi 10.1609/aaai.v38i16.29720

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

Authors: Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler

Abstract: We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges co… ▽ More We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks. △ Less

Submitted 6 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 2024 (AAAI'24)

arXiv:2305.14398 [pdf, other]

TornadoQSim: An Open-source High-Performance and Modular Quantum Circuit Simulation Framework

Authors: Ales Kubicek, Athanasios Stratikopoulos, Juan Fumero, Nikos Foutris, Christos Kotselidis

Abstract: In this article, we present TornadoQSim, an open-source quantum circuit simulation framework implemented in Java. The proposed framework has been designed to be modular and easily expandable for accommodating different user-defined simulation backends, such as the unitary matrix simulation technique. Furthermore, TornadoQSim features the ability to interchange simulation backends that can simulate… ▽ More In this article, we present TornadoQSim, an open-source quantum circuit simulation framework implemented in Java. The proposed framework has been designed to be modular and easily expandable for accommodating different user-defined simulation backends, such as the unitary matrix simulation technique. Furthermore, TornadoQSim features the ability to interchange simulation backends that can simulate arbitrary quantum circuits. Another novel aspect of TornadoQSim over other quantum simulators is the transparent hardware acceleration of the simulation backends on heterogeneous devices. TornadoQSim employs TornadoVM to automatically compile parts of the simulation backends onto heterogeneous hardware, thereby addressing the fragmentation in development due to the low-level heterogeneous programming models. The evaluation of TornadoQSim has shown that the transparent utilization of GPU hardware can result in up to 506.5$x$ performance speedup when compared to the vanilla Java code for a fully entangled quantum circuit of 11 qubits. Other evaluated quantum algorithms have been the Deutsch-Jozsa algorithm (493.10$x$ speedup for a 11-qubit circuit) and the quantum Fourier transform algorithm (518.12$x$ speedup for a 11-qubit circuit). Finally, the best TornadoQSim implementation of unitary matrix has been evaluated against a semantically equivalent simulation via Qiskit. The comparative evaluation has shown that the simulation with TornadoQSim is faster for small circuits, while for large circuits Qiskit outperforms TornadoQSim by an order of magnitude. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 29 pages

Showing 1–7 of 7 results for author: Kubicek, A