Search | arXiv e-print repository

Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL

Authors: Marius Meyer, Tobias Kenter, Lucian Petrica, Kenneth O'Brien, Michaela Blott, Christian Plessl

Abstract: Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone because the whole network stack and communication patterns have to be implemented and managed on the FPGAs. Also, this approach conceptually involves a tr… ▽ More Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone because the whole network stack and communication patterns have to be implemented and managed on the FPGAs. Also, this approach conceptually involves a trade-off between the performance potential of improved communication and the impact of resource consumption for communication infrastructure, since the utilized resources on the FPGAs could otherwise be used for computations. In this work, we investigate this trade-off, firstly, by using synthetic benchmarks to evaluate the different configuration options of the communication framework ACCL and their impact on communication latency and throughput. Finally, we use our findings to implement a shallow water simulation whose scalability heavily depends on low-latency communication. With a suitable configuration of ACCL, good scaling behavior can be shown to all 48 FPGAs installed in the system. Overall, the results show that the availability of inter-FPGA communication frameworks as well as the configurability of framework and network stack are crucial to achieve the best application performance with low latency communication. △ Less

Submitted 7 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2304.03039 [pdf, other]

A computation of D(9) using FPGA Supercomputing

Authors: Lennart Van Hirtum, Patrick De Causmaecker, Jens Goemaere, Tobias Kenter, Heinrich Riebler, Michael Lass, Christian Plessl

Abstract: This preprint makes the claim of having computed the $9^{th}$ Dedekind Number. This was done by building an efficient FPGA Accelerator for the core operation of the process, and parallelizing it on the Noctua 2 Supercluster at Paderborn University. The resulting value is 286386577668298411128469151667598498812366. This value can be verified in two steps. We have made the data file containing the 4… ▽ More This preprint makes the claim of having computed the $9^{th}$ Dedekind Number. This was done by building an efficient FPGA Accelerator for the core operation of the process, and parallelizing it on the Noctua 2 Supercluster at Paderborn University. The resulting value is 286386577668298411128469151667598498812366. This value can be verified in two steps. We have made the data file containing the 490M results available, each of which can be verified separately on CPU, and the whole file sums to our proposed value. △ Less

Submitted 18 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.13632 [pdf, other]

Computing and Compressing Electron Repulsion Integrals on FPGAs

Authors: Xin Wu, Tobias Kenter, Robert Schade, Thomas D. Kühne, Christian Plessl

Abstract: The computation of electron repulsion integrals (ERIs) over Gaussian-type orbitals (GTOs) is a challenging problem in quantum-mechanics-based atomistic simulations. In practical simulations, several trillions of ERIs may have to be computed for every time step. In this work, we investigate FPGAs as accelerators for the ERI computation. We use template parameters, here within the Intel oneAPI too… ▽ More The computation of electron repulsion integrals (ERIs) over Gaussian-type orbitals (GTOs) is a challenging problem in quantum-mechanics-based atomistic simulations. In practical simulations, several trillions of ERIs may have to be computed for every time step. In this work, we investigate FPGAs as accelerators for the ERI computation. We use template parameters, here within the Intel oneAPI tool flow, to create customized designs for 256 different ERI quartet classes, based on their orbitals. To maximize data reuse, all intermediates are buffered in FPGA on-chip memory with customized layout. The pre-calculation of intermediates also helps to overcome data dependencies caused by multi-dimensional recurrence relations. The involved loop structures are partially or even fully unrolled for high throughput of FPGA kernels. Furthermore, a lossy compression algorithm utilizing arbitrary bitwidth integers is integrated in the FPGA kernels. To our best knowledge, this is the first work on ERI computation on FPGAs that supports more than just the single most basic quartet class. Also, the integration of ERI computation and compression it a novelty that is not even covered by CPU or GPU libraries so far. Our evaluation shows that using 16-bit integer for the ERI compression, the fastest FPGA kernels exceed the performance of 10 GERIS ($10 \times 10^9$ ERIs per second) on one Intel Stratix 10 GX 2800 FPGA, with maximum absolute errors around $10^{-7}$ - $10^{-5}$ Hartree. The measured throughput can be accurately explained by a performance model. The FPGA kernels deployed on 2 FPGAs outperform similar computations using the widely used libint reference on a two-socket server with 40 Xeon Gold 6148 CPU cores of the same process technology by factors up to 6.0x and on a new two-socket server with 128 EPYC 7713 CPU cores by up to 1.9x. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2208.13183 [pdf, other]

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks

Authors: Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark

Abstract: Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. T… ▽ More Transfer tasks in text-to-speech (TTS) synthesis - where one or more aspects of the speech of one set of speakers is transferred to another set of speakers that do not feature these aspects originally - remains a challenging task. One of the challenges is that models that have high-quality transfer capabilities can have issues in stability, making them impractical for user-facing critical tasks. This paper demonstrates that transfer can be obtained by training a robust TTS system on data generated by a less robust TTS system designed for a high-quality transfer task; in particular, a CHiVE-BERT monolingual TTS system is trained on the output of a Tacotron model designed for accent transfer. While some quality loss is inevitable with this approach, experimental results show that the models trained on synthetic data this way can produce high quality audio displaying accent transfer, while preserving speaker characteristics such as speaking style. △ Less

Submitted 28 August, 2022; originally announced August 2022.

Comments: To be published in Interspeech 2022

arXiv:2205.12182 [pdf, other]

Breaking the Exascale Barrier for the Electronic Structure Problem in Ab-Initio Molecular Dynamics

Authors: Robert Schade, Tobias Kenter, Hossam Elgabarty, Michael Lass, Thomas D. Kühne, Christian Plessl

Abstract: The non-orthogonal local submatrix method applied to electronic-structure based molecular dynamics simulations is shown to exceed 1.1 EFLOP/s in FP16/FP32 mixed floating-point arithmetic when using 4,400 NVIDIA A100 GPUs of the Perlmutter system. This is enabled by a modification of the original method that pushes the sustained fraction of the peak performance to about 80%. Example calculations ar… ▽ More The non-orthogonal local submatrix method applied to electronic-structure based molecular dynamics simulations is shown to exceed 1.1 EFLOP/s in FP16/FP32 mixed floating-point arithmetic when using 4,400 NVIDIA A100 GPUs of the Perlmutter system. This is enabled by a modification of the original method that pushes the sustained fraction of the peak performance to about 80%. Example calculations are performed for SARS-CoV-2 spike proteins with up to 83 million atoms. △ Less

Submitted 7 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 6 pages, 6 figures, 2 tables

arXiv:2202.13995 [pdf, other]

Multi-FPGA Designs and Scaling of HPC Challenge Benchmarks via MPI and Circuit-Switched Inter-FPGA Networks

Authors: Marius Meyer, Tobias Kenter, Christian Plessl

Abstract: While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can… ▽ More While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can be achieved by using standard communication interfaces like MPI on the host side. However, for performance and productivity, some applications can profit from a tighter coupling of the accelerators. FPGAs offer unique opportunities here when extending the dataflow characteristics to their communication ininterfaces. In this work, we extend the HPCC FPGA benchmark suite by multi-FPGA support and three missing benchmarks that particularly characterize or stress inter-device communication: b_eff, PTRANS, and LINPACK. With all benchmarks implemented for current boards with Intel and Xilinx FPGAs, we established a baseline for multi-FPGA performance. Additionally, for the communication-centric benchmarks, we explored the potential of direct FPGA-to-FPGA communication with a circuit-switched inter-FPGA network that is currently only available for one of the boards. The evaluation with parallel execution on up to 26 FPGA boards makes use of one of the largest academic FPGA installations. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2108.12188 [pdf, ps, other]

doi 10.1145/3492805.3492808

A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays

Authors: Martin Karp, Artur Podobas, Tobias Kenter, Niclas Jansson, Christian Plessl, Philipp Schlatter, Stefano Markidis

Abstract: The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we de… ▽ More The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work -- which often focuses on accelerating small kernels -- we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator. We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies. △ Less

Submitted 2 November, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 12 pages, 3 figures, 3 tables, Accepted to HPC Asia 2022

ACM Class: G.4; J.2; C.1

arXiv:2104.08245 [pdf, other]

doi 10.1016/j.parco.2022.102920

Towards Electronic Structure-Based Ab-Initio Molecular Dynamics Simulations with Hundreds of Millions of Atoms

Authors: Robert Schade, Tobias Kenter, Hossam Elgabarty, Michael Lass, Ole Schütt, Alfio Lazzaro, Hans Pabst, Stephan Mohr, Jürg Hutter, Thomas D. Kühne, Christian Plessl

Abstract: We push the boundaries of electronic structure-based \textit{ab-initio} molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise barely reachable with classical force-field methods or novel neural network and machine learning potentials. We achieve this breakthrough by combining innovations in linear-scaling AIMD, efficient and approximate sparse linear algebra, low and mixed-pr… ▽ More We push the boundaries of electronic structure-based \textit{ab-initio} molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise barely reachable with classical force-field methods or novel neural network and machine learning potentials. We achieve this breakthrough by combining innovations in linear-scaling AIMD, efficient and approximate sparse linear algebra, low and mixed-precision floating-point computation on GPUs, and a compensation scheme for the errors introduced by numerical approximations. The core of our work is the non-orthogonalized local submatrix method (NOLSM), which scales very favorably to massively parallel computing systems and translates large sparse matrix operations into highly parallel, dense matrix operations that are ideally suited to hardware accelerators. We demonstrate that the NOLSM method, which is at the center point of each AIMD step, is able to achieve a sustained performance of 324 PFLOP/s in mixed FP16/FP32 precision corresponding to an efficiency of 67.7% when running on 1536 NVIDIA A100 GPUs. △ Less

Submitted 31 January, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: 12 pages, 11 figures

arXiv:2010.13463 [pdf]

High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

Authors: Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis

Abstract: Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate… ▽ More Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a convenient balance between complexity and performance. In this paper, we study modern FPGAs' applicability in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator operating in double-precision that we empirically evaluate on the latest Stratix 10 GX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project future FPGAs' performance and role to accelerate CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have? △ Less

Submitted 4 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 10 pages, IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

ACM Class: G.4; J.2; C.1

arXiv:2006.08435 [pdf, other]

Efficient Ab-Initio Molecular Dynamic Simulations by Offloading Fast Fourier Transformations to FPGAs

Authors: Arjun Ramaswami, Tobias Kenter, Thomas D. Kühne, Christian Plessl

Abstract: A large share of today's HPC workloads is used for Ab-Initio Molecular Dynamics (AIMD) simulations, where the interatomic forces are computed on-the-fly by means of accurate electronic structure calculations. They are computationally intensive and thus constitute an interesting application class for energy-efficient hardware accelerators such as FPGAs. In this paper, we investigate the potential o… ▽ More A large share of today's HPC workloads is used for Ab-Initio Molecular Dynamics (AIMD) simulations, where the interatomic forces are computed on-the-fly by means of accurate electronic structure calculations. They are computationally intensive and thus constitute an interesting application class for energy-efficient hardware accelerators such as FPGAs. In this paper, we investigate the potential of offloading 3D Fast Fourier Transformations (FFTs) as a critical routine of plane-wave-based electronic structure calculations to FPGA and in conjunction demonstrate the tolerance of these simulations to lower precision computations. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 2 pages, 3 figures, to be published in FPL 2020

arXiv:2004.11059 [pdf, other]

Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of the HPCChallenge Benchmark Suite

Authors: Marius Meyer, Tobias Kenter, Christian Plessl

Abstract: FPGAs have found increasing adoption in data center applications since a new generation of high-level tools have become available which noticeably reduce development time for FPGA accelerators and still provide high quality of results. There is however no high-level benchmark suite available which specifically enables a comparison of FPGA architectures, programming tools and libraries for HPC appl… ▽ More FPGAs have found increasing adoption in data center applications since a new generation of high-level tools have become available which noticeably reduce development time for FPGA accelerators and still provide high quality of results. There is however no high-level benchmark suite available which specifically enables a comparison of FPGA architectures, programming tools and libraries for HPC applications. To fill this gap, we have developed an OpenCL-based open source implementation of the HPCC benchmark suite for Xilinx and Intel FPGAs. This benchmark can serve to analyze the current capabilities of FPGA devices, cards and development tool flows, track progress over time and point out specific difficulties for FPGA acceleration in the HPC domain. Additionally, the benchmark documents proven performance optimization patterns. We will continue optimizing and porting the benchmark for new generations of FPGAs and design tools and encourage active participation to create a valuable tool for the community. △ Less

Submitted 12 June, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

arXiv:1909.03965 [pdf, ps, other]

Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs

Authors: Rob Clark, Hanna Silen, Tom Kenter, Ralph Leith

Abstract: Text-to-speech systems are typically evaluated on single sentences. When long-form content, such as data consisting of full paragraphs or dialogues is considered, evaluating sentences in isolation is not always appropriate as the context in which the sentences are synthesized is missing. In this paper, we investigate three different ways of evaluating the naturalness of long-form text-to-speech sy… ▽ More Text-to-speech systems are typically evaluated on single sentences. When long-form content, such as data consisting of full paragraphs or dialogues is considered, evaluating sentences in isolation is not always appropriate as the context in which the sentences are synthesized is missing. In this paper, we investigate three different ways of evaluating the naturalness of long-form text-to-speech synthesis. We compare the results obtained from evaluating sentences in isolation, evaluating whole paragraphs of speech, and presenting a selection of speech or text as context and evaluating the subsequent speech. We find that, even though these three evaluations are based upon the same material, the outcomes differ per setting, and moreover that these outcomes do not necessarily correlate with each other. We show that our findings are consistent between a single speaker setting of read paragraphs and a two-speaker dialogue scenario. We conclude that to evaluate the quality of long-form speech, the traditional way of evaluating sentences in isolation does not suffice, and that multiple evaluations are required. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: Accepted for The 10th ISCA Speech Synthesis Workshop (SSW10), 6 pages

arXiv:1905.07195 [pdf, other]

CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network

Authors: Vincent Wan, Chun-an Chan, Tom Kenter, Jakub Vit, Rob Clark

Abstract: The prosodic aspects of speech signals produced by current text-to-speech systems are typically averaged over training material, and as such lack the variety and liveliness found in natural speech. To avoid monotony and averaged prosody contours, it is desirable to have a way of modeling the variation in the prosodic aspects of speech, so audio signals can be synthesized in multiple ways for a giv… ▽ More The prosodic aspects of speech signals produced by current text-to-speech systems are typically averaged over training material, and as such lack the variety and liveliness found in natural speech. To avoid monotony and averaged prosody contours, it is desirable to have a way of modeling the variation in the prosodic aspects of speech, so audio signals can be synthesized in multiple ways for a given text. We present a new, hierarchically structured conditional variational autoencoder to generate prosodic features (fundamental frequency, energy and duration) suitable for use with a vocoder or a generative model like WaveNet. At inference time, an embedding representing the prosody of a sentence may be sampled from the variational layer to allow for prosodic variation. To efficiently capture the hierarchical nature of the linguistic input (words, syllables and phones), both the encoder and decoder parts of the auto-encoder are hierarchical, in line with the linguistic structure, with layers being clocked dynamically at the respective rates. We show in our experiments that our dynamic hierarchical network outperforms a non-hierarchical state-of-the-art baseline, and, additionally, that prosody transfer across sentences is possible by employing the prosody embedding of one sentence to generate the speech signal of another. △ Less

Submitted 4 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

arXiv:1810.05436 [pdf, other]

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents

Authors: Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

Abstract: A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, thei… ▽ More A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three distributions for assessing the diversity of documents: distributions of words within documents, words within topics, and topics within documents. Topic models play a central role in this approach and, hence, their quality is crucial to the efficacy of measuring topical diversity. The quality of topic models is affected by two causes: generality and impurity of topics. General topics only include common information of a background corpus and are assigned to most of the documents. Impure topics contain words that are not related to the topic. Impurity lowers the interpretability of topic models. Impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation process aimed at removing generality and impurity. Our approach has three re-estimation components: (1) document re-estimation, which removes general words from the documents; (2) topic re-estimation, which re-estimates the distribution over words of each topic; and (3) topic assignment re-estimation, which re-estimates for each document its distributions over topics. For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset. △ Less

Submitted 12 October, 2018; originally announced October 2018.

Comments: IEEE Transactions on Knowledge and Data Engineering

arXiv:1801.02178 [pdf, other]

Neural Networks for Information Retrieval

Authors: Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra

Abstract: Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. The aim of this full-day tutorial… ▽ More Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR. △ Less

Submitted 7 January, 2018; originally announced January 2018.

Comments: Overview of full-day tutorial at WSDM 2018

arXiv:1712.07229 [pdf, other]

Attentive Memory Networks: Efficient Machine Reading for Conversational Search

Authors: Tom Kenter, Maarten de Rijke

Abstract: Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances… ▽ More Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to keep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a stand-alone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a stand-alone task and present the Attentive Memory Network (AMN), an end-to-end trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the state-of-the-art models, while using considerably fewer computations. △ Less

Submitted 19 December, 2017; originally announced December 2017.

Journal ref: Proceedings of 1st International Workshop on Conversational Approaches to Information Retrieval, Tokyo, Japan, August 11, 2017 (CAIR'17)

arXiv:1707.04242 [pdf, other]

Neural Networks for Information Retrieval

Authors: Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, Maarten de Rijke, Bhaskar Mitra

Abstract: Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally,… ▽ More Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions. △ Less

Submitted 13 July, 2017; originally announced July 2017.

Comments: Overview of full-day tutorial at SIGIR 2017

arXiv:1701.04273 [pdf, other]

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

Authors: Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke

Abstract: A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is subopt… ▽ More A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments. △ Less

Submitted 16 January, 2017; originally announced January 2017.

Comments: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017)

arXiv:1606.04640 [pdf, other]

Siamese CBOW: Optimizing Word Embeddings for Sentence Representations

Authors: Tom Kenter, Alexey Borisov, Maarten de Rijke

Abstract: We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of high-quality sentence embeddings. Averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way of obtaining sentence embeddings. However, word embeddings trained with the methods currently available are not optimized for the task of sente… ▽ More We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of high-quality sentence embeddings. Averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way of obtaining sentence embeddings. However, word embeddings trained with the methods currently available are not optimized for the task of sentence representation, and, thus, likely to be suboptimal. Siamese CBOW handles this problem by training word embeddings directly for the purpose of being averaged. The underlying neural network learns word embeddings by predicting, from a sentence representation, its surrounding sentences. We show the robustness of the Siamese CBOW model by evaluating it on 20 datasets stemming from a wide variety of sources. △ Less

Submitted 15 June, 2016; originally announced June 2016.

Comments: Accepted as full paper at ACL 2016, Berlin. 11 pages

arXiv:0903.2219 [pdf, ps, other]

doi 10.1088/0004-637X/696/2/2206

The Star Formation and Nuclear Accretion Histories of Normal Galaxies in the AGES Survey

Authors: Casey R. Watson, Christopher S. Kochanek, William R. Forman, Ryan C. Hickox, Christine J. Jones, Michael J. I. Brown, Kate Brand, Arjun Dey, Buell T. Jannuzi, Almus T. Kenter, Steve S. Murray, Alexey Vikhlinin, Daniel J. Eisenstein, Giovani G. Fazio, Paul J. Green, Brian R. McNamara, Marcia Rieke, Joseph C. Shields

Abstract: We combine IR, optical and X-ray data from the overlap**, 9.3 square degree NOAO Deep Wide-Field Survey (NDWFS), AGN and Galaxy Evolution Survey (AGES), and XBootes Survey to measure the X-ray evolution of 6146 normal galaxies as a function of absolute optical luminosity, redshift, and spectral type over the largely unexplored redshift range 0.1 < z < 0.5. Because only the closest or brightest… ▽ More We combine IR, optical and X-ray data from the overlap**, 9.3 square degree NOAO Deep Wide-Field Survey (NDWFS), AGN and Galaxy Evolution Survey (AGES), and XBootes Survey to measure the X-ray evolution of 6146 normal galaxies as a function of absolute optical luminosity, redshift, and spectral type over the largely unexplored redshift range 0.1 < z < 0.5. Because only the closest or brightest of the galaxies are individually detected in X-rays, we use a stacking analysis to determine the mean properties of the sample. Our results suggest that X-ray emission from spectroscopically late-type galaxies is dominated by star formation, while that from early-type galaxies is dominated by a combination of hot gas and AGN emission. We find that the mean star formation and supermassive black hole accretion rate densities evolve like (1+z)^3, in agreement with the trends found for samples of bright, individually detectable starburst galaxies and AGN. Our work also corroborates the results of many previous stacking analyses of faint source populations, with improved statistics. △ Less

Submitted 12 March, 2009; originally announced March 2009.

Comments: 19 pages, 15 figures, 3 tables, accepted for publication in ApJ

Journal ref: Astrophys.J.696:2206-2219,2009

arXiv:0803.0357 [pdf, ps, other]

doi 10.1086/587431

The Mid-Infrared Properties of X-ray Sources

Authors: V. Gorjian, M. Brodwin, C. S. Kochanek, S. Murray, D. Stern, K. Brand, P. R. Eisenhardt, M. L. N. Ashby, P. Barmby, M. J. I. Brown, A. Dey, W. Forman, B. T. Jannuzi, C. Jones, A. T. Kenter, M. A. Pahre, J. C. Shields, M. W. Werner, S. P. Willner

Abstract: We combine the results of the Spitzer IRAC Shallow Survey and the Chandra XBootes Survey of the 8.5 square degrees Bootes field of the NOAO Deep Wide- Field Survey to produce the largest comparison of mid-IR and X-ray sources to date. The comparison is limited to sources with X-ray fluxes >8x10-15 erg cm-2s-1 in the 0.5-7.0 keV range and mid-IR sources with 3.6 um fluxes brighter than 18.4 mag (… ▽ More We combine the results of the Spitzer IRAC Shallow Survey and the Chandra XBootes Survey of the 8.5 square degrees Bootes field of the NOAO Deep Wide- Field Survey to produce the largest comparison of mid-IR and X-ray sources to date. The comparison is limited to sources with X-ray fluxes >8x10-15 erg cm-2s-1 in the 0.5-7.0 keV range and mid-IR sources with 3.6 um fluxes brighter than 18.4 mag (12.3 uJy). In this most sensitive IRAC band, 85% of the 3086 X-ray sources have mid-IR counterparts at an 80% confidence level based on a Bayesian matching technique. Only 2.5% of the sample have no IRAC counterpart at all based on visual inspection. Even for a smaller but a significantly deeper Chandra survey in the same field, the IRAC Shallow Survey recovers most of the X-ray sources. A majority (65%) of the Chandra sources detected in all four IRAC bands occupy a well-defined region of IRAC [3.6] - [4.5] vs [5.8] - [8.0] color-color space. These X-ray sources are likely infrared luminous, unobscured type I AGN with little mid-infrared flux contributed by the AGN host galaxy. Of the remaining Chandra sources, most are lower luminosity type I and type II AGN whose mid-IR emission is dominated by the host galaxy, while approximately 5% are either Galactic stars or very local galaxies. △ Less

Submitted 3 March, 2008; originally announced March 2008.

Comments: Accepted for publication in ApJ

arXiv:astro-ph/0512343 [pdf, ps, other]

doi 10.1086/500312

The Chandra XBootes Survey - III: Optical and Near-IR Counterparts

Authors: Kate Brand, Michael J. I. Brown, Arjun Dey, Buell T. Jannuzi, Christopher S. Kochanek, Almus T. Kenter, Daniel Fabricant, Giovanni G. Fazio, William R. Forman, Paul J. Green, Christine J. Jones, Brian R. McNamara, Stephen S. Murray, Joan R. Najita, Marcia Rieke, Joseph C. Shields, Alexey Vikhlinin

Abstract: The XBootes Survey is a 5-ks Chandra survey of the Bootes Field of the NOAO Deep Wide-Field Survey (NDWFS). This survey is unique in that it is the largest (9.3 deg^2), contiguous region imaged in X-ray with complementary deep optical and near-IR observations. We present a catalog of the optical counterparts to the 3,213 X-ray point sources detected in the XBootes survey. Using a Bayesian identi… ▽ More The XBootes Survey is a 5-ks Chandra survey of the Bootes Field of the NOAO Deep Wide-Field Survey (NDWFS). This survey is unique in that it is the largest (9.3 deg^2), contiguous region imaged in X-ray with complementary deep optical and near-IR observations. We present a catalog of the optical counterparts to the 3,213 X-ray point sources detected in the XBootes survey. Using a Bayesian identification scheme, we successfully identified optical counterparts for 98% of the X-ray point sources. The optical colors suggest that the optically detected galaxies are a combination of z<1 massive early-type galaxies and bluer star-forming galaxies whose optical AGN emission is faint or obscured, whereas the majority of the optically detected point sources are likely quasars over a large redshift range. Our large area, X-ray bright, optically deep survey enables us to select a large sub-sample of sources (773) with high X-ray to optical flux ratios (f_x/f_o>10). These objects are likely high redshift and/or dust obscured AGN. These sources have generally harder X-ray spectra than sources with 0.1<f_x/f_o<10. Of the 73 X-ray sources with no optical counterpart in the NDWFS catalog, 47 are truly optically blank down to R~25.5 (the average 50% completeness limit of the NDWFS R-band catalogs). These sources are also likely to be high redshift and/or dust obscured AGN. △ Less

Submitted 13 December, 2005; originally announced December 2005.

Comments: 19 pages, 13 figures, ApJ accepted. Catalog can be found at: http://www.noao.edu/noao/noaodeep or ftp://archive.noao.edu/pub/catalogs/xbootes/

Journal ref: Astrophys.J.641:140-157,2006

arXiv:astro-ph/0503156 [pdf, ps, other]

doi 10.1086/430124

Tracing the Nuclear Accretion History of the Red Galaxy Population

Authors: Kate Brand, Arjun Dey, Michael J. I. Brown, Casey R. Watson, Buell T. Jannuzi, Joan R. Najita, Christopher S. Kochanek, Joseph C. Shields, Giovani G. Fazio, William R. Forman, Paul J. Green, Christine J. Jones, Almus T. Kenter, Brian R. McNamara, Steve S. Murray, Marcia Rieke, Alexey Vikhlinin

Abstract: We investigate the evolution of the hard X-ray luminosity of the red galaxy population using a large sample of 3316 red galaxies selected over a wide range in redshift (0.3<z<0.9) from a 1.4 deg^2 region in the Bootes field of the NOAO Deep Wide-Field Survey (NDWFS). The red galaxies are early-type, bulge-dominated galaxies and are selected to have the same evolution corrected, absolute R-band m… ▽ More We investigate the evolution of the hard X-ray luminosity of the red galaxy population using a large sample of 3316 red galaxies selected over a wide range in redshift (0.3<z<0.9) from a 1.4 deg^2 region in the Bootes field of the NOAO Deep Wide-Field Survey (NDWFS). The red galaxies are early-type, bulge-dominated galaxies and are selected to have the same evolution corrected, absolute R-band magnitude distribution as a function of redshift to ensure we are tracing the evolution in the X-ray properties of a comparable optical population. Using a stacking analysis of 5-ks Chandra/ACIS observations within this field to study the X-ray emission from these red galaxies in three redshift bins, we find that the mean X-ray luminosity increases as a function of redshift. The large mean X-ray luminosity and the hardness of the mean X-ray spectrum suggests that the X-ray emission is largely dominated by AGN rather than stellar sources. The hardness ratio can be reproduced by either an absorbed (N_H ~2 x 10^22 cm^-2) Gamma=1.7 power-law source, consistent with that of a population of moderately obscured Seyfert-like AGN, or an unabsorbed Gamma=0.7 source suggesting a radiatively inefficient accretion flow (e.g., an advection-dominated accretion flow). We also find that the emission from this sample of red galaxies constitutes at least 5% of the hard X-ray background. These results suggest a global decline in the mean AGN activity of normal early-type galaxies from z~1 to the present, which indicates that we are witnessing the tailing off of the accretion activity onto SMBHs in early-type galaxies since the quasar epoch. △ Less

Submitted 7 March, 2005; originally announced March 2005.

Comments: 23 pages, accepted for publication in ApJ

Journal ref: Astrophys.J. 626 (2005) 723-732

Showing 1–23 of 23 results for author: Kenter, T