Search | arXiv e-print repository

doi 10.1007/978-3-031-56060-6_17

HyperPIE: Hyperparameter Information Extraction from Scientific Publications

Authors: Tarek Saier, Mayumi Ohta, Takuto Asakura, Michael Färber

Abstract: Automatic extraction of information from publications is key to making scientific knowledge machine readable at a large scale. The extracted information can, for example, facilitate academic search, decision making, and knowledge graph construction. An important type of information not covered by existing approaches is hyperparameters. In this paper, we formalize and tackle hyperparameter informat… ▽ More Automatic extraction of information from publications is key to making scientific knowledge machine readable at a large scale. The extracted information can, for example, facilitate academic search, decision making, and knowledge graph construction. An important type of information not covered by existing approaches is hyperparameters. In this paper, we formalize and tackle hyperparameter information extraction (HyperPIE) as an entity recognition and relation extraction task. We create a labeled data set covering publications from a variety of computer science disciplines. Using this data set, we train and evaluate BERT-based fine-tuned models as well as five large language models: GPT-3.5, GALACTICA, Falcon, Vicuna, and WizardLM. For fine-tuned models, we develop a relation extraction approach that achieves an improvement of 29% F1 over a state-of-the-art baseline. For large language models, we develop an approach leveraging YAML output for structured data extraction, which achieves an average improvement of 5.5% F1 in entity recognition over using JSON. With our best performing model we extract hyperparameter information from a large number of unannotated papers, and analyze patterns across disciplines. All our data and source code is publicly available at https://github.com/IllDepence/hyperpie △ Less

Submitted 10 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: accepted at ECIR2024

arXiv:2311.10063 [pdf, other]

FENDL: A library for fusion research and applications

Authors: G. Schnabel, D. L. Aldama, T. Bohm, U. Fischer, S. Kunieda, A. Trkov, C. Konno, R. Capote, A. J. Koning, S. Breidokaite, T. Eade, M. Fabbri, D. Flammini, L. Isolan, I. Kodeli, M. Košťál, S. Kwon, D. Laghi, D. Leichtle, S. Nakayama, M. Ohta, L. W. Packer, Y. Qiu, S. Sato, M. Sawan , et al. (6 additional authors not shown)

Abstract: The Fusion Evaluated Nuclear Data Library (FENDL) is a comprehensive and validated collection of nuclear cross section data coordinated by the International Atomic Energy Agency (IAEA) Nuclear Data Section (NDS). FENDL assembles the best nuclear data for fusion applications selected from available nuclear data libraries and has been under development for decades. FENDL contains sub-libraries for i… ▽ More The Fusion Evaluated Nuclear Data Library (FENDL) is a comprehensive and validated collection of nuclear cross section data coordinated by the International Atomic Energy Agency (IAEA) Nuclear Data Section (NDS). FENDL assembles the best nuclear data for fusion applications selected from available nuclear data libraries and has been under development for decades. FENDL contains sub-libraries for incident neutron, proton, and deuteron cross sections including general purpose and activation files used for particle transport and nuclide inventory calculations. We describe the history, selection of evaluations for the various sub-libraries (neutron, proton, deuteron) with the focus on transport and reactor dosimetry applications, the processing of the nuclear data for application codes, and the development of the TENDL-2017 library which is the currently recommended activation library for FENDL. We briefly describe the IAEA IRDFF library as the recommended library for dosimetry fusion applications. We also present work on validation of the neutron sub-library using a variety of fusion relevant computational and experimental benchmarks. A variety of cross section libraries are used for the validation work including FENDL-2.1, FENDL-3.1d, FENDL-3.2, ENDF/B-VIII.0, and JEFF-3.2 with the emphasis on the FENDL libraries. The results of the experimental validation showed that the performance of FENDL-3.2b is at least as good and in most cases better than FENDL-2.1. Future work will consider improved evaluations developed by the International Nuclear Data Evaluation Network (INDEN). Additional work will be needed to investigate differences in gas production in structural materials. Covariance matrices need to be updated to support the development of fusion technology. Additional validation work for high-energy neutrons, protons and deuterons, and the activation library will be needed. △ Less

Submitted 17 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 81 pages, 114 figures

arXiv:2210.02545 [pdf, other]

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

Authors: Mayumi Ohta, Julia Kreutzer, Stefan Riezler

Abstract: JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integr… ▽ More JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: EMNLP 2022 demo track

arXiv:2201.08279 [pdf, other]

Modeling and hexahedral meshing of cerebral arterial networks from centerlines

Authors: Méghane Decroocq, Carole Frindel, Pierre Rougé, Makoto Ohta, Guillaume Lavoué

Abstract: Computational fluid dynamics (CFD) simulation provides valuable information on blood flow from the vascular geometry. However, it requires extracting precise models of arteries from low-resolution medical images, which remains challenging. Centerline-based representation is widely used to model large vascular networks with small vessels, as it encodes both the geometric and topological information… ▽ More Computational fluid dynamics (CFD) simulation provides valuable information on blood flow from the vascular geometry. However, it requires extracting precise models of arteries from low-resolution medical images, which remains challenging. Centerline-based representation is widely used to model large vascular networks with small vessels, as it encodes both the geometric and topological information and facilitates manual editing. In this work, we propose an automatic method to generate a structured hexahedral mesh suitable for CFD directly from centerlines. We addressed both the modeling and meshing tasks. We proposed a vessel model based on penalized splines to overcome the limitations inherent to the centerline representation, such as noise and sparsity. The bifurcations are reconstructed using a parametric model based on the anatomy that we extended to planar n-furcations. Finally, we developed a method to produce a volume mesh with structured, hexahedral, and flow-oriented cells from the proposed vascular network model. The proposed method offers better robustness to the common defects of centerlines and increases the mesh quality compared to state-of-the-art methods. As it relies on centerlines alone, it can be applied to edit the vascular model effortlessly to study the impact of vascular geometry and topology on hemodynamics. We demonstrate the efficiency of our method by entirely meshing a dataset of 60 cerebral vascular networks. 92% of the vessels and 83% of the bifurcations were meshed without defects needing manual intervention, despite the challenging aspect of the input data. The source code is released publicly. △ Less

Submitted 13 June, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2104.01393 [pdf, other]

doi 10.21437/Interspeech.2021-1679

On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR

Authors: Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni, Stefan Riezler

Abstract: We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio… ▽ More We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) that uses alignment information to generate effective training samples. Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate previously unseen training pairs. The speech representations are sampled from an audio dictionary that has been extracted from the training corpus and inject speaker variations into the training examples. The transcribed tokens are either predicted by a language model such that the augmented data pairs are semantically close to the original data, or randomly sampled. Both strategies result in training pairs that improve robustness in ASR training. Our experiments on a Seq-to-Seq architecture show that ADA can be applied on top of SpecAugment, and achieves about 9-23% and 4-15% relative improvements in WER over SpecAugment alone on LibriSpeech 100h and LibriSpeech 960h test datasets, respectively. △ Less

Submitted 9 June, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: Accepted at INTERSPEECH 2021

arXiv:2006.01759 [pdf, other]

Sparse Perturbations for Improved Convergence in Stochastic Zeroth-Order Optimization

Authors: Mayumi Ohta, Nathaniel Berger, Artem Sokolov, Stefan Riezler

Abstract: Interest in stochastic zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks. SZO methods only require the ability to evaluate the objective function at random input points, however, their weakness is the dependency of their convergence speed on the dimensionality of the function to be evaluated. We pr… ▽ More Interest in stochastic zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks. SZO methods only require the ability to evaluate the objective function at random input points, however, their weakness is the dependency of their convergence speed on the dimensionality of the function to be evaluated. We present a sparse SZO optimization method that reduces this factor to the expected dimensionality of the random perturbation during learning. We give a proof that justifies this reduction for sparse SZO optimization for non-convex functions without making any assumptions on sparsity of objective function or gradient. Furthermore, we present experimental results for neural networks on MNIST and CIFAR that show faster convergence in training loss and test accuracy, and a smaller distance of the gradient approximation to the true gradient in sparse SZO compared to dense SZO. △ Less

Submitted 29 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: International Conference on Machine Learning, Optimization, and Data Science (LOD), Siena, Italy

Journal ref: LOD 2020

arXiv:1806.04458 [pdf, other]

Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction

Authors: Artem Sokolov, Julian Hitschler, Mayumi Ohta, Stefan Riezler

Abstract: Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, t… ▽ More Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, this factor can be reduced to the expected number of active features over input-output pairs. We give a general proof that applies sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic objectives, and present an experimental evaluation on linear bandit structured prediction tasks with sparse word-based feature representations that confirm our theoretical results. △ Less

Submitted 10 November, 2020; v1 submitted 12 June, 2018; originally announced June 2018.

Showing 1–7 of 7 results for author: Ohta, M