Skip to main content

Showing 1–16 of 16 results for author: Polák, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03881  [pdf, other

    cs.CL

    Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

    Authors: Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh, Marco Turchi

    Abstract: Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to fill this gap by conducting a comprehensive human evaluation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: LREC-COLING2024 publication (with corrections for Table 3)

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  2. arXiv:2401.05625  [pdf, other

    cs.CV

    Face-GPS: A Comprehensive Technique for Quantifying Facial Muscle Dynamics in Videos

    Authors: Juni Kim, Zhikang Dong, Pawel Polak

    Abstract: We introduce a novel method that combines differential geometry, kernels smoothing, and spectral analysis to quantify facial muscle activity from widely accessible video recordings, such as those captured on personal smartphones. Our approach emphasizes practicality and accessibility. It has significant potential for applications in national security and plastic surgery. Additionally, it offers re… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  3. arXiv:2310.11141  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Long-form Simultaneous Speech Translation: Thesis Proposal

    Authors: Peter Polák

    Abstract: Simultaneous speech translation (SST) aims to provide real-time translation of spoken language, even before the speaker finishes their sentence. Traditionally, SST has been addressed primarily by cascaded systems that decompose the task into subtasks, including speech recognition, segmentation, and machine translation. However, the advent of deep learning has sparked significant interest in end-to… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: IJCNLP-AACL SRW 2023 - camera-ready version

  4. arXiv:2310.06282  [pdf, other

    cs.LG cs.CV cs.IR

    MuseChat: A Conversational Music Recommendation System for Videos

    Authors: Zhikang Dong, Bin Chen, Xiulong Liu, Pawel Polak, Peng Zhang

    Abstract: Music recommendation for videos attracts growing interest in multi-modal research. However, existing systems focus primarily on content compatibility, often ignoring the users' preferences. Their inability to interact with users for further refinements or to provide explanations leads to a less satisfying experience. We address these issues with MuseChat, a first-of-its-kind dialogue-based recomme… ▽ More

    Submitted 9 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  5. arXiv:2309.11384  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

    Authors: Peter Polák, Ondřej Bojar

    Abstract: Current simultaneous speech translation models can process audio only up to a few seconds long. Contemporary datasets provide an oracle segmentation into sentences based on human-annotated transcripts and translations. However, the segmentation into sentences is not available in the real world. Current speech segmentation approaches either offer poor segmentation quality or have to trade latency f… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2309.11379  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

    Authors: Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar

    Abstract: Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed -- this scheme… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at INTERSPEECH 2023

    Journal ref: Polák, P., Yan, B., Watanabe, S., Waibel, A., Bojar, O. (2023) Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. Proc. INTERSPEECH 2023, 3979-3983

  7. arXiv:2305.16894  [pdf, other

    cs.CL

    Robustness of Multi-Source MT to Transcription Errors

    Authors: Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre

    Abstract: Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling. In this paper, we hypothesize that leveraging multiple sources will improve translation quality if the sources complement one another in terms of correct information they contain. To this… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Findings

  8. arXiv:2304.09947  [pdf, other

    q-fin.ST cs.LG stat.ML

    Online Ensemble of Models for Optimal Predictive Performance with Applications to Sector Rotation Strategy

    Authors: Jiaju Miao, Pawel Polak

    Abstract: Asset-specific factors are commonly used to forecast financial returns and quantify asset-specific risk premia. Using various machine learning models, we demonstrate that the information contained in these factors leads to even larger economic gains in terms of forecasts of sector returns and the measurement of sector-specific risk premia. To capitalize on the strong predictive results of individu… ▽ More

    Submitted 29 March, 2023; originally announced April 2023.

  9. arXiv:2304.04596  [pdf, other

    cs.SD cs.CL eess.AS

    ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

    Authors: Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

    Abstract: ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-… ▽ More

    Submitted 6 July, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: ACL 2023; System Demonstration

  10. arXiv:2303.05352  [pdf, other

    cs.CL cond-mat.mtrl-sci

    Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering

    Authors: Maciej P. Polak, Dane Morgan

    Abstract: There has been a growing effort to replace manual extraction of data from research papers with automated data extraction based on natural language processing, language models, and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this… ▽ More

    Submitted 21 February, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: 7 pages, 3 figures, 1 table

    Journal ref: Nature Communications (2024) 15:1569

  11. arXiv:2302.04914  [pdf, other

    cond-mat.mtrl-sci cs.AI cs.CL

    Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models

    Authors: Maciej P. Polak, Shrey Modi, Anna Latosinska, **ming Zhang, Ching-Wen Wang, Shaonan Wang, Ayan Deep Hazra, Dane Morgan

    Abstract: Accurate and comprehensive material databases extracted from research papers are crucial for materials science and engineering, but their development requires significant human effort. With large language models (LLMs) transforming the way humans interact with text, LLMs provide an opportunity to revolutionize data extraction. In this study, we demonstrate a simple and efficient method for extract… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: 13 pages, 4 figures

    Journal ref: Digital Discovery, 2024, 3, 1221-1235

  12. arXiv:2211.00233  [pdf, other

    cs.CV cs.AI cs.LG

    Detection of (Hidden) Emotions from Videos using Muscles Movements and Face Manifold Embedding

    Authors: Juni Kim, Zhikang Dong, Eric Guan, Judah Rosenthal, Shi Fu, Miriam Rafailovich, Pawel Polak

    Abstract: We provide a new non-invasive, easy-to-scale for large amounts of subjects and a remotely accessible method for (hidden) emotion detection from videos of human faces. Our approach combines face manifold detection for accurate location of the face in the video with local face manifold embedding to create a common domain for the measurements of muscle micro-movements that is invariant to the movemen… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    ACM Class: I.4.3; I.4.9

  13. arXiv:2208.08626  [pdf, other

    stat.ML cs.AI cs.LG math.DS math.NA

    CP-PINNs: Data-Driven Changepoints Detection in PDEs Using Online Optimized Physics-Informed Neural Networks

    Authors: Zhikang Dong, Pawel Polak

    Abstract: We investigate the inverse problem for Partial Differential Equations (PDEs) in scenarios where the parameters of the given PDE dynamics may exhibit changepoints at random time. We employ Physics-Informed Neural Networks (PINNs) - universal approximators capable of estimating the solution of any physical law described by a system of PDEs, which serves as a regularization during neural network trai… ▽ More

    Submitted 1 April, 2024; v1 submitted 18 August, 2022; originally announced August 2022.

  14. arXiv:2205.05433  [pdf, other

    cs.CL

    ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation

    Authors: Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar

    Abstract: Summarization is a challenging problem, and even more challenging is to manually create, correct, and evaluate the summaries. The severity of the problem grows when the inputs are multi-party dialogues in a meeting setup. To facilitate the research in this area, we present ALIGNMEET, a comprehensive tool for meeting annotation, alignment, and evaluation. The tool aims to provide an efficient and c… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to LREC22

  15. arXiv:2204.06028  [pdf, other

    cs.CL

    CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

    Authors: Peter Polák, Ngoc-Quan Ngoc, Tuan-Nam Nguyen, Danni Liu, Carlos Mullov, Jan Niehues, Ondřej Bojar, Alexander Waibel

    Abstract: In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being $3\times$ faster than offline in terms of latency on the test set.… ▽ More

    Submitted 11 May, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted to IWSLT22

  16. arXiv:2109.00916  [pdf, other

    cs.CL

    Coarse-To-Fine And Cross-Lingual ASR Transfer

    Authors: Peter Polák, Ondřej Bojar

    Abstract: End-to-end neural automatic speech recognition systems achieved recently state-of-the-art results, but they require large datasets and extensive computing resources. Transfer learning has been proposed to overcome these difficulties even across languages, e.g., German ASR trained from an English model. We experiment with much less related languages, reusing an English model for Czech ASR. To simpl… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: Accepted to ITAT WAFNL