Skip to main content

Showing 1–9 of 9 results for author: Ozaki, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.10992  [pdf, other

    cs.CL cs.AI

    How does the task complexity of masked pretraining objectives affect downstream performance?

    Authors: Atsuki Yamaguchi, Hiroaki Ozaki, Terufumi Morishita, Gaku Morio, Yasuhiro Sogawa

    Abstract: Masked language modeling (MLM) is a widely used self-supervised pretraining objective, where a model needs to predict an original token that is replaced with a mask given contexts. Although simpler and computationally efficient pretraining objectives, e.g., predicting the first character of a masked token, have recently shown comparable results to MLM, no objectives with a masking scheme actually… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023 Findings

  2. arXiv:2304.09516  [pdf, other

    cs.CL

    Controlling keywords and their positions in text generation

    Authors: Yuichi Sasazawa, Terufumi Morishita, Hiroaki Ozaki, Osamu Imaichi, Yasuhiro Sogawa

    Abstract: One of the challenges in text generation is to control text generation as intended by the user. Previous studies proposed specifying the keywords that should be included in the generated text. However, this approach is insufficient to generate text that reflect the user's intent. For example, placing an important keyword at the beginning of the text would help attract the reader's attention; howev… ▽ More

    Submitted 31 October, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of the 16th International Natural Language Generation Conference, 2023, pages 407 to 413

  3. arXiv:2303.01794  [pdf, other

    cs.CL cs.AI

    Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online News

    Authors: Yuta Koreeda, Ken-ichi Yokote, Hiroaki Ozaki, Atsuki Yamaguchi, Masaya Tsunokake, Yasuhiro Sogawa

    Abstract: This paper explains the participation of team Hitachi to SemEval-2023 Task 3 "Detecting the genre, the framing, and the persuasion techniques in online news in a multi-lingual setup.'' Based on the multilingual, multi-task nature of the task and the low-resource setting, we investigated different cross-lingual and multi-task strategies for training the pretrained language models. Through extensive… ▽ More

    Submitted 25 April, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted at SemEval-2023 Task 3

  4. arXiv:2205.12683  [pdf, other

    cs.LG cs.AI stat.ML

    Rethinking Fano's Inequality in Ensemble Learning

    Authors: Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo Nukaga

    Abstract: We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad? Previous studies used a variant of Fano's inequality of information theory and derived a lower bound of the classification error rate on the basis of the $\textit{accuracy}$ and $\textit{diversity}$ of models. We revisit the original Fano's inequality and argue… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: ICML2022

  5. arXiv:2203.01870  [pdf, other

    physics.ins-det cs.LG

    KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen

    Authors: A. Li, Z. Fu, L. Winslow, C. Grant, H. Song, H. Ozaki, I. Shimizu, A. Takeuchi

    Abstract: Rare event searches allow us to search for new physics at energy scales inaccessible with other means by leveraging specialized large-mass detectors. Machine learning provides a new tool to maximize the information provided by these detectors. The information is sparse, which forces these algorithms to start from the lowest level data and exploit all symmetries in the detector to produce results.… ▽ More

    Submitted 26 July, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 12 pages, dual submission with upcoming KamLAND-Zen 800 main result

  6. arXiv:2112.02741  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization

    Authors: Atsuki Yamaguchi, Gaku Morio, Hiroaki Ozaki, Ken-ichi Yokote, Kenji Nagamatsu

    Abstract: This paper introduces the proposed automatic minuting system of the Hitachi team for the First Shared Task on Automatic Minuting (AutoMin-2021). We utilize a reference-free approach (i.e., without using training minutes) for automatic minuting (Task A), which first splits a transcript into blocks on the basis of topics and subsequently summarizes those blocks with a pre-trained BART model fine-tun… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: 8 pages, 4 figures

  7. arXiv:2005.00295  [pdf

    cs.CL cs.LG

    Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing

    Authors: Manikandan Ravikiran, Amin Ekant Muljibhai, Toshinori Miyoshi, Hiroaki Ozaki, Yuta Koreeda, Sakata Masayuki

    Abstract: In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th positi… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: preprint v1, Under submission for SemEval 2020 Workshop

  8. Unsupervised Temporal Feature Aggregation for Event Detection in Unstructured Sports Videos

    Authors: Subhajit Chaudhury, Daiki Kimura, Phongtharin Vinayavekhin, Asim Munawar, Ryuki Tachibana, Koji Ito, Yuki Inaba, Minoru Matsumoto, Shuji Kidokoro, Hiroki Ozaki

    Abstract: Image-based sports analytics enable automatic retrieval of key events in a game to speed up the analytics process for human experts. However, most existing methods focus on structured television broadcast video datasets with a straight and fixed camera having minimum variability in the capturing pose. In this paper, we study the case of event detection in sports videos for unstructured environment… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: Accepted to IEEE International Symposium on Multimedia, 2019

  9. Hitachi at MRP 2019: Unified Encoder-to-Biaffine Network for Cross-Framework Meaning Representation Parsing

    Authors: Yuta Koreeda, Gaku Morio, Terufumi Morishita, Hiroaki Ozaki, Kohsuke Yanai

    Abstract: This paper describes the proposed system of the Hitachi team for the Cross-Framework Meaning Representation Parsing (MRP 2019) shared task. In this shared task, the participating systems were asked to predict nodes, edges and their attributes for five frameworks, each with different order of "abstraction" from input tokens. We proposed a unified encoder-to-biaffine network for all five frameworks,… ▽ More

    Submitted 20 November, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 13 pages, 3 figures

    Journal ref: in Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning