Search | arXiv e-print repository

Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

Authors: Yicheng Xu, Yuxin Chen, Jiahao Nie, Yusong Wang, Hui** Zhuang, Manabu Okumura

Abstract: Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to mainta… ▽ More Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code will be released upon acceptance. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.11632 [pdf, other]

Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation

Authors: Boxuan Lyu, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura

Abstract: Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability. However, high estimated probability does not always lead to high translation quality. Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility. In this work, we show that Quality Estimation (QE) re… ▽ More Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability. However, high estimated probability does not always lead to high translation quality. Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility. In this work, we show that Quality Estimation (QE) reranking, which uses a QE model as a reranker, can be viewed as a variant of MBR. Inspired by this, we propose source-based MBR (sMBR) decoding, a novel approach that utilizes synthetic sources generated by backward translation as ``support hypotheses'' and a reference-free quality estimation metric as the utility function, marking the first work to solely use sources in MBR decoding. Experiments show that sMBR significantly outperforms QE reranking and is competitive with standard MBR decoding. Furthermore, sMBR calls the utility function fewer times compared to MBR. Our findings suggest that sMBR is a promising approach for high-quality NMT decoding. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11097 [pdf, other]

InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models

Authors: Juseon-Do, **gun Kwon, Hidetaka Kamigaito, Manabu Okumura

Abstract: Extractive summarization can produce faithful summaries but often requires additional constraints such as a desired summary length. Traditional sentence compression models do not typically consider the constraints because of their restricted model abilities, which require model modifications for co** with them. To bridge this gap, we propose Instruction-based Compression (InstructCMP), an approa… ▽ More Extractive summarization can produce faithful summaries but often requires additional constraints such as a desired summary length. Traditional sentence compression models do not typically consider the constraints because of their restricted model abilities, which require model modifications for co** with them. To bridge this gap, we propose Instruction-based Compression (InstructCMP), an approach to the sentence compression task that can consider the length constraint through instructions by leveraging the zero-shot task-solving abilities of Large Language Models (LLMs). For this purpose, we created new evaluation datasets by transforming traditional sentence compression datasets into an instruction format. By using the datasets, we first reveal that the current LLMs still face challenges in accurately controlling the length for a compressed text. To address this issue, we propose an approach named "length priming," that incorporates additional length information into the instructions without external resources. While the length priming effectively works in a zero-shot setting, a training dataset with the instructions would further improve the ability of length control. Thus, we additionally created a training dataset in an instruction format to fine-tune the model on it. Experimental results and analysis show that applying the length priming significantly improves performances of InstructCMP in both zero-shot and fine-tuning settings without the need of any model modifications. △ Less

Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures, accepted to ACL 2024 Findings (Long Paper)

ACM Class: I.2.7

arXiv:2405.01350 [pdf, other]

Community-Invariant Graph Contrastive Learning

Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

Abstract: Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current know… ▽ More Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git). △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: This paper is accepted by ICML-2024

arXiv:2405.00334 [pdf, other]

A Survey on Deep Active Learning: Recent Advances and New Frontiers

Authors: Dongyuan Li, Zhen Wang, Yankai Chen, Renhe Jiang, Wei** Ding, Manabu Okumura

Abstract: Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and… ▽ More Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: This paper is accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2405.00307 [pdf, other]

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

Authors: Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro, Manabu Okumura

Abstract: Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover… ▽ More Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training speech recognition task and the downstream speech emotion recognition task. Then, AL methods are employed to iteratively select a subset of the most informative and diverse samples for fine-tuning, thereby reducing time consumption. Experiments demonstrate that our proposed method \textsc{After}, using only 20\% of samples, improves accuracy by 8.45\% and reduces time consumption by 79\%. The additional extension of \textsc{After} and ablation studies further confirm its effectiveness and applicability to various real-world scenarios. Our source code is available on Github for reproducibility. (https://github.com/Clearloveyuan/AFTER). △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted by Journal of Natural Language Processing. arXiv admin note: text overlap with arXiv:2310.00283

arXiv:2404.00264 [pdf, other]

DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation

Authors: Aru Maekawa, Satoshi Kosugi, Kotaro Funakoshi, Manabu Okumura

Abstract: Dataset distillation aims to compress a training dataset by creating a small number of informative synthetic samples such that neural networks trained on them perform as well as those trained on the original training dataset. Current text dataset distillation methods create each synthetic sample as a sequence of word embeddings instead of a text to apply gradient-based optimization; however, such… ▽ More Dataset distillation aims to compress a training dataset by creating a small number of informative synthetic samples such that neural networks trained on them perform as well as those trained on the original training dataset. Current text dataset distillation methods create each synthetic sample as a sequence of word embeddings instead of a text to apply gradient-based optimization; however, such embedding-level distilled datasets cannot be used for training other models whose word embedding weights are different from the model used for distillation. To address this issue, we propose a novel text dataset distillation approach, called Distilling dataset into Language Model (DiLM), which trains a language model to generate informative synthetic training samples as text data, instead of directly optimizing synthetic samples. We evaluated DiLM on various text classification datasets and showed that distilled synthetic datasets from DiLM outperform those from current coreset selection methods. DiLM achieved remarkable generalization performance in training different types of models and in-context learning of large language models. Our code will be available at https://github.com/arumaekawa/DiLM. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted by Findings of NAACL 2024

arXiv:2403.05065 [pdf, other]

Can we obtain significant success in RST discourse parsing by using Large Language Models?

Authors: Aru Maekawa, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura

Abstract: Recently, decoder-only pre-trained large language models (LLMs), with several tens of billion parameters, have significantly impacted a wide range of natural language processing (NLP) tasks. While encoder-only or encoder-decoder pre-trained language models have already proved to be effective in discourse parsing, the extent to which LLMs can perform this task remains an open research question. The… ▽ More Recently, decoder-only pre-trained large language models (LLMs), with several tens of billion parameters, have significantly impacted a wide range of natural language processing (NLP) tasks. While encoder-only or encoder-decoder pre-trained language models have already proved to be effective in discourse parsing, the extent to which LLMs can perform this task remains an open research question. Therefore, this paper explores how beneficial such LLMs are for Rhetorical Structure Theory (RST) discourse parsing. Here, the parsing process for both fundamental top-down and bottom-up strategies is converted into prompts, which LLMs can work with. We employ Llama 2 and fine-tune it with QLoRA, which has fewer parameters that can be tuned. Experimental results on three benchmark datasets, RST-DT, Instr-DT, and the GUM corpus, demonstrate that Llama 2 with 70 billion parameters in the bottom-up strategy obtained state-of-the-art (SOTA) results with significant differences. Furthermore, our parsers demonstrated generalizability when evaluated on RST-DT, showing that, in spite of being trained with the GUM corpus, it obtained similar performances to those of existing parsers trained with RST-DT. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted in the main conference of EACL 2024

arXiv:2402.11924 [pdf, other]

Evaluating LLMs' Inherent Multi-hop Reasoning Ability

Authors: Jian Wu, Linyi Yang, Zhen Wang, Manabu Okumura, Yue Zhang

Abstract: While Large Language Models (LLMs) excel in question-answering (QA) tasks, their multi-step reasoning abilities on multiple evidence integration on Multi-hop QA tasks remain underexplored. LLMs sometimes generate answers that rely on internal memory rather than reasoning given context, which brings concerns about the evaluation quality of real reasoning abilities. The counterfactual QA task can se… ▽ More While Large Language Models (LLMs) excel in question-answering (QA) tasks, their multi-step reasoning abilities on multiple evidence integration on Multi-hop QA tasks remain underexplored. LLMs sometimes generate answers that rely on internal memory rather than reasoning given context, which brings concerns about the evaluation quality of real reasoning abilities. The counterfactual QA task can separate internal memory from reasoning abilities, but focusing solely on final-QA performance without evaluating the multi-step reasoning process is insufficient for reporting LLMs' real reasoning abilities. Current Multi-hop QA (MHQA) benchmarks are factual and annotated on open-source corpora such as Wikipedia, although useful for multi-step reasoning evaluation, showing limitations due to potential data contamination in LLMs pre-training stage. To address this issue, we introduce the Inherent Reasoning Evaluation (IRE) method, a novel evaluation way that jointly evaluates the LLMs' chain-of-reasoning performance based on the first knowledge-edited counterfactual multi-hop QA data which involves editing the original Wikipedia passages, reducing data contamination risks. The IRE comprehensively assesses reasoning chains through sub-QA and final-QA evaluations. Our comparisons reveal significant performance gaps for several LLMs between Wikipedia-based benchmarks and IRE, deeming data contamination issues in existing benchmarks. We believe that the IRE benchmark will enhance and facilitate trustworthy LLM evaluations. △ Less

Submitted 3 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11166 [pdf, other]

GenDec: A robust generative Question-decomposition method for Multi-hop reasoning

Authors: Jian Wu, Linyi Yang, Yuliang Ji, Wenhao Huang, Börje F. Karlsson, Manabu Okumura

Abstract: Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer.… ▽ More Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2311.11009 [pdf, other]

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

Authors: Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

Abstract: Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fal… ▽ More Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.08189 [pdf, other]

All Data on the Table: Novel Dataset and Benchmark for Cross-Modality Scientific Information Extraction

Authors: Yuhan Li, Jian Wu, Zhiwei Yu, Börje F. Karlsson, Wei Shen, Manabu Okumura, Chin-Yew Lin

Abstract: Extracting key information from scientific papers has the potential to help researchers work more efficiently and accelerate the pace of scientific progress. Over the last few years, research on Scientific Information Extraction (SciIE) witnessed the release of several new systems and benchmarks. However, existing paper-focused datasets mostly focus only on specific parts of a manuscript (e.g., ab… ▽ More Extracting key information from scientific papers has the potential to help researchers work more efficiently and accelerate the pace of scientific progress. Over the last few years, research on Scientific Information Extraction (SciIE) witnessed the release of several new systems and benchmarks. However, existing paper-focused datasets mostly focus only on specific parts of a manuscript (e.g., abstracts) and are single-modality (i.e., text- or table-only), due to complex processing and expensive annotations. Moreover, core information can be present in either text or tables or across both. To close this gap in data availability and enable cross-modality IE, while alleviating labeling costs, we propose a semi-supervised pipeline for annotating entities in text, as well as entities and relations in tables, in an iterative procedure. Based on this pipeline, we release novel resources for the scientific community, including a high-quality benchmark, a large-scale corpus, and a semi-supervised annotation pipeline. We further report the performance of state-of-the-art IE models on the proposed benchmark dataset, as a baseline. Lastly, we explore the potential capability of large language models such as ChatGPT for the current task. Our new dataset, results, and analysis validate the effectiveness and efficiency of our semi-supervised pipeline, and we discuss its remaining limitations. △ Less

Submitted 17 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Work in progress; 17 pages, 6 figures, 11 tables

arXiv:2310.00283 [pdf, other]

Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Authors: Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

Abstract: Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in… ▽ More Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time consumption. Experiments demonstrate that using only 20\%pt. samples improves 8.45\%pt. accuracy and reduces 79\%pt. time consumption. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.12546 [pdf, ps, other]

Automatic Answerability Evaluation for Question Generation

Authors: Zifan Wang, Kotaro Funakoshi, Manabu Okumura

Abstract: Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed for natural language generation (NLG) tasks, are based on measuring the n-gram overlap between the generated and reference text. These simple metrics may be insufficient for more complex tasks, such as question generation (QG), which requires generating questions that are answerable by the reference answers. Develo** a… ▽ More Conventional automatic evaluation metrics, such as BLEU and ROUGE, developed for natural language generation (NLG) tasks, are based on measuring the n-gram overlap between the generated and reference text. These simple metrics may be insufficient for more complex tasks, such as question generation (QG), which requires generating questions that are answerable by the reference answers. Develo** a more sophisticated automatic evaluation metric, thus, remains an urgent problem in QG research. This work proposes PMAN (Prompting-based Metric on ANswerability), a novel automatic evaluation metric to assess whether the generated questions are answerable by the reference answers for the QG tasks. Extensive experiments demonstrate that its evaluation results are reliable and align with human evaluations. We further apply our metric to evaluate the performance of QG models, which shows that our metric complements conventional metrics. Our implementation of a GPT-based QG model achieves state-of-the-art performance in generating answerable questions. △ Less

Submitted 25 February, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2306.00369 [pdf, other]

Focused Prefix Tuning for Controllable Text Generation

Authors: Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura

Abstract: In a controllable text generation dataset, there exist unannotated attributes that could provide irrelevant learning signals to models that use it for training and thus degrade their performance. We propose focused prefix tuning(FPT) to mitigate the problem and to enable the control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text f… ▽ More In a controllable text generation dataset, there exist unannotated attributes that could provide irrelevant learning signals to models that use it for training and thus degrade their performance. We propose focused prefix tuning(FPT) to mitigate the problem and to enable the control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text fluency than baseline models in single-attribute control tasks. In multi-attribute control tasks, FPT achieves comparable control accuracy with the state-of-the-art approach while kee** the flexibility to control new attributes without retraining existing models. △ Less

Submitted 10 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to the ACL 2023

arXiv:2305.14682 [pdf, other]

TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering

Authors: Jian Wu, Yicheng Xu, Yan Gao, Jian-Guang Lou, Börje F. Karlsson, Manabu Okumura

Abstract: Hybrid Question-Answering (HQA), which targets reasoning over tables and passages linked from table cells, has witnessed significant research in recent years. A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence. Such a challenge made it difficult for previous studies to s… ▽ More Hybrid Question-Answering (HQA), which targets reasoning over tables and passages linked from table cells, has witnessed significant research in recent years. A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence. Such a challenge made it difficult for previous studies to show their reasoning ability in retrieving answers. To bridge this gap, we propose a novel Table-alignment-based Cell-selection and Reasoning model (TACR) for hybrid text and table QA, evaluated on the HybridQA and WikiTableQuestions datasets. In evidence retrieval, we design a table-question-alignment enhanced cell-selection method to retrieve fine-grained evidence. In answer reasoning, we incorporate a QA module that treats the row containing selected cells as context. Experimental results over the HybridQA and WikiTableQuestions (WTQ) datasets show that TACR achieves state-of-the-art results on cell selection and outperforms fine-grained evidence retrieval baselines on HybridQA, while achieving competitive performance on WTQ. We also conducted a detailed analysis to demonstrate that being able to align questions to tables in the cell-selection stage can result in important gains from experiments of over 90\% table row and column selection accuracy, meanwhile also improving output explainability. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at Findings of ACL 2023

arXiv:2305.13000 [pdf, other]

Bidirectional Transformer Reranker for Grammatical Error Correction

Authors: Ying Zhang, Hidetaka Kamigaito, Manabu Okumura

Abstract: Pre-trained seq2seq models have achieved state-of-the-art results in the grammatical error correction task. However, these models still suffer from a prediction bias due to their unidirectional decoding. Thus, we propose a bidirectional Transformer reranker (BTR), that re-estimates the probability of each candidate sentence generated by the pre-trained seq2seq model. The BTR preserves the seq2seq-… ▽ More Pre-trained seq2seq models have achieved state-of-the-art results in the grammatical error correction task. However, these models still suffer from a prediction bias due to their unidirectional decoding. Thus, we propose a bidirectional Transformer reranker (BTR), that re-estimates the probability of each candidate sentence generated by the pre-trained seq2seq model. The BTR preserves the seq2seq-style Transformer architecture but utilizes a BERT-style self-attention mechanism in the decoder to compute the probability of each target token by using masked language modeling to capture bidirectional representations from the target context. For guiding the reranking, the BTR adopts negative sampling in the objective function to minimize the unlikelihood. During inference, the BTR gives final results after comparing the reranked top-1 results with the original ones by an acceptance threshold. Experimental results show that, in reranking candidates from a pre-trained seq2seq model, T5-base, the BTR on top of T5-base could yield 65.47 and 71.27 F0.5 scores on the CoNLL-14 and BEA test sets, respectively, and yield 59.52 GLEU score on the JFLEG corpus, with improvements of 0.36, 0.76 and 0.48 points compared with the original T5-base. Furthermore, when reranking candidates from T5-large, the BTR on top of T5-base improved the original T5-large by 0.26 points on the BEA test set. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted to the Findings of ACL 2023

arXiv:2303.05924 [pdf, ps, other]

Variational formulations of ODE-Net as a mean-field optimal control problem and existence results

Authors: Noboru Isobe, Mizuho Okumura

Abstract: This paper presents a mathematical analysis of ODE-Net, a continuum model of deep neural networks (DNNs). In recent years, Machine Learning researchers have introduced ideas of replacing the deep structure of DNNs with ODEs as a continuum limit. These studies regard the "learning" of ODE-Net as the minimization of a "loss" constrained by a parametric ODE. Although the existence of a minimizer for… ▽ More This paper presents a mathematical analysis of ODE-Net, a continuum model of deep neural networks (DNNs). In recent years, Machine Learning researchers have introduced ideas of replacing the deep structure of DNNs with ODEs as a continuum limit. These studies regard the "learning" of ODE-Net as the minimization of a "loss" constrained by a parametric ODE. Although the existence of a minimizer for this minimization problem needs to be assumed, only a few studies have investigated its existence analytically in detail. In the present paper, the existence of a minimizer is discussed based on a formulation of ODE-Net as a measure-theoretic mean-field optimal control problem. The existence result is proved when a neural network, which describes a vector field of ODE-Net, is linear with respect to learnable parameters. The proof employs the measure-theoretic formulation combined with the direct method of Calculus of Variations. Secondly, an idealized minimization problem is proposed to remove the above linearity assumption. Such a problem is inspired by a kinetic regularization associated with the Benamou--Brenier formula and universal approximation theorems for neural networks. The proofs of these existence results use variational methods, differential equations, and mean-field optimal control theory. They will stand for a new analytic way to investigate the learning process of deep neural networks. △ Less

Submitted 6 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 33 pages

MSC Class: 49J20 (Primary) 49Q22; 68T07; 35A35 (Secondary)

arXiv:2210.08355 [pdf, other]

A Simple and Strong Baseline for End-to-End Neural RST-style Discourse Parsing

Authors: Naoki Kobayashi, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata

Abstract: To promote and further develop RST-style discourse parsing models, we need a strong baseline that can be regarded as a reference for reporting reliable experimental results. This paper explores a strong baseline by integrating existing simple parsing strategies, top-down and bottom-up, with various transformer-based pre-trained language models. The experimental results obtained from two benchmark… ▽ More To promote and further develop RST-style discourse parsing models, we need a strong baseline that can be regarded as a reference for reporting reliable experimental results. This paper explores a strong baseline by integrating existing simple parsing strategies, top-down and bottom-up, with various transformer-based pre-trained language models. The experimental results obtained from two benchmark datasets demonstrate that the parsing performance strongly relies on the pretrained language models rather than the parsing strategies. In particular, the bottom-up parser achieves large performance gains compared to the current best parser when employing DeBERTa. We further reveal that language models with a span-masking scheme especially boost the parsing performance through our analysis within intra- and multi-sentential parsing, and nuclearity prediction. △ Less

Submitted 1 November, 2022; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: Accepted in Findings of EMNLP 2022

arXiv:2208.08280 [pdf, other]

Exploiting Unlabeled Data for Target-Oriented Opinion Words Extraction

Authors: Yidong Wang, Hao Wu, Ao Liu, Wenxin Hou, Zhen Wu, **dong Wang, Takahiro Shinozaki, Manabu Okumura, Yue Zhang

Abstract: Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentiment analysis task that aims to extract the corresponding opinion words of a given opinion target from the sentence. Recently, deep learning approaches have made remarkable progress on this task. Nevertheless, the TOWE task still suffers from the scarcity of training data due to the expensive data annotation process. Limited la… ▽ More Target-oriented Opinion Words Extraction (TOWE) is a fine-grained sentiment analysis task that aims to extract the corresponding opinion words of a given opinion target from the sentence. Recently, deep learning approaches have made remarkable progress on this task. Nevertheless, the TOWE task still suffers from the scarcity of training data due to the expensive data annotation process. Limited labeled data increase the risk of distribution shift between test data and training data. In this paper, we propose exploiting massive unlabeled data to reduce the risk by increasing the exposure of the model to varying distribution shifts. Specifically, we propose a novel Multi-Grained Consistency Regularization (MGCR) method to make use of unlabeled data and design two filters specifically for TOWE to filter noisy data at different granularity. Extensive experimental results on four TOWE benchmark datasets indicate the superiority of MGCR compared with current state-of-the-art methods. The in-depth analysis also demonstrates the effectiveness of the different-granularity filters. Our codes are available at https://github.com/TOWESSL/TOWESSL. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: Accepted by COLING 2022

arXiv:2207.00929 [pdf, other]

Generating Repetitions with Appropriate Repeated Words

Authors: Toshiki Kawamoto, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura

Abstract: A repetition is a response that repeats words in the previous speaker's utterance in a dialogue. Repetitions are essential in communication to build trust with others, as investigated in linguistic studies. In this work, we focus on repetition generation. To the best of our knowledge, this is the first neural approach to address repetition generation. We propose Weighted Label Smoothing, a smoothi… ▽ More A repetition is a response that repeats words in the previous speaker's utterance in a dialogue. Repetitions are essential in communication to build trust with others, as investigated in linguistic studies. In this work, we focus on repetition generation. To the best of our knowledge, this is the first neural approach to address repetition generation. We propose Weighted Label Smoothing, a smoothing method for explicitly learning which words to repeat during fine-tuning, and a repetition scoring method that can output more appropriate repetitions during decoding. We conducted automatic and human evaluations involving applying these methods to the pre-trained language model T5 for generating repetitions. The experimental results indicate that our methods outperformed baselines in both evaluations. △ Less

Submitted 2 July, 2022; originally announced July 2022.

arXiv:2204.11445 [pdf, other]

Aspect-based Analysis of Advertising Appeals for Search Engine Advertising

Authors: Soichiro Murakami, Peinan Zhang, Sho Hoshino, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

Abstract: Writing an ad text that attracts people and persuades them to click or act is essential for the success of search engine advertising. Therefore, ad creators must consider various aspects of advertising appeals (A$^3$) such as the price, product features, and quality. However, products and services exhibit unique effective A$^3$ for different industries. In this work, we focus on exploring the effe… ▽ More Writing an ad text that attracts people and persuades them to click or act is essential for the success of search engine advertising. Therefore, ad creators must consider various aspects of advertising appeals (A$^3$) such as the price, product features, and quality. However, products and services exhibit unique effective A$^3$ for different industries. In this work, we focus on exploring the effective A$^3$ for different industries with the aim of assisting the ad creation process. To this end, we created a dataset of advertising appeals and used an existing model that detects various aspects for ad texts. Our experiments demonstrated that different industries have their own effective A$^3$ and that the identification of the A$^3$ contributes to the estimation of advertising performance. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted by NAACL-HLT2022 Industry track

arXiv:2112.03977 [pdf, other]

doi 10.1021/acsphotonics.1c01849

Cavity-Enhanced Vernier Spectroscopy with a Chip-Scale Mid-Infrared Frequency Comb

Authors: Lukasz A. Sterczewski, Tzu-Ling Chen, Douglas C. Ober, Charles R. Markus, Chadwick L. Canedy, Igor Vurgaftman, Clifford Frez, Jerry R. Meyer, Mitchio Okumura, Mahmood Bagheri

Abstract: Chip-scale optical frequency combs can provide broadband spectroscopy for diagnosing complex organic molecules. They are also promising as miniaturized laser spectrometers in applications ranging from atmospheric chemistry to geological science and the search for extraterrestrial life. While optical cavities are commonly used to boost sensitivity, it is challenging to realize a compact cavity-enha… ▽ More Chip-scale optical frequency combs can provide broadband spectroscopy for diagnosing complex organic molecules. They are also promising as miniaturized laser spectrometers in applications ranging from atmospheric chemistry to geological science and the search for extraterrestrial life. While optical cavities are commonly used to boost sensitivity, it is challenging to realize a compact cavity-enhanced comb-based spectrometer. Here, we apply the Vernier technique to free-running operation of an interband cascade laser frequency comb in a simple linear geometry that performs cavity-enhanced chemical sensing. A centimeter-scale high-finesse cavity simultaneously provides selective mode filtering and enhancement of the path length to 30 meters. As a proof-of-concept, we sense transient open-path releases of ppm-level difluoroethane with 2 ms temporal resolution over a 1 THz optical bandwidth centered at 3.64 $μ$m. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: 10 pages, 5 figures

Journal ref: ACS Photonics 9, 994-1001 (2022)

arXiv:2110.08263 [pdf, other]

FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

Authors: Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, **dong Wang, Manabu Okumura, Takahiro Shinozaki

Abstract: The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue… ▽ More The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model's learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch achieves 13.96% and 18.96% error rate reduction over FixMatch on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open-source our code at https://github.com/TorchSSL/TorchSSL. △ Less

Submitted 28 January, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021; camera-ready version; 16 pages with appendix; code: https://github.com/TorchSSL/TorchSSL

arXiv:2109.08177 [pdf, ps, other]

Profile decomposition in Sobolev spaces and decomposition of integral functionals II: homogeneous case

Authors: Mizuho Okumura

Abstract: The present paper is devoted to a theory of profile decomposition for bounded sequences in \emph{homogeneous} Sobolev spaces, and it enables us to analyze the lack of compactness of bounded sequences. For every bounded sequence in homogeneous Sobolev spaces, the sequence is asymptotically decomposed into the sum of profiles with dilations and translations and a double suffixed residual term. One g… ▽ More The present paper is devoted to a theory of profile decomposition for bounded sequences in \emph{homogeneous} Sobolev spaces, and it enables us to analyze the lack of compactness of bounded sequences. For every bounded sequence in homogeneous Sobolev spaces, the sequence is asymptotically decomposed into the sum of profiles with dilations and translations and a double suffixed residual term. One gets an energy decomposition in the homogeneous Sobolev norm. The residual term becomes arbitrarily small in the critical Lebesgue or Sobolev spaces of lower order, and then, the results of decomposition of integral functionals are obtained, which are important strict decompositions in the critical Lebesgue or Sobolev spaces where the residual term is vanishing. △ Less

Submitted 13 February, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: version 2, 35 pages. arXiv admin note: text overlap with arXiv:2109.08176

MSC Class: 46B50; 46E35

arXiv:2109.08176 [pdf, ps, other]

Profile decomposition in Sobolev spaces and decomposition of integral functionals I: inhomogeneous case

Authors: Mizuho Okumura

Abstract: The present paper is devoted to analysis of the lack of compactness of bounded sequences in \emph{inhomogeneous} Sobolev spaces, where bounded sequences might fail to be compact due to an isometric group action, that is, \emph{translation}. It will be proved that every bounded sequence $(u_n)$ has (possibly infinitely many) \emph{profiles}, and then the sequence is asymptotically decomposed into a… ▽ More The present paper is devoted to analysis of the lack of compactness of bounded sequences in \emph{inhomogeneous} Sobolev spaces, where bounded sequences might fail to be compact due to an isometric group action, that is, \emph{translation}. It will be proved that every bounded sequence $(u_n)$ has (possibly infinitely many) \emph{profiles}, and then the sequence is asymptotically decomposed into a sum of translated profiles and a double-suffixed residual term, where the residual term becomes arbitrarily small in appropriate Lebesgue or Sobolev spaces of lower order. To this end, functional analytic frameworks are established in an abstract way by making use of a group action $G$, in order to characterize profiles by $(u_n)$ and $G$. One also finds that a decomposition of the Sobolev norm into profiles is bounded by the supremum of the norm of $u_n$. Moreover, the profile decomposition leads to results of decomposition of integral functionals of subcritical order. It is noteworthy that the space where the decomposition of integral functionals holds is the same as that where the residual term is vanishing. △ Less

Submitted 15 February, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: version 3, 42 pages

MSC Class: 46B50; 46E35

arXiv:2103.03549 [pdf, ps, other]

Generalization of the Ehrling inequality and universal characterization of completely continuous operators

Authors: Mizuho Okumura

Abstract: The present work is devoted to an extension of the well-known Ehrling inequalities, which quantitatively characterize compact embeddings of function spaces, to more general operators. Firstly, a modified notion of continuity for linear operators, named \emph{Ehrling continuity} and inspired by the classical Ehrling inequality, is introduced, and then, a necessary and sufficient condition for Ehrli… ▽ More The present work is devoted to an extension of the well-known Ehrling inequalities, which quantitatively characterize compact embeddings of function spaces, to more general operators. Firstly, a modified notion of continuity for linear operators, named \emph{Ehrling continuity} and inspired by the classical Ehrling inequality, is introduced, and then, a necessary and sufficient condition for Ehrling continuity is provided via arguments based on general topology. Secondly, general completely continuous operators between normed spaces are characterized in terms of (generalized) Ehrling type inequalities. To this end, the well-known local metrization of the weak topology (so to speak, a \emph{very weak norm}) plays a crucial role. Thanks to these results, a universal relation is observed among complete continuity, the very weak norm and generalized Ehrling type inequality. △ Less

Submitted 5 March, 2021; originally announced March 2021.

MSC Class: 46B50; 47A63; 47B01; 47B07

arXiv:2102.00819 [pdf, other]

Metric-Type Identification for Multi-Level Header Numerical Tables in Scientific Papers

Authors: Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Takamura

Abstract: Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. We introduce a new information extraction task, metric-type identification from multi-level header numerical tables, and provide a dataset extracted from scientific papers consisting of header tables, captions, and metric-… ▽ More Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. We introduce a new information extraction task, metric-type identification from multi-level header numerical tables, and provide a dataset extracted from scientific papers consisting of header tables, captions, and metric-types. We then propose two joint-learning neural classification and generation schemes featuring pointer-generator-based and BERT-based models. Our results show that the joint models can handle both in-header and out-of-header metric-type identification problems. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: To appear at EACL 2021

arXiv:2012.00374 [pdf, other]

doi 10.1063/5.0029991

A new instrument for kinetics and branching ratio studies of gas phase collisional processes at very low temperatures

Authors: Olivier Durif, Michael Capron, Joey P. Messinger, Abdessamad Benidar, Ludovic Biennier, Jérémy Bourgalais, André Canosa, Jonathan Courbe, Gustavo A. Garcia, Jean-François Gil, Laurent Nahon, Mitchio Okumura, Lucile Rutkowski, Ian R. Sims, Jonathan Thiévin, Sébastien D. Le Picard

Abstract: A new instrument dedicated to the kinetic study of low-temperature gas phase neutral-neutral reactions, including clustering processes, is presented. It combines a supersonic flow reactor with Vacuum Ultra-Violet (VUV) synchrotron photoionization time of flight mass spectrometry. A photoion-photoelectron coincidence detection scheme has been adopted to optimize the particle counting efficiency. Th… ▽ More A new instrument dedicated to the kinetic study of low-temperature gas phase neutral-neutral reactions, including clustering processes, is presented. It combines a supersonic flow reactor with Vacuum Ultra-Violet (VUV) synchrotron photoionization time of flight mass spectrometry. A photoion-photoelectron coincidence detection scheme has been adopted to optimize the particle counting efficiency. The characteristics of the instrument are detailed along with its capabilities illustrated through a few results obtained at low temperatures (< 100 K) including a {photoionization spectrum} of n-butane, the detection of formic acid dimer formation as well as the observation of diacetylene molecules formed by the reaction between the C$_2$H radical and C$_2$H$_2$. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2011.09140 [pdf, other]

Diverse and Non-redundant Answer Set Extraction on Community QA based on DPPs

Authors: Shogo Fujita, Tomohide Shibata, Manabu Okumura

Abstract: In community-based question answering (CQA) platforms, it takes time for a user to get useful information from among many answers. Although one solution is an answer ranking method, the user still needs to read through the top-ranked answers carefully. This paper proposes a new task of selecting a diverse and non-redundant answer set rather than ranking the answers. Our method is based on determin… ▽ More In community-based question answering (CQA) platforms, it takes time for a user to get useful information from among many answers. Although one solution is an answer ranking method, the user still needs to read through the top-ranked answers carefully. This paper proposes a new task of selecting a diverse and non-redundant answer set rather than ranking the answers. Our method is based on determinantal point processes (DPPs), and it calculates the answer importance and similarity between answers by using BERT. We built a dataset focusing on a Japanese CQA site, and the experiments on this dataset demonstrated that the proposed method outperformed several baseline methods. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Comments: COLING2020, 12 pages

arXiv:2011.04241 [pdf, other]

Pointing to Subwords for Generating Function Names in Source Code

Authors: Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

Abstract: We tackle the task of automatically generating a function name from source code. Existing generators face difficulties in generating low-frequency or out-of-vocabulary subwords. In this paper, we propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1 and accur… ▽ More We tackle the task of automatically generating a function name from source code. Existing generators face difficulties in generating low-frequency or out-of-vocabulary subwords. In this paper, we propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1 and accuracy on the Java-small and Java-large datasets. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 12 pages, accepted to COLING2020

arXiv:2011.02173 [pdf, other]

Neural text normalization leveraging similarities of strings and sounds

Authors: Riku Kawamura, Tatsuya Aoki, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

Abstract: We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the similarity of word strings or of sounds, and a model without the similarities as a baseline. Results showed that leveraging the word string similarity succeede… ▽ More We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the similarity of word strings or of sounds, and a model without the similarities as a baseline. Results showed that leveraging the word string similarity succeeded in dealing with misspellings and abbreviations, and taking into account the sound similarity succeeded in dealing with phonetic substitutions and emphasized characters. So that the proposed models achieved higher F$_1$ scores than the baseline. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: 6 pages, accepted to COLING2020

arXiv:2007.08355 [pdf, other]

A second-order accurate structure-preserving scheme for the Cahn-Hilliard equation with a dynamic boundary condition

Authors: Makoto Okumura, Takeshi Fukao, Daisuke Furihata, Shuji Yoshikawa

Abstract: We propose a structure-preserving finite difference scheme for the Cahn-Hilliard equation with a dynamic boundary condition using the discrete variational derivative method (DVDM). In this approach, it is important and essential how to discretize the energy which characterizes the equation. By modifying the conventional manner and using an appropriate summation-by-parts formula, we can use a stand… ▽ More We propose a structure-preserving finite difference scheme for the Cahn-Hilliard equation with a dynamic boundary condition using the discrete variational derivative method (DVDM). In this approach, it is important and essential how to discretize the energy which characterizes the equation. By modifying the conventional manner and using an appropriate summation-by-parts formula, we can use a standard central difference operator as an approximation of an outward normal derivative on the discrete boundary condition of the scheme. We show that our proposed scheme is second-order accurate in space, although the previous structure-preserving scheme by Fukao-Yoshikawa-Wada (Commun. Pure Appl. Anal. 16 (2017), 1915-1938) is first-order accurate in space. Also, we show the stability, the existence, and the uniqueness of the solution for the proposed scheme. Computation examples demonstrate the effectiveness of the proposed scheme. Especially through computation examples, we confirm that numerical solutions can be stably obtained by our proposed scheme. △ Less

Submitted 16 July, 2020; originally announced July 2020.

Comments: 32 pages, 18 figures

MSC Class: 65M06; 65M12

arXiv:2002.01145 [pdf, other]

Syntactically Look-Ahead Attention Network for Sentence Compression

Authors: Hidetaka Kamigaito, Manabu Okumura

Abstract: Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words. In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words. Thus, it cannot usually explicitly capture the relationships between decoded words and unseen words that will be decoded in the future time steps. Therefore, to avoid generating… ▽ More Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words. In sequence-to-sequence (Seq2Seq) based models, the decoder unidirectionally decides to retain or delete words. Thus, it cannot usually explicitly capture the relationships between decoded words and unseen words that will be decoded in the future time steps. Therefore, to avoid generating ungrammatical sentences, the decoder sometimes drops important words in compressing sentences. To solve this problem, we propose a novel Seq2Seq model, syntactically look-ahead attention network (SLAHAN), that can generate informative summaries by explicitly tracking both dependency parent and child words during decoding and capturing important words that will be decoded in the future. The results of the automatic evaluation on the Google sentence compression dataset showed that SLAHAN achieved the best kept-token-based-F1, ROUGE-1, ROUGE-2 and ROUGE-L scores of 85.5, 79.3, 71.3 and 79.1, respectively. SLAHAN also improved the summarization performance on longer sentences. Furthermore, in the human evaluation, SLAHAN improved informativeness without losing readability. △ Less

Submitted 17 May, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

Comments: AAAI 2020

arXiv:1909.02255 [pdf, ps, other]

doi 10.1103/PhysRevB.102.041124

Self-learning Hybrid Monte Carlo: A First-principles Approach

Authors: Yuki Nagai, Masahiro Okumura, Keita Kobayashi, Motoyuki Shiga

Abstract: We propose a novel approach called Self-Learning Hybrid Monte Carlo (SLHMC) which is a general method to make use of machine learning potentials to accelerate the statistical sampling of first-principles density-functional-theory (DFT) simulations. The trajectories are generated on an approximate machine learning (ML) potential energy surface. The trajectories are then accepted or rejected by the… ▽ More We propose a novel approach called Self-Learning Hybrid Monte Carlo (SLHMC) which is a general method to make use of machine learning potentials to accelerate the statistical sampling of first-principles density-functional-theory (DFT) simulations. The trajectories are generated on an approximate machine learning (ML) potential energy surface. The trajectories are then accepted or rejected by the Metropolis algorithm based on DFT energies. In this way the statistical ensemble is sampled exactly at the DFT level for a given thermodynamic condition. Meanwhile the ML potential is improved on the fly by training to enhance the sampling, whereby the training data set, which is sampled from the exact ensemble, is created automatically. Using the examples of $α$-quartz crystal SiO$_2^{}$ and phonon-mediated unconventional superconductor YNi$_2^{}$B$_2^{}$C systems, we show that SLHMC with artificial neural networks (ANN) is capable of very efficient sampling, while at the same time enabling the optimization of the ANN potential to within meV/atom accuracy. The ANN potential thus obtained is transferable to ANN molecular dynamics simulations to explore dynamics as well as thermodynamics. This makes the SLHMC approach widely applicable for studies on materials in physics and chemistry. △ Less

Submitted 5 September, 2019; originally announced September 2019.

Comments: 6 pages, 5 figures

Journal ref: Phys. Rev. B 102, 041124 (2020)

arXiv:1903.11771 [pdf, other]

A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation

Authors: Yuta Hitomi, Yuya Taguchi, Hideaki Tamori, Ko Kikuta, Jiro Nishitoba, Naoaki Okazaki, Kentaro Inui, Manabu Okumura

Abstract: Browsing news articles on multiple devices is now possible. The lengths of news article headlines have precise upper bounds, dictated by the size of the display of the relevant device or interface. Therefore, controlling the length of headlines is essential when applying the task of headline generation to news production. However, because there is no corpus of headlines of multiple lengths for a g… ▽ More Browsing news articles on multiple devices is now possible. The lengths of news article headlines have precise upper bounds, dictated by the size of the display of the relevant device or interface. Therefore, controlling the length of headlines is essential when applying the task of headline generation to news production. However, because there is no corpus of headlines of multiple lengths for a given article, previous research on controlling output length in headline generation has not discussed whether the system outputs could be adequately evaluated without multiple references of different lengths. In this paper, we introduce two corpora, which are Japanese News Corpus (JNC) and JApanese MUlti-Length Headline Corpus (JAMUL), to confirm the validity of previous evaluation settings. The JNC provides common supervision data for headline generation. The JAMUL is a large-scale evaluation dataset for headlines of three different lengths composed by professional editors. We report new findings on these corpora; for example, although the longest length reference summary can appropriately evaluate the existing methods controlling output length, this evaluation setting has several problems. △ Less

Submitted 26 September, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

Comments: Accepted by INLG 2019

arXiv:1807.04955 [pdf, other]

doi 10.1103/PhysRevB.101.115111

Self-learning Monte Carlo method with Behler-Parrinello neural networks

Authors: Yuki Nagai, Masahiko Okumura, Akinori Tanaka

Abstract: We propose a general way to construct an effective Hamiltonian in the Self-learning Monte Carlo method (SLMC), which speeds up Monte Carlo simulations by training an effective model to propose uncorrelated configurations in the Markov chain. Its applications are, however, limited. This is because it is not obvious to find the explicit form of the effective Hamiltonians. Particularly, it is difficu… ▽ More We propose a general way to construct an effective Hamiltonian in the Self-learning Monte Carlo method (SLMC), which speeds up Monte Carlo simulations by training an effective model to propose uncorrelated configurations in the Markov chain. Its applications are, however, limited. This is because it is not obvious to find the explicit form of the effective Hamiltonians. Particularly, it is difficult to make trainable effective Hamiltonians including many-body interactions. In order to overcome this critical difficulty, we introduce the Behler-Parrinello neural networks (BPNNs) as "effective Hamiltonian'' without any prior knowledge, which is used to construct the potential-energy surfaces in interacting many particle systems for molecular dynamics. We combine SLMC with BPNN by focusing on a divisibility of Hamiltonian and propose how to construct the element-wise configurations. We apply it to quantum impurity models. We observed significant improvement of the acceptance ratio from 0.01 (the effective Hamiltonian with the explicit form) to 0.76 (BPNN). This drastic improvement implies that the BPNN effective Hamiltonian includes many body interaction, which is omitted in the effective Hamiltonian with the explicit forms. The BPNNs make SLMC more promising. △ Less

Submitted 9 March, 2020; v1 submitted 13 July, 2018; originally announced July 2018.

Comments: 14 pages, 5 figures

Report number: RIKEN-iTHEMS-Report-18

Journal ref: Phys. Rev. B 101, 115111 (2020)

arXiv:1706.04689 [pdf]

Direct measurements of DOCO isomers in the kinetics of OD+CO

Authors: Thinh Q. Bui, Bryce J. Bjork, P. Bryan Changala, Thanh L. Nguyen, John F. Stanton, Mitchio Okumura, Jun Ye

Abstract: Quantitative and mechanistically-detailed kinetics of the reaction of hydroxyl radical (OH) with carbon monoxide (CO) have been a longstanding goal of contemporary chemical kinetics. This fundamental prototype reaction plays an important role in atmospheric and combustion chemistry, motivating studies for accurate determination of the reaction rate coefficient and its pressure and temperature depe… ▽ More Quantitative and mechanistically-detailed kinetics of the reaction of hydroxyl radical (OH) with carbon monoxide (CO) have been a longstanding goal of contemporary chemical kinetics. This fundamental prototype reaction plays an important role in atmospheric and combustion chemistry, motivating studies for accurate determination of the reaction rate coefficient and its pressure and temperature dependence at thermal reaction conditions. This intricate dependence can be traced directly to details of the underlying dynamics (formation, isomerization, and dissociation) involving the reactive intermediates cis- and trans-HOCO, which can only be observed transiently. Using time-resolved frequency comb spectroscopy, comprehensive mechanistic elucidation of the kinetics of the isotopic analogue deuteroxyl radical (OD) with CO has been realized. By monitoring the concentrations of reactants, intermediates, and products in real-time, the branching and isomerization kinetics and absolute yields of all species in the OD+CO reaction are quantified as a function of pressure and collision partner. △ Less

Submitted 6 October, 2017; v1 submitted 14 June, 2017; originally announced June 2017.

Comments: 19 pages, 4 figures

arXiv:1611.06422 [pdf, ps, other]

Emergence of $η$-pairing ground-state in population-imbalanced attractive Fermi-gases filling $p$ orbitals on 1-D optical lattice

Authors: Keita Kobayashi, Yukihiro Ota, Masahiko Okumura, Susumu Yamada, Masahiko Machida

Abstract: We explore the ground states in population-imbalanced attractive 1-D fermionic optical lattice filling $p$ orbitals over the lowest $s$ one by using the density-matrix-renormalization-group (DMRG) method. The DMRG calculations find the occurrence of spatially non-uniform off-diagonal long-range order. In contrast to Fulde-Ferrel Larkin-Ovchinikov pair as observed in the single-band Hubbard model.… ▽ More We explore the ground states in population-imbalanced attractive 1-D fermionic optical lattice filling $p$ orbitals over the lowest $s$ one by using the density-matrix-renormalization-group (DMRG) method. The DMRG calculations find the occurrence of spatially non-uniform off-diagonal long-range order. In contrast to Fulde-Ferrel Larkin-Ovchinikov pair as observed in the single-band Hubbard model. The spatial oscillation period of the pair correlation function is widely fixed to be $π$ irrespective of the mismatch between spin-split Fermi surfaces. The ground-state $π$ order corresponds to $η$-pair condensate predicted by Yang [Phys. Rev. Lett. \textbf{63}, 2144 (1989)]. Taking account of the effects of harmonic traps, we confirm that the $η$-pair state distinctly emerges at the center of the trap potential surrounded by perfectly-polarized states even in the trapped cases. △ Less

Submitted 22 November, 2016; v1 submitted 19 November, 2016; originally announced November 2016.

Comments: 6 pages, 4 figures

arXiv:1609.09552 [pdf, other]

Controlling Output Length in Neural Encoder-Decoders

Authors: Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Takamura, Manabu Okumura

Abstract: Neural encoder-decoder models have shown great success in many sequence generation tasks. However, previous work has not investigated situations in which we would like to control the length of encoder-decoder outputs. This capability is crucial for applications such as text summarization, in which we have to generate concise summaries with a desired length. In this paper, we propose methods for co… ▽ More Neural encoder-decoder models have shown great success in many sequence generation tasks. However, previous work has not investigated situations in which we would like to control the length of encoder-decoder outputs. This capability is crucial for applications such as text summarization, in which we have to generate concise summaries with a desired length. In this paper, we propose methods for controlling the output sequence length for neural encoder-decoder models: two decoding-based methods and two learning-based methods. Results show that our learning-based methods have the capability to control length without degrading summary quality in a summarization task. △ Less

Submitted 29 September, 2016; originally announced September 2016.

Comments: 11 pages. To appear in EMNLP 2016

arXiv:1608.07321 [pdf]

doi 10.1126/science.aag1862

Direct Frequency Comb Measurement of OD + CO -> DOCO Kinetics

Authors: Bryce J. Bjork, Thinh Q. Bui, Oliver H. Heckl, P. Bryan Changala, Ben Spaun, Paula Heu, David Follman, Christoph Deutsch, Garrett D. Cole, Markus Aspelmeyer, Mitchio Okumura, Jun Ye

Abstract: The kinetics of the OH + CO reaction, fundamental to both atmospheric and combustion chemistry, are complex due to the formation of the HOCO intermediate. Despite extensive studies on this reaction, HOCO has not been observed at thermal reaction conditions. Exploiting the sensitive, broadband, and high-resolution capabilities of time-resolved cavity-enhanced direct frequency comb spectroscopy, we… ▽ More The kinetics of the OH + CO reaction, fundamental to both atmospheric and combustion chemistry, are complex due to the formation of the HOCO intermediate. Despite extensive studies on this reaction, HOCO has not been observed at thermal reaction conditions. Exploiting the sensitive, broadband, and high-resolution capabilities of time-resolved cavity-enhanced direct frequency comb spectroscopy, we observe OD + CO reaction kinetics with the detection of stabilized trans-DOCO, the deuterated analogue of trans-HOCO, and its yield. By simultaneously measuring the time-dependent concentrations of both trans-DOCO and OD species, we observe unambiguous low-pressure termolecular dependence on the reaction rate coefficients for both N2 and CO bath gases. These results confirm the HOCO formation mechanism and quantify its yield. △ Less

Submitted 25 August, 2016; originally announced August 2016.

Comments: 39 pages, 14 figures

Journal ref: Science. 354 (2016) 444-448

arXiv:1608.00125 [pdf, ps, other]

doi 10.1103/PhysRevB.94.214501

Superconductivity in repulsively interacting fermions on a diamond chain: flat-band induced pairing

Authors: Keita Kobayashi, Masahiko Okumura, Susumu Yamada, Masahiko Machida, Hideo Aoki

Abstract: To explore whether a flat-band system can accommodate superconductivity, we consider repulsively interacting fermions on the diamond chain, a simplest quasi-one-dimensional system that contains a flat band. Exact diagonalization and the density-matrix renormalization group (DMRG) are used to show that we have a significant binding energy of a Cooper pair with a long-tailed pair-pair correlation in… ▽ More To explore whether a flat-band system can accommodate superconductivity, we consider repulsively interacting fermions on the diamond chain, a simplest quasi-one-dimensional system that contains a flat band. Exact diagonalization and the density-matrix renormalization group (DMRG) are used to show that we have a significant binding energy of a Cooper pair with a long-tailed pair-pair correlation in real space when the total band filling is slightly below $1/3$, where the dispersive band interacts with the flat band that is empty but close to $E_F$. Pairs selectively formed across the outer sites of the diamond chain are responsible for the pairing correlation. At exactly $1/3$-filling an insulating phase emerges, where the entanglement spectrum indicates the particles on the outer sites are highly entangled and topological. These come from a peculiarity of the flat band in which "Wannier orbits" are not orthogonalizable. △ Less

Submitted 28 October, 2016; v1 submitted 30 July, 2016; originally announced August 2016.

Comments: Phys. Rev. B, to be published

Journal ref: Physical Review B, 94, 214501 (2016)

arXiv:1509.09125 [pdf, other]

Fields of View for Environmental Radioactivity

Authors: Alex Malins, Masahiko Okumura, Masahiko Machida, Hiroshi Takemiya, Kimiaki Saito

Abstract: The gamma component of air radiation dose rates is a function of the amount and spread of radioactive nuclides in the environment. These radionuclides can be natural or anthropogenic in origin. The field of view describes the area of radionuclides on, or below, the ground that is responsible for determining the air dose rate, and hence correspondingly the external radiation exposure. This work des… ▽ More The gamma component of air radiation dose rates is a function of the amount and spread of radioactive nuclides in the environment. These radionuclides can be natural or anthropogenic in origin. The field of view describes the area of radionuclides on, or below, the ground that is responsible for determining the air dose rate, and hence correspondingly the external radiation exposure. This work describes Monte Carlo radiation transport calculations for the field of view under a variety of situations. Presented first are results for natural 40K and thorium and uranium series radionuclides distributed homogeneously within the ground. Results are then described for atmospheric radioactive caesium fallout, such as from the Fukushima Daiichi Nuclear Power Plant accident. Various stages of fallout evolution are considered through the depth distribution of 134Cs and 137Cs in soil. The fields of view for the natural radionuclides and radiocaesium are different. This can affect the responses of radiation monitors to these nuclides if the detector is partially shielded from the ground within its field of view. The field of view also sets the maximum reduction in air dose rates that can be achieved through local decontamination or remediation measures. This maximum efficiency can be determined quickly from the data presented here for the air dose rate versus the spatial extent of radioactive source on the ground. △ Less

Submitted 2 November, 2015; v1 submitted 30 September, 2015; originally announced September 2015.

Comments: 6 pages, 6 figures, Author Accepted Manuscript for Proceedings of the 2015 International Symposium on Radiological Issues for Fukushima's Revitalized Future

arXiv:1509.04005 [pdf, other]

doi 10.1016/j.jenvrad.2015.09.014

Evaluation of ambient dose equivalent rates influenced by vertical and horizontal distribution of radioactive cesium in soil in Fukushima Prefecture

Authors: Alex Malins, Hiroshi Kurikami, Shigeo Nakama, Tatsuo Saito, Masahiko Okumura, Masahiko Machida, Akihiro Kitamura

Abstract: The air dose rate in an environment contaminated with 134Cs and 137Cs depends on the amount, depth profile and horizontal distribution of these contaminants within the ground. This paper introduces and verifies a tool that models these variables and calculates ambient dose equivalent rates at 1 m above the ground. Good correlation is found between predicted dose rates and dose rates measured with… ▽ More The air dose rate in an environment contaminated with 134Cs and 137Cs depends on the amount, depth profile and horizontal distribution of these contaminants within the ground. This paper introduces and verifies a tool that models these variables and calculates ambient dose equivalent rates at 1 m above the ground. Good correlation is found between predicted dose rates and dose rates measured with survey meters in Fukushima Prefecture in areas contaminated with radiocesium from the Fukushima Dai-ichi Nuclear Power Plant accident. This finding is insensitive to the choice for modelling the activity depth distribution in the ground using activity measurements of collected soil layers, or by using exponential and hyperbolic secant fits to the measurement data. Better predictions are obtained by modelling the horizontal distribution of radioactive cesium across an area if multiple soil samples are available, as opposed to assuming a spatially homogeneous contamination distribution. Reductions seen in air dose rates above flat, undisturbed fields in Fukushima Prefecture are consistent with decrement by radioactive decay and downward migration of cesium into soil. Analysis of remediation strategies for farmland soils confirmed that topsoil removal and interchanging a topsoil layer with a subsoil layer result in similar reductions in the air dose rate. These two strategies are more effective than reverse tillage to invert and mix the topsoil. △ Less

Submitted 14 September, 2015; v1 submitted 14 September, 2015; originally announced September 2015.

Comments: 15 pages, 12 figures, 5 tables, Author Accepted Manuscript (14th Sep 2015), Journal of Environmental Radioactivity

Journal ref: Journal of Environmental Radioactivity 151, 38-49 (2016)

arXiv:1502.03892 [pdf, other]

Topographic Effects on Ambient Dose Equivalent Rates from Radiocesium Fallout

Authors: Alex Malins, Masahiko Okumura, Masahiko Machida, Kimiaki Saito

Abstract: Land topography can affect air radiation dose rates by locating radiation sources closer to, or further from, detector locations when compared to perfectly flat terrain. Hills and slopes can also shield against the propagation of gamma rays. To understand the possible magnitude of topographic effects on air dose rates, this study presents calculations for ambient dose equivalent rates at a range o… ▽ More Land topography can affect air radiation dose rates by locating radiation sources closer to, or further from, detector locations when compared to perfectly flat terrain. Hills and slopes can also shield against the propagation of gamma rays. To understand the possible magnitude of topographic effects on air dose rates, this study presents calculations for ambient dose equivalent rates at a range of heights above the ground for varying land topographies. The geometries considered were angled ground at the intersection of two planar surfaces, which is a model for slopes neighboring flat land, and a simple conical geometry, representing settings from hilltops to valley bottoms. In each case the radiation source was radioactive cesium fallout, and the slope angle was varied systematically to determine the effect of topography on the air dose rate. Under the assumption of homogeneous fallout across the land surface, and for these geometries and detector locations, the dose rates at high altitudes are more strongly affected by the underlying land topography than those close to ground level. At a height of 300m, uneven topographies can lead to a 50% change in air dose rates compared to if the ground were uniformly flat. However, in practice the effect will more often than not be smaller than this, and heterogeneity in the source distribution is likely to be a more significant factor in determining local air dose rates. △ Less

Submitted 15 April, 2015; v1 submitted 13 February, 2015; originally announced February 2015.

Comments: 7 pages, 8 figures, corrected problem with column formatting + latex tweaks, for presentation at the Joint International Conference on Mathematics and Computation, Supercomputing in Nuclear Applications and the Monte Carlo Method (M&C + SNA + MC 2015), Nashville, USA

arXiv:1401.0241 [pdf, ps, other]

doi 10.1103/PhysRevA.89.023625

Quantum phases in $p$-orbital degenerated attractive 1D fermionic optical lattices

Authors: Keita Kobayashi, Yukihiro Ota, Masahiko Okumura, Susumu Yamada, Masahiko Machida

Abstract: We examine quantum phases emerged by double degeneracy of $p$-orbital bands in attractive atomic Fermi gases loaded on a 1D optical lattice. Our numerical simulations by the density-matrix renormalization group predict the emergence of a state with a charge excitation gap, the Haldane insulator phase. A map** onto an effective spin-$1$ model reveals its physical origin. Moreover, we show that po… ▽ More We examine quantum phases emerged by double degeneracy of $p$-orbital bands in attractive atomic Fermi gases loaded on a 1D optical lattice. Our numerical simulations by the density-matrix renormalization group predict the emergence of a state with a charge excitation gap, the Haldane insulator phase. A map** onto an effective spin-$1$ model reveals its physical origin. Moreover, we show that population imbalance leads to richer diversity of the quantum phases, including a phase-separated polarized state. Finally, we study the effects of harmonic trap potential in this 1D chain. △ Less

Submitted 6 February, 2014; v1 submitted 31 December, 2013; originally announced January 2014.

Comments: 7 pages, 5 figures

Journal ref: Physical Review A.89.023625 (2014)

arXiv:1305.6439 [pdf, ps, other]

Vertex Algebras and the Equivariant Lie Algebroid Cohomology

Authors: Masanari Okumura

Abstract: A vertex-algebraic analogue of the Lie algebroid complex is constructed, which generalizes the "small" chiral de Rham complex on smooth manifolds. The notion of VSA-inductive sheaves is also introduced. This notion generalizes that of sheaves of vertex superalgebras. The complex mentioned above is constructed as a VSA-inductive sheaf. With this complex, the equivariant Lie algebroid cohomology is… ▽ More A vertex-algebraic analogue of the Lie algebroid complex is constructed, which generalizes the "small" chiral de Rham complex on smooth manifolds. The notion of VSA-inductive sheaves is also introduced. This notion generalizes that of sheaves of vertex superalgebras. The complex mentioned above is constructed as a VSA-inductive sheaf. With this complex, the equivariant Lie algebroid cohomology is generalized to a vertex-algebraic analogue, which we call the chiral equivariant Lie algebroid cohomology. In fact, the notion of the equivariant Lie algebroid cohomology contains that of the equivariant Poisson cohomology. Thus the chiral equivariant Lie algebroid cohomology is also a vertex-algebraic generalization of the equivariant Poisson cohomology. A special kind of complex is introduced and its properties are studied in detail. With these properties, some isomorphisms of cohomologies are developed, which enables us to compute the chiral equivariant Lie algebroid cohomology in some cases. Poisson-Lie groups are considered as such a special case. △ Less

Submitted 28 May, 2013; originally announced May 2013.

arXiv:1211.2509 [pdf, ps, other]

doi 10.1103/PhysRevLett.109.235302

Nontrivial Haldane phase of an atomic two-component Fermi gas trapped in a 1d optical lattice

Authors: Keita Kobayashi, Masahiko Okumura, Yukihiro Ota, Susumu Yamada, Masahiko Machida

Abstract: We propose how to create a non-trivial Haldane phase in atomic two-component Fermi-gas loaded on one-dimensional (1-D) optical lattice with trap potential. The Haldane phase is naturally formed on $p$-band Mott core in a wide range of the strong on-site repulsive interaction. The present proposal is composed of two steps, one of which is theoretical derivation of an effective 1-D S=1 interacting-c… ▽ More We propose how to create a non-trivial Haldane phase in atomic two-component Fermi-gas loaded on one-dimensional (1-D) optical lattice with trap potential. The Haldane phase is naturally formed on $p$-band Mott core in a wide range of the strong on-site repulsive interaction. The present proposal is composed of two steps, one of which is theoretical derivation of an effective 1-D S=1 interacting-chain model from the original tight-binding Hamiltonian handling the two $p$-orbitals, and the other of which is numerical demonstration employing the density-matrix renormalization-group for the formation of the Haldane phase on $p$-band Mott core and its associated features in the original tight-binding model with the harmonic trap potential. △ Less

Submitted 11 November, 2012; originally announced November 2012.

Comments: 5pages,4figures

Journal ref: Phys. Rev. Lett. 109, 235302 (2012)

arXiv:1207.0396 [pdf, other]

Applying Deep Belief Networks to Word Sense Disambiguation

Authors: Peratham Wiriyathammabhum, Boonserm Kijsirikul, Hiroya Takamura, Manabu Okumura

Abstract: In this paper, we applied a novel learning algorithm, namely, Deep Belief Networks (DBN) to word sense disambiguation (WSD). DBN is a probabilistic generative model composed of multiple layers of hidden units. DBN uses Restricted Boltzmann Machine (RBM) to greedily train layer by layer as a pretraining. Then, a separate fine tuning step is employed to improve the discriminative power. We compared… ▽ More In this paper, we applied a novel learning algorithm, namely, Deep Belief Networks (DBN) to word sense disambiguation (WSD). DBN is a probabilistic generative model composed of multiple layers of hidden units. DBN uses Restricted Boltzmann Machine (RBM) to greedily train layer by layer as a pretraining. Then, a separate fine tuning step is employed to improve the discriminative power. We compared DBN with various state-of-the-art supervised learning algorithms in WSD such as Support Vector Machine (SVM), Maximum Entropy model (MaxEnt), Naive Bayes classifier (NB) and Kernel Principal Component Analysis (KPCA). We used all words in the given paragraph, surrounding context words and part-of-speech of surrounding words as our knowledge sources. We conducted our experiment on the SENSEVAL-2 data set. We observed that DBN outperformed all other learning algorithms. △ Less

Submitted 2 July, 2012; originally announced July 2012.

arXiv:1203.5907 [pdf, ps, other]

doi 10.1143/JPSJ.81.104003

Interference Pattern Formation between Bounded-Solitons and Radiation in Momentum Space: Possible Detection of Radiation from Bounded-Solitons with Bose-Einstein Condensate of Neutral Atoms

Authors: Hironobu Fujishima, Masahiko Okumura, Makoto Mine, Tetsu Yajima

Abstract: We propose an indirect method to observe radiation from an incomplete soliton with sufficiently large amplitude. We show that the radiation causes a notched structure on the envelope of the wave packet in the momentum space. The origin of this structure is a result of interference between the main body of oscillating solitons and the small radiation in the momentum space. We numerically integrate… ▽ More We propose an indirect method to observe radiation from an incomplete soliton with sufficiently large amplitude. We show that the radiation causes a notched structure on the envelope of the wave packet in the momentum space. The origin of this structure is a result of interference between the main body of oscillating solitons and the small radiation in the momentum space. We numerically integrate the nonlinear Schrödinger equation and perform Fourier transformation to confirm that the predicted structure really appears. We also show the simple model which reproduces the qualitative result. Experimental detection of the notched structure with Bose-Einstein condensation of neutral atoms is discussed and suitable parameters for this detection experiment are shown. △ Less

Submitted 27 September, 2012; v1 submitted 27 March, 2012; originally announced March 2012.

Comments: 10 figures

Journal ref: Journal of the Physical Society of Japan, 81 (2012) 104003

Showing 1–50 of 70 results for author: Okumura, M