Search | arXiv e-print repository

Learning Decision Trees and Forests with Algorithmic Recourse

Authors: Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike

Abstract: This paper proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. In practice,… ▽ More This paper proposes a new algorithm for learning accurate tree-based models while ensuring the existence of recourse actions. Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. Typical AR methods provide a reasonable action by solving an optimization task of minimizing the required effort among executable actions. In practice, however, such actions do not always exist for models optimized only for predictive performance. To alleviate this issue, we formulate the task of learning an accurate classification tree under the constraint of ensuring the existence of reasonable actions for as many instances as possible. Then, we propose an efficient top-down greedy algorithm by leveraging the adversarial training techniques. We also show that our proposed algorithm can be applied to the random forest, which is known as a popular framework for learning tree ensembles. Experimental results demonstrated that our method successfully provided reasonable actions to more instances than the baselines without significantly degrading accuracy and computational efficiency. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 27 pages, 10 figures, to appear in the 41st International Conference on Machine Learning (ICML 2024)

arXiv:2405.17492 [pdf, other]

StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs

Authors: Yusuke Kawamoto, Kentaro Kobayashi, Kohei Suenaga

Abstract: Statistical methods have been widely misused and misinterpreted in various scientific fields, raising significant concerns about the integrity of scientific research. To develop techniques to mitigate this problem, we propose a new method for formally specifying and automatically verifying the correctness of statistical programs. In this method, programmers are reminded to check the requirements f… ▽ More Statistical methods have been widely misused and misinterpreted in various scientific fields, raising significant concerns about the integrity of scientific research. To develop techniques to mitigate this problem, we propose a new method for formally specifying and automatically verifying the correctness of statistical programs. In this method, programmers are reminded to check the requirements for statistical methods by annotating their source code. Then, a software tool called StatWhy automatically checks whether the programmers have properly specified the requirements for the statistical methods. This tool is implemented using the Why3 platform to verify the correctness of OCaml programs for statistical hypothesis testing. We demonstrate how StatWhy can be used to avoid common errors in a variety of popular hypothesis testing programs. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2403.09920 [pdf]

Predicting Generalization of AI Colonoscopy Models to Unseen Data

Authors: Joel Shor, Carson McNeil, Yotam Intrator, Joseph R Ledsam, Hiro-o Yamano, Daisuke Tsurumaru, Hiroki Kayama, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Masaaki Miyo, Eiji Oki, Ichiro Takemasa, Ehud Rivlin, Roman Goldenberg

Abstract: $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}… ▽ More $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}$: We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. $\textbf{Results}$: MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). $\textbf{Conclusion}$: Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy. △ Less

Submitted 22 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2312.09529 [pdf, other]

Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer

Authors: Kazuma Kobayashi, Yasuyuki Takamizawa, Mototaka Miyake, Sono Ito, Lin Gu, Tatsuya Nakatsuka, Yu Akagi, Tatsuya Harada, Yukihide Kanemitsu, Ryuji Hamamoto

Abstract: Explainability is key to enhancing artificial intelligence's trustworthiness in medicine. However, several issues remain concerning the actual benefit of explainable models for clinical decision-making. Firstly, there is a lack of consensus on an evaluation framework for quantitatively assessing the practical benefits that effective explainability should provide to practitioners. Secondly, physici… ▽ More Explainability is key to enhancing artificial intelligence's trustworthiness in medicine. However, several issues remain concerning the actual benefit of explainable models for clinical decision-making. Firstly, there is a lack of consensus on an evaluation framework for quantitatively assessing the practical benefits that effective explainability should provide to practitioners. Secondly, physician-centered evaluations of explainability are limited. Thirdly, the utility of built-in attention mechanisms in transformer-based models as an explainability technique is unclear. We hypothesize that superior attention maps should align with the information that physicians focus on, potentially reducing prediction uncertainty and increasing model reliability. We employed a multimodal transformer to predict lymph node metastasis in rectal cancer using clinical data and magnetic resonance imaging, exploring how well attention maps, visualized through a state-of-the-art technique, can achieve agreement with physician understanding. We estimated the model's uncertainty using meta-level information like prediction probability variance and quantified agreement. Our assessment of whether this agreement reduces uncertainty found no significant effect. In conclusion, this case study did not confirm the anticipated benefit of attention maps in enhancing model reliability. Superficial explanations could do more harm than good by misleading physicians into relying on uncertain predictions, suggesting that the current state of attention mechanisms in explainability should not be overestimated. Identifying explainability mechanisms truly beneficial for clinical decision-making remains essential. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.06833 [pdf]

The unreasonable effectiveness of AI CADe polyp detectors to generalize to new countries

Authors: Joel Shor, Hiro-o Yamano, Daisuke Tsurumaru, Yotami Intrator, Hiroki Kayama, Joe Ledsam, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Eiji Oki, Roman Goldenberg, Ehud Rivlin, Ichiro Takemasa

Abstract: $\textbf{Background and aims}… ▽ More $\textbf{Background and aims}$: Artificial Intelligence (AI) Computer-Aided Detection (CADe) is commonly used for polyp detection, but data seen in clinical settings can differ from model training. Few studies evaluate how well CADe detectors perform on colonoscopies from countries not seen during training, and none are able to evaluate performance without collecting expensive and time-intensive labels. $\textbf{Methods}$: We trained a CADe polyp detector on Israeli colonoscopy videos (5004 videos, 1106 hours) and evaluated on Japanese videos (354 videos, 128 hours) by measuring the True Positive Rate (TPR) versus false alarms per minute (FAPM). We introduce a colonoscopy dissimilarity measure called "MAsked mediCal Embedding Distance" (MACE) to quantify differences between colonoscopies, without labels. We evaluated CADe on all Japan videos and on those with the highest MACE. $\textbf{Results}$: MACE correctly quantifies that narrow-band imaging (NBI) and chromoendoscopy (CE) frames are less similar to Israel data than Japan whitelight (bootstrapped z-test, |z| > 690, p < $10^{-8}$ for both). Despite differences in the data, CADe performance on Japan colonoscopies was non-inferior to Israel ones without additional training (TPR at 0.5 FAPM: 0.957 and 0.972 for Israel and Japan; TPR at 1.0 FAPM: 0.972 and 0.989 for Israel and Japan; superiority test t > 45.2, p < $10^{-8}$). Despite not being trained on NBI or CE, TPR on those subsets were non-inferior to Japan overall (non-inferiority test t > 47.3, p < $10^{-8}$, $δ$ = 1.5% for both). $\textbf{Conclusion}$: Differences that prevent CADe detectors from performing well in non-medical settings do not degrade the performance of our AI CADe polyp detector when applied to data from a new country. MACE can help medical AI models internationalize by identifying the most "dissimilar" data on which to evaluate models. △ Less

Submitted 17 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.18207 [pdf, other]

Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

Authors: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

Abstract: Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often used to identify the top-k promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online p… ▽ More Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often used to identify the top-k promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online policy deployment. To address this issue, we draw inspiration from portfolio evaluation in finance and develop a new metric, called SharpeRatio@k, which measures the risk-return tradeoff of policy portfolios formed by an OPE estimator under varying online evaluation budgets (k). We validate our metric in two example scenarios, demonstrating its ability to effectively distinguish between low-risk and high-risk estimators and to accurately identify the most efficient one. Efficiency of an estimator is characterized by its capability to form the most advantageous policy portfolios, maximizing returns while minimizing risks during online deployment, a nuance that existing metrics typically overlook. To facilitate a quick, accurate, and consistent evaluation of OPE via SharpeRatio@k, we have also integrated this metric into an open-source software, SCOPE-RL (https://github.com/hakuhodo-technologies/scope-rl). Employing SharpeRatio@k and SCOPE-RL, we conduct comprehensive benchmarking experiments on various estimators and RL tasks, focusing on their risk-return tradeoff. These experiments offer several interesting directions and suggestions for future OPE research. △ Less

Submitted 10 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: ICLR2024

arXiv:2311.18206 [pdf, other]

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

Authors: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

Abstract: This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS). Unlike most existing libraries that focus solely on either policy learning or evaluation, SCOPE-RL seamlessly integrates these two key aspects, facilitating flexible and complete implementations of both offline RL an… ▽ More This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS). Unlike most existing libraries that focus solely on either policy learning or evaluation, SCOPE-RL seamlessly integrates these two key aspects, facilitating flexible and complete implementations of both offline RL and OPE processes. SCOPE-RL put particular emphasis on its OPE modules, offering a range of OPE estimators and robust evaluation-of-OPE protocols. This approach enables more in-depth and reliable OPE compared to other packages. For instance, SCOPE-RL enhances OPE by estimating the entire reward distribution under a policy rather than its mere point-wise expected value. Additionally, SCOPE-RL provides a more thorough evaluation-of-OPE by presenting the risk-return tradeoff in OPE results, extending beyond mere accuracy evaluations in existing OPE literature. SCOPE-RL is designed with user accessibility in mind. Its user-friendly APIs, comprehensive documentation, and a variety of easy-to-follow examples assist researchers and practitioners in efficiently implementing and experimenting with various offline RL methods and OPE estimators, tailored to their specific problem contexts. The documentation of SCOPE-RL is available at https://scope-rl.readthedocs.io/en/latest/. △ Less

Submitted 10 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: preprint, open-source software: https://github.com/hakuhodo-technologies/scope-rl

arXiv:2309.09627 [pdf, other]

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

Authors: Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

Abstract: We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conv… ▽ More We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score. △ Less

Submitted 20 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024. Demo page: lesterphillip.github.io/icassp2024_el_sie

arXiv:2309.07598 [pdf, other]

AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion

Authors: Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda

Abstract: Non-autoregressive (non-AR) sequence-to-seqeunce (seq2seq) models for voice conversion (VC) is attractive in its ability to effectively model the temporal structure while enjoying boosted intelligibility and fast inference thanks to non-AR modeling. However, the dependency of current non-AR seq2seq VC models on ground truth durations extracted from an external AR model greatly limits its generaliz… ▽ More Non-autoregressive (non-AR) sequence-to-seqeunce (seq2seq) models for voice conversion (VC) is attractive in its ability to effectively model the temporal structure while enjoying boosted intelligibility and fast inference thanks to non-AR modeling. However, the dependency of current non-AR seq2seq VC models on ground truth durations extracted from an external AR model greatly limits its generalization ability to smaller training datasets. In this paper, we first demonstrate the above-mentioned problem by varying the training data size. Then, we present AAS-VC, a non-AR seq2seq VC model based on automatic alignment search (AAS), which removes the dependency on external durations and serves as a proper inductive bias to provide the required generalization ability for small datasets. Experimental results show that AAS-VC can generalize better to a training dataset of only 5 minutes. We also conducted ablation studies to justify several model design choices. The audio samples and implementation are available online. △ Less

Submitted 15 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP 2024. Demo: https://unilight.github.io/Publication-Demos/publications/aas-vc/index.html. Code: https://github.com/unilight/seq2seq-vc

arXiv:2309.06006 [pdf, ps, other]

SoccerNet 2023 Challenges Results

Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.03331 [pdf, other]

Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning

Authors: Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu

Abstract: Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting di… ▽ More Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting disease labels based on CXR reports. In this paper, we re-extract disease labels from CXR reports to make them more realistic by considering disease severity and uncertainty in classification. Our contributions are as follows: 1. We re-extracted the disease labels with severity and uncertainty by a rule-based approach with keywords discussed with clinical experts. 2. To further improve the explainability of chest X-ray diagnosis, we designed a multi-relationship graph learning method with an expert uncertainty-aware loss function. 3. Our multi-relationship graph learning method can also interpret the disease classification results. Our experimental results show that models considering disease severity and uncertainty outperform previous state-of-the-art methods. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.07523 [pdf, other]

doi 10.1038/s41598-024-51984-x

Deep Neural Operator Driven Real Time Inference for Nuclear Systems to Enable Digital Twin Solutions

Authors: Kazuma Kobayashi, Syed Bahauddin Alam

Abstract: This paper focuses on the feasibility of Deep Neural Operator (DeepONet) as a robust surrogate modeling method within the context of digital twin (DT) for nuclear energy systems. Through benchmarking and evaluation, this study showcases the generalizability and computational efficiency of DeepONet in solving a challenging particle transport problem. DeepONet also exhibits remarkable prediction acc… ▽ More This paper focuses on the feasibility of Deep Neural Operator (DeepONet) as a robust surrogate modeling method within the context of digital twin (DT) for nuclear energy systems. Through benchmarking and evaluation, this study showcases the generalizability and computational efficiency of DeepONet in solving a challenging particle transport problem. DeepONet also exhibits remarkable prediction accuracy and speed, outperforming traditional ML methods, making it a suitable algorithm for real-time DT inference. However, the application of DeepONet also reveals challenges related to optimal sensor placement and model evaluation, critical aspects of real-world implementation. Addressing these challenges will further enhance the method's practicality and reliability. Overall, DeepONet presents a promising and transformative nuclear engineering research and applications tool. Its accurate prediction and computational efficiency capabilities can revolutionize DT systems, advancing nuclear engineering research. This study marks an important step towards harnessing the power of surrogate modeling techniques in critical engineering domains. △ Less

Submitted 28 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Journal ref: Sci Rep 14, 2101 (2024)

arXiv:2307.11986 [pdf, other]

doi 10.1145/3580305.3599819

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

Authors: Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu

Abstract: To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with… ▽ More To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code can be found at https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further push forward the medical vision language model. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2307.10204 [pdf, ps, other]

An IPW-based Unbiased Ranking Metric in Two-sided Markets

Authors: Keisho Oh, Naoki Nishimura, Minje Sung, Ken Kobayashi, Kazuhide Nakata

Abstract: In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions requ… ▽ More In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.08966 [pdf, ps, other]

Multi-Robot Patrol Algorithm with Distributed Coordination and Consciousness of the Base Station's Situation Awareness

Authors: Kazuho Kobayashi, Seiya Ueno, Takehiro Higuchi

Abstract: Multi-robot patrolling is the potential application for robotic systems to survey wide areas efficiently without human burdens and mistakes. However, such systems have few examples of real-world applications due to their lack of human predictability. This paper proposes an algorithm: Local Reactive (LR) for multi-robot patrolling to satisfy both needs: (i)patrol efficiently and (ii)provide humans… ▽ More Multi-robot patrolling is the potential application for robotic systems to survey wide areas efficiently without human burdens and mistakes. However, such systems have few examples of real-world applications due to their lack of human predictability. This paper proposes an algorithm: Local Reactive (LR) for multi-robot patrolling to satisfy both needs: (i)patrol efficiently and (ii)provide humans with better situation awareness to enhance system predictability. Each robot operating according to the proposed algorithm selects its patrol target from the local areas around the robot's current location by two requirements: (i)patrol location with greater need, (ii)report its achievements to the base station. The algorithm is distributed and coordinates the robots without centralized control by sharing their patrol achievements and degree of need to report to the base station. The proposed algorithm performed better than existing algorithms in both patrolling and the base station's situation awareness. △ Less

Submitted 10 October, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2304.14606 [pdf, other]

Algorithmic Recourse with Missing Values

Authors: Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike

Abstract: This paper proposes a new framework of algorithmic recourse (AR) that works even in the presence of missing values. AR aims to provide a recourse action for altering the undesired prediction result given by a classifier. Existing AR methods assume that we can access complete information on the features of an input instance. However, we often encounter missing values in a given instance (e.g., due… ▽ More This paper proposes a new framework of algorithmic recourse (AR) that works even in the presence of missing values. AR aims to provide a recourse action for altering the undesired prediction result given by a classifier. Existing AR methods assume that we can access complete information on the features of an input instance. However, we often encounter missing values in a given instance (e.g., due to privacy concerns), and previous studies have not discussed such a practical situation. In this paper, we first empirically and theoretically show the risk that a naive approach with a single imputation technique fails to obtain good actions regarding their validity, cost, and features to be changed. To alleviate this risk, we formulate the task of obtaining a valid and low-cost action for a given incomplete instance by incorporating the idea of multiple imputation. Then, we provide some theoretical analyses of our task and propose a practical solution based on mixed-integer linear optimization. Experimental results demonstrated the efficacy of our method in the presence of missing values compared to the baselines. △ Less

Submitted 22 May, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: 30 pages, 15 figures

arXiv:2304.05949 [pdf, other]

doi 10.1038/s41467-024-46645-6

CMOS + stochastic nanomagnets: heterogeneous computers for probabilistic inference and learning

Authors: Nihal Sanjay Singh, Keito Kobayashi, Qixuan Cao, Kemal Selcuk, Tianrui Hu, Shaila Niazi, Navid Anjum Aadit, Shun Kanai, Hideo Ohno, Shunsuke Fukami, Kerem Y. Camsari

Abstract: Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. One important class of problems involve sampling-based Monte Carlo algorithms used in probabilistic machine learning, optimization, and quantum simulation. Here, we combine stochastic magnetic tunnel junction (sMTJ)-based probabilistic… ▽ More Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. One important class of problems involve sampling-based Monte Carlo algorithms used in probabilistic machine learning, optimization, and quantum simulation. Here, we combine stochastic magnetic tunnel junction (sMTJ)-based probabilistic bits (p-bits) with Field Programmable Gate Arrays (FPGA) to create an energy-efficient CMOS + X (X = sMTJ) prototype. This setup shows how asynchronously driven CMOS circuits controlled by sMTJs can perform probabilistic inference and learning by leveraging the algorithmic update-order-invariance of Gibbs sampling. We show how the stochasticity of sMTJs can augment low-quality random number generators (RNG). Detailed transistor-level comparisons reveal that sMTJ-based p-bits can replace up to 10,000 CMOS transistors while dissipating two orders of magnitude less energy. Integrated versions of our approach can advance probabilistic computing involving deep Boltzmann machines and other energy-based learning algorithms with extremely high throughput and energy efficiency. △ Less

Submitted 23 February, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

Journal ref: Nature Communications volume 15, Article number: 2685 (2024)

arXiv:2303.11734 [pdf, other]

Unlocking Layer-wise Relevance Propagation for Autoencoders

Authors: Kenyu Kobayashi, Renata Khasanova, Arno Schneuwly, Felix Schmidt, Matteo Casserini

Abstract: Autoencoders are a powerful and versatile tool often used for various problems such as anomaly detection, image processing and machine translation. However, their reconstructions are not always trivial to explain. Therefore, we propose a fast explainability solution by extending the Layer-wise Relevance Propagation method with the help of Deep Taylor Decomposition framework. Furthermore, we introd… ▽ More Autoencoders are a powerful and versatile tool often used for various problems such as anomaly detection, image processing and machine translation. However, their reconstructions are not always trivial to explain. Therefore, we propose a fast explainability solution by extending the Layer-wise Relevance Propagation method with the help of Deep Taylor Decomposition framework. Furthermore, we introduce a novel validation technique for comparing our explainability approach with baseline methods in the case of missing ground-truth data. Our results highlight computational as well as qualitative advantages of the proposed explainability solution with respect to existing methods. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.03633 [pdf, other]

Sketch-based Medical Image Retrieval

Authors: Kazuma Kobayashi, Lin Gu, Ryuichiro Hataya, Takaaki Mizuno, Mototaka Miyake, Hirokazu Watanabe, Masamichi Takahashi, Yasuyuki Takamizawa, Yukihiro Yoshida, Satoshi Nakamura, Nobuji Kouno, Amina Bolatkan, Yusuke Kurose, Tatsuya Harada, Ryuji Hamamoto

Abstract: The amount of medical images stored in hospitals is increasing faster than ever; however, utilizing the accumulated medical images has been limited. This is because existing content-based medical image retrieval (CBMIR) systems usually require example images to construct query vectors; nevertheless, example images cannot always be prepared. Besides, there can be images with rare characteristics th… ▽ More The amount of medical images stored in hospitals is increasing faster than ever; however, utilizing the accumulated medical images has been limited. This is because existing content-based medical image retrieval (CBMIR) systems usually require example images to construct query vectors; nevertheless, example images cannot always be prepared. Besides, there can be images with rare characteristics that make it difficult to find similar example images, which we call isolated samples. Here, we introduce a novel sketch-based medical image retrieval (SBMIR) system that enables users to find images of interest without example images. The key idea lies in feature decomposition of medical images, whereby the entire feature of a medical image can be decomposed into and reconstructed from normal and abnormal features. By extending this idea, our SBMIR system provides an easy-to-use two-step graphical user interface: users first select a template image to specify a normal feature and then draw a semantic sketch of the disease on the template image to represent an abnormal feature. Subsequently, it integrates the two kinds of input to construct a query vector and retrieves reference images with the closest reference vectors. Using two datasets, ten healthcare professionals with various clinical backgrounds participated in the user test for evaluation. As a result, our SBMIR system enabled users to overcome previous challenges, including image retrieval based on fine-grained image characteristics, image retrieval without example images, and image retrieval for isolated samples. Our SBMIR system achieves flexible medical image retrieval on demand, thereby expanding the utility of medical image databases. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.09636 [pdf, other]

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Authors: Xinyue Hu, Lin Gu, Kazuma Kobayashi, Qiyuan An, Qingyu Chen, Zhiyong Lu, Chang Su, Tatsuya Harada, Yingying Zhu

Abstract: Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. Existing medical VQA methods tend to encode medical images and learn the correspondence between visual… ▽ More Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. Existing medical VQA methods tend to encode medical images and learn the correspondence between visual features and questions without exploiting the spatial, semantic, or medical knowledge behind them. This is partially because of the small size of the current medical VQA dataset, which often includes simple questions. Therefore, we first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images. The questions involved detailed relationships, such as disease names, locations, levels, and types in our dataset. Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs: spatial relationship, semantic relationship, and implicit relationship graphs on the image regions, questions, and semantic labels. The answer and graph reasoning paths are learned for different questions. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2301.06701 [pdf, other]

doi 10.1016/j.engappai.2024.107844

Improved generalization with deep neural operators for engineering systems: Path towards digital twin

Authors: Kazuma Kobayashi, James Daniell, Syed Bahauddin Alam

Abstract: Neural Operator Networks (ONets) represent a novel advancement in machine learning algorithms, offering a robust and generalizable alternative for approximating partial differential equations (PDEs) solutions. Unlike traditional Neural Networks (NN), which directly approximate functions, ONets specialize in approximating mathematical operators, enhancing their efficacy in addressing complex PDEs.… ▽ More Neural Operator Networks (ONets) represent a novel advancement in machine learning algorithms, offering a robust and generalizable alternative for approximating partial differential equations (PDEs) solutions. Unlike traditional Neural Networks (NN), which directly approximate functions, ONets specialize in approximating mathematical operators, enhancing their efficacy in addressing complex PDEs. In this work, we evaluate the capabilities of Deep Operator Networks (DeepONets), an ONets implementation using a branch/trunk architecture. Three test cases are studied: a system of ODEs, a general diffusion system, and the convection/diffusion Burgers equation. It is demonstrated that DeepONets can accurately learn the solution operators, achieving prediction accuracy scores above 0.96 for the ODE and diffusion problems over the observed domain while achieving zero shot (without retraining) capability. More importantly, when evaluated on unseen scenarios (zero shot feature), the trained models exhibit excellent generalization ability. This underscores ONets vital niche for surrogate modeling and digital twin development across physical systems. While convection-diffusion poses a greater challenge, the results confirm the promise of ONets and motivate further enhancements to the DeepONet algorithm. This work represents an important step towards unlocking the potential of digital twins through robust and generalizable surrogates. △ Less

Submitted 28 April, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

Journal ref: Engineering Applications of Artificial Intelligence 131 (2024): 107844

arXiv:2301.06676 [pdf, other]

doi 10.1016/j.engappai.2023.107620

Explainable, Interpretable & Trustworthy AI for Intelligent Digital Twin: Case Study on Remaining Useful Life

Authors: Kazuma Kobayashi, Syed Bahauddin Alam

Abstract: Artificial intelligence (AI) and Machine learning (ML) are increasingly used in energy and engineering systems, but these models must be fair, unbiased, and explainable. It is critical to have confidence in AI's trustworthiness. ML techniques have been useful in predicting important parameters and in improving model performance. However, for these AI techniques to be useful for making decisions, t… ▽ More Artificial intelligence (AI) and Machine learning (ML) are increasingly used in energy and engineering systems, but these models must be fair, unbiased, and explainable. It is critical to have confidence in AI's trustworthiness. ML techniques have been useful in predicting important parameters and in improving model performance. However, for these AI techniques to be useful for making decisions, they need to be audited, accounted for, and easy to understand. Therefore, the use of explainable AI (XAI) and interpretable machine learning (IML) is crucial for the accurate prediction of prognostics, such as remaining useful life (RUL), in a digital twin system, to make it intelligent while ensuring that the AI model is transparent in its decision-making processes and that the predictions it generates can be understood and trusted by users. By using AI that is explainable, interpretable, and trustworthy, intelligent digital twin systems can make more accurate predictions of RUL, leading to better maintenance and repair planning, and ultimately, improved system performance. The objective of this paper is to explain the ideas of XAI and IML and to justify the important role of AI/ML in the digital twin framework and components, which requires XAI to understand the prediction better. This paper explains the importance of XAI and IML in both local and global aspects to ensure the use of trustworthy AI/ML applications for RUL prediction. We used the RUL prediction for the XAI and IML studies and leveraged the integrated Python toolbox for interpretable machine learning~(PiML). △ Less

Submitted 28 April, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

Journal ref: Engineering Applications of Artificial Intelligence 129 (2024): 107620

arXiv:2211.13157 [pdf, other]

Physics-Informed Multi-Stage Deep Learning Framework Development for Digital Twin-Centred State-Based Reactor Power Prediction

Authors: James Daniell, Kazuma Kobayashi, Susmita Naskar, Dinesh Kumar, Souvik Chakraborty, Ayodeji Alajo, Ethan Taber, Joseph Graham, Syed Alam

Abstract: Computationally efficient and trustworthy machine learning algorithms are necessary for Digital Twin (DT) framework development. Generally speaking, DT-enabling technologies consist of five major components: (i) Machine learning (ML)-driven prediction algorithm, (ii) Temporal synchronization between physics and digital assets utilizing advanced sensors/instrumentation, (iii) uncertainty propagatio… ▽ More Computationally efficient and trustworthy machine learning algorithms are necessary for Digital Twin (DT) framework development. Generally speaking, DT-enabling technologies consist of five major components: (i) Machine learning (ML)-driven prediction algorithm, (ii) Temporal synchronization between physics and digital assets utilizing advanced sensors/instrumentation, (iii) uncertainty propagation, and (iv) DT operational framework. Unfortunately, there is still a significant gap in develo** those components for nuclear plant operation. In order to address this gap, this study specifically focuses on the "ML-driven prediction algorithms" as a viable component for the nuclear reactor operation while assessing the reliability and efficacy of the proposed model. Therefore, as a DT prediction component, this study develops a multi-stage predictive model consisting of two feedforward Deep Learning using Neural Networks (DNNs) to determine the final steady-state power of a reactor transient for a nuclear reactor/plant. The goal of the multi-stage model architecture is to convert probabilistic classification to continuous output variables to improve reliability and ease of analysis. Four regression models are developed and tested with input from the first stage model to predict a single value representing the reactor power output. The combined model yields 96% classification accuracy for the first stage and 92% absolute prediction accuracy for the second stage. The development procedure is discussed so that the method can be applied generally to similar systems. An analysis of the role similar models would fill in DTs is performed. △ Less

Submitted 24 November, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.10314 [pdf, other]

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Authors: Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda

Abstract: Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insuffici… ▽ More Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted to SLT 2022

arXiv:2210.09055 [pdf, other]

Data-driven multi-scale modeling and robust optimization of composite structure with uncertainty quantification

Authors: Kazuma Kobayashi, Shoaib Usman, Carlos Castano, Dinesh Kumar, Syed Alam

Abstract: It is important to accurately model materials' properties at lower length scales (micro-level) while translating the effects to the components and/or system level (macro-level) can significantly reduce the amount of experimentation required to develop new technologies. Robustness analysis of fuel and structural performance for harsh environments (such as power uprated reactor systems or aerospace… ▽ More It is important to accurately model materials' properties at lower length scales (micro-level) while translating the effects to the components and/or system level (macro-level) can significantly reduce the amount of experimentation required to develop new technologies. Robustness analysis of fuel and structural performance for harsh environments (such as power uprated reactor systems or aerospace applications) using machine learning-based multi-scale modeling and robust optimization under uncertainties are required. The fiber and matrix material characteristics are potential sources of uncertainty at the microscale. The stacking sequence (angles of stacking and thickness of layers) of composite layers causes meso-scale uncertainties. It is also possible for macro-scale uncertainties to arise from system properties, like the load or the initial conditions. This chapter demonstrates advanced data-driven methods and outlines the specific capability that must be developed/added for the multi-scale modeling of advanced composite materials. This chapter proposes a multi-scale modeling method for composite structures based on a finite element method (FEM) simulation driven by surrogate models/emulators based on microstructurally informed meso-scale materials models to study the impact of operational parameters/uncertainties using machine learning approaches. To ensure optimal composite materials, composite properties are optimized with respect to initial materials volume fraction using data-driven numerical algorithms. △ Less

Submitted 4 November, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Journal ref: Handbook of Smart Energy Systems, 2022

arXiv:2210.00074 [pdf]

Leveraging Industry 4.0 -- Deep Learning, Surrogate Model and Transfer Learning with Uncertainty Quantification Incorporated into Digital Twin for Nuclear System

Authors: M. Rahman, Abid Khan, Sayeed Anowar, Md Al-Imran, Richa Verma, Dinesh Kumar, Kazuma Kobayashi, Syed Alam

Abstract: Industry 4.0 targets the conversion of the traditional industries into intelligent ones through technological revolution. This revolution is only possible through innovation, optimization, interconnection, and rapid decision-making capability. Numerical models are believed to be the key components of Industry 4.0, facilitating quick decision-making through simulations instead of costly experiments… ▽ More Industry 4.0 targets the conversion of the traditional industries into intelligent ones through technological revolution. This revolution is only possible through innovation, optimization, interconnection, and rapid decision-making capability. Numerical models are believed to be the key components of Industry 4.0, facilitating quick decision-making through simulations instead of costly experiments. However, numerical investigation of precise, high-fidelity models for optimization or decision-making is usually time-consuming and computationally expensive. In such instances, data-driven surrogate models are excellent substitutes for fast computational analysis and the probabilistic prediction of the output parameter for new input parameters. The emergence of Internet of Things (IoT) and Machine Learning (ML) has made the concept of surrogate modeling even more viable. However, these surrogate models contain intrinsic uncertainties, originate from modeling defects, or both. These uncertainties, if not quantified and minimized, can produce a skewed result. Therefore, proper implementation of uncertainty quantification techniques is crucial during optimization, cost reduction, or safety enhancement processes analysis. This chapter begins with a brief overview of the concept of surrogate modeling, transfer learning, IoT and digital twins. After that, a detailed overview of uncertainties, uncertainty quantification frameworks, and specifics of uncertainty quantification methodologies for a surrogate model linked to a digital twin is presented. Finally, the use of uncertainty quantification approaches in the nuclear industry has been addressed. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.12146 [pdf]

doi 10.1007/978-3-030-72322-4_149-1

Machine Learning and Artificial Intelligence-Driven Multi-Scale Modeling for High Burnup Accident-Tolerant Fuels for Light Water-Based SMR Applications

Authors: Md. Shamim Hassan, Abid Hossain Khan, Richa Verma, Dinesh Kumar, Kazuma Kobayashi, Shoaib Usman, Syed Alam

Abstract: The concept of small modular reactor has changed the outlook for tackling future energy crises. This new reactor technology is very promising considering its lower investment requirements, modularity, design simplicity, and enhanced safety features. The application of artificial intelligence-driven multi-scale modeling (neutronics, thermal hydraulics, fuel performance, etc.) incorporating Digital… ▽ More The concept of small modular reactor has changed the outlook for tackling future energy crises. This new reactor technology is very promising considering its lower investment requirements, modularity, design simplicity, and enhanced safety features. The application of artificial intelligence-driven multi-scale modeling (neutronics, thermal hydraulics, fuel performance, etc.) incorporating Digital Twin and associated uncertainties in the research of small modular reactors is a recent concept. In this work, a comprehensive study is conducted on the multiscale modeling of accident-tolerant fuels. The application of these fuels in the light water-based small modular reactors is explored. This chapter also focuses on the application of machine learning and artificial intelligence in the design optimization, control, and monitoring of small modular reactors. Finally, a brief assessment of the research gap on the application of artificial intelligence to the development of high burnup composite accident-tolerant fuels is provided. Necessary actions to fulfill these gaps are also discussed. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Journal ref: Handbook of Smart Energy Systems, 2022

arXiv:2205.11099 [pdf, other]

Bézier Flow: a Surface-wise Gradient Descent Method for Multi-objective Optimization

Authors: Akiyoshi Sannai, Yasunari Hikima, Ken Kobayashi, Akinori Tanaka, Naoki Hamada

Abstract: In this paper, we propose a strategy to construct a multi-objective optimization algorithm from a single-objective optimization algorithm by using the Bézier simplex model. Also, we extend the stability of optimization algorithms in the sense of Probability Approximately Correct (PAC) learning and define the PAC stability. We prove that it leads to an upper bound on the generalization with high pr… ▽ More In this paper, we propose a strategy to construct a multi-objective optimization algorithm from a single-objective optimization algorithm by using the Bézier simplex model. Also, we extend the stability of optimization algorithms in the sense of Probability Approximately Correct (PAC) learning and define the PAC stability. We prove that it leads to an upper bound on the generalization with high probability. Furthermore, we show that multi-objective optimization algorithms derived from a gradient descent-based single-objective optimization algorithm are PAC stable. We conducted numerical experiments and demonstrated that our method achieved lower generalization errors than the existing multi-objective optimization algorithm. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2203.15292 [pdf, other]

doi 10.1145/3512290.3528778

A Two-phase Framework with a Bézier Simplex-based Interpolation Method for Computationally Expensive Multi-objective Optimization

Authors: Ryoji Tanabe, Youhei Akimoto, Ken Kobayashi, Hiroshi Umeki, Shinichi Shirakawa, Naoki Hamada

Abstract: This paper proposes a two-phase framework with a Bézier simplex-based interpolation method (TPB) for computationally expensive multi-objective optimization. The first phase in TPB aims to approximate a few Pareto optimal solutions by optimizing a sequence of single-objective scalar problems. The first phase in TPB can fully exploit a state-of-the-art single-objective derivative-free optimizer. The… ▽ More This paper proposes a two-phase framework with a Bézier simplex-based interpolation method (TPB) for computationally expensive multi-objective optimization. The first phase in TPB aims to approximate a few Pareto optimal solutions by optimizing a sequence of single-objective scalar problems. The first phase in TPB can fully exploit a state-of-the-art single-objective derivative-free optimizer. The second phase in TPB utilizes a Bézier simplex model to interpolate the solutions obtained in the first phase. The second phase in TPB fully exploits the fact that a Bézier simplex model can approximate the Pareto optimal solution set by exploiting its simplex structure when a given problem is simplicial. We investigate the performance of TPB on the 55 bi-objective BBOB problems. The results show that TPB performs significantly better than HMO-CMA-ES and some state-of-the-art meta-model-based optimizers. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: This is an accepted version of a paper published in the proceedings of GECCO 2022

arXiv:2202.00380 [pdf, ps, other]

doi 10.1038/s41598-022-18115-w

Machine-learning-enhanced quantum sensors for accurate magnetic field imaging

Authors: Moeta Tsukamoto, Shuji Ito, Kensuke Ogawa, Yuto Ashida, Kento Sasaki, Kensuke Kobayashi

Abstract: Local detection of magnetic fields is crucial for characterizing nano- and micro-materials and has been implemented using various scanning techniques or even diamond quantum sensors. Diamond nanoparticles (nanodiamonds) offer an attractive opportunity to chieve high spatial resolution because they can easily be close to the target within a few 10 nm simply by attaching them to its surface. A physi… ▽ More Local detection of magnetic fields is crucial for characterizing nano- and micro-materials and has been implemented using various scanning techniques or even diamond quantum sensors. Diamond nanoparticles (nanodiamonds) offer an attractive opportunity to chieve high spatial resolution because they can easily be close to the target within a few 10 nm simply by attaching them to its surface. A physical model for such a randomly oriented nanodiamond ensemble (NDE) is available, but the complexity of actual experimental conditions still limits the accuracy of deducing magnetic fields. Here, we demonstrate magnetic field imaging with high accuracy of 1.8 $μ$T combining NDE and machine learning without any physical models. We also discover the field direction dependence of the NDE signal, suggesting the potential application for vector magnetometry and improvement of the existing model. Our method further enriches the performance of NDE to achieve the accuracy to visualize mesoscopic current and magnetism in atomic-layer materials and to expand the applicability in arbitrarily shaped materials, including living organisms. This achievement will bridge machine learning and quantum sensing for accurate measurements. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: 29 pages, 10 figures

arXiv:2112.13208 [pdf, other]

Neural Network Module Decomposition and Recomposition

Authors: Hiroaki Kingetsu, Kenichi Kobayashi, Taiji Suzuki

Abstract: We propose a modularization method that decomposes a deep neural network (DNN) into small modules from a functionality perspective and recomposes them into a new model for some other task. Decomposed modules are expected to have the advantages of interpretability and verifiability due to their small size. In contrast to existing studies based on reusing models that involve retraining, such as a tr… ▽ More We propose a modularization method that decomposes a deep neural network (DNN) into small modules from a functionality perspective and recomposes them into a new model for some other task. Decomposed modules are expected to have the advantages of interpretability and verifiability due to their small size. In contrast to existing studies based on reusing models that involve retraining, such as a transfer learning model, the proposed method does not require retraining and has wide applicability as it can be easily combined with existing functional modules. The proposed method extracts modules using weight masks and can be applied to arbitrary DNNs. Unlike existing studies, it requires no assumption about the network architecture. To extract modules, we designed a learning method and a loss function to maximize shared weights among modules. As a result, the extracted modules can be recomposed without a large increase in the size. We demonstrate that the proposed method can decompose and recompose DNNs with high compression ratio and high accuracy and is superior to the existing method through sharing weights between modules. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2112.12454 [pdf, other]

Cardinality-constrained Distributionally Robust Portfolio Optimization

Authors: Ken Kobayashi, Yuichi Takano, Kazuhide Nakata

Abstract: This paper studies a distributionally robust portfolio optimization model with a cardinality constraint for limiting the number of invested assets. We formulate this model as a mixed-integer semidefinite optimization (MISDO) problem by means of the moment-based ambiguity set of probability distributions of asset returns. To exactly solve large-scale problems, we propose a specialized cutting-plane… ▽ More This paper studies a distributionally robust portfolio optimization model with a cardinality constraint for limiting the number of invested assets. We formulate this model as a mixed-integer semidefinite optimization (MISDO) problem by means of the moment-based ambiguity set of probability distributions of asset returns. To exactly solve large-scale problems, we propose a specialized cutting-plane algorithm that is based on bilevel optimization reformulation. We prove the finite convergence of the algorithm. We also apply a matrix completion technique to lower-level SDO problems to make their problem sizes much smaller. Numerical experiments demonstrate that our cutting-plane algorithm is significantly faster than the state-of-the-art MISDO solver SCIP-SDP. We also show that our portfolio optimization model can achieve good investment performance compared with the conventional robust optimization model based on the ellipsoidal uncertainty set. △ Less

Submitted 21 December, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

arXiv:2109.14108 [pdf, ps, other]

Connected domination in grid graphs

Authors: Masahisa Goto, Koji M. Kobayashi

Abstract: Given an undirected simple graph, a subset of the vertices of the graph is a {\em dominating set} if every vertex not in the subset is adjacent to at least one vertex in the subset. A subset of the vertices of the graph is a {\em connected dominating set} if the subset is a dominating set and the subgraph induced by the subset is connected. In this paper, we determine the minimum cardinality of a… ▽ More Given an undirected simple graph, a subset of the vertices of the graph is a {\em dominating set} if every vertex not in the subset is adjacent to at least one vertex in the subset. A subset of the vertices of the graph is a {\em connected dominating set} if the subset is a dominating set and the subgraph induced by the subset is connected. In this paper, we determine the minimum cardinality of a connected dominating set, called the {\em connected domination number}, of an $m \times n$ grid graph for any $m$ and $n$. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: 44 pages

arXiv:2109.03016 [pdf, other]

doi 10.1007/978-3-030-77599-5_5

SpatialViewer: A Remote Work Sharing Tool that Considers Intimacy Among Workers

Authors: Sicheng Li, Yudai Makioka, Kyousuke Kobayashi, Haoran Xie, Kentaro Takashima

Abstract: Due to the influence of the new coronavirus disease (COVID-19), teleworking has been expanding rapidly. Although existing interactive remote working systems are convenient, they do not allow users to adjust their spatial distance to team members at will, %"Arbitrarily" is probably not the best word here. It means without apparent reason. A better expression might be "at will." and they ignore the… ▽ More Due to the influence of the new coronavirus disease (COVID-19), teleworking has been expanding rapidly. Although existing interactive remote working systems are convenient, they do not allow users to adjust their spatial distance to team members at will, %"Arbitrarily" is probably not the best word here. It means without apparent reason. A better expression might be "at will." and they ignore the discomfort caused by different levels of intimacy. To solve this issue, we propose a telework support system using spatial augmented reality technology. This system calibrates the space in which videos are projected with real space and adjusts the spatial distance between users by changing the position of projections. Users can switch the projection position of the video using hand-wave gestures. We also synchronize audio according to distance to further emphasize the sense of space within the remote interaction: the distance between projection position and user is inversely proportional to the audio volume. We conducted a telework experiment and a questionnaire survey to evaluate our system. The results show that the system enables users to adjust distance according to intimacy and thus improve the users' comfort. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: Proceedings of HCII 2021. 12 pages, 6 figures

arXiv:2106.01415 [pdf, other]

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

Authors: Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda

Abstract: We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for D… ▽ More We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for DVC, which is highly flexible in that no normal speech of the patient is required. First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality. We investigate several design options. Experimental evaluation results demonstrate the potential of our approach to improving the quality of the dysarthric speech while maintaining the speaker identity. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted to Interspeech 2021. 5 pages, 3 figures, 1 table

arXiv:2104.06793 [pdf, other]

Non-autoregressive sequence-to-sequence voice conversion

Authors: Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda

Abstract: This paper proposes a novel voice conversion (VC) method based on non-autoregressive sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S models such as FastSpeech in text-to-speech (TTS), we extend the FastSpeech2 model for the VC problem. We introduce the convolution-augmented Transformer (Conformer) instead of the Transformer, making it possible to capture both local… ▽ More This paper proposes a novel voice conversion (VC) method based on non-autoregressive sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S models such as FastSpeech in text-to-speech (TTS), we extend the FastSpeech2 model for the VC problem. We introduce the convolution-augmented Transformer (Conformer) instead of the Transformer, making it possible to capture both local and global context information from the input sequence. Furthermore, we extend variance predictors to variance converters to explicitly convert the source speaker's prosody components such as pitch and energy into the target speaker. The experimental evaluation with the Japanese speaker dataset, which consists of male and female speakers of 1,000 utterances, demonstrates that the proposed model enables us to perform more stable, faster, and better conversion than autoregressive S2S (AR-S2S) models such as Tacotron2 and Transformer. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: Accepted to ICASSP2021. Demo HP: https://kan-bayashi.github.io/NonARSeq2SeqVC/

arXiv:2104.04679 [pdf, other]

Approximate Bayesian Computation of Bézier Simplices

Authors: Akinori Tanaka, Akiyoshi Sannai, Ken Kobayashi, Naoki Hamada

Abstract: Bézier simplex fitting algorithms have been recently proposed to approximate the Pareto set/front of multi-objective continuous optimization problems. These new methods have shown to be successful at approximating various shapes of Pareto sets/fronts when sample points exactly lie on the Pareto set/front. However, if the sample points scatter away from the Pareto set/front, those methods often lik… ▽ More Bézier simplex fitting algorithms have been recently proposed to approximate the Pareto set/front of multi-objective continuous optimization problems. These new methods have shown to be successful at approximating various shapes of Pareto sets/fronts when sample points exactly lie on the Pareto set/front. However, if the sample points scatter away from the Pareto set/front, those methods often likely suffer from over-fitting. To overcome this issue, in this paper, we extend the Bézier simplex model to a probabilistic one and propose a new learning algorithm of it, which falls into the framework of approximate Bayesian computation (ABC) based on the Wasserstein distance. We also study the convergence property of the Wasserstein ABC algorithm. An extensive experimental evaluation on publicly available problem instances shows that the new algorithm converges on a finite sample. Moreover, it outperforms the deterministic fitting methods on noisy instances. △ Less

Submitted 12 April, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

Report number: RIKEN-iTHEMS-Report-21

arXiv:2103.12328 [pdf, other]

Decomposing Normal and Abnormal Features of Medical Images into Discrete Latent Codes for Content-Based Image Retrieval

Authors: Kazuma Kobayashi, Ryuichiro Hataya, Yusuke Kurose, Mototaka Miyake, Masamichi Takahashi, Akiko Nakagawa, Tatsuya Harada, Ryuji Hamamoto

Abstract: In medical imaging, the characteristics purely derived from a disease should reflect the extent to which abnormal findings deviate from the normal features. Indeed, physicians often need corresponding images without abnormal findings of interest or, conversely, images that contain similar abnormal findings regardless of normal anatomical context. This is called comparative diagnostic reading of me… ▽ More In medical imaging, the characteristics purely derived from a disease should reflect the extent to which abnormal findings deviate from the normal features. Indeed, physicians often need corresponding images without abnormal findings of interest or, conversely, images that contain similar abnormal findings regardless of normal anatomical context. This is called comparative diagnostic reading of medical images, which is essential for a correct diagnosis. To support comparative diagnostic reading, content-based image retrieval (CBIR), which can selectively utilize normal and abnormal features in medical images as two separable semantic components, will be useful. Therefore, we propose a neural network architecture to decompose the semantic components of medical images into two latent codes: normal anatomy code and abnormal anatomy code. The normal anatomy code represents normal anatomies that should have existed if the sample is healthy, whereas the abnormal anatomy code attributes to abnormal changes that reflect deviation from the normal baseline. These latent codes are discretized through vector quantization to enable binary hashing, which can reduce the computational burden at the time of similarity search. By calculating the similarity based on either normal or abnormal anatomy codes or the combination of the two codes, our algorithm can retrieve images according to the selected semantic component from a dataset consisting of brain magnetic resonance images of gliomas. Our CBIR system qualitatively and quantitatively achieves remarkable results. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.02858 [pdf, ps, other]

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

Authors: Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda

Abstract: In this paper, we present an open-source software for develo** a nonparallel voice conversion (VC) system named crank. Although we have released an open-source VC software based on the Gaussian mixture model named sprocket in the last VC Challenge, it is not straightforward to apply any speech corpus because it is necessary to prepare parallel utterances of source and target speakers to model a… ▽ More In this paper, we present an open-source software for develo** a nonparallel voice conversion (VC) system named crank. Although we have released an open-source VC software based on the Gaussian mixture model named sprocket in the last VC Challenge, it is not straightforward to apply any speech corpus because it is necessary to prepare parallel utterances of source and target speakers to model a statistical conversion function. To address this issue, in this study, we developed a new open-source VC software that enables users to model the conversion function by using only a nonparallel speech corpus. For implementing the VC software, we used a vector-quantized variational autoencoder (VQVAE). To rapidly examine the effectiveness of recent technologies developed in this research field, crank also supports several representative works for autoencoder-based VC methods such as the use of hierarchical architectures, cyclic architectures, generative adversarial networks, speaker adversarial training, and neural vocoders. Moreover, it is possible to automatically estimate objective measures such as mel-cepstrum distortion and pseudo mean opinion score based on MOSNet. In this paper, we describe representative functions developed in crank and make brief comparisons by objective evaluations. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: Accepted to ICASSP 2021

arXiv:2102.08014 [pdf, other]

Representing Hierarchical Structure by Using Cone Embedding

Authors: Daisuke Takehara, Kei Kobayashi

Abstract: Graph embedding is becoming an important method with applications in various areas, including social networks and knowledge graph completion. In particular, Poincaré embedding has been proposed to capture the hierarchical structure of graphs, and its effectiveness has been reported. However, most of the existing methods have isometric map**s in the embedding space, and the choice of the origin p… ▽ More Graph embedding is becoming an important method with applications in various areas, including social networks and knowledge graph completion. In particular, Poincaré embedding has been proposed to capture the hierarchical structure of graphs, and its effectiveness has been reported. However, most of the existing methods have isometric map**s in the embedding space, and the choice of the origin point can be arbitrary. This fact is not desirable when the distance from the origin is used as an indicator of hierarchy, as in the case of Poincaré embedding. In this paper, we propose cone embedding, embedding method in a metric cone, which solve these problems, and we gain further benefits: 1) we provide an indicator of hierarchical information that is both geometrically and intuitively natural to interpret, and 2) we can extract the hierarchical structure from a graph embedding output of other methods by learning additional one-dimensional parameters. △ Less

Submitted 10 May, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2012.11782 [pdf, other]

doi 10.1527/tjsai.36-6_C-L44

Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization

Authors: Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, Hiroki Arimura

Abstract: Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired dec… ▽ More Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired decision result. In practice, however, showing only a perturbation vector is often insufficient for users to execute the action. The reason is that if there is an asymmetric interaction among features, such as causality, the total cost of the action is expected to depend on the order of changing features. Therefore, practical CE methods are required to provide an appropriate order of changing features in addition to a perturbation vector. For this purpose, we propose a new framework called Ordered Counterfactual Explanation (OrdCE). We introduce a new objective function that evaluates a pair of an action and an order based on feature interaction. To extract an optimal pair, we propose a mixed-integer linear optimization approach with our objective function. Numerical experiments on real datasets demonstrated the effectiveness of our OrdCE in comparison with unordered CE methods. △ Less

Submitted 14 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: 20 pages, 5 figures, to appear in the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

arXiv:2011.06224 [pdf, other]

Decomposing Normal and Abnormal Features of Medical Images for Content-based Image Retrieval

Authors: Kazuma Kobayashi, Ryuichiro Hataya, Yusuke Kurose, Tatsuya Harada, Ryuji Hamamoto

Abstract: Medical images can be decomposed into normal and abnormal features, which is considered as the compositionality. Based on this idea, we propose an encoder-decoder network to decompose a medical image into two discrete latent codes: a normal anatomy code and an abnormal anatomy code. Using these latent codes, we demonstrate a similarity retrieval by focusing on either normal or abnormal features of… ▽ More Medical images can be decomposed into normal and abnormal features, which is considered as the compositionality. Based on this idea, we propose an encoder-decoder network to decompose a medical image into two discrete latent codes: a normal anatomy code and an abnormal anatomy code. Using these latent codes, we demonstrate a similarity retrieval by focusing on either normal or abnormal features of medical images. △ Less

Submitted 12 November, 2020; originally announced November 2020.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

arXiv:2010.13494 [pdf, other]

One-vs.-One Mitigation of Intersectional Bias: A General Method to Extend Fairness-Aware Binary Classification

Authors: Kenji Kobayashi, Yuri Nakao

Abstract: With the widespread adoption of machine learning in the real world, the impact of the discriminatory bias has attracted attention. In recent years, various methods to mitigate the bias have been proposed. However, most of them have not considered intersectional bias, which brings unfair situations where people belonging to specific subgroups of a protected group are treated worse when multiple sen… ▽ More With the widespread adoption of machine learning in the real world, the impact of the discriminatory bias has attracted attention. In recent years, various methods to mitigate the bias have been proposed. However, most of them have not considered intersectional bias, which brings unfair situations where people belonging to specific subgroups of a protected group are treated worse when multiple sensitive attributes are taken into consideration. To mitigate this bias, in this paper, we propose a method called One-vs.-One Mitigation by applying a process of comparison between each pair of subgroups related to sensitive attributes to the fairness-aware machine learning for binary classification. We compare our method and the conventional fairness-aware binary classification methods in comprehensive settings using three approaches (pre-processing, in-processing, and post-processing), six metrics (the ratio and difference of demographic parity, equalized odds, and equal opportunity), and two real-world datasets (Adult and COMPAS). As a result, our method mitigates the intersectional bias much better than conventional methods in all the settings. With the result, we open up the potential of fairness-aware binary classification for solving more realistic problems occurring when there are multiple sensitive attributes. △ Less

Submitted 26 October, 2020; originally announced October 2020.

ACM Class: I.6.5; I.2.6

arXiv:2010.04446 [pdf, other]

The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders

Authors: Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda

Abstract: In this paper, we present the voice conversion (VC) systems developed at Nagoya University (NU) for the Voice Conversion Challenge 2020 (VCC2020). We aim to determine the effectiveness of two recent significant technologies in VC: sequence-to-sequence (seq2seq) models and autoregressive (AR) neural vocoders. Two respective systems were developed for the two tasks in the challenge: for task 1, we a… ▽ More In this paper, we present the voice conversion (VC) systems developed at Nagoya University (NU) for the Voice Conversion Challenge 2020 (VCC2020). We aim to determine the effectiveness of two recent significant technologies in VC: sequence-to-sequence (seq2seq) models and autoregressive (AR) neural vocoders. Two respective systems were developed for the two tasks in the challenge: for task 1, we adopted the Voice Transformer Network, a Transformer-based seq2seq VC model, and extended it with synthetic parallel data to tackle nonparallel data; for task 2, we used the frame-based cyclic variational autoencoder (CycleVAE) to model the spectral features of a speech waveform and the AR WaveNet vocoder with additional fine-tuning. By comparing with the baseline systems, we confirmed that the seq2seq modeling can improve the conversion similarity and that the use of AR vocoders can improve the naturalness of the converted speech. △ Less

Submitted 9 October, 2020; originally announced October 2020.

Comments: Accepted to the ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

arXiv:2007.15159 [pdf, other]

doi 10.1371/journal.pone.0242099

Prediction of hierarchical time series using structured regularization and its application to artificial neural networks

Authors: Tomokaze Shiratori, Ken Kobayashi, Yuichi Takano

Abstract: This paper discusses the prediction of hierarchical time series, where each upper-level time series is calculated by summing appropriate lower-level time series. Forecasts for such hierarchical time series should be coherent, meaning that the forecast for an upper-level time series equals the sum of forecasts for corresponding lower-level time series. Previous methods for making coherent forecasts… ▽ More This paper discusses the prediction of hierarchical time series, where each upper-level time series is calculated by summing appropriate lower-level time series. Forecasts for such hierarchical time series should be coherent, meaning that the forecast for an upper-level time series equals the sum of forecasts for corresponding lower-level time series. Previous methods for making coherent forecasts consist of two phases: first computing base (incoherent) forecasts and then reconciling those forecasts based on their inherent hierarchical structure. With the aim of improving time series predictions, we propose a structured regularization method for completing both phases simultaneously. The proposed method is based on a prediction model for bottom-level time series and uses a structured regularization term to incorporate upper-level forecasts into the prediction model. We also develop a backpropagation algorithm specialized for application of our method to artificial neural networks for time series prediction. Experimental results using synthetic and real-world datasets demonstrate the superiority of our method in terms of prediction accuracy and computational efficiency. △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2007.05663 [pdf, other]

doi 10.1109/TASLP.2021.3061245

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

Authors: Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

Abstract: In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs). Specifically, as a probabilistic autoregressive generation model with stacked dilated convolution layers, WN achieves high-fidelity audio waveform generatio… ▽ More In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs). Specifically, as a probabilistic autoregressive generation model with stacked dilated convolution layers, WN achieves high-fidelity audio waveform generation. However, the pure-data-driven nature and the lack of prior knowledge of audio signals degrade the pitch controllability of WN. For instance, it is difficult for WN to precisely generate the periodic components of audio signals when the given auxiliary fundamental frequency ($F_{0}$) features are outside the $F_{0}$ range observed in the training data. To address this problem, QPNet with two novel designs is proposed. First, the PDCNN component is applied to dynamically change the network architecture of WN according to the given auxiliary $F_{0}$ features. Second, a cascaded network structure is utilized to simultaneously model the long- and short-term dependencies of quasi-periodic signals such as speech. The performances of single-tone sinusoid and speech generations are evaluated. The experimental results show the effectiveness of the PDCNNs for unseen auxiliary $F_{0}$ features and the effectiveness of the cascaded structure for speech generation. △ Less

Submitted 27 March, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: 15 pages, 12 figures, 11 tables

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1134-1148, 2021

arXiv:2005.12573 [pdf, other]

Learning Global and Local Features of Normal Brain Anatomy for Unsupervised Abnormality Detection

Authors: Kazuma Kobayashi, Ryuichiro Hataya, Yusuke Kurose, Amina Bolatkan, Mototaka Miyake, Hirokazu Watanabe, Masamichi Takahashi, Jun Itami, Tatsuya Harada, Ryuji Hamamoto

Abstract: In real-world clinical practice, overlooking unanticipated findings can result in serious consequences. However, supervised learning, which is the foundation for the current success of deep learning, only encourages models to identify abnormalities that are defined in datasets in advance. Therefore, abnormality detection must be implemented in medical images that are not limited to a specific dise… ▽ More In real-world clinical practice, overlooking unanticipated findings can result in serious consequences. However, supervised learning, which is the foundation for the current success of deep learning, only encourages models to identify abnormalities that are defined in datasets in advance. Therefore, abnormality detection must be implemented in medical images that are not limited to a specific disease category. In this study, we demonstrate an unsupervised learning framework for pixel-wise abnormality detection in brain magnetic resonance imaging captured from a patient population with metastatic brain tumor. Our concept is as follows: If an image reconstruction network can faithfully reproduce the global features of normal anatomy, then the abnormal lesions in unseen images can be identified based on the local difference from those reconstructed as normal by a discriminative network. Both networks are trained on a dataset comprising only normal images without labels. In addition, we devise a metric to evaluate the anatomical fidelity of the reconstructed images and confirm that the overall detection performance is improved when the image reconstruction network achieves a higher score. For evaluation, clinically significant abnormalities are comprehensively segmented. The results show that the area under the receiver operating characteristics curve values for metastatic brain tumors, extracranial metastatic tumors, postoperative cavities, and structural changes are 0.78, 0.61, 0.91, and 0.60, respectively. △ Less

Submitted 8 May, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2003.11750 [pdf]

doi 10.1109/ACCESS.2020.2984007

Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

Authors: Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda

Abstract: In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic featu… ▽ More In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018. △ Less

Submitted 6 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

Comments: 13 pages, 13 figures, 1 table, accepted to publish in IEEE Access

arXiv:2003.00402 [pdf, other]

Why is the Mahalanobis Distance Effective for Anomaly Detection?

Authors: Ryo Kamoi, Kei Kobayashi

Abstract: The Mahalanobis distance-based confidence score, a recently proposed anomaly detection method for pre-trained neural classifiers, achieves state-of-the-art performance on both out-of-distribution (OoD) and adversarial examples detection. This work analyzes why this method exhibits such strong performance in practical settings while imposing an implausible assumption; namely, that class conditional… ▽ More The Mahalanobis distance-based confidence score, a recently proposed anomaly detection method for pre-trained neural classifiers, achieves state-of-the-art performance on both out-of-distribution (OoD) and adversarial examples detection. This work analyzes why this method exhibits such strong performance in practical settings while imposing an implausible assumption; namely, that class conditional distributions of pre-trained features have tied covariance. Although the Mahalanobis distance-based method is claimed to be motivated by classification prediction confidence, we find that its superior performance stems from information not useful for classification. This suggests that the reason the Mahalanobis confidence score works so well is mistaken, and makes use of different information from ODIN, another popular OoD detection method based on prediction confidence. This perspective motivates us to combine these two methods, and the combined detector exhibits improved performance and robustness. These findings provide insight into the behavior of neural classifiers in response to anomalous inputs. △ Less

Submitted 30 April, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

arXiv:1911.06515 [pdf, other]

Likelihood Assignment for Out-of-Distribution Inputs in Deep Generative Models is Sensitive to Prior Distribution Choice

Authors: Ryo Kamoi, Kei Kobayashi

Abstract: Recent work has shown that deep generative models assign higher likelihood to out-of-distribution inputs than to training data. We show that a factor underlying this phenomenon is a mismatch between the nature of the prior distribution and that of the data distribution, a problem found in widely used deep generative models such as VAEs and Glow. While a typical choice for a prior distribution is a… ▽ More Recent work has shown that deep generative models assign higher likelihood to out-of-distribution inputs than to training data. We show that a factor underlying this phenomenon is a mismatch between the nature of the prior distribution and that of the data distribution, a problem found in widely used deep generative models such as VAEs and Glow. While a typical choice for a prior distribution is a standard Gaussian distribution, properties of distributions of real data sets may not be consistent with a unimodal prior distribution. This paper focuses on the relationship between the choice of a prior distribution and the likelihoods assigned to out-of-distribution inputs. We propose the use of a mixture distribution as a prior to make likelihoods assigned by deep generative models sensitive to out-of-distribution inputs. Furthermore, we explain the theoretical advantages of adopting a mixture distribution as the prior, and we present experimental results to support our claims. Finally, we demonstrate that a mixture prior lowers the out-of-distribution likelihood with respect to two pairs of real image data sets: Fashion-MNIST vs. MNIST and CIFAR10 vs. SVHN. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Showing 1–50 of 84 results for author: Kobayashi, K