-
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
Authors:
LLM-jp,
:,
Akiko Aizawa,
Eiji Aramaki,
Bowen Chen,
Fei Cheng,
Hiroyuki Deguchi,
Rintaro Enomoto,
Kazuki Fujii,
Kensuke Fukumoto,
Takuya Fukushima,
Namgi Han,
Yuto Harada,
Chikara Hashimoto,
Tatsuya Hiraoka,
Shohei Hisada,
Sosuke Hosokawa,
Lu Jie,
Keisuke Kamata,
Teruhito Kanazawa,
Hiroki Kanezashi,
Hiroshi Kataoka,
Satoru Katsumata,
Daisuke Kawahara,
Seiya Kawano
, et al. (57 additional authors not shown)
Abstract:
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its…
▽ More
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution
Authors:
Nobuhiro Ueda,
Hideko Habe,
Yoko Matsui,
Akishige Yuguchi,
Seiya Kawano,
Yasutomo Kawanishi,
Sadao Kurohashi,
Koichiro Yoshino
Abstract:
Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users. In real-world reference resolution, a system must ground the verbal information that appears in user interactions to the visual information observed in egocentric views. To this end, we propose a multimodal referen…
▽ More
Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users. In real-world reference resolution, a system must ground the verbal information that appears in user interactions to the visual information observed in egocentric views. To this end, we propose a multimodal reference resolution task and construct a Japanese Conversation dataset for Real-world Reference Resolution (J-CRe3). Our dataset contains egocentric video and dialogue audio of real-world conversations between two people acting as a master and an assistant robot at home. The dataset is annotated with crossmodal tags between phrases in the utterances and the object bounding boxes in the video frames. These tags include indirect reference relations, such as predicate-argument structures and bridging references as well as direct reference relations. We also constructed an experimental model and clarified the challenges in multimodal reference resolution tasks.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety Distance
Authors:
Jesse Reimann,
Nico Mansion,
James Haydon,
Benjamin Bray,
Agnishom Chattopadhyay,
Sota Sato,
Masaki Waga,
Étienne André,
Ichiro Hasuo,
Naoki Ueda,
Yosuke Yokoyama
Abstract:
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) a…
▽ More
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Rapidly Develo** High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese
Authors:
Yikun Sun,
Zhen Wan,
Nobuhiro Ueda,
Sakiko Yahata,
Fei Cheng,
Chenhui Chu,
Sadao Kurohashi
Abstract:
The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly develo** such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos…
▽ More
The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly develo** such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propose an efficient self-instruct method based on GPT-4. We first translate a small amount of English instructions into Japanese and post-edit them to obtain native-level quality. GPT-4 then utilizes them as demonstrations to automatically generate Japanese instruction data. We also construct an evaluation benchmark containing 80 questions across 8 categories, using GPT-4 to automatically assess the response quality of LLMs without human references. The empirical results suggest that the models fine-tuned on our GPT-4 self-instruct data significantly outperformed the Japanese-Alpaca across all three base pre-trained models. Our GPT-4 self-instruct data allowed the LLaMA 13B model to defeat GPT-3.5 (Davinci-003) with a 54.37\% win-rate. The human evaluation exhibits the consistency between GPT-4's assessments and human preference. Our high-quality instruction data and evaluation benchmark have been released here.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Neural Operators Meet Energy-based Theory: Operator Learning for Hamiltonian and Dissipative PDEs
Authors:
Yusuke Tanaka,
Takaharu Yaguchi,
Tomoharu Iwata,
Naonori Ueda
Abstract:
The operator learning has received significant attention in recent years, with the aim of learning a map** between function spaces. Prior works have proposed deep neural networks (DNNs) for learning such a map**, enabling the learning of solution operators of partial differential equations (PDEs). However, these works still struggle to learn dynamics that obeys the laws of physics. This paper…
▽ More
The operator learning has received significant attention in recent years, with the aim of learning a map** between function spaces. Prior works have proposed deep neural networks (DNNs) for learning such a map**, enabling the learning of solution operators of partial differential equations (PDEs). However, these works still struggle to learn dynamics that obeys the laws of physics. This paper proposes Energy-consistent Neural Operators (ENOs), a general framework for learning solution operators of PDEs that follows the energy conservation or dissipation law from observed solution trajectories. We introduce a novel penalty function inspired by the energy-based theory of physics for training, in which the energy functional is modeled by another DNN, allowing one to bias the outputs of the DNN-based solution operators to ensure energetic consistency without explicit PDEs. Experiments on multiple physical systems show that ENO outperforms existing DNN models in predicting solutions from data, especially in super-resolution settings.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Meta-learning of Physics-informed Neural Networks for Efficiently Solving Newly Given PDEs
Authors:
Tomoharu Iwata,
Yusuke Tanaka,
Naonori Ueda
Abstract:
We propose a neural network-based meta-learning method to efficiently solve partial differential equation (PDE) problems. The proposed method is designed to meta-learn how to solve a wide variety of PDE problems, and uses the knowledge for solving newly given PDE problems. We encode a PDE problem into a problem representation using neural networks, where governing equations are represented by coef…
▽ More
We propose a neural network-based meta-learning method to efficiently solve partial differential equation (PDE) problems. The proposed method is designed to meta-learn how to solve a wide variety of PDE problems, and uses the knowledge for solving newly given PDE problems. We encode a PDE problem into a problem representation using neural networks, where governing equations are represented by coefficients of a polynomial function of partial derivatives, and boundary conditions are represented by a set of point-condition pairs. We use the problem representation as an input of a neural network for predicting solutions, which enables us to efficiently predict problem-specific solutions by the forwarding process of the neural network without updating model parameters. To train our model, we minimize the expected error when adapted to a PDE problem based on the physics-informed neural network framework, by which we can evaluate the error even when solutions are unknown. We demonstrate that our proposed method outperforms existing methods in predicting solutions of PDE problems.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Deep-learning Real/Bogus classification for the Tomo-e Gozen transient survey
Authors:
Ichiro Takahashi,
Ryo Hamasaki,
Naonori Ueda,
Masaomi Tanaka,
Nozomu Tominaga,
Shigeyuki Sako,
Ryou Ohsawa,
Naoki Yoshida
Abstract:
We present a deep neural network Real/Bogus classifier that improves classification performance in the Tomo-e Gozen transient survey by handling label errors in the training data. In the wide-field, high-frequency transient survey with Tomo-e Gozen, the performance of conventional convolutional neural network classifier is not sufficient as about $10^6$ bogus detections appear every night. In need…
▽ More
We present a deep neural network Real/Bogus classifier that improves classification performance in the Tomo-e Gozen transient survey by handling label errors in the training data. In the wide-field, high-frequency transient survey with Tomo-e Gozen, the performance of conventional convolutional neural network classifier is not sufficient as about $10^6$ bogus detections appear every night. In need of a better classifier, we have developed a new two-stage training method. In this training method, label errors in the training data are first detected by normal supervised learning classification, and then they are unlabeled and used for training of semi-supervised learning. For actual observed data, the classifier with this method achieves an area under the curve (AUC) of 0.9998 and a false positive rate (FPR) of 0.0002 at true positive rate (TPR) of 0.9. This training method saves relabeling effort by humans and works better on training data with a high fraction of label errors. By implementing the developed classifier in the Tomo-e Gozen pipeline, the number of transient candidates was reduced to $\sim$40 objects per night, which is $\sim$1/130 of the previous version, while maintaining the recovery rate of real transients. This enables more efficient selection of targets for follow-up observations.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Excess risk analysis for epistemic uncertainty with application to variational inference
Authors:
Futoshi Futami,
Tomoharu Iwata,
Naonori Ueda,
Issei Sato,
Masashi Sugiyama
Abstract:
Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis…
▽ More
Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis exists on EU, although many numerical experiments have been conducted on it. In this study, we analyze the EU of supervised learning in approximate Bayesian inference by focusing on its excess risk. First, we theoretically show the novel relations between generalization error and the widely used EU measurements, such as the variance and mutual information of predictive distribution, and derive their convergence behaviors. Next, we clarify how the objective function of VI regularizes the EU. With this analysis, we propose a new objective function for VI that directly controls the prediction performance and the EU based on the PAC-Bayesian theory. Numerical experiments show that our algorithm significantly improves the EU evaluation over the existing VI methods.
△ Less
Submitted 11 October, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Loss function based second-order Jensen inequality and its application to particle variational inference
Authors:
Futoshi Futami,
Tomoharu Iwata,
Naonori Ueda,
Issei Sato,
Masashi Sugiyama
Abstract:
Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure…
▽ More
Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.
△ Less
Submitted 9 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Photometric classification of HSC transients using machine learning
Authors:
Ichiro Takahashi,
Nao Suzuki,
Naoki Yasuda,
Akisato Kimura,
Naonori Ueda,
Masaomi Tanaka,
Nozomu Tominaga,
Naoki Yoshida
Abstract:
The advancement of technology has resulted in a rapid increase in supernova (SN) discoveries. The Subaru/Hyper Suprime-Cam (HSC) transient survey, conducted from fall 2016 through spring 2017, yielded 1824 SN candidates. This gave rise to the need for fast type classification for spectroscopic follow-up and prompted us to develop a machine learning algorithm using a deep neural network (DNN) with…
▽ More
The advancement of technology has resulted in a rapid increase in supernova (SN) discoveries. The Subaru/Hyper Suprime-Cam (HSC) transient survey, conducted from fall 2016 through spring 2017, yielded 1824 SN candidates. This gave rise to the need for fast type classification for spectroscopic follow-up and prompted us to develop a machine learning algorithm using a deep neural network (DNN) with highway layers. This machine is trained by actual observed cadence and filter combinations such that we can directly input the observed data array into the machine without any interpretation. We tested our model with a dataset from the LSST classification challenge (Deep Drilling Field). Our classifier scores an area under the curve (AUC) of 0.996 for binary classification (SN Ia or non-SN Ia) and 95.3% accuracy for three-class classification (SN Ia, SN Ibc, or SN II). Application of our binary classification to HSC transient data yields an AUC score of 0.925. With two weeks of HSC data since the first detection, this classifier achieves 78.1% accuracy for binary classification, and the accuracy increases to 84.2% with the full dataset. This paper discusses the potential use of machine learning for SN type classification purposes.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
A System for Worldwide COVID-19 Information Aggregation
Authors:
Akiko Aizawa,
Frederic Bergeron,
Junjie Chen,
Fei Cheng,
Katsuhiko Hayashi,
Kentaro Inui,
Hiroyoshi Ito,
Daisuke Kawahara,
Masaru Kitsuregawa,
Hirokazu Kiyomaru,
Masaki Kobayashi,
Takashi Kodama,
Sadao Kurohashi,
Qianying Liu,
Masaki Matsubara,
Yusuke Miyao,
Atsuyuki Morishima,
Yugo Murawaki,
Kazumasa Omura,
Haiyue Song,
Eiichiro Sumita,
Shinji Suzuki,
Ribeka Tanaka,
Yu Tanaka,
Masashi Toyoda
, et al. (4 additional authors not shown)
Abstract:
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-…
▽ More
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.
△ Less
Submitted 11 October, 2020; v1 submitted 27 July, 2020;
originally announced August 2020.
-
Translation Between Waves, wave2wave
Authors:
Tsuyoshi Okita,
Hirotaka Hachiya,
Sozo Inoue,
Naonori Ueda
Abstract:
The understanding of sensor data has been greatly improved by advanced deep learning methods with big data. However, available sensor data in the real world are still limited, which is called the opportunistic sensor problem. This paper proposes a new variant of neural machine translation seq2seq to deal with continuous signal waves by introducing the window-based (inverse-) representation to adap…
▽ More
The understanding of sensor data has been greatly improved by advanced deep learning methods with big data. However, available sensor data in the real world are still limited, which is called the opportunistic sensor problem. This paper proposes a new variant of neural machine translation seq2seq to deal with continuous signal waves by introducing the window-based (inverse-) representation to adaptively represent partial shapes of waves and the iterative back-translation model for high-dimensional data. Experimental results are shown for two real-life data: earthquake and activity translation. The performance improvements of one-dimensional data was about 46% in test loss and that of high-dimensional data was about 1625% in perplexity with regard to the original seq2seq.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Three-dimensional Generative Adversarial Nets for Unsupervised Metal Artifact Reduction
Authors:
Megumi Nakao,
Keiho Imanishi,
Nobuhiro Ueda,
Yuichiro Imai,
Tadaaki Kirita,
Tetsuya Matsuda
Abstract:
The reduction of metal artifacts in computed tomography (CT) images, specifically for strong artifacts generated from multiple metal objects, is a challenging issue in medical imaging research. Although there have been some studies on supervised metal artifact reduction through the learning of synthesized artifacts, it is difficult for simulated artifacts to cover the complexity of the real physic…
▽ More
The reduction of metal artifacts in computed tomography (CT) images, specifically for strong artifacts generated from multiple metal objects, is a challenging issue in medical imaging research. Although there have been some studies on supervised metal artifact reduction through the learning of synthesized artifacts, it is difficult for simulated artifacts to cover the complexity of the real physical phenomena that may be observed in X-ray propagation. In this paper, we introduce metal artifact reduction methods based on an unsupervised volume-to-volume translation learned from clinical CT images. We construct three-dimensional adversarial nets with a regularized loss function designed for metal artifacts from multiple dental fillings. The results of experiments using 915 CT volumes from real patients demonstrate that the proposed framework has an outstanding capacity to reduce strong artifacts and to recover underlying missing voxels, while preserving the anatomical features of soft tissues and tooth structures from the original images.
△ Less
Submitted 21 August, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Anomaly Detection with Inexact Labels
Authors:
Tomoharu Iwata,
Machiko Toyoda,
Shotaro Tora,
Naonori Ueda
Abstract:
We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which…
▽ More
We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which is our extension of the area under the ROC curve (AUC) for inexact labels. The proposed method trains an anomaly score function so that the smooth approximation of the inexact AUC increases while anomaly scores for non-anomalous instances become low. We model the anomaly score function by a neural network-based unsupervised anomaly detection method, e.g., autoencoders. The proposed method performs well even when only a small number of inexact labels are available by incorporating an unsupervised anomaly detection mechanism with inexact AUC maximization. Using various datasets, we experimentally demonstrate that our proposed method improves the anomaly detection performance with inexact anomaly labels, and outperforms existing unsupervised and supervised anomaly detection and multiple instance learning methods.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information
Authors:
Maya Okawa,
Tomoharu Iwata,
Takeshi Kurashima,
Yusuke Tanaka,
Hiroyuki Toda,
Naonori Ueda
Abstract:
Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the considerat…
▽ More
Predicting when and where events will occur in cities, like taxi pick-ups, crimes, and vehicle collisions, is a challenging and important problem with many applications in fields such as urban planning, transportation optimization and location-based marketing. Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the consideration of the rich contextual factors that affect event occurrence, such as weather, social activities, geographical characteristics, and traffic. In this paper, we propose \textsf{DMPP} (Deep Mixture Point Processes), a point process model for predicting spatio-temporal events with the use of rich contextual information; a key advance is its incorporation of the heterogeneous and high-dimensional context available in image and text data. Specifically, we design the intensity of our point process model as a mixture of kernels, where the mixture weights are modeled by a deep neural network. This formulation allows us to automatically learn the complex nonlinear effects of the contextual factors on event occurrence. At the same time, this formulation makes analytical integration over the intensity, which is required for point process estimation, tractable. We use real-world data sets from different domains to demonstrate that DMPP has better predictive performance than existing methods.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
Fully Neural Network based Model for General Temporal Point Processes
Authors:
Takahiro Omi,
Naonori Ueda,
Kazuyuki Aihara
Abstract:
A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasi…
▽ More
A temporal point process is a mathematical model for a time series of discrete events, which covers various applications. Recently, recurrent neural network (RNN) based models have been developed for point processes and have been found effective. RNN based models usually assume a specific functional form for the time course of the intensity function of a point process (e.g., exponentially decreasing or increasing with the time since the most recent event). However, such an assumption can restrict the expressive power of the model. We herein propose a novel RNN based model in which the time course of the intensity function is represented in a general manner. In our approach, we first model the integral of the intensity function using a feedforward neural network and then obtain the intensity function as its derivative. This approach enables us to both obtain a flexible model of the intensity function and exactly evaluate the log-likelihood function, which contains the integral of the intensity function, without any numerical approximations. Our model achieves competitive or superior performances compared to the previous state-of-the-art methods for both synthetic and real datasets.
△ Less
Submitted 10 January, 2020; v1 submitted 23 May, 2019;
originally announced May 2019.
-
The Hyper Suprime-Cam SSP Transient Survey in COSMOS: Overview
Authors:
Naoki Yasuda,
Masaomi Tanaka,
Nozomu Tominaga,
Ji-an Jiang,
Takashi J. Moriya,
Tomoki Morokuma,
Nao Suzuki,
Ichiro Takahashi,
Masaki S. Yamaguchi,
Keiichi Maeda,
Masao Sako,
Shiro Ikeda,
Akisato Kimura,
Mikio Morii,
Naonori Ueda,
Naoki Yoshida,
Chien-Hsiu Lee,
Sherry H. Suyu,
Yutaka Komiyama,
Nicolas Regnault,
David Rubin
Abstract:
We present an overview of a deep transient survey of the COSMOS field with the Subaru Hyper Suprime-Cam (HSC). The survey was performed for the 1.77 deg$^2$ ultra-deep layer and 5.78 deg$^2$ deep layer in the Subaru Strategic Program over 6- and 4-month periods from 2016 to 2017, respectively. The ultra-deep layer shows a median depth per epoch of 26.4, 26.3, 26.0, 25.6, and 24.6 mag in $g$, $r$,…
▽ More
We present an overview of a deep transient survey of the COSMOS field with the Subaru Hyper Suprime-Cam (HSC). The survey was performed for the 1.77 deg$^2$ ultra-deep layer and 5.78 deg$^2$ deep layer in the Subaru Strategic Program over 6- and 4-month periods from 2016 to 2017, respectively. The ultra-deep layer shows a median depth per epoch of 26.4, 26.3, 26.0, 25.6, and 24.6 mag in $g$, $r$, $i$, $z$, and $y$ bands, respectively; the deep layer is $\sim0.6$ mag shallower. In total, 1,824 supernova candidates were identified. Based on light curve fitting and derived light curve shape parameter, we classified 433 objects as Type Ia supernovae (SNe); among these candidates, 129 objects have spectroscopic or COSMOS2015 photometric redshifts and 58 objects are located at $z > 1$. Our unique dataset doubles the number of Type Ia SNe at $z > 1$ and enables various time-domain analyses of Type II SNe, high redshift superluminous SNe, variable stars, and active galactic nuclei.
△ Less
Submitted 21 April, 2019;
originally announced April 2019.
-
Finding Appropriate Traffic Regulations via Graph Convolutional Networks
Authors:
Tomoharu Iwata,
Takuma Otsuka,
Hitoshi Shimizu,
Hiroshi Sawada,
Futoshi Naya,
Naonori Ueda
Abstract:
Appropriate traffic regulations, e.g. planned road closure, are important in congested events. Crowd simulators have been used to find appropriate regulations by simulating multiple scenarios with different regulations. However, this approach requires multiple simulation runs, which are time-consuming. In this paper, we propose a method to learn a function that outputs regulation effects given the…
▽ More
Appropriate traffic regulations, e.g. planned road closure, are important in congested events. Crowd simulators have been used to find appropriate regulations by simulating multiple scenarios with different regulations. However, this approach requires multiple simulation runs, which are time-consuming. In this paper, we propose a method to learn a function that outputs regulation effects given the current traffic situation as inputs. If the function is learned using the training data of many simulation runs in advance, we can obtain an appropriate regulation efficiently by bypassing simulations for the current situation. We use the graph convolutional networks for modeling the function, which enable us to find regulations even for unseen areas. With the proposed method, we construct a graph for each area, where a node represents a road, and an edge represents the road connection. By running crowd simulations with various regulations on various areas, we generate traffic situations and regulation effects. The graph convolutional networks are trained to output the regulation effects given the graph with the traffic situation information as inputs. With experiments using real-world road networks and a crowd simulator, we demonstrate that the proposed method can find a road to close that reduces the average time needed to reach the destination.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Unsupervised Object Matching for Relational Data
Authors:
Tomoharu Iwata,
Naonori Ueda
Abstract:
We propose an unsupervised object matching method for relational data, which finds matchings between objects in different relational datasets without correspondence information. For example, the proposed method matches documents in different languages in multi-lingual document-word networks without dictionaries nor alignment information. The proposed method assumes that each object has latent vect…
▽ More
We propose an unsupervised object matching method for relational data, which finds matchings between objects in different relational datasets without correspondence information. For example, the proposed method matches documents in different languages in multi-lingual document-word networks without dictionaries nor alignment information. The proposed method assumes that each object has latent vectors, and the probability of neighbor objects is modeled by the inner-product of the latent vectors, where the neighbors are generated by short random walks over the relations. The latent vectors are estimated by maximizing the likelihood of the neighbors for each dataset. The estimated latent vectors contain hidden structural information of each object in the given relational dataset. Then, the proposed method linearly projects the latent vectors for all the datasets onto a common latent space shared across all datasets by matching the distributions while preserving the structural information. The projection matrix is estimated by minimizing the distance between the latent vector distributions with an orthogonality regularizer. To represent the distributions effectively, we use the kernel embedding of distributions that hold high-order moment information about a distribution as an element in a reproducing kernel Hilbert space, which enables us to calculate the distance between the distributions without density estimation. The structural information encoded in the latent vectors are preserved by using the orthogonality regularizer. We demonstrate the effectiveness of the proposed method with experiments using real-world multi-lingual document-word relational datasets and multiple user-item relational datasets.
△ Less
Submitted 26 December, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Partial AUC Maximization via Nonlinear Scoring Functions
Authors:
Naonori Ueda,
Akinori Fu**o
Abstract:
We propose a method for maximizing a partial area under a receiver operating characteristic (ROC) curve (pAUC) for binary classification tasks. In binary classification tasks, accuracy is the most commonly used as a measure of classifier performance. In some applications such as anomaly detection and diagnostic testing, accuracy is not an appropriate measure since prior probabilties are often grea…
▽ More
We propose a method for maximizing a partial area under a receiver operating characteristic (ROC) curve (pAUC) for binary classification tasks. In binary classification tasks, accuracy is the most commonly used as a measure of classifier performance. In some applications such as anomaly detection and diagnostic testing, accuracy is not an appropriate measure since prior probabilties are often greatly biased. Although in such cases the pAUC has been utilized as a performance measure, few methods have been proposed for directly maximizing the pAUC. This optimization is achieved by using a scoring function. The conventional approach utilizes a linear function as the scoring function. In contrast we newly introduce nonlinear scoring functions for this purpose. Specifically, we present two types of nonlinear scoring functions based on generative models and deep neural networks. We show experimentally that nonlinear scoring fucntions improve the conventional methods through the application of a binary classification of real and bogus objects obtained with the Hyper Suprime-Cam on the Subaru telescope.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Few-shot learning of neural networks from scratch by pseudo example optimization
Authors:
Akisato Kimura,
Zoubin Ghahramani,
Koh Takeuchi,
Tomoharu Iwata,
Naonori Ueda
Abstract:
In this paper, we propose a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are more robust against overfitt…
▽ More
In this paper, we propose a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are more robust against overfitting than the network we want to train. Different from almost all the previous work for knowledge distillation that requires a large amount of labeled training data, the proposed method requires only a small amount of training data. Instead, we introduce pseudo training examples that are optimized as a part of model parameters. Experimental results for several benchmark datasets demonstrate that the proposed method outperformed all the other baselines, such as naive training of the target model and standard knowledge distillation.
△ Less
Submitted 5 July, 2018; v1 submitted 8 February, 2018;
originally announced February 2018.
-
Single-epoch supernova classification with deep convolutional neural networks
Authors:
Akisato Kimura,
Ichiro Takahashi,
Masaomi Tanaka,
Naoki Yasuda,
Naonori Ueda,
Naoki Yoshida
Abstract:
Supernovae Type-Ia (SNeIa) play a significant role in exploring the history of the expansion of the Universe, since they are the best-known standard candles with which we can accurately measure the distance to the objects. Finding large samples of SNeIa and investigating their detailed characteristics have become an important issue in cosmology and astronomy. Existing methods relied on a photometr…
▽ More
Supernovae Type-Ia (SNeIa) play a significant role in exploring the history of the expansion of the Universe, since they are the best-known standard candles with which we can accurately measure the distance to the objects. Finding large samples of SNeIa and investigating their detailed characteristics have become an important issue in cosmology and astronomy. Existing methods relied on a photometric approach that first measures the luminance of supernova candidates precisely and then fits the results to a parametric function of temporal changes in luminance. However, it inevitably requires multi-epoch observations and complex luminance measurements. In this work, we present a novel method for classifying SNeIa simply from single-epoch observation images without any complex measurements, by effectively integrating the state-of-the-art computer vision methodology into the standard photometric approach. Our method first builds a convolutional neural network for estimating the luminance of supernovae from telescope images, and then constructs another neural network for the classification, where the estimated luminance and observation dates are used as features for classification. Both of the neural networks are integrated into a single deep neural network to classify SNeIa directly from observation images. Experimental results show the effectiveness of the proposed method and reveal classification performance comparable to existing photometric methods with multi-epoch observations.
△ Less
Submitted 30 November, 2017;
originally announced November 2017.
-
Multi-output Polynomial Networks and Factorization Machines
Authors:
Mathieu Blondel,
Vlad Niculae,
Takuma Otsuka,
Naonori Ueda
Abstract:
Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation…
▽ More
Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.
△ Less
Submitted 4 November, 2017; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Machine-learning Selection of Optical Transients in Subaru/Hyper Suprime-Cam Survey
Authors:
Mikio Morii,
Shiro Ikeda,
Nozomu Tominaga,
Masaomi Tanaka,
Tomoki Morokuma,
Katsuhiko Ishiguro,
Junji Yamato,
Naonori Ueda,
Naotaka Suzuki,
Naoki Yasuda,
Naoki Yoshida
Abstract:
We present an application of machine-learning (ML) techniques to source selection in the optical transient survey data with Hyper Suprime-Cam (HSC) on the Subaru telescope. Our goal is to select real transient events accurately and in a timely manner out of a large number of false candidates, obtained with the standard difference-imaging method. We have developed the transient selector which is ba…
▽ More
We present an application of machine-learning (ML) techniques to source selection in the optical transient survey data with Hyper Suprime-Cam (HSC) on the Subaru telescope. Our goal is to select real transient events accurately and in a timely manner out of a large number of false candidates, obtained with the standard difference-imaging method. We have developed the transient selector which is based on majority voting of three ML machines of AUC Boosting, Random Forest, and Deep Neural Network. We applied it to our observing runs of Subaru-HSC in 2015 May and August, and proved it to be efficient in selecting optical transients. The false positive rate was 1.0% at the true positive rate of 90% in the magnitude range of 22.0--25.0 mag for the former data. For the latter run, we successfully detected and reported ten candidates of supernovae within the same day as the observation. From these runs, we learned the following lessons: (1) the training using artificial objects is effective in filtering out false candidates, especially for faint objects, and (2) combination of ML by majority voting is advantageous.
△ Less
Submitted 11 September, 2016;
originally announced September 2016.
-
Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms
Authors:
Mathieu Blondel,
Masakazu Ishihata,
Akinori Fu**o,
Naonori Ueda
Abstract:
Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks. In this paper, we revisit both models from a unified perspective. Based on this new view, we study the properties of both models and propose new efficient training algorithms. Key to our approach is to cast parameter learning as a low…
▽ More
Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks. In this paper, we revisit both models from a unified perspective. Based on this new view, we study the properties of both models and propose new efficient training algorithms. Key to our approach is to cast parameter learning as a low-rank symmetric tensor estimation problem, which we solve by multi-convex optimization. We demonstrate our approach on regression and recommender system tasks.
△ Less
Submitted 29 July, 2016;
originally announced July 2016.
-
Higher-Order Factorization Machines
Authors:
Mathieu Blondel,
Akinori Fu**o,
Naonori Ueda,
Masakazu Ishihata
Abstract:
Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient training algorithm for higher-order FMs (HOFMs). In this paper, we present the first generic yet efficient algorithms for training arbitrary-order HOFMs. We al…
▽ More
Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient training algorithm for higher-order FMs (HOFMs). In this paper, we present the first generic yet efficient algorithms for training arbitrary-order HOFMs. We also present new variants of HOFMs with shared parameters, which greatly reduce model size and prediction times while maintaining similar accuracy. We demonstrate the proposed approaches on four different link prediction tasks.
△ Less
Submitted 14 October, 2016; v1 submitted 25 July, 2016;
originally announced July 2016.
-
Collapsed Variational Bayes Inference of Infinite Relational Model
Authors:
Katsuhiko Ishiguro,
Issei Sato,
Naonori Ueda
Abstract:
The Infinite Relational Model (IRM) is a probabilistic model for relational data clustering that partitions objects into clusters based on observed relationships. This paper presents Averaged CVB (ACVB) solutions for IRM, convergence-guaranteed and practically useful fast Collapsed Variational Bayes (CVB) inferences. We first derive ordinary CVB and CVB0 for IRM based on the lower bound maximizati…
▽ More
The Infinite Relational Model (IRM) is a probabilistic model for relational data clustering that partitions objects into clusters based on observed relationships. This paper presents Averaged CVB (ACVB) solutions for IRM, convergence-guaranteed and practically useful fast Collapsed Variational Bayes (CVB) inferences. We first derive ordinary CVB and CVB0 for IRM based on the lower bound maximization. CVB solutions yield deterministic iterative procedures for inferring IRM given the truncated number of clusters. Our proposal includes CVB0 updates of hyperparameters including the concentration parameter of the Dirichlet Process, which has not been studied in the literature. To make the CVB more practically useful, we further study the CVB inference in two aspects. First, we study the convergence issues and develop a convergence-guaranteed algorithm for any CVB-based inferences called ACVB, which enables automatic convergence detection and frees non-expert practitioners from difficult and costly manual monitoring of inference processes. Second, we present a few techniques for speeding up IRM inferences. In particular, we describe the linear time inference of CVB0, allowing the IRM for larger relational data uses. The ACVB solutions of IRM showed comparable or better performance compared to existing inference methods in experiments, and provide deterministic, faster, and easier convergence detection.
△ Less
Submitted 16 September, 2014;
originally announced September 2014.
-
Flexible construction of hierarchical scale-free networks with general exponent
Authors:
J. C. Nacher,
N. Ueda,
M. Kanehisa,
T. Akutsu
Abstract:
Extensive studies have been done to understand the principles behind architectures of real networks. Recently, evidences for hierarchical organization in many real networks have also been reported. Here, we present a new hierarchical model which reproduces the main experimental properties observed in real networks: scale-free of degree distribution $P(k)$ (frequency of the nodes that are connect…
▽ More
Extensive studies have been done to understand the principles behind architectures of real networks. Recently, evidences for hierarchical organization in many real networks have also been reported. Here, we present a new hierarchical model which reproduces the main experimental properties observed in real networks: scale-free of degree distribution $P(k)$ (frequency of the nodes that are connected to $k$ other nodes decays as a power-law $P(k)\sim k^{-γ}$) and power-law scaling of the clustering coefficient $C(k)\sim k^{-1}$. The major novelties of our model can be summarized as follows: {\it (a)} The model generates networks with scale-free distribution for the degree of nodes with general exponent $γ> 2$, and arbitrarily close to any specified value, being able to reproduce most of the observed hierarchical scale-free topologies. In contrast, previous models can not obtain values of $γ> 2.58$. {\it (b)} Our model has structural flexibility because {\it (i)} it can incorporate various types of basic building blocks (e.g., triangles, tetrahedrons and, in general, fully connected clusters of $n$ nodes) and {\it (ii)} it allows a large variety of configurations (i.e., the model can use more than $n-1$ copies of basic blocks of $n$ nodes). The structural features of our proposed model might lead to a better understanding of architectures of biological and non-biological networks.
△ Less
Submitted 6 September, 2004;
originally announced September 2004.
-
Clustering under the line graph transformation: Application to reaction network
Authors:
J. C. Nacher,
N. Ueda,
T. Yamada,
M. Kanehisa,
T. Akutsu
Abstract:
Many real networks can be understood as two complementary networks with two kind of nodes. This is the case of metabolic networks where the first network has chemical compounds as nodes and the second one has nodes as reactions. The second network can be related to the first one by a technique called line graph transformation (i.e., edges in an initial network are transformed into nodes). Recent…
▽ More
Many real networks can be understood as two complementary networks with two kind of nodes. This is the case of metabolic networks where the first network has chemical compounds as nodes and the second one has nodes as reactions. The second network can be related to the first one by a technique called line graph transformation (i.e., edges in an initial network are transformed into nodes). Recently, the main topological properties of the metabolic networks have been properly described by means of a hierarchical model. In our work, we apply the line graph transformation to a hierarchical network and the clustering coefficient $C(k)$ is calculated for the transformed network, where $k$ is the node degree. While $C(k)$ follows the scaling law $C(k)\sim k^{-1.1}$ for the initial hierarchical network, $C(k)$ scales weakly as $k^{0.08}$ for the transformed network. These results indicate that the reaction network can be identified as a degree-independent clustering network.
△ Less
Submitted 18 August, 2004; v1 submitted 31 March, 2004;
originally announced March 2004.