-
Unusual photoinduced crystal structure dynamics in TaTe$_2$ with double zigzag chain superstructure
Authors:
J. Koga,
Y. Chiashi,
A. Nakamura,
T. Akiba,
H. Takahashi,
T. Shimojima,
S. Ishiwata,
K. Ishizaka
Abstract:
Transition metal dichalcogenides with superperiodic lattice distortions have been widely investigated as the platform of ultrafast structural phase manipulations. Here we performed ultrafast electron diffraction on room-temperature TaTe$_2$, which exhibits peculiar double zigzag chain pattern of Ta atoms. From the time-dependent electron diffraction pattern, we revealed a photoinduced change in th…
▽ More
Transition metal dichalcogenides with superperiodic lattice distortions have been widely investigated as the platform of ultrafast structural phase manipulations. Here we performed ultrafast electron diffraction on room-temperature TaTe$_2$, which exhibits peculiar double zigzag chain pattern of Ta atoms. From the time-dependent electron diffraction pattern, we revealed a photoinduced change in the crystal structure occurring within <0.5 ps, though there is no corresponding high-temperature equilibrium phase. We further clarified the slower response (~1.5 ps) reflecting the lattice thermalization. Our result suggests the unusual ultrafast crystal structure dynamics specific to the non-equilibrium transient process in TaTe$_2$.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Tidal Disruption of Planetesimals from an Eccentric Debris Disk Following a White Dwarf Natal Kick
Authors:
Tatsuya Akiba,
Selah McIntyre,
Ann-Marie Madigan
Abstract:
The surfaces of many white dwarfs are polluted by metals, implying a recent accretion event. The tidal disruption of planetesimals is a viable source of white dwarf pollution and offers a unique window into the composition of exoplanet systems. The question of how planetary material enters the tidal disruption radius of the white dwarf is currently unresolved. Using a series of $N$-body simulation…
▽ More
The surfaces of many white dwarfs are polluted by metals, implying a recent accretion event. The tidal disruption of planetesimals is a viable source of white dwarf pollution and offers a unique window into the composition of exoplanet systems. The question of how planetary material enters the tidal disruption radius of the white dwarf is currently unresolved. Using a series of $N$-body simulations, we explore the response of the surrounding planetesimal debris disk as the white dwarf receives a natal kick caused by anisotropic mass loss on the asymptotic giant branch. We find that the kick can form an apse-aligned, eccentric debris disk in the range 30 to 240 AU which corresponds to the orbits of Neptune, the Kuiper Belt, and the scattered disk in our solar system. In addition, many planetesimals beyond 240 AU flip to counter-rotating orbits. Assuming an isotropic distribution of kicks, we predict that approximately 80% of white dwarf debris disks should exhibit significant apsidal alignment and fraction of counter-rotating orbits. The eccentric disk is able to efficiently and continuously torque planetesimals onto radial, star-grazing orbits. We show that the kick causes both an initial burst in tidal disruption events as well as an extended period of 100 Myr where tidal disruption rates are consistent with observed mass accretion rates on polluted white dwarfs.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Evolutionary Optimization of Model Merging Recipes
Authors:
Takuya Akiba,
Makoto Shing,
Yu** Tang,
Qi Sun,
David Ha
Abstract:
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically disc…
▽ More
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Unveiling the orbital-selective electronic band reconstruction through the structural phase transition in TaTe$_2$
Authors:
Natsuki Mitsuishi,
Yusuke Sugita,
Tomoki Akiba,
Yuki Takahashi,
Masato Sakano,
Koji Horiba,
Hiroshi Kumigashira,
Hidefumi Takahashi,
Shintaro Ishiwata,
Yukitoshi Motome,
Kyoko Ishizaka
Abstract:
Tantalum ditelluride TaTe$_2$ belongs to the family of layered transition metal dichalcogenides but exhibits a unique structural phase transition at around 170 K that accompanies the rearrangement of the Ta atomic network from a "ribbon chain" to a "butterfly-like" pattern. While multiple mechanisms including Fermi surface nesting and chemical bonding instabilities have been intensively discussed,…
▽ More
Tantalum ditelluride TaTe$_2$ belongs to the family of layered transition metal dichalcogenides but exhibits a unique structural phase transition at around 170 K that accompanies the rearrangement of the Ta atomic network from a "ribbon chain" to a "butterfly-like" pattern. While multiple mechanisms including Fermi surface nesting and chemical bonding instabilities have been intensively discussed, the origin of this transition remains elusive. Here we investigate the electronic structure of single-crystalline TaTe$_2$ with a particular focus on its modifications through the phase transition, by employing core-level and angle-resolved photoemission spectroscopy combined with first-principles calculations. Temperature-dependent core-level spectroscopy demonstrates a splitting of the Ta $4f$ core-level spectra through the phase transition indicative of the Ta-dominated electronic state reconstruction. Low-energy electronic state measurements further reveal an unusual kink-like band reconstruction occurring at the Brillouin zone boundary, which cannot be explained by Fermi surface nesting or band folding effects. On the basis of the orbital-projected band calculations, this band reconstruction is mainly attributed to the modifications of specific Ta $5d$ states, namely the $d_{XY}$ orbitals (the ones elongating along the ribbon chains) at the center Ta sites of the ribbon chains. The present results highlight the strong orbital-dependent electronic state reconstruction through the phase transition in this system and provide fundamental insights towards understanding complex electron-lattice-bond coupled phenomena.
△ Less
Submitted 10 February, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Reprocessing Models for the Optical Light Curves of Hypervariable Quasars from the Sloan Digital Sky Survey Reverberation Map** Project
Authors:
Tatsuya Akiba,
Jason Dexter,
William Brandt,
Luis C. Ho,
Yasaman Homayouni,
Donald P. Schneider,
Yue Shen,
Jonathan R. Trump
Abstract:
We explore reprocessing models for a sample of 17 hypervariable quasars, taken from the Sloan Digital Sky Survey Reverberation Map** (SDSS-RM) project, which all show coordinated optical luminosity hypervariability with amplitudes of factors $\gtrsim 2$ between 2014 and 2020. We develop and apply reprocessing models for quasar light curves in simple geometries that are likely to be representativ…
▽ More
We explore reprocessing models for a sample of 17 hypervariable quasars, taken from the Sloan Digital Sky Survey Reverberation Map** (SDSS-RM) project, which all show coordinated optical luminosity hypervariability with amplitudes of factors $\gtrsim 2$ between 2014 and 2020. We develop and apply reprocessing models for quasar light curves in simple geometries that are likely to be representative of quasar inner environments. In addition to the commonly investigated thin-disk model, we include the thick-disk and hemisphere geometries. The thick-disk geometry could, for instance, represent a magnetically-elevated disk, whereas the hemisphere model can be interpreted as a first-order approximation for any optically-thick out-of-plane material caused by outflows/winds, warped/tilted disks, etc. Of the 17 quasars in our sample, eleven are best-fit by a hemisphere geometry, five are classified as thick disks, and both models fail for just one object. We highlight the successes and shortcomings of our thermal reprocessing models in case studies of four quasars that are representative of the sample. While reprocessing is unlikely to explain all of the variability we observe in quasars, we present our classification scheme as a starting point for revealing the likely geometries of reprocessing for quasars in our sample and hypervariable quasars in general.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Anisotropic Star Clusters around Recoiling Supermassive Black Holes
Authors:
Tatsuya Akiba,
Ann-Marie Madigan
Abstract:
Gravitational wave recoil kicks from merging supermassive black hole binaries can have a profound effect on the surrounding stellar population. In this work, we study the dynamic and kinematic properties of nuclear star clusters following a recoil kick. We show that these post-kick structures present unique signatures that can provide key insight to observational searches for recoiling supermassiv…
▽ More
Gravitational wave recoil kicks from merging supermassive black hole binaries can have a profound effect on the surrounding stellar population. In this work, we study the dynamic and kinematic properties of nuclear star clusters following a recoil kick. We show that these post-kick structures present unique signatures that can provide key insight to observational searches for recoiling supermassive black holes. In Akiba & Madigan (2021), we showed that an in-plane recoil kick turns a circular disk into a lopsided, eccentric disk such as the one we observe in the Andromeda nucleus. Building on this work, here we explore many recoil kick angles as well as initial stellar configurations. For a circular disk of stars, an in-plane kick causes strong apsidal alignment with a significant fraction of the disk becoming retrograde at large radii. If initial orbits are highly eccentric, an in-plane kick forms a bar-like structure made up of two anti-aligned lopsided disks. An out-of-plane kick causes clustering in the argument of periapsis, $ω$, regardless of the initial eccentricity distribution. Initially isotropic configurations form anisotropies in the form of a torus of eccentric orbits oriented perpendicular to the recoil kick. Post-kick surface density and velocity maps are presented in each case to highlight the distinct, observable structures of these systems.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Carleman linearization approach for chemical kinetics integration toward quantum computation
Authors:
Takaki Akiba,
Youhi Morii,
Kaoru Maruta
Abstract:
The Harrow, Hassidim, Lloyd (HHL) algorithm is a quantum algorithm expected to accelerate solving large-scale linear ordinary differential equations (ODEs). To apply the HHL to non-linear problems such as chemical reactions, the system must be linearized. In this study, Carleman linearization was utilized to transform nonlinear first-order ODEs of chemical reactions into linear ODEs. Although this…
▽ More
The Harrow, Hassidim, Lloyd (HHL) algorithm is a quantum algorithm expected to accelerate solving large-scale linear ordinary differential equations (ODEs). To apply the HHL to non-linear problems such as chemical reactions, the system must be linearized. In this study, Carleman linearization was utilized to transform nonlinear first-order ODEs of chemical reactions into linear ODEs. Although this linearization theoretically requires the generation of an infinite matrix, the original nonlinear equations can be reconstructed. For the practical use, the linearized system should be truncated with finite size and analysis precision can be determined by the extent of the truncation. Matrix should be sufficiently large so that the precision is satisfied because quantum computers can treat. Our method was applied to a one-variable nonlinear dy/dt = -y^2 system to investigate the effect of truncation orders in Carleman linearization and time step size on the absolute error. Subsequently, two zero-dimensional homogeneous ignition problems for H2/air and CH4/air gas mixtures were solved. The results revealed that the proposed method could accurately reproduce reference data. Furthermore, an increase in the truncation order in Carleman linearization improved accuracy even with a large time-step size. Thus, our approach can provide accurate numerical simulations rapidly for complex combustion systems.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Spin-orbit-derived giant magnetoresistance in a layered magnetic semiconductor AgCrSe2
Authors:
Hidefumi Takahashi,
Tomoki Akiba,
Alex Hiro Mayo,
Kazuto Akiba,
Atsushi Miyake,
Masashi Tokunaga,
Hitoshi Mori,
Ryotaro Arita,
Shintaro Ishiwata
Abstract:
Two-dimensional magnetic materials have recently attracted great interest due to their unique functions as the electric field control of a magnetic phase and the anomalous spin Hall effect. For such remarkable functions, a spin-orbit coupling (SOC) serves as an essential ingredient. Here we report a giant positive magnetoresistance in a layered magnetic semiconductor AgCrSe2, which is a manifestat…
▽ More
Two-dimensional magnetic materials have recently attracted great interest due to their unique functions as the electric field control of a magnetic phase and the anomalous spin Hall effect. For such remarkable functions, a spin-orbit coupling (SOC) serves as an essential ingredient. Here we report a giant positive magnetoresistance in a layered magnetic semiconductor AgCrSe2, which is a manifestation of the subtle combination of the SOC and Zeeman-type spin splitting. When the carrier concentration approaches the critical value of 2.5\times10^18 cm^-3, a sizable positive magnetoresistance of ~400 % emerges upon the application of magnetic fields normal to the conducting layers. Based on the magneto-Seebeck effect and the first-principles calculations, the unconventional magnetoresistance is ascribable to the enhancement of effective carrier mass in the SOC induced J = 3/2 state, which is tuned to the Fermi level through the Zeeman splitting enhanced by the p-d coupling. This study demonstrates a new aspect of the SOC-derived magnetotransport in two-dimensional magnetic semiconductors, paving the way to novel spintronic functions.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
On the Formation of an Eccentric Nuclear Disk following the Gravitational Recoil Kick of a Supermassive Black Hole
Authors:
Tatsuya Akiba,
Ann-Marie Madigan
Abstract:
The anisotropic emission of gravitational waves during the merger of two supermassive black holes can result in a recoil kick of the merged remnant. We show here that eccentric nuclear disks - stellar disks of eccentric, apse-aligned orbits - can directly form as a result. An initially circular disk of stars will align orthogonal to the black hole kick direction with a distinctive 'tick-mark' ecce…
▽ More
The anisotropic emission of gravitational waves during the merger of two supermassive black holes can result in a recoil kick of the merged remnant. We show here that eccentric nuclear disks - stellar disks of eccentric, apse-aligned orbits - can directly form as a result. An initially circular disk of stars will align orthogonal to the black hole kick direction with a distinctive 'tick-mark' eccentricity distribution and a spiral pattern in mean anomaly.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Giant enhancement of cryogenic thermopower by polar structural instability in the pressurized semimetal MoTe2
Authors:
Hidefumi Takahashi,
Kento Hasegawa,
Tomoki Akiba,
Hideaki Sakai,
Mohammad Saeed Bahramy,
Shintaro Ishiwata
Abstract:
We found that a high mobility semimetal 1T'-MoTe2 shows a significant pressure-dependent change in the cryogenic thermopower in the vicinity of the critical pressure, where the polar structural transition disappears. With the application of a high pressure of 0.75 GPa, while the resistivity becomes as low as 10 μΩcm, thermopower reached the maximum value of 60 μVK-1 at 25 K, leading to a giant the…
▽ More
We found that a high mobility semimetal 1T'-MoTe2 shows a significant pressure-dependent change in the cryogenic thermopower in the vicinity of the critical pressure, where the polar structural transition disappears. With the application of a high pressure of 0.75 GPa, while the resistivity becomes as low as 10 μΩcm, thermopower reached the maximum value of 60 μVK-1 at 25 K, leading to a giant thermoelectric power factor of 300 μWK-2cm-1. Based on semiquantitative analyses, the origin of this behavior is discussed in terms of inelastic electron-phonon scattering enhanced by the softening of zone center phonon modes associated with the polar structural instability.
△ Less
Submitted 12 November, 2019;
originally announced November 2019.
-
Team PFDet's Methods for Open Images Challenge 2019
Authors:
Yusuke Niitani,
Toru Ogawa,
Shuji Suzuki,
Takuya Akiba,
Tommi Kerola,
Kohei Ozaki,
Shotaro Sano
Abstract:
We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively.
We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively.
△ Less
Submitted 25 October, 2019;
originally announced October 2019.
-
Chainer: A Deep Learning Framework for Accelerating the Research Cycle
Authors:
Seiya Tokui,
Ryosuke Okuta,
Takuya Akiba,
Yusuke Niitani,
Toru Ogawa,
Shunta Saito,
Shuji Suzuki,
Kota Uenishi,
Brian Vogel,
Hiroyuki Yamazaki Vincent
Abstract:
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units…
▽ More
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Optuna: A Next-generation Hyperparameter Optimization Framework
Authors:
Takuya Akiba,
Shotaro Sano,
Toshihiko Yanase,
Takeru Ohta,
Masanori Koyama
Abstract:
The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purpo…
▽ More
The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (https://github.com/pfnet/optuna/).
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation
Authors:
Mitsuru Kusumoto,
Takuya Inoue,
Gentaro Watanabe,
Takuya Akiba,
Masanori Koyama
Abstract:
Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous…
▽ More
Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by 36%~81%, which outperforms the reduction achieved by other methods.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects
Authors:
Yusuke Niitani,
Takuya Akiba,
Tommi Kerola,
Toru Ogawa,
Shotaro Sano,
Shuji Suzuki
Abstract:
Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account,…
▽ More
Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account, one possibility is to use pretrained models to detect the presence of the unverified objects. However, the performance of such a strategy depends largely on the power of the pretrained model. In this study, we propose part-aware sampling, a method that uses human intuition for the hierarchical relation between objects. In terse terms, our method works by making assumptions like "a bounding box for a car should contain a bounding box for a tire". We demonstrate the power of our method on OID and compare the performance against a method based on a pretrained model. Our method also won the first and second place on the public and private test sets of the Google AI Open Images Competition 2018.
△ Less
Submitted 21 April, 2019; v1 submitted 27 November, 2018;
originally announced November 2018.
-
PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track
Authors:
Takuya Akiba,
Tommi Kerola,
Yusuke Niitani,
Toru Ogawa,
Shotaro Sano,
Shuji Suzuki
Abstract:
We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle.
We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
Adversarial Attacks and Defences Competition
Authors:
Alexey Kurakin,
Ian Goodfellow,
Samy Bengio,
Yinpeng Dong,
Fangzhou Liao,
Ming Liang,
Tianyu Pang,
Jun Zhu,
Xiaolin Hu,
Cihang Xie,
Jianyu Wang,
Zhishuai Zhang,
Zhou Ren,
Alan Yuille,
Sangxia Huang,
Yao Zhao,
Yuzhe Zhao,
Zhonglin Han,
Junjiajia Long,
Yerkebulan Berdibekov,
Takuya Akiba,
Seiya Tokui,
Motoki Abe
Abstract:
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several o…
▽ More
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several of the top-placing teams.
△ Less
Submitted 30 March, 2018;
originally announced April 2018.
-
Variance-based Gradient Compression for Efficient Distributed Deep Learning
Authors:
Yusuke Tsuzuku,
Hiroto Imachi,
Takuya Akiba
Abstract:
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently communicate gradients, causing severe bottlenecks, especially on lower bandwidth connections. A few methods have been proposed to compress gradient for efficient comm…
▽ More
Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently communicate gradients, causing severe bottlenecks, especially on lower bandwidth connections. A few methods have been proposed to compress gradient for efficient communication, but they either suffer a low compression ratio or significantly harm the resulting model accuracy, particularly when applied to convolutional neural networks. To address these issues, we propose a method to reduce the communication overhead of distributed deep learning. Our key observation is that gradient updates can be delayed until an unambiguous (high amplitude, low variance) gradient has been calculated. We also present an efficient algorithm to compute the variance with negligible additional cost. We experimentally show that our method can achieve very high compression ratio while maintaining the result model accuracy. We also analyze the efficiency using computation and communication cost models and provide the evidence that this method enables distributed deep learning for many scenarios with commodity environments.
△ Less
Submitted 19 February, 2018; v1 submitted 16 February, 2018;
originally announced February 2018.
-
ShakeDrop Regularization for Deep Residual Learning
Authors:
Yoshihiro Yamada,
Masakazu Iwamura,
Takuya Akiba,
Koichi Kise
Abstract:
Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied…
▽ More
Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well.
△ Less
Submitted 6 January, 2020; v1 submitted 7 February, 2018;
originally announced February 2018.
-
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
Authors:
Takuya Akiba,
Shuji Suzuki,
Keisuke Fukuda
Abstract:
We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also desc…
▽ More
We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.
△ Less
Submitted 12 November, 2017;
originally announced November 2017.
-
ChainerMN: Scalable Distributed Deep Learning Framework
Authors:
Takuya Akiba,
Keisuke Fukuda,
Shuji Suzuki
Abstract:
One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distribu…
▽ More
One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.
△ Less
Submitted 31 October, 2017;
originally announced October 2017.
-
Anticorrelation between polar lattice instability and superconductivity in the Weyl semimetal candidate MoTe2
Authors:
H. Takahashi,
T. Akiba,
K. Imura,
T. Shiino,
K. Deguchi,
N. K. Sato,
H. Sakai,
M. S. Bahramy,
S. Ishiwata
Abstract:
The relation between the polar structural instability and superconductivity in a Weyl semimetal candidate MoTe2 has been clarified by finely controlled physical and chemical pressure. The physical pressure as well as the chemical pressure, i.e., the Se substitution for Te, enhances the superconducting transition temperature Tc at around the critical pressure where the polar structure transition di…
▽ More
The relation between the polar structural instability and superconductivity in a Weyl semimetal candidate MoTe2 has been clarified by finely controlled physical and chemical pressure. The physical pressure as well as the chemical pressure, i.e., the Se substitution for Te, enhances the superconducting transition temperature Tc at around the critical pressure where the polar structure transition disappears. From the heat capacity and thermopower measurements, we ascribe the significant enhancement of Tc at the critical pressure to a subtle modification of the phonon dispersion or the semimetallic band structure upon the polar-to-nonpolar transition. On the other hand, the physical pressure, which strongly reduces the interlayer distance, is more effective on the suppression of the polar structural transition and the enhancement of Tc as compared with the chemical pressure, which emphasizes the importance of the interlayer coupling on the structural and superconducting instability in MoTe2.
△ Less
Submitted 7 March, 2017;
originally announced March 2017.
-
Cut Tree Construction from Massive Graphs
Authors:
Takuya Akiba,
Yoichi Iwata,
Yosuke Sameshima,
Naoto Mizuno,
Yosuke Yano
Abstract:
The construction of cut trees (also known as Gomory-Hu trees) for a given graph enables the minimum-cut size of the original graph to be obtained for any pair of vertices. Cut trees are a powerful back-end for graph management and mining, as they support various procedures related to the minimum cut, maximum flow, and connectivity. However, the crucial drawback with cut trees is the computational…
▽ More
The construction of cut trees (also known as Gomory-Hu trees) for a given graph enables the minimum-cut size of the original graph to be obtained for any pair of vertices. Cut trees are a powerful back-end for graph management and mining, as they support various procedures related to the minimum cut, maximum flow, and connectivity. However, the crucial drawback with cut trees is the computational cost of their construction. In theory, a cut tree is built by applying a maximum flow algorithm for $n$ times, where $n$ is the number of vertices. Therefore, naive implementations of this approach result in cubic time complexity, which is obviously too slow for today's large-scale graphs. To address this issue, in the present study, we propose a new cut-tree construction algorithm tailored to real-world networks. Using a series of experiments, we demonstrate that the proposed algorithm is several orders of magnitude faster than previous algorithms and it can construct cut trees for billion-scale graphs.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Authors:
Takuya Akiba,
Kenko Nakamura,
Taro Takaguchi
Abstract:
Analysis and modeling of networked objects are fundamental pieces of modern data mining. Most real-world networks, from biological to social ones, are known to have common structural properties. These properties allow us to model the growth processes of networks and to develop useful algorithms. One remarkable example is the fractality of networks, which suggests the self-similar organization of g…
▽ More
Analysis and modeling of networked objects are fundamental pieces of modern data mining. Most real-world networks, from biological to social ones, are known to have common structural properties. These properties allow us to model the growth processes of networks and to develop useful algorithms. One remarkable example is the fractality of networks, which suggests the self-similar organization of global network structure. To determine the fractality of a network, we need to solve the so-called box-covering problem, where preceding algorithms are not feasible for large-scale networks. The lack of an efficient algorithm prevents us from investigating the fractal nature of large-scale networks. To overcome this issue, we propose a new box-covering algorithm based on recently emerging sketching techniques. We theoretically show that it works in near-linear time with a guarantee of solution accuracy. In experiments, we have confirmed that the algorithm enables us to study the fractality of million-scale networks for the first time. We have observed that its outputs are sufficiently accurate and that its time and space requirements are orders of magnitude smaller than those of previous algorithms.
△ Less
Submitted 26 September, 2016;
originally announced September 2016.
-
Branch-and-Reduce Exponential/FPT Algorithms in Practice: A Case Study of Vertex Cover
Authors:
Takuya Akiba,
Yoichi Iwata
Abstract:
We investigate the gap between theory and practice for exact branching algorithms. In theory, branch-and-reduce algorithms currently have the best time complexity for numerous important problems. On the other hand, in practice, state-of-the-art methods are based on different approaches, and the empirical efficiency of such theoretical algorithms have seldom been investigated probably because they…
▽ More
We investigate the gap between theory and practice for exact branching algorithms. In theory, branch-and-reduce algorithms currently have the best time complexity for numerous important problems. On the other hand, in practice, state-of-the-art methods are based on different approaches, and the empirical efficiency of such theoretical algorithms have seldom been investigated probably because they are seemingly inefficient because of the plethora of complex reduction rules. In this paper, we design a branch-and-reduce algorithm for the vertex cover problem using the techniques developed for theoretical algorithms and compare its practical performance with other state-of-the-art empirical methods. The results indicate that branch-and-reduce algorithms are actually quite practical and competitive with other state-of-the-art approaches for several kinds of instances, thus showing the practical impact of theoretical research on branching algorithms.
△ Less
Submitted 10 November, 2014;
originally announced November 2014.
-
Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling
Authors:
Takuya Akiba,
Yoichi Iwata,
Yuichi Yoshida
Abstract:
We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of ver…
▽ More
We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.
△ Less
Submitted 16 April, 2013;
originally announced April 2013.
-
Effects of Language Modeling on Speech-driven Question Answering
Authors:
Tomoyosi Akiba,
Atsushi Fujii,
Katunobu Itou
Abstract:
We integrate automatic speech recognition (ASR) and question answering (QA) to realize a speech-driven QA system, and evaluate its performance. We adapt an N-gram language model to natural language questions, so that the input of our system can be recognized with a high accuracy. We target WH-questions which consist of the topic part and fixed phrase used to ask about something. We first produce…
▽ More
We integrate automatic speech recognition (ASR) and question answering (QA) to realize a speech-driven QA system, and evaluate its performance. We adapt an N-gram language model to natural language questions, so that the input of our system can be recognized with a high accuracy. We target WH-questions which consist of the topic part and fixed phrase used to ask about something. We first produce a general N-gram model intended to recognize the topic and emphasize the counts of the N-grams that correspond to the fixed phrases. Given a transcription by the ASR engine, the QA engine extracts the answer candidates from target documents. We propose a passage retrieval method robust against recognition errors in the transcription. We use the QA test collection produced in NTCIR, which is a TREC-style evaluation workshop, and show the effectiveness of our method by means of experiments.
△ Less
Submitted 10 July, 2004;
originally announced July 2004.
-
Unsupervised Topic Adaptation for Lecture Speech Retrieval
Authors:
Atsushi Fujii,
Katunobu Itou,
Tomoyosi Akiba,
Tetsuya Ishikawa
Abstract:
We are develo** a cross-media information retrieval system, in which users can view specific segments of lecture videos by submitting text queries. To produce a text index, the audio track is extracted from a lecture video and a transcription is generated by automatic speech recognition. In this paper, to improve the quality of our retrieval system, we extensively investigate the effects of ad…
▽ More
We are develo** a cross-media information retrieval system, in which users can view specific segments of lecture videos by submitting text queries. To produce a text index, the audio track is extracted from a lecture video and a transcription is generated by automatic speech recognition. In this paper, to improve the quality of our retrieval system, we extensively investigate the effects of adapting acoustic and language models on speech recognition. We perform an MLLR-based method to adapt an acoustic model. To obtain a corpus for language model adaptation, we use the textbook for a target lecture to search a Web collection for the pages associated with the lecture topic. We show the effectiveness of our method by means of experiments.
△ Less
Submitted 10 July, 2004;
originally announced July 2004.
-
A Cross-media Retrieval System for Lecture Videos
Authors:
Atsushi Fujii,
Katunobu Itou,
Tomoyosi Akiba,
Tetsuya Ishikawa
Abstract:
We propose a cross-media lecture-on-demand system, in which users can selectively view specific segments of lecture videos by submitting text queries. Users can easily formulate queries by using the textbook associated with a target lecture, even if they cannot come up with effective keywords. Our system extracts the audio track from a target lecture video, generates a transcription by large voc…
▽ More
We propose a cross-media lecture-on-demand system, in which users can selectively view specific segments of lecture videos by submitting text queries. Users can easily formulate queries by using the textbook associated with a target lecture, even if they cannot come up with effective keywords. Our system extracts the audio track from a target lecture video, generates a transcription by large vocabulary continuous speech recognition, and produces a text index. Experimental results showed that by adapting speech recognition to the topic of the lecture, the recognition accuracy increased and the retrieval accuracy was comparable with that obtained by human transcription.
△ Less
Submitted 13 September, 2003;
originally announced September 2003.