-
Spatio-temporal reconstruction of substance dynamics using compressed sensing in multi-spectral magnetic resonance spectroscopic imaging
Authors:
Utako Yamamoto,
Hirohiko Imai,
Kei Sano,
Masayuki Ohzeki,
Tetsuya Matsuda,
Toshiyuki Tanaka
Abstract:
The objective of our study is to observe dynamics of multiple substances in vivo with high temporal resolution from multi-spectral magnetic resonance spectroscopic imaging (MRSI) data. The multi-spectral MRSI can effectively separate spectral peaks of multiple substances and is useful to measure spatial distributions of substances. However it is difficult to measure time-varying substance distribu…
▽ More
The objective of our study is to observe dynamics of multiple substances in vivo with high temporal resolution from multi-spectral magnetic resonance spectroscopic imaging (MRSI) data. The multi-spectral MRSI can effectively separate spectral peaks of multiple substances and is useful to measure spatial distributions of substances. However it is difficult to measure time-varying substance distributions directly by ordinary full sampling because the measurement requires a significantly long time. In this study, we propose a novel method to reconstruct the spatio-temporal distributions of substances from randomly undersampled multi-spectral MRSI data on the basis of compressed sensing (CS) and the partially separable function model with base spectra of substances. In our method, we have employed spatio-temporal sparsity and temporal smoothness of the substance distributions as prior knowledge to perform CS. The effectiveness of our method has been evaluated using phantom data sets of glass tubes filled with glucose or lactate solution in increasing amounts over time and animal data sets of a tumor-bearing mouse to observe the metabolic dynamics involved in the Warburg effect in vivo. The reconstructed results are consistent with the expected behaviors, showing that our method can reconstruct the spatio-temporal distribution of substances with a temporal resolution of four seconds which is extremely short time scale compared with that of full sampling. Since this method utilizes only prior knowledge naturally assumed for the spatio-temporal distributions of substances and is independent of the number of the spectral and spatial dimensions or the acquisition sequence of MRSI, it is expected to contribute to revealing the underlying substance dynamics in MRSI data already acquired or to be acquired in the future.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Compiling ONNX Neural Network Models Using MLIR
Authors:
Tian **,
Gheorghe-Teodor Bercea,
Tung D. Le,
Tong Chen,
Gong Su,
Haruki Imai,
Yasushi Negishi,
Anh Leu,
Kevin O'Brien,
Kiyokuni Kawachiya,
Alexandre E. Eichenberger
Abstract:
Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source…
▽ More
Deep neural network models are becoming increasingly popular and have been used in various tasks such as computer vision, speech recognition, and natural language processing. Machine learning models are commonly trained in a resource-rich environment and then deployed in a distinct environment such as high availability machines or edge devices. To assist the portability of models, the open-source community has proposed the Open Neural Network Exchange (ONNX) standard. In this paper, we present a high-level, preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models described in the ONNX format. Onnx-mlir is an open-source compiler implemented using the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project. Onnx-mlir relies on the MLIR concept of dialects to implement its functionality. We propose here two new dialects: (1) an ONNX specific dialect that encodes the ONNX standard semantics, and (2) a loop-based dialect to provide for a common lowering point for all ONNX dialect operations. Each intermediate representation facilitates its own characteristic set of graph-level and loop-based optimizations respectively. We illustrate our approach by following several models through the proposed representations and we include some early optimization work and performance results.
△ Less
Submitted 30 September, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Profiling based Out-of-core Hybrid Method for Large Neural Networks
Authors:
Yuki Ito,
Haruki Imai,
Tung Le Duc,
Yasushi Negishi,
Kiyokuni Kawachiya,
Ryo Matsumiya,
Toshio Endo
Abstract:
GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swap** method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movem…
▽ More
GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swap** method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swap** or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50 GB memory on a single GPU with 16 GB memory. Compared with in-core cases, performance degradation was 38 \% on x86 machine and 28 \% on POWER9 machine.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Fast and Accurate 3D Medical Image Segmentation with Data-swap** Method
Authors:
Haruki Imai,
Samuel Matzek,
Tung D. Le,
Yasushi Negishi,
Kiyokuni Kawachiya
Abstract:
Deep neural network models used for medical image segmentation are large because they are trained with high-resolution three-dimensional (3D) images. Graphics processing units (GPUs) are widely used to accelerate the trainings. However, the memory on a GPU is not large enough to train the models. A popular approach to tackling this problem is patch-based method, which divides a large image into sm…
▽ More
Deep neural network models used for medical image segmentation are large because they are trained with high-resolution three-dimensional (3D) images. Graphics processing units (GPUs) are widely used to accelerate the trainings. However, the memory on a GPU is not large enough to train the models. A popular approach to tackling this problem is patch-based method, which divides a large image into small patches and trains the models with these small patches. However, this method would degrade the segmentation quality if a target object spans multiple patches. In this paper, we propose a novel approach for 3D medical image segmentation that utilizes the data-swap**, which swaps out intermediate data from GPU memory to CPU memory to enlarge the effective GPU memory size, for training high-resolution 3D medical images without patching. We carefully tuned parameters in the data-swap** method to obtain the best training performance for 3D U-Net, a widely used deep neural network model for medical image segmentation. We applied our tuning to train 3D U-Net with full-size images of 192 x 192 x 192 voxels in brain tumor dataset. As a result, communication overhead, which is the most important issue, was reduced by 17.1%. Compared with the patch-based method for patches of 128 x 128 x 128 voxels, our training for full-size images achieved improvement on the mean Dice score by 4.48% and 5.32 % for detecting whole tumor sub-region and tumor core sub-region, respectively. The total training time was reduced from 164 hours to 47 hours, resulting in 3.53 times of acceleration.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
TFLMS: Large Model Support in TensorFlow by Graph Rewriting
Authors:
Tung D. Le,
Haruki Imai,
Yasushi Negishi,
Kiyokuni Kawachiya
Abstract:
While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the computational graph of a neural network, in which swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. In particular, we…
▽ More
While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the computational graph of a neural network, in which swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. In particular, we first revise the concept of a computational graph by defining a concrete semantics for variables in a graph. We then formally show how to derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. To realize our approach, we developed a module in TensorFlow, named TFLMS. TFLMS is published as a pull request in the TensorFlow repository for contributing to the TensorFlow community. With TFLMS, we were able to train ResNet-50 and 3DUnet with 4.7x and 2x larger batch size, respectively. In particular, we were able to train 3DUNet using images of size of $192^3$ for image segmentation, which, without TFLMS, had been done only by dividing the images to smaller images, which affects the accuracy.
△ Less
Submitted 2 October, 2019; v1 submitted 5 July, 2018;
originally announced July 2018.
-
Profile-guided memory optimization for deep neural networks
Authors:
Taro Sekiyama,
Takashi Imamichi,
Haruki Imai,
Rudy Raymond
Abstract:
Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI. Such DNNs however require huge amounts of memory to store weights and intermediate results (e.g., activations, feature maps, etc.) in propagation. This requirement makes it difficult to run the DNNs on devices with limited, hard-to-extend memory, degrades the runni…
▽ More
Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI. Such DNNs however require huge amounts of memory to store weights and intermediate results (e.g., activations, feature maps, etc.) in propagation. This requirement makes it difficult to run the DNNs on devices with limited, hard-to-extend memory, degrades the running time performance, and restricts the design of network models. We address this challenge by develo** a novel profile-guided memory optimization to efficiently and quickly allocate memory blocks during the propagation in DNNs. The optimization utilizes a simple and fast heuristic algorithm based on the two-dimensional rectangle packing problem. Experimenting with well-known neural network models, we confirm that our method not only reduces the memory consumption by up to $49.5\%$ but also accelerates training and inference by up to a factor of four thanks to the rapidity of the memory allocation and the ability to use larger mini-batch sizes.
△ Less
Submitted 26 April, 2018;
originally announced April 2018.
-
Spatially Coupled Quasi-Cyclic Quantum LDPC Codes
Authors:
Manabu Hagiwara,
Kenta Kasai,
Hideki Imai,
Kohichi Sakaniwa
Abstract:
We face the following dilemma for designing low-density parity-check codes (LDPC) for quantum error correction. 1) The row weights of parity-check should be large: The minimum distances are bounded above by the minimum row weights of parity-check matrices of constituent classical codes. Small minimum distance tends to result in poor decoding performance at the error-floor region. 2) The row weight…
▽ More
We face the following dilemma for designing low-density parity-check codes (LDPC) for quantum error correction. 1) The row weights of parity-check should be large: The minimum distances are bounded above by the minimum row weights of parity-check matrices of constituent classical codes. Small minimum distance tends to result in poor decoding performance at the error-floor region. 2) The row weights of parity-check matrices should not be large: The sum-product decoding performance at the water-fall region is degraded as the row weight increases. Recently, Kudekar et al. showed spatially-coupled (SC) LDPC codes exhibit capacity-achieving performance for classical channels. SC LDPC codes have both large row weight and capacity-achieving error-floor and water-fall performance. In this paper, we design SC LDPC-CSS (Calderbank, Shor and Steane) codes for quantum error correction over the depolarizing channels.
△ Less
Submitted 15 February, 2011;
originally announced February 2011.
-
Homophonic Coding Design for Communication Systems Employing the Encoding-Encryption Paradigm
Authors:
Miodrag J. Mihaljevic,
Frederique Oggier,
Hideki Imai
Abstract:
This paper addresses the design of a dedicated homophonic coding for a class of communication systems which, in order to provide both reliability and security, first encode the data before encrypting it, which is referred to as the encoding-encryption paradigm. The considered systems employ error-correction coding for reliability, a stream cipher for encryption, and homophonic coding to enhance th…
▽ More
This paper addresses the design of a dedicated homophonic coding for a class of communication systems which, in order to provide both reliability and security, first encode the data before encrypting it, which is referred to as the encoding-encryption paradigm. The considered systems employ error-correction coding for reliability, a stream cipher for encryption, and homophonic coding to enhance the protection of the key used in the stream cipher, on which relies the security of all the system transmissions. This paper presents a security evaluation of such systems from a computational complexity point of view, which serves as a source for establishing dedicated homophonic code design criteria. The security evaluation shows that the computational complexity of recovering the secret key, given all the information an attacker could gather during passive attacks he can mount, is lower bounded by the complexity of the related LPN (Learning Parity in Noise) problem in both the average and worst case. This gives guidelines to construct a dedicated homophonic encoder which maximizes the complexity of the underlying LPN problem for a given encoding overhead. Finally, this paper proposes a generic homophonic coding strategy that fulfills the proposed design criteria and thus both enhances security while minimizing the induced overhead.
△ Less
Submitted 29 December, 2010;
originally announced December 2010.
-
Quantum Error Correction beyond the Bounded Distance Decoding Limit
Authors:
Kenta Kasai,
Manabu Hagiwara,
Hideki Imai,
Kohichi Sakaniwa
Abstract:
In this paper, we consider quantum error correction over depolarizing channels with non-binary low-density parity-check codes defined over Galois field of size $2^p$ . The proposed quantum error correcting codes are based on the binary quasi-cyclic CSS (Calderbank, Shor and Steane) codes. The resulting quantum codes outperform the best known quantum codes and surpass the performance limit of the b…
▽ More
In this paper, we consider quantum error correction over depolarizing channels with non-binary low-density parity-check codes defined over Galois field of size $2^p$ . The proposed quantum error correcting codes are based on the binary quasi-cyclic CSS (Calderbank, Shor and Steane) codes. The resulting quantum codes outperform the best known quantum codes and surpass the performance limit of the bounded distance decoder. By increasing the size of the underlying Galois field, i.e., $2^p$, the error floors are considerably improved.
△ Less
Submitted 13 July, 2011; v1 submitted 11 July, 2010;
originally announced July 2010.
-
Theoretical framework for constructing matching algorithms in biometric authentication systems
Authors:
Manabu Inuma,
Akira Otsuka,
Hideki Imai
Abstract:
In this paper, we propose a theoretical framework to construct matching algorithms for any biometric authentication systems. Conventional matching algorithms are not necessarily secure against strong intentional impersonation attacks such as wolf attacks. The wolf attack is an attempt to impersonate a genuine user by presenting a "wolf" to a biometric authentication system without the knowledge…
▽ More
In this paper, we propose a theoretical framework to construct matching algorithms for any biometric authentication systems. Conventional matching algorithms are not necessarily secure against strong intentional impersonation attacks such as wolf attacks. The wolf attack is an attempt to impersonate a genuine user by presenting a "wolf" to a biometric authentication system without the knowledge of a genuine user's biometric sample. A wolf is a sample which can be accepted as a match with multiple templates. The wolf attack probability (WAP) is the maximum success probability of the wolf attack, which was proposed by Une, Otsuka, Imai as a measure for evaluating security of biometric authentication systems. We present a principle for construction of secure matching algorithms against the wolf attack for any biometric authentication systems. The ideal matching algorithm determines a threshold for each input value depending on the entropy of the probability distribution of the (Hamming) distances. Then we show that if the information about the probability distribution for each input value is perfectly given, then our matching algorithm is secure against the wolf attack. Our generalized matching algorithm gives a theoretical framework to construct secure matching algorithms. How lower WAP is achievable depends on how accurately the entropy is estimated. Then there is a trade-off between the efficiency and the achievable WAP. Almost every conventional matching algorithm employs a fixed threshold and hence it can be regarded as an efficient but insecure instance of our theoretical framework. Daugman's IrisCode recognition algorithm proposed can also be regarded as a non-optimal instance of our framework.
△ Less
Submitted 8 April, 2009;
originally announced April 2009.
-
Quantum Quasi-Cyclic LDPC Codes
Authors:
Manabu Hagiwara,
Hideki Imai
Abstract:
In this paper, a construction of a pair of "regular" quasi-cyclic LDPC codes as ingredient codes for a quantum error-correcting code is proposed. That is, we find quantum regular LDPC codes with various weight distributions. Furthermore our proposed codes have lots of variations for length, code rate. These codes are obtained by a descrete mathematical characterization for model matrices of quasi-…
▽ More
In this paper, a construction of a pair of "regular" quasi-cyclic LDPC codes as ingredient codes for a quantum error-correcting code is proposed. That is, we find quantum regular LDPC codes with various weight distributions. Furthermore our proposed codes have lots of variations for length, code rate. These codes are obtained by a descrete mathematical characterization for model matrices of quasi-cyclic LDPC codes.
Our proposed codes achieve a bounded distance decoding (BDD) bound, or known as VG bound, and achieve a lower bound of the code length.
△ Less
Submitted 28 August, 2010; v1 submitted 5 January, 2007;
originally announced January 2007.
-
Optimization of Memory Usage in Tardos's Fingerprinting Codes
Authors:
Koji Nuida,
Manabu Hagiwara,
Hajime Watanabe,
Hideki Imai
Abstract:
It is known that Tardos's collusion-secure probabilistic fingerprinting code (Tardos code; STOC'03) has length of theoretically minimal order with respect to the number of colluding users. However, Tardos code uses certain continuous probability distribution in codeword generation, which creates some problems for practical use, in particular, it requires large extra memory. A solution proposed s…
▽ More
It is known that Tardos's collusion-secure probabilistic fingerprinting code (Tardos code; STOC'03) has length of theoretically minimal order with respect to the number of colluding users. However, Tardos code uses certain continuous probability distribution in codeword generation, which creates some problems for practical use, in particular, it requires large extra memory. A solution proposed so far is to use some finite probability distributions instead. In this paper, we determine the optimal finite distribution in order to decrease extra memory amount. By our result, the extra memory is reduced to 1/32 of the original, or even becomes needless, in some practical setting. Moreover, the code length is also reduced, e.g. to about 20.6% of Tardos code asymptotically. Finally, we address some other practical issues such as approximation errors which are inevitable in any real implementation.
△ Less
Submitted 15 January, 2008; v1 submitted 6 October, 2006;
originally announced October 2006.
-
A Secure Traitor Tracing Scheme against Key Exposure
Authors:
Kazuto Ogawa,
Goichiro Hanaoka,
Hideki Imai
Abstract:
Copyright protection is a major issue in distributing digital content. On the other hand, improvements to usability are sought by content users. In this paper, we propose a secure {\it traitor tracing scheme against key exposure (TTaKE)} which contains the properties of both a traitor tracing scheme and a forward secure public key cryptosystem. Its structure fits current digital broadcasting sys…
▽ More
Copyright protection is a major issue in distributing digital content. On the other hand, improvements to usability are sought by content users. In this paper, we propose a secure {\it traitor tracing scheme against key exposure (TTaKE)} which contains the properties of both a traitor tracing scheme and a forward secure public key cryptosystem. Its structure fits current digital broadcasting systems and it may be useful in preventing traitors from making illegal decoders and in minimizing the damage from accidental key exposure. It can improve usability through these properties.
△ Less
Submitted 2 August, 2005;
originally announced August 2005.
-
Commitment Capacity of Discrete Memoryless Channels
Authors:
Andreas Winter,
Anderson C. A. Nascimento,
Hideki Imai
Abstract:
In extension of the bit commitment task and following work initiated by Crepeau and Kilian, we introduce and solve the problem of characterising the optimal rate at which a discrete memoryless channel can be used for bit commitment. It turns out that the answer is very intuitive: it is the maximum equivocation of the channel (after removing trivial redundancy), even when unlimited noiseless bidi…
▽ More
In extension of the bit commitment task and following work initiated by Crepeau and Kilian, we introduce and solve the problem of characterising the optimal rate at which a discrete memoryless channel can be used for bit commitment. It turns out that the answer is very intuitive: it is the maximum equivocation of the channel (after removing trivial redundancy), even when unlimited noiseless bidirectional side communication is allowed.
By a well-known reduction, this result provides a lower bound on the channel's capacity for implementing coin tossing, which we conjecture to be an equality.
The method of proving this relates the problem to Wyner's wire--tap channel in an amusing way. We also discuss extensions to quantum channels.
△ Less
Submitted 10 April, 2003;
originally announced April 2003.
-
Pretty-Simple Password-Authenticated Key-Exchange Protocol
Authors:
Kazukuni Kobara,
Hideki Imai
Abstract:
We propose pretty simple password-authenticated key-exchange protocol which is based on the difficulty of solving DDH problem. It has the following advantages: (1) Both $y_1$ and $y_2$ in our protocol are independent and thus they can be pre-computed and can be sent independently. This speeds up the protocol. (2) Clients and servers can use almost the same algorithm. This reduces the implementat…
▽ More
We propose pretty simple password-authenticated key-exchange protocol which is based on the difficulty of solving DDH problem. It has the following advantages: (1) Both $y_1$ and $y_2$ in our protocol are independent and thus they can be pre-computed and can be sent independently. This speeds up the protocol. (2) Clients and servers can use almost the same algorithm. This reduces the implementation costs without accepting replay attacks and abuse of entities as oracles.
△ Less
Submitted 10 October, 2001;
originally announced October 2001.
-
Two-way Quantum One-counter Automata
Authors:
Tomohiro Yamasaki,
Hirotada Kobayashi,
Hiroshi Imai
Abstract:
After the first treatments of quantum finite state automata by Moore and Crutchfield and by Kondacs and Watrous, a number of papers study the power of quantum finite state automata and their variants. This paper introduces a model of two-way quantum one-counter automata (2Q1CAs), combining the model of two-way quantum finite state automata (2QFAs) by Kondacs and Watrous and the model of one-way…
▽ More
After the first treatments of quantum finite state automata by Moore and Crutchfield and by Kondacs and Watrous, a number of papers study the power of quantum finite state automata and their variants. This paper introduces a model of two-way quantum one-counter automata (2Q1CAs), combining the model of two-way quantum finite state automata (2QFAs) by Kondacs and Watrous and the model of one-way quantum one-counter automata (1Q1CAs) by Kravtsev. We give the definition of 2Q1CAs with well-formedness conditions. It is proved that 2Q1CAs are at least as powerful as classical two-way deterministic one-counter automata (2D1CAs), that is, every language L recognizable by 2D1CAs is recognized by 2Q1CAs with no error. It is also shown that several non-context-free languages including {a^n b^{n^2}} and {a^n b^{2^n}} are recognizable by 2Q1CAs with bounded error.
△ Less
Submitted 2 October, 2001;
originally announced October 2001.
-
More Robust Multiparty Protocols with Oblivious Transfer
Authors:
J. Mueller-Quade,
H. Imai
Abstract:
With oblivious transfer multiparty protocols become possible even in the presence of a faulty majority. But all known protocols can be aborted by just one disruptor.
This paper presents more robust solutions for multiparty protocols with oblivious transfer. This additional robustness against disruptors weakens the security of the protocol and the guarantee that the result is correct. We can ob…
▽ More
With oblivious transfer multiparty protocols become possible even in the presence of a faulty majority. But all known protocols can be aborted by just one disruptor.
This paper presents more robust solutions for multiparty protocols with oblivious transfer. This additional robustness against disruptors weakens the security of the protocol and the guarantee that the result is correct. We can observe a trade off between robustness against disruption and security and correctness.
We give an application to quantum multiparty protocols. These allow the implementation of oblivious transfer and the protocols of this paper relative to temporary assumptions, i.e., the security increases after the termination of the protocol.
△ Less
Submitted 22 June, 2001; v1 submitted 22 January, 2001;
originally announced January 2001.
-
Anonymous Oblivious Transfer
Authors:
J. Mueller-Quade,
H. Imai
Abstract:
In this short note we want to introduce {\em anonymous oblivious transfer} a new cryptographic primitive which can be proven to be strictly more powerful than oblivious transfer. We show that all functions can be robustly realized by multi party protocols with {\em anonymous oblivious transfer}. No assumption about possible collusions of cheaters or disruptors have to be made. Furthermore we sho…
▽ More
In this short note we want to introduce {\em anonymous oblivious transfer} a new cryptographic primitive which can be proven to be strictly more powerful than oblivious transfer. We show that all functions can be robustly realized by multi party protocols with {\em anonymous oblivious transfer}. No assumption about possible collusions of cheaters or disruptors have to be made. Furthermore we shortly discuss how to realize anonymous oblivious transfer with oblivious broadcast or by quantum cryptography. The protocol of anonymous oblivious transfer was inspired by a quantum protocol: the anonymous quantum channel.
△ Less
Submitted 3 December, 2000; v1 submitted 4 November, 2000;
originally announced November 2000.