Search | arXiv e-print repository

doi 10.23919/IPEC-Himeji2022-ECCE53331.2022.9806976

Pixel Relationships-based Regularizer for Retinal Vessel Image Segmentation

Abstract: The task of image segmentation is to classify each pixel in the image based on the appropriate label. Various deep learning approaches have been proposed for image segmentation that offers high accuracy and deep architecture. However, the deep learning technique uses a pixel-wise loss function for the training process. Using pixel-wise loss neglected the pixel neighbor relationships in the network… ▽ More The task of image segmentation is to classify each pixel in the image based on the appropriate label. Various deep learning approaches have been proposed for image segmentation that offers high accuracy and deep architecture. However, the deep learning technique uses a pixel-wise loss function for the training process. Using pixel-wise loss neglected the pixel neighbor relationships in the network learning process. The neighboring relationship of the pixels is essential information in the image. Utilizing neighboring pixel information provides an advantage over using only pixel-to-pixel information. This study presents regularizers to give the pixel neighbor relationship information to the learning process. The regularizers are constructed by the graph theory approach and topology approach: By graph theory approach, graph Laplacian is used to utilize the smoothness of segmented images based on output images and ground-truth images. By topology approach, Euler characteristic is used to identify and minimize the number of isolated objects on segmented images. Experiments show that our scheme successfully captures pixel neighbor relations and improves the performance of the convolutional neural network better than the baseline without a regularization term. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2212.13730 [pdf, other]

doi 10.1007/978-3-030-92307-5_61

Single-Image Super-Resolution Reconstruction based on the Differences of Neighboring Pixels

Authors: Huipeng Zheng, Lukman Hakim, Takio Kurita, Junichi Miyao

Abstract: The deep learning technique was used to increase the performance of single image super-resolution (SISR). However, most existing CNN-based SISR approaches primarily focus on establishing deeper or larger networks to extract more significant high-level features. Usually, the pixel-level loss between the target high-resolution image and the estimated image is used, but the neighbor relations between… ▽ More The deep learning technique was used to increase the performance of single image super-resolution (SISR). However, most existing CNN-based SISR approaches primarily focus on establishing deeper or larger networks to extract more significant high-level features. Usually, the pixel-level loss between the target high-resolution image and the estimated image is used, but the neighbor relations between pixels in the image are seldom used. On the other hand, according to observations, a pixel's neighbor relationship contains rich information about the spatial structure, local context, and structural knowledge. Based on this fact, in this paper, we utilize pixel's neighbor relationships in a different perspective, and we propose the differences of neighboring pixels to regularize the CNN by constructing a graph from the estimated image and the ground-truth image. The proposed method outperforms the state-of-the-art methods in terms of quantitative and qualitative evaluation of the benchmark datasets. Keywords: Super-resolution, Convolutional Neural Networks, Deep Learning △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2209.13106 [pdf, other]

Simultaneous Acquisition of High Quality RGB Image and Polarization Information using a Sparse Polarization Sensor

Authors: Teppei Kurita, Yuhi Kondo, Legong Sun, Yusuke Moriuchi

Abstract: This paper proposes a novel polarization sensor structure and network architecture to obtain a high-quality RGB image and polarization information. Conventional polarization sensors can simultaneously acquire RGB images and polarization information, but the polarizers on the sensor degrade the quality of the RGB images. There is a trade-off between the quality of the RGB image and polarization inf… ▽ More This paper proposes a novel polarization sensor structure and network architecture to obtain a high-quality RGB image and polarization information. Conventional polarization sensors can simultaneously acquire RGB images and polarization information, but the polarizers on the sensor degrade the quality of the RGB images. There is a trade-off between the quality of the RGB image and polarization information as fewer polarization pixels reduce the degradation of the RGB image but decrease the resolution of polarization information. Therefore, we propose an approach that resolves the trade-off by sparsely arranging polarization pixels on the sensor and compensating for low-resolution polarization information with higher resolution using the RGB image as a guide. Our proposed network architecture consists of an RGB image refinement network and a polarization information compensation network. We confirmed the superiority of our proposed network in compensating the differential component of polarization intensity by comparing its performance with state-of-the-art methods for similar tasks: depth completion. Furthermore, we confirmed that our approach could simultaneously acquire higher quality RGB images and polarization information than conventional polarization sensors, resolving the trade-off between the quality of RGB images and polarization information. The baseline code and newly generated real and synthetic large-scale polarization image datasets are available for further research and development. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV) 2023

arXiv:2203.04606 [pdf, other]

Attention-effective multiple instance learning on weakly stem cell colony segmentation

Authors: Novanto Yudistira, Muthu Subash Kavitha, Jeny Rajan, Takio Kurita

Abstract: The detection of induced pluripotent stem cell (iPSC) colonies often needs the precise extraction of the colony features. However, existing computerized systems relied on segmentation of contours by preprocessing for classifying the colony conditions were task-extensive. To maximize the efficiency in categorizing colony conditions, we propose a multiple instance learning (MIL) in weakly supervised… ▽ More The detection of induced pluripotent stem cell (iPSC) colonies often needs the precise extraction of the colony features. However, existing computerized systems relied on segmentation of contours by preprocessing for classifying the colony conditions were task-extensive. To maximize the efficiency in categorizing colony conditions, we propose a multiple instance learning (MIL) in weakly supervised settings. It is designed in a single model to produce weak segmentation and classification of colonies without using finely labeled samples. As a single model, we employ a U-net-like convolution neural network (CNN) to train on binary image-level labels for MIL colonies classification. Furthermore, to specify the object of interest we used a simple post-processing method. The proposed approach is compared over conventional methods using five-fold cross-validation and receiver operating characteristic (ROC) curve. The maximum accuracy of the MIL-net is 95%, which is 15 % higher than the conventional methods. Furthermore, the ability to interpret the location of the iPSC colonies based on the image level label without using a pixel-wise ground truth image is more appealing and cost-effective in colony condition recognition. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2012.09542 [pdf, other]

doi 10.1007/s11263-022-01649-x

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN

Authors: Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita

Abstract: 3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResN… ▽ More 3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResNext network, and ii) implement attention gating network to improve the accuracy of the action recognition. The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition. Firstly, the 3DResNext is trained and applied for action classification using backpropagation concerning the maximum predicted class. The gradients and activations of every layer are then up-sampled. Later, aggregation is used to produce more nuanced attention, which points out the most critical part of the predicted class's input videos. We use contour thresholding of final attention for final localization. We evaluate spatial and temporal action localization in trimmed videos using fine-grained visual explanation via 3DCam. Experimental results show that the proposed approach produces informative visual explanations and discriminative attention. Furthermore, the action recognition via attention gating on each layer produces better classification results than the baseline model. △ Less

Submitted 16 August, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

Journal ref: International Journal of Computer Vision, 2022

arXiv:2012.00999 [pdf, other]

q-SNE: Visualizing Data using q-Gaussian Distributed Stochastic Neighbor Embedding

Authors: Motoshi Abe, Junichi Miyao, Takio Kurita

Abstract: The dimensionality reduction has been widely introduced to use the high-dimensional data for regression, classification, feature analysis, and visualization. As the one technique of dimensionality reduction, a stochastic neighbor embedding (SNE) was introduced. The SNE leads powerful results to visualize high-dimensional data by considering the similarity between the local Gaussian distributions o… ▽ More The dimensionality reduction has been widely introduced to use the high-dimensional data for regression, classification, feature analysis, and visualization. As the one technique of dimensionality reduction, a stochastic neighbor embedding (SNE) was introduced. The SNE leads powerful results to visualize high-dimensional data by considering the similarity between the local Gaussian distributions of high and low-dimensional space. To improve the SNE, a t-distributed stochastic neighbor embedding (t-SNE) was also introduced. To visualize high-dimensional data, the t-SNE leads to more powerful and flexible visualization on 2 or 3-dimensional map** than the SNE by using a t-distribution as the distribution of low-dimensional data. Recently, Uniform manifold approximation and projection (UMAP) is proposed as a dimensionality reduction technique. We present a novel technique called a q-Gaussian distributed stochastic neighbor embedding (q-SNE). The q-SNE leads to more powerful and flexible visualization on 2 or 3-dimensional map** than the t-SNE and the SNE by using a q-Gaussian distribution as the distribution of low-dimensional data. The q-Gaussian distribution includes the Gaussian distribution and the t-distribution as the special cases with q=1.0 and q=2.0. Therefore, the q-SNE can also express the t-SNE and the SNE by changing the parameter q, and this makes it possible to find the best visualization by choosing the parameter q. We show the performance of q-SNE as visualization on 2-dimensional map** and classification by k-Nearest Neighbors (k-NN) classifier in embedded space compared with SNE, t-SNE, and UMAP by using the datasets MNIST, COIL-20, OlivettiFaces, FashionMNIST, and Glove. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: This paper is accepted ICPR2020. Code on Python is here (https://github.com/i13abe/q-SNE)

arXiv:2011.02390 [pdf, other]

Channel Planting for Deep Neural Networks using Knowledge Distillation

Authors: Kakeru Mitsuno, Yuichiro Nomura, Takio Kurita

Abstract: In recent years, deeper and wider neural networks have shown excellent performance in computer vision tasks, while their enormous amount of parameters results in increased computational cost and overfitting. Several methods have been proposed to compress the size of the networks without reducing network performance. Network pruning can reduce redundant and unnecessary parameters from a network. Kn… ▽ More In recent years, deeper and wider neural networks have shown excellent performance in computer vision tasks, while their enormous amount of parameters results in increased computational cost and overfitting. Several methods have been proposed to compress the size of the networks without reducing network performance. Network pruning can reduce redundant and unnecessary parameters from a network. Knowledge distillation can transfer the knowledge of deeper and wider networks to smaller networks. The performance of the smaller network obtained by these methods is bounded by the predefined network. Neural architecture search has been proposed, which can search automatically the architecture of the networks to break the structure limitation. Also, there is a dynamic configuration method to train networks incrementally as sub-networks. In this paper, we present a novel incremental training algorithm for deep neural networks called planting. Our planting can search the optimal network architecture with smaller number of parameters for improving the network performance by augmenting channels incrementally to layers of the initial networks while kee** the earlier trained parameters fixed. Also, we propose using the knowledge distillation method for training the channels planted. By transferring the knowledge of deeper and wider networks, we can grow the networks effectively and efficiently. We evaluate the effectiveness of the proposed method on different datasets such as CIFAR-10/100 and STL-10. For the STL-10 dataset, we show that we are able to achieve comparable performance with only 7% parameters compared to the larger network and reduce the overfitting caused by a small amount of the data. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: Accepted to ICPR 2020

arXiv:2011.02389 [pdf, other]

Filter Pruning using Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks

Authors: Kakeru Mitsuno, Takio Kurita

Abstract: Since the convolutional neural networks are often trained with redundant parameters, it is possible to reduce redundant kernels or filters to obtain a compact network without drop** the classification accuracy. In this paper, we propose a filter pruning method using the hierarchical group sparse regularization. It is shown in our previous work that the hierarchical group sparse regularization is… ▽ More Since the convolutional neural networks are often trained with redundant parameters, it is possible to reduce redundant kernels or filters to obtain a compact network without drop** the classification accuracy. In this paper, we propose a filter pruning method using the hierarchical group sparse regularization. It is shown in our previous work that the hierarchical group sparse regularization is effective in obtaining sparse networks in which filters connected to unnecessary channels are automatically close to zero. After training the convolutional neural network with the hierarchical group sparse regularization, the unnecessary filters are selected based on the increase of the classification loss of the randomly selected training samples to obtain a compact network. It is shown that the proposed method can reduce more than 50% parameters of ResNet for CIFAR-10 with only 0.3% decrease in the accuracy of test samples. Also, 34% parameters of ResNet are reduced for TinyImageNet-200 with higher accuracy than the baseline network. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: Accepted to ICPR 2020

arXiv:2009.11587 [pdf, other]

Transfer Learning by Cascaded Network to identify and classify lung nodules for cancer detection

Authors: Shah B. Shrey, Lukman Hakim, Muthusubash Kavitha, Hae Won Kim, Takio Kurita

Abstract: Lung cancer is one of the most deadly diseases in the world. Detecting such tumors at an early stage can be a tedious task. Existing deep learning architecture for lung nodule identification used complex architecture with large number of parameters. This study developed a cascaded architecture which can accurately segment and classify the benign or malignant lung nodules on computed tomography (CT… ▽ More Lung cancer is one of the most deadly diseases in the world. Detecting such tumors at an early stage can be a tedious task. Existing deep learning architecture for lung nodule identification used complex architecture with large number of parameters. This study developed a cascaded architecture which can accurately segment and classify the benign or malignant lung nodules on computed tomography (CT) images. The main contribution of this study is to introduce a segmentation network where the first stage trained on a public data set can help to recognize the images which included a nodule from any data set by means of transfer learning. And the segmentation of a nodule improves the second stage to classify the nodules into benign and malignant. The proposed architecture outperformed the conventional methods with an area under curve value of 95.67\%. The experimental results showed that the classification accuracy of 97.96\% of our proposed architecture outperformed other simple and complex architectures in classifying lung nodules for lung cancer detection. △ Less

Submitted 24 September, 2020; originally announced September 2020.

arXiv:2009.07567 [pdf, other]

U-Net with Graph Based Smoothing Regularizer for Small Vessel Segmentation on Fundus Image

Authors: Lukman Hakim, Novanto Yudistira, Muthusubash Kavitha, Takio Kurita

Abstract: The detection of retinal blood vessels, especially the changes of small vessel condition is the most important indicator to identify the vascular network of the human body. Existing techniques focused mainly on shape of the large vessels, which is not appropriate for the disconnected small and isolated vessels. Paying attention to the low contrast small blood vessel in fundus region, first time we… ▽ More The detection of retinal blood vessels, especially the changes of small vessel condition is the most important indicator to identify the vascular network of the human body. Existing techniques focused mainly on shape of the large vessels, which is not appropriate for the disconnected small and isolated vessels. Paying attention to the low contrast small blood vessel in fundus region, first time we proposed to combine graph based smoothing regularizer with the loss function in the U-net framework. The proposed regularizer treated the image as two graphs by calculating the graph laplacians on vessel regions and the background regions on the image. The potential of the proposed graph based smoothing regularizer in reconstructing small vessel is compared over the classical U-net with or without regularizer. Numerical and visual results shows that our developed regularizer proved its effectiveness in segmenting the small vessels and reconnecting the fragmented retinal blood vessels. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Journal ref: ICONIP2019

arXiv:2004.08116 [pdf, other]

Triplet Loss for Knowledge Distillation

Authors: Hideki Oki, Motoshi Abe, Junichi Miyao, Takio Kurita

Abstract: In recent years, deep learning has spread rapidly, and deeper, larger models have been proposed. However, the calculation cost becomes enormous as the size of the models becomes larger. Various techniques for compressing the size of the models have been proposed to improve performance while reducing computational costs. One of the methods to compress the size of the models is knowledge distillatio… ▽ More In recent years, deep learning has spread rapidly, and deeper, larger models have been proposed. However, the calculation cost becomes enormous as the size of the models becomes larger. Various techniques for compressing the size of the models have been proposed to improve performance while reducing computational costs. One of the methods to compress the size of the models is knowledge distillation (KD). Knowledge distillation is a technique for transferring knowledge of deep or ensemble models with many parameters (teacher model) to smaller shallow models (student model). Since the purpose of knowledge distillation is to increase the similarity between the teacher model and the student model, we propose to introduce the concept of metric learning into knowledge distillation to make the student model closer to the teacher model using pairs or triplets of the training samples. In metric learning, the researchers are develo** the methods to build a model that can increase the similarity of outputs for similar samples. Metric learning aims at reducing the distance between similar and increasing the distance between dissimilar. The functionality of the metric learning to reduce the differences between similar outputs can be used for the knowledge distillation to reduce the differences between the outputs of the teacher model and the student model. Since the outputs of the teacher model for different objects are usually different, the student model needs to distinguish them. We think that metric learning can clarify the difference between the different outputs, and the performance of the student model could be improved. We have performed experiments to compare the proposed method with state-of-the-art knowledge distillation methods. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: Accepted to IJCNN 2020, Source code is at https://github.com/i13abe/Triplet-Loss-for-Knowledge-Distillation

arXiv:2004.08074 [pdf, other]

Adaptive Neuron-wise Discriminant Criterion and Adaptive Center Loss at Hidden Layer for Deep Convolutional Neural Network

Authors: Motoshi Abe, Junichi Miyao, Takio Kurita

Abstract: A deep convolutional neural network (CNN) has been widely used in image classification and gives better classification accuracy than the other techniques. The softmax cross-entropy loss function is often used for classification tasks. There are some works to introduce the additional terms in the objective function for training to make the features of the output layer more discriminative. The neuro… ▽ More A deep convolutional neural network (CNN) has been widely used in image classification and gives better classification accuracy than the other techniques. The softmax cross-entropy loss function is often used for classification tasks. There are some works to introduce the additional terms in the objective function for training to make the features of the output layer more discriminative. The neuron-wise discriminant criterion makes the input feature of each neuron in the output layer discriminative by introducing the discriminant criterion to each of the features. Similarly, the center loss was introduced to the features before the softmax activation function for face recognition to make the deep features discriminative. The ReLU function is often used for the network as an active function in the hidden layers of the CNN. However, it is observed that the deep features trained by using the ReLU function are not discriminative enough and show elongated shapes. In this paper, we propose to use the neuron-wise discriminant criterion at the output layer and the center-loss at the hidden layer. Also, we introduce the online computation of the means of each class with the exponential forgetting. We named them adaptive neuron-wise discriminant criterion and adaptive center loss, respectively. The effectiveness of the integration of the adaptive neuron-wise discriminant criterion and the adaptive center loss is shown by the experiments with MNSIT, FashionMNIST, CIFAR10, CIFAR100, and STL10. Source code is at https://github.com/i13abe/Adaptive-discriminant-and-center △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: Accepted to IJCNN 2020

arXiv:2004.04394 [pdf, other]

Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks

Authors: Kakeru Mitsuno, Junichi Miyao, Takio Kurita

Abstract: In a deep neural network (DNN), the number of the parameters is usually huge to get high learning performances. For that reason, it costs a lot of memory and substantial computational resources, and also causes overfitting. It is known that some parameters are redundant and can be removed from the network without decreasing performance. Many sparse regularization criteria have been proposed to sol… ▽ More In a deep neural network (DNN), the number of the parameters is usually huge to get high learning performances. For that reason, it costs a lot of memory and substantial computational resources, and also causes overfitting. It is known that some parameters are redundant and can be removed from the network without decreasing performance. Many sparse regularization criteria have been proposed to solve this problem. In a convolutional neural network (CNN), group sparse regularizations are often used to remove unnecessary subsets of the weights, such as filters or channels. When we apply a group sparse regularization for the weights connected to a neuron as a group, each convolution filter is not treated as a target group in the regularization. In this paper, we introduce the concept of hierarchical grou** to solve this problem, and we propose several hierarchical group sparse regularization criteria for CNNs. Our proposed the hierarchical group sparse regularization can treat the weight for the input-neuron or the output-neuron as a group and convolutional filter as a group in the same group to prune the unnecessary subsets of weights. As a result, we can prune the weights more adequately depending on the structure of the network and the number of channels kee** high performance. In the experiment, we investigate the effectiveness of the proposed sparse regularizations through intensive comparison experiments on public datasets with several network architectures. Code is available on GitHub: "https://github.com/K-Mitsuno/hierarchical-group-sparse-regularization" △ Less

Submitted 9 April, 2020; originally announced April 2020.

Comments: Accepted to IJCNN 2020

arXiv:2002.08005 [pdf, other]

On-line non-overlap** camera calibration net

Authors: Zhao Fangda, Toru Tamaki, Takio Kurita, Bisser Raytchev, Kazufumi Kaneda

Abstract: We propose an easy-to-use non-overlap** camera calibration method. First, successive images are fed to a PoseNet-based network to obtain ego-motion of cameras between frames. Next, the pose between cameras are estimated. Instead of using a batch method, we propose an on-line method of the inter-camera pose estimation. Furthermore, we implement the entire procedure on a computation graph. Experim… ▽ More We propose an easy-to-use non-overlap** camera calibration method. First, successive images are fed to a PoseNet-based network to obtain ego-motion of cameras between frames. Next, the pose between cameras are estimated. Instead of using a batch method, we propose an on-line method of the inter-camera pose estimation. Furthermore, we implement the entire procedure on a computation graph. Experiments with simulations and the KITTI dataset show the proposed method to be effective in simulation. △ Less

Submitted 18 February, 2020; originally announced February 2020.

Comments: 7 pages

Journal ref: in Proc. of MIRU2018

arXiv:1906.09739 [pdf, other]

Mixup of Feature Maps in a Hidden Layer for Training of Convolutional Neural Network

Authors: Hideki Oki, Takio Kurita

Abstract: The deep Convolutional Neural Network (CNN) became very popular as a fundamental technique for image classification and objects recognition. To improve the recognition accuracy for the more complex tasks, deeper networks have being introduced. However, the recognition accuracy of the trained deep CNN drastically decreases for the samples which are obtained from the outside regions of the training… ▽ More The deep Convolutional Neural Network (CNN) became very popular as a fundamental technique for image classification and objects recognition. To improve the recognition accuracy for the more complex tasks, deeper networks have being introduced. However, the recognition accuracy of the trained deep CNN drastically decreases for the samples which are obtained from the outside regions of the training samples. To improve the generalization ability for such samples, Krizhevsky et al. proposed to generate additional samples through transformations from the existing samples and to make the training samples richer. This method is known as data augmentation. Hongyi Zhang et al. introduced data augmentation method called mixup which achieves state-of-the-art performance in various datasets. Mixup generates new samples by mixing two different training samples. Mixing of the two images is implemented with simple image morphing. In this paper, we propose to apply mixup to the feature maps in a hidden layer. To implement the mixup in the hidden layer we use the Siamese network or the triplet network architecture to mix feature maps. From the experimental comparison, it is observed that the mixup of the feature maps obtained from the first convolution layer is more effective than the original image mixup. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 11 pages, 5 figures

Journal ref: Neural Information Processing 25th International Conference (ICONIP2018) Proceedings Part II

arXiv:1807.08291 [pdf, other]

doi 10.1016/j.image.2019.115731

Correlation Net: Spatiotemporal multimodal deep learning for action recognition

Authors: Novanto Yudistira, Takio Kurita

Abstract: This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal streams are required for action recognition by a deep CNN, but overfitting reduction and fusing these two streams remain open problems. The existing fusion approac… ▽ More This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal streams are required for action recognition by a deep CNN, but overfitting reduction and fusing these two streams remain open problems. The existing fusion approach averages the two streams. Here we propose a correlation network with a Shannon fusion for learning a pre-trained CNN. A Long-range video may consist of spatiotemporal correlations over arbitrary times, which can be captured by forming the correlation network from simple fully connected layers. This approach was found to complement the existing network fusion methods. The importance of multimodal correlation is validated in comparison experiments on the UCF-101 and HMDB-51 datasets. The multimodal correlation enhanced the accuracy of the video recognition results. △ Less

Submitted 16 December, 2019; v1 submitted 22 July, 2018; originally announced July 2018.

Journal ref: Signal Processing: Image Communication, Volume 82, March 2020, 115731

arXiv:1707.05425 [pdf]

Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network

Authors: ** Yamanaka, Shigesumi Kuwashima, Takio Kurita

Abstract: We propose a highly efficient and faster Single Image Super-Resolution (SISR) model with Deep Convolutional neural networks (Deep CNN). Deep CNN have recently shown that they have a significant reconstruction performance on single-image super-resolution. Current trend is using deeper CNN layers to improve performance. However, deep models demand larger computation resources and is not suitable for… ▽ More We propose a highly efficient and faster Single Image Super-Resolution (SISR) model with Deep Convolutional neural networks (Deep CNN). Deep CNN have recently shown that they have a significant reconstruction performance on single-image super-resolution. Current trend is using deeper CNN layers to improve performance. However, deep models demand larger computation resources and is not suitable for network edge devices like mobile, tablet and IoT devices. Our model achieves state of the art reconstruction performance with at least 10 times lower calculation cost by Deep CNN with Residual Net, Skip Connection and Network in Network (DCSCN). A combination of Deep CNNs and Skip connection layers is used as a feature extractor for image features on both local and global area. Parallelized 1x1 CNNs, like the one called Network in Network, is also used for image reconstruction. That structure reduces the dimensions of the previous layer's output for faster computation with less information loss, and make it possible to process original images directly. Also we optimize the number of layers and filters of each CNN to significantly reduce the calculation cost. Thus, the proposed algorithm not only achieves the state of the art performance but also achieves faster and efficient computation. Code is available at https://github.com/**y2001/dcscn-super-resolution △ Less

Submitted 8 September, 2020; v1 submitted 17 July, 2017; originally announced July 2017.

Comments: 9 pages, 4 figures. This paper is accepted at 24th International Conference On Neural Information Processing (ICONIP 2017)

Journal ref: 24th International Conference of Neural Information Processing, ICONIP 2017, Proceedings, Part II (pp.217-225)

arXiv:1703.09393 [pdf, ps, other]

Mixture of Counting CNNs: Adaptive Integration of CNNs Specialized to Specific Appearance for Crowd Counting

Authors: Shohei Kumagai, Kazuhiro Hotta, Takio Kurita

Abstract: This paper proposes a crowd counting method. Crowd counting is difficult because of large appearance changes of a target which caused by density and scale changes. Conventional crowd counting methods generally utilize one predictor (e,g., regression and multi-class classifier). However, such only one predictor can not count targets with large appearance changes well. In this paper, we propose to p… ▽ More This paper proposes a crowd counting method. Crowd counting is difficult because of large appearance changes of a target which caused by density and scale changes. Conventional crowd counting methods generally utilize one predictor (e,g., regression and multi-class classifier). However, such only one predictor can not count targets with large appearance changes well. In this paper, we propose to predict the number of targets using multiple CNNs specialized to a specific appearance, and those CNNs are adaptively selected according to the appearance of a test image. By integrating the selected CNNs, the proposed method has the robustness to large appearance changes. In experiments, we confirm that the proposed method can count crowd with lower counting error than a CNN and integration of CNNs with fixed weights. Moreover, we confirm that each predictor automatically specialized to a specific appearance. △ Less

Submitted 27 March, 2017; originally announced March 2017.

Comments: 8pages, 8figures

arXiv:1611.02443 [pdf, other]

Domain Adaptation with L2 constraints for classifying images from different endoscope systems

Authors: Toru Tamaki, Shoji Sonoyama, Takio Kurita, Tsubasa Hirakawa, Bisser Raytchev, Kazufumi Kaneda, Tetsushi Koide, Shigeto Yoshida, Hiroshi Mieno, Shinji Tanaka, Kazuaki Chayama

Abstract: This paper proposes a method for domain adaptation that extends the maximum margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L2 distance constraints between samples of different domains; thus, our method is denoted as MMDTL2. Motivated by the differences between the images taken by narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices as different d… ▽ More This paper proposes a method for domain adaptation that extends the maximum margin domain transfer (MMDT) proposed by Hoffman et al., by introducing L2 distance constraints between samples of different domains; thus, our method is denoted as MMDTL2. Motivated by the differences between the images taken by narrow band imaging (NBI) endoscopic devices, we utilize different NBI devices as different domains and estimate the transformations between samples of different domains, i.e., image samples taken by different NBI endoscope systems. We first formulate the problem in the primal form, and then derive the dual form with much lesser computational costs as compared to the naive approach. From our experimental results using NBI image datasets from two different NBI endoscopic devices, we find that MMDTL2 is better than MMDT and also support vector machines without adaptation, especially when NBI image features are high-dimensional and the per-class training samples are greater than 20. △ Less

Submitted 2 February, 2018; v1 submitted 8 November, 2016; originally announced November 2016.

Comments: 15 pages

Showing 1–19 of 19 results for author: Kurita, T