Search | arXiv e-print repository

doi 10.1145/3626641.3626930

Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks

Authors: Sulthan Rafif, Mochamad Arfan Ravy Wahyu Pratama, Mohammad Faris Azhar, Ahmad Mustafidul Ibad, Lailil Muflikhah, Novanto Yudistira

Abstract: Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Q… ▽ More Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Journal ref: CSIAM Transactions on Applied Mathematics; R. Riad et al, "Learning strides in convolutional neural networks," pp. 1-16, 2022. [Online];

arXiv:2401.03198 [pdf, other]

doi 10.1145/3626641.3627239

Learning-Augmented K-Means Clustering Using Dimensional Reduction

Authors: Issam K. O Jabari, Shofiyah, Pradiptya Kahvi S, Novi Nur Putriwijaya, Novanto Yudistira

Abstract: Learning augmented is a machine learning concept built to improve the performance of a method or model, such as enhancing its ability to predict and generalize data or features, or testing the reliability of the method by introducing noise and other factors. On the other hand, clustering is a fundamental aspect of data analysis and has long been used to understand the structure of large datasets.… ▽ More Learning augmented is a machine learning concept built to improve the performance of a method or model, such as enhancing its ability to predict and generalize data or features, or testing the reliability of the method by introducing noise and other factors. On the other hand, clustering is a fundamental aspect of data analysis and has long been used to understand the structure of large datasets. Despite its long history, the k-means algorithm still faces challenges. One approach, as suggested by Ergun et al,is to use a predictor to minimize the sum of squared distances between each data point and a specified centroid. However, it is known that the computational cost of this algorithm increases with the value of k, and it often gets stuck in local minima. In response to these challenges, we propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA). It is worth noting that when using k values of 10 and 25, the proposed algorithm yields lower cost results compared to running it without PCA. "Principal component analysis (PCA) is the problem of fitting a low-dimensional affine subspace to a set of data points in a high-dimensional space. PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization." △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: acmart-LaTeX2e v1.84 17 pages with 12 figures

Journal ref: International Conference on Sustainable Information Engineering and Technology (SIET 2023), October 24--25, 2023, Badung, Bali, Indonesia

arXiv:2401.02744 [pdf]

doi 10.1145/3626641.3626931

MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron Captioning

Authors: Alfirsa Damasyifa Fauzulhaq, Wahyu Parwitayasa, Joseph Ananda Sugihdharma, M. Fadli Ridhani, Novanto Yudistira

Abstract: Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron. Neuron labeling extract information about the features captured by certain neurons in a deep neural network, one of which uses the encoder-decoder image captioning approach. The encoder used can be a pretrained CNN-based model and the decoder is an RNN-based mod… ▽ More Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron. Neuron labeling extract information about the features captured by certain neurons in a deep neural network, one of which uses the encoder-decoder image captioning approach. The encoder used can be a pretrained CNN-based model and the decoder is an RNN-based model for text generation. Previous work, namely MILAN (Mutual Information-guided Linguistic Annotation of Neuron), has tried to visualize the neuron behaviour using modified Show, Attend, and Tell (SAT) model in the encoder, and LSTM added with Bahdanau attention in the decoder. MILAN can show great result on short sequence neuron captioning, but it does not show great result on long sequence neuron captioning, so in this work, we would like to improve the performance of MILAN even more by utilizing different kind of attention mechanism and additionally adding several attention result into one, in order to combine all the advantages from several attention mechanism. Using our compound dataset, we obtained higher BLEU and F1-Score on our proposed model, achieving 17.742 and 0.4811 respectively. At some point where the model converges at the peak, our model obtained BLEU of 21.2262 and BERTScore F1-Score of 0.4870. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2309.05224 [pdf, other]

doi 10.1016/j.neucom.2024.127433

SparseSwin: Swin Transformer with Sparse Transformer Block

Authors: Krisna Pinasthika, Blessius Sheldo Putra Laksono, Riyandi Banovbi Putera Irsal, Syifa Hukma Shabiyya, Novanto Yudistira

Abstract: Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Tran… ▽ More Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: Neurocomputing, Volume 580, 2024, 127433

arXiv:2308.01604 [pdf, other]

IndoHerb: Indonesia Medicinal Plants Recognition using Transfer Learning and Deep Learning

Authors: Muhammad Salman Ikrar Musyaffa, Novanto Yudistira, Muhammad Arif Rahman, Jati Batoro

Abstract: The rich diversity of herbal plants in Indonesia holds immense potential as alternative resources for traditional healing and ethnobotanical practices. However, the dwindling recognition of herbal plants due to modernization poses a significant challenge in preserving this valuable heritage. The accurate identification of these plants is crucial for the continuity of traditional practices and the… ▽ More The rich diversity of herbal plants in Indonesia holds immense potential as alternative resources for traditional healing and ethnobotanical practices. However, the dwindling recognition of herbal plants due to modernization poses a significant challenge in preserving this valuable heritage. The accurate identification of these plants is crucial for the continuity of traditional practices and the utilization of their nutritional benefits. Nevertheless, the manual identification of herbal plants remains a time-consuming task, demanding expert knowledge and meticulous examination of plant characteristics. In response, the application of computer vision emerges as a promising solution to facilitate the efficient identification of herbal plants. This research addresses the task of classifying Indonesian herbal plants through the implementation of transfer learning of Convolutional Neural Networks (CNN). To support our study, we curated an extensive dataset of herbal plant images from Indonesia with careful manual selection. Subsequently, we conducted rigorous data preprocessing, and classification utilizing transfer learning methodologies with five distinct models: ResNet, DenseNet, VGG, ConvNeXt, and Swin Transformer. Our comprehensive analysis revealed that ConvNeXt achieved the highest accuracy, standing at an impressive 92.5%. Additionally, we conducted testing using a scratch model, resulting in an accuracy of 53.9%. The experimental setup featured essential hyperparameters, including the ExponentialLR scheduler with a gamma value of 0.9, a learning rate of 0.001, the Cross-Entropy Loss function, the Adam optimizer, and a training epoch count of 50. This study's outcomes offer valuable insights and practical implications for the automated identification of Indonesian medicinal plants. △ Less

Submitted 9 June, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: 34 pages, 27 figures

arXiv:2307.14897 [pdf, other]

Mixture of Self-Supervised Learning

Authors: Aristo Renaldo Ruslim, Novanto Yudistira, Budi Darma Setiawan

Abstract: Self-supervised learning is popular method because of its ability to learn features in images without using its labels and is able to overcome limited labeled datasets used in supervised learning. Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task. There are some examples of pretext tasks used in self-supervised learnin… ▽ More Self-supervised learning is popular method because of its ability to learn features in images without using its labels and is able to overcome limited labeled datasets used in supervised learning. Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task. There are some examples of pretext tasks used in self-supervised learning in the field of image recognition, namely rotation prediction, solving jigsaw puzzles, and predicting relative positions on image. Previous studies have only used one type of transformation as a pretext task. This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks. Therefore, we propose the Gated Self-Supervised Learning method to improve image classification which use more than one transformation as pretext task and uses the Mixture of Expert architecture as a gating network in combining each pretext task so that the model automatically can study and focus more on the most useful augmentations for classification. We test performance of the proposed method in several scenarios, namely CIFAR imbalance dataset classification, adversarial perturbations, Tiny-Imagenet dataset classification, and semi-supervised learning. Moreover, there are Grad-CAM and T-SNE analysis that are used to see the proposed method for identifying important features that influence image classification and representing data for each class and separating different classes properly. Our code is in https://github.com/aristorenaldo/G-SSL △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.12122 [pdf, other]

Synthesis of Batik Motifs using a Diffusion -- Generative Adversarial Network

Authors: One Octadion, Novanto Yudistira, Diva Kurnianingtyas

Abstract: Batik, a unique blend of art and craftsmanship, is a distinct artistic and technological creation for Indonesian society. Research on batik motifs is primarily focused on classification. However, further studies may extend to the synthesis of batik patterns. Generative Adversarial Networks (GANs) have been an important deep learning model for generating synthetic data, but often face challenges in… ▽ More Batik, a unique blend of art and craftsmanship, is a distinct artistic and technological creation for Indonesian society. Research on batik motifs is primarily focused on classification. However, further studies may extend to the synthesis of batik patterns. Generative Adversarial Networks (GANs) have been an important deep learning model for generating synthetic data, but often face challenges in the stability and consistency of results. This research focuses on the use of StyleGAN2-Ada and Diffusion techniques to produce realistic and high-quality synthetic batik patterns. StyleGAN2-Ada is a variation of the GAN model that separates the style and content aspects in an image, whereas diffusion techniques introduce random noise into the data. In the context of batik, StyleGAN2-Ada and Diffusion are used to produce realistic synthetic batik patterns. This study also made adjustments to the model architecture and used a well-curated batik dataset. The main goal is to assist batik designers or craftsmen in producing unique and quality batik motifs with efficient production time and costs. Based on qualitative and quantitative evaluations, the results show that the model tested is capable of producing authentic and quality batik patterns, with finer details and rich artistic variations. The dataset and code can be accessed here:https://github.com/octadion/diffusion-stylegan2-ada-pytorch △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2307.11988 [pdf, other]

Sparse then Prune: Toward Efficient Vision Transformers

Authors: Yogi Prasetyo, Novanto Yudistira, Agus Wahyu Widodo

Abstract: The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a substantial amount of training data still make Vision Transformers computationally burdensome. In this research, we investigate the possibility of applying Sparse Re… ▽ More The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a substantial amount of training data still make Vision Transformers computationally burdensome. In this research, we investigate the possibility of applying Sparse Regularization to Vision Transformers and the impact of Pruning, either after Sparse Regularization or without it, on the trade-off between performance and efficiency. To accomplish this, we apply Sparse Regularization and Pruning methods to the Vision Transformer architecture for image classification tasks on the CIFAR-10, CIFAR-100, and ImageNet-100 datasets. The training process for the Vision Transformer model consists of two parts: pre-training and fine-tuning. Pre-training utilizes ImageNet21K data, followed by fine-tuning for 20 epochs. The results show that when testing with CIFAR-100 and ImageNet-100 data, models with Sparse Regularization can increase accuracy by 0.12%. Furthermore, applying pruning to models with Sparse Regularization yields even better results. Specifically, it increases the average accuracy by 0.568% on CIFAR-10 data, 1.764% on CIFAR-100, and 0.256% on ImageNet-100 data compared to pruning models without Sparse Regularization. Code can be accesed here: https://github.com/yogiprsty/Sparse-ViT △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2302.05105 [pdf]

Text recognition on images using pre-trained CNN

Authors: Afgani Fajar Rizky, Novanto Yudistira, Edy Santoso

Abstract: A text on an image often stores important information and directly carries high level semantics, makes it as important source of information and become a very active research topic. Many studies have shown that the use of CNN-based neural networks is quite effective and accurate for image classification which is the basis of text recognition. It can also be more enhanced by using transfer learning… ▽ More A text on an image often stores important information and directly carries high level semantics, makes it as important source of information and become a very active research topic. Many studies have shown that the use of CNN-based neural networks is quite effective and accurate for image classification which is the basis of text recognition. It can also be more enhanced by using transfer learning from pre-trained model trained on ImageNet dataset as an initial weight. In this research, the recognition is trained by using Chars74K dataset and the best model results then tested on some samples of IIIT-5K-Dataset. The research results showed that the best accuracy is the model that trained using VGG-16 architecture applied with image transformation of rotation 15°, image scale of 0.9, and the application of gaussian blur effect. The research model has an accuracy of 97.94% for validation data, 98.16% for test data, and 95.62% for the test data from IIIT-5K-Dataset. Based on these results, it can be concluded that pre-trained CNN can produce good accuracy for text recognition, and the model architecture that used in this study can be used as reference material in the development of text detection systems in the future △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2301.10616 [pdf, other]

Prediction of COVID-19 by Its Variants using Multivariate Data-driven Deep Learning Models

Authors: Akhmad Dimitri Baihaqi, Novanto Yudistira, Edy Santoso

Abstract: The Coronavirus Disease 2019 or the COVID-19 pandemic has swept almost all parts of the world since the first case was found in Wuhan, China, in December 2019. With the increasing number of COVID-19 cases in the world, SARS-CoV-2 has mutated into various variants. Given the increasingly dangerous conditions of the pandemic, it is crucial to know when the pandemic will stop by predicting confirmed… ▽ More The Coronavirus Disease 2019 or the COVID-19 pandemic has swept almost all parts of the world since the first case was found in Wuhan, China, in December 2019. With the increasing number of COVID-19 cases in the world, SARS-CoV-2 has mutated into various variants. Given the increasingly dangerous conditions of the pandemic, it is crucial to know when the pandemic will stop by predicting confirmed cases of COVID-19. Therefore, many studies have raised COVID-19 as a case study to overcome the ongoing pandemic using the Deep Learning method, namely LSTM, with reasonably accurate results and small error values. LSTM training is used to predict confirmed cases of COVID-19 based on variants that have been identified using ECDC's COVID-19 dataset containing confirmed cases of COVID-19 that have been identified from 30 countries in Europe. Tests were conducted using the LSTM and BiLSTM models with the addition of RNN as comparisons on hidden size and layer size. The obtained result showed that in testing hidden sizes 25, 50, 75 to 100, the RNN model provided better results, with the minimum MSE value of 0.01 and the RMSE value of 0.012 for B.1.427/B.1.429 variant with hidden size 100. In further testing of layer sizes 2, 3, 4, and 5, the result shows that the BiLSTM model provided better results, with minimum MSE value of 0.01 and the RMSE of 0.01 for the B.1.427/B.1.429 variant with hidden size 100 and layer size 2. △ Less

Submitted 27 January, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.05865 [pdf, other]

Gated Self-supervised Learning For Improving Supervised Learning

Authors: Erland Hilman Fuadi, Aristo Renaldo Ruslim, Putu Wahyu Kusuma Wardhana, Novanto Yudistira

Abstract: In past research on self-supervised learning for image classification, the use of rotation as an augmentation has been common. However, relying solely on rotation as a self-supervised transformation can limit the ability of the model to learn rich features from the data. In this paper, we propose a novel approach to self-supervised learning for image classification using several localizable augmen… ▽ More In past research on self-supervised learning for image classification, the use of rotation as an augmentation has been common. However, relying solely on rotation as a self-supervised transformation can limit the ability of the model to learn rich features from the data. In this paper, we propose a novel approach to self-supervised learning for image classification using several localizable augmentations with the combination of the gating method. Our approach uses flip and shuffle channel augmentations in addition to the rotation, allowing the model to learn rich features from the data. Furthermore, the gated mixture network is used to weigh the effects of each self-supervised learning on the loss function, allowing the model to focus on the most relevant transformations for classification. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2207.06356 [pdf, other]

Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction

Authors: Rizki Ramadhan Fitra, Novanto Yudistira, Wayan Firdaus Mahmudy

Abstract: Coronavirus disease or COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. The first confirmed case caused by this virus was found at the end of December 2019 in Wuhan City, China. This case then spread throughout the world, including Indonesia. Therefore, the COVID-19 case was designated as a global pandemic by WHO. The growth of COVID-19 cases, especially in Indonesia, can be predi… ▽ More Coronavirus disease or COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. The first confirmed case caused by this virus was found at the end of December 2019 in Wuhan City, China. This case then spread throughout the world, including Indonesia. Therefore, the COVID-19 case was designated as a global pandemic by WHO. The growth of COVID-19 cases, especially in Indonesia, can be predicted using several approaches, such as the Deep Neural Network (DNN). One of the DNN models that can be used is Deep Transformer which can predict time series. The model is trained with several test scenarios to get the best model. The evaluation is finding the best hyperparameters. Then, further evaluation was carried out using the best hyperparameters setting of the number of prediction days, the optimizer, the number of features, and comparison with the former models of the Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN). All evaluations used metric of the Mean Absolute Percentage Error (MAPE). Based on the results of the evaluations, Deep Transformer produces the best results when using the Pre-Layer Normalization and predicting one day ahead with a MAPE value of 18.83. Furthermore, the model trained with the Adamax optimizer obtains the best performance among other tested optimizers. The performance of the Deep Transformer also exceeds other test models, which are LSTM and RNN. △ Less

Submitted 9 July, 2022; originally announced July 2022.

arXiv:2207.01584 [pdf, other]

Classification of Alzheimer's Disease Using the Convolutional Neural Network (CNN) with Transfer Learning and Weighted Loss

Authors: Muhammad Wildan Oktavian, Novanto Yudistira, Achmad Ridok

Abstract: Alzheimer's disease is a progressive neurodegenerative disorder that gradually deprives the patient of cognitive function and can end in death. With the advancement of technology today, it is possible to detect Alzheimer's disease through Magnetic Resonance Imaging (MRI) scans. So that MRI is the technique most often used for the diagnosis and analysis of the progress of Alzheimer's disease. With… ▽ More Alzheimer's disease is a progressive neurodegenerative disorder that gradually deprives the patient of cognitive function and can end in death. With the advancement of technology today, it is possible to detect Alzheimer's disease through Magnetic Resonance Imaging (MRI) scans. So that MRI is the technique most often used for the diagnosis and analysis of the progress of Alzheimer's disease. With this technology, image recognition in the early diagnosis of Alzheimer's disease can be achieved automatically using machine learning. Although machine learning has many advantages, currently the use of deep learning is more widely applied because it has stronger learning capabilities and is more suitable for solving image recognition problems. However, there are still several challenges that must be faced to implement deep learning, such as the need for large datasets, requiring large computing resources, and requiring careful parameter setting to prevent overfitting or underfitting. In responding to the challenge of classifying Alzheimer's disease using deep learning, this study propose the Convolutional Neural Network (CNN) method with the Residual Network 18 Layer (ResNet-18) architecture. To overcome the need for a large and balanced dataset, transfer learning from ImageNet is used and weighting the loss function values so that each class has the same weight. And also in this study conducted an experiment by changing the network activation function to a mish activation function to increase accuracy. From the results of the tests that have been carried out, the accuracy of the model is 88.3 % using transfer learning, weighted loss and the mish activation function. This accuracy value increases from the baseline model which only gets an accuracy of 69.1 %. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2205.12867 [pdf, other]

Image Colorization using U-Net with Skip Connections and Fusion Layer on Landscape Images

Authors: Muhammad Hisyam Zayd, Novanto Yudistira, Randy Cahya Wihandika

Abstract: We present a novel technique to automatically colorize grayscale images that combine the U-Net model and Fusion Layer features. This approach allows the model to learn the colorization of images from pre-trained U-Net. Moreover, the Fusion layer is applied to merge local information results dependent on small image patches with global priors of an entire image on each class, forming visually more… ▽ More We present a novel technique to automatically colorize grayscale images that combine the U-Net model and Fusion Layer features. This approach allows the model to learn the colorization of images from pre-trained U-Net. Moreover, the Fusion layer is applied to merge local information results dependent on small image patches with global priors of an entire image on each class, forming visually more compelling colorization results. Finally, we validate our approach with a user study evaluation and compare it against state-of-the-art, resulting in improvements. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2203.11542 [pdf, other]

Mask Usage Recognition using Vision Transformer with Transfer Learning and Data Augmentation

Authors: Hensel Donato Jahja, Novanto Yudistira, Sutrisno

Abstract: The COVID-19 pandemic has disrupted various levels of society. The use of masks is essential in preventing the spread of COVID-19 by identifying an image of a person using a mask. Although only 23.1% of people use masks correctly, Artificial Neural Networks (ANN) can help classify the use of good masks to help slow the spread of the Covid-19 virus. However, it requires a large dataset to train an… ▽ More The COVID-19 pandemic has disrupted various levels of society. The use of masks is essential in preventing the spread of COVID-19 by identifying an image of a person using a mask. Although only 23.1% of people use masks correctly, Artificial Neural Networks (ANN) can help classify the use of good masks to help slow the spread of the Covid-19 virus. However, it requires a large dataset to train an ANN that can classify the use of masks correctly. MaskedFace-Net is a suitable dataset consisting of 137016 digital images with 4 class labels, namely Mask, Mask Chin, Mask Mouth Chin, and Mask Nose Mouth. Mask classification training utilizes Vision Transformers (ViT) architecture with transfer learning method using pre-trained weights on ImageNet-21k, with random augmentation. In addition, the hyper-parameters of training of 20 epochs, an Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.03, a batch size of 64, a Gaussian Cumulative Distribution (GeLU) activation function, and a Cross-Entropy loss function are used to be applied on the training of three architectures of ViT, namely Base-16, Large-16, and Huge-14. Furthermore, comparisons of with and without augmentation and transfer learning are conducted. This study found that the best classification is transfer learning and augmentation using ViT Huge-14. Using this method on MaskedFace-Net dataset, the research reaches an accuracy of 0.9601 on training data, 0.9412 on validation data, and 0.9534 on test data. This research shows that training the ViT model with data augmentation and transfer learning improves classification of the mask usage, even better than convolutional-based Residual Network (ResNet). △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.04606 [pdf, other]

Attention-effective multiple instance learning on weakly stem cell colony segmentation

Authors: Novanto Yudistira, Muthu Subash Kavitha, Jeny Rajan, Takio Kurita

Abstract: The detection of induced pluripotent stem cell (iPSC) colonies often needs the precise extraction of the colony features. However, existing computerized systems relied on segmentation of contours by preprocessing for classifying the colony conditions were task-extensive. To maximize the efficiency in categorizing colony conditions, we propose a multiple instance learning (MIL) in weakly supervised… ▽ More The detection of induced pluripotent stem cell (iPSC) colonies often needs the precise extraction of the colony features. However, existing computerized systems relied on segmentation of contours by preprocessing for classifying the colony conditions were task-extensive. To maximize the efficiency in categorizing colony conditions, we propose a multiple instance learning (MIL) in weakly supervised settings. It is designed in a single model to produce weak segmentation and classification of colonies without using finely labeled samples. As a single model, we employ a U-net-like convolution neural network (CNN) to train on binary image-level labels for MIL colonies classification. Furthermore, to specify the object of interest we used a simple post-processing method. The proposed approach is compared over conventional methods using five-fold cross-validation and receiver operating characteristic (ROC) curve. The maximum accuracy of the MIL-net is 95%, which is 15 % higher than the conventional methods. Furthermore, the ability to interpret the location of the iPSC colonies based on the image level label without using a pixel-wise ground truth image is more appealing and cost-effective in colony condition recognition. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2107.06802 [pdf]

doi 10.1145/3479645.3479679

BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews

Authors: Kuncahyo Setyo Nugroho, Anantha Yullian Sukmadewa, Haftittah Wuswilahaken DW, Fitra Abdurrachman Bachtiar, Novanto Yudistira

Abstract: User reviews have an essential role in the success of the developed mobile apps. User reviews in the textual form are unstructured data, creating a very high complexity when processed for sentiment analysis. Previous approaches that have been used often ignore the context of reviews. In addition, the relatively small data makes the model overfitting. A new approach, BERT, has been introduced as a… ▽ More User reviews have an essential role in the success of the developed mobile apps. User reviews in the textual form are unstructured data, creating a very high complexity when processed for sentiment analysis. Previous approaches that have been used often ignore the context of reviews. In addition, the relatively small data makes the model overfitting. A new approach, BERT, has been introduced as a transfer learning model with a pre-trained model that has previously been trained to have a better context representation. This study examines the effectiveness of fine-tuning BERT for sentiment analysis using two different pre-trained models. Besides the multilingual pre-trained model, we use the pre-trained model that only has been trained in Indonesian. The dataset used is Indonesian user reviews of the ten best apps in 2020 in Google Play sites. We also perform hyper-parameter tuning to find the optimum trained model. Two training data labeling approaches were also tested to determine the effectiveness of the model, which is score-based and lexicon-based. The experimental results show that pre-trained models trained in Indonesian have better average accuracy on lexicon-based data. The pre-trained Indonesian model highest accuracy is 84%, with 25 epochs and a training time of 24 minutes. These results are better than all of the machine learning and multilingual pre-trained models. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Journal ref: SIET '21: 6th International Conference on Sustainable Information Engineering and Technology 2021

arXiv:2107.06796 [pdf]

doi 10.1145/3479645.3479666

Indonesia's Fake News Detection using Transformer Network

Authors: Aisyah Awalina, Jibran Fawaid, Rifky Yunus Krisnabayu, Novanto Yudistira

Abstract: Fake news is a problem faced by society in this era. It is not rare for fake news to cause provocation and problem for the people. Indonesia, as a country with the 4th largest population, has a problem in dealing with fake news. More than 30% of rural and urban population are deceived by this fake news problem. As we have been studying, there is only few literatures on preventing the spread of fak… ▽ More Fake news is a problem faced by society in this era. It is not rare for fake news to cause provocation and problem for the people. Indonesia, as a country with the 4th largest population, has a problem in dealing with fake news. More than 30% of rural and urban population are deceived by this fake news problem. As we have been studying, there is only few literatures on preventing the spread of fake news in Bahasa Indonesia. So, this research is conducted to prevent these problems. The dataset used in this research was obtained from a news portal that identifies fake news, turnbackhoax.id. Using Web Scrap** on this page, we got 1116 data consisting of valid news and fake news. The dataset can be accessed at https://github.com/JibranFawaid/turnbackhoax-dataset. This dataset will be combined with other available datasets. The methods used are CNN, BiLSTM, Hybrid CNN-BiLSTM, and BERT with Transformer Network. This research shows that the BERT method with Transformer Network has the best results with an accuracy of up to 90%. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Journal ref: SIET '21: 6th International Conference on Sustainable Information Engineering and Technology 2021

arXiv:2107.06785 [pdf]

doi 10.1145/3479645.3479658

Large-Scale News Classification using BERT Language Model: Spark NLP Approach

Authors: Kuncahyo Setyo Nugroho, Anantha Yullian Sukmadewa, Novanto Yudistira

Abstract: The rise of big data analytics on top of NLP increases the computational burden for text processing at scale. The problems faced in NLP are very high dimensional text, so it takes a high computation resource. The MapReduce allows parallelization of large computations and can improve the efficiency of text processing. This research aims to study the effect of big data processing on NLP tasks based… ▽ More The rise of big data analytics on top of NLP increases the computational burden for text processing at scale. The problems faced in NLP are very high dimensional text, so it takes a high computation resource. The MapReduce allows parallelization of large computations and can improve the efficiency of text processing. This research aims to study the effect of big data processing on NLP tasks based on a deep learning approach. We classify a big text of news topics with fine-tuning BERT used pre-trained models. Five pre-trained models with a different number of parameters were used in this study. To measure the efficiency of this method, we compared the performance of the BERT with the pipelines from Spark NLP. The result shows that BERT without Spark NLP gives higher accuracy compared to BERT with Spark NLP. The accuracy average and training time of all models using BERT is 0.9187 and 35 minutes while using BERT with Spark NLP pipeline is 0.8444 and 9 minutes. The bigger model will take more computation resources and need a longer time to complete the tasks. However, the accuracy of BERT with Spark NLP only decreased by an average of 5.7%, while the training time was reduced significantly by 62.9% compared to BERT without Spark NLP. △ Less

Submitted 14 July, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

Journal ref: SIET '21: 6th International Conference on Sustainable Information Engineering and Technology 2021

arXiv:2012.09542 [pdf, other]

doi 10.1007/s11263-022-01649-x

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN

Authors: Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita

Abstract: 3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResN… ▽ More 3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResNext network, and ii) implement attention gating network to improve the accuracy of the action recognition. The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition. Firstly, the 3DResNext is trained and applied for action classification using backpropagation concerning the maximum predicted class. The gradients and activations of every layer are then up-sampled. Later, aggregation is used to produce more nuanced attention, which points out the most critical part of the predicted class's input videos. We use contour thresholding of final attention for final localization. We evaluate spatial and temporal action localization in trimmed videos using fine-grained visual explanation via 3DCam. Experimental results show that the proposed approach produces informative visual explanations and discriminative attention. Furthermore, the action recognition via attention gating on each layer produces better classification results than the baseline model. △ Less

Submitted 16 August, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

Journal ref: International Journal of Computer Vision, 2022

arXiv:2009.07567 [pdf, other]

U-Net with Graph Based Smoothing Regularizer for Small Vessel Segmentation on Fundus Image

Authors: Lukman Hakim, Novanto Yudistira, Muthusubash Kavitha, Takio Kurita

Abstract: The detection of retinal blood vessels, especially the changes of small vessel condition is the most important indicator to identify the vascular network of the human body. Existing techniques focused mainly on shape of the large vessels, which is not appropriate for the disconnected small and isolated vessels. Paying attention to the low contrast small blood vessel in fundus region, first time we… ▽ More The detection of retinal blood vessels, especially the changes of small vessel condition is the most important indicator to identify the vascular network of the human body. Existing techniques focused mainly on shape of the large vessels, which is not appropriate for the disconnected small and isolated vessels. Paying attention to the low contrast small blood vessel in fundus region, first time we proposed to combine graph based smoothing regularizer with the loss function in the U-net framework. The proposed regularizer treated the image as two graphs by calculating the graph laplacians on vessel regions and the background regions on the image. The potential of the proposed graph based smoothing regularizer in reconstructing small vessel is compared over the classical U-net with or without regularizer. Numerical and visual results shows that our developed regularizer proved its effectiveness in segmenting the small vessels and reconnecting the fragmented retinal blood vessels. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Journal ref: ICONIP2019

arXiv:2005.04809 [pdf, other]

COVID-19 growth prediction using multivariate long short term memory

Authors: Novanto Yudistira

Abstract: Coronavirus disease (COVID-19) spread forecasting is an important task to track the growth of the pandemic. Existing predictions are merely based on qualitative analyses and mathematical modeling. The use of available big data with machine learning is still limited in COVID-19 growth prediction even though the availability of data is abundance. To make use of big data in the prediction using deep… ▽ More Coronavirus disease (COVID-19) spread forecasting is an important task to track the growth of the pandemic. Existing predictions are merely based on qualitative analyses and mathematical modeling. The use of available big data with machine learning is still limited in COVID-19 growth prediction even though the availability of data is abundance. To make use of big data in the prediction using deep learning, we use long short-term memory (LSTM) method to learn the correlation of COVID-19 growth over time. The structure of an LSTM layer is searched heuristically until the best validation score is achieved. First, we trained training data containing confirmed cases from around the globe. We achieved favorable performance compared with that of the recurrent neural network (RNN) method with a comparable low validation error. The evaluation is conducted based on graph visualization and root mean squared error (RMSE). We found that it is not easy to achieve the same quantity of confirmed cases over time. However, LSTM provide a similar pattern between the actual cases and prediction. In the future, our proposed prediction can be used for anticipating forthcoming pandemics. The code is provided here: https://github.com/cbasemaster/lstmcorona △ Less

Submitted 27 May, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

Journal ref: IAENG International Journal of Computer Science, vol. 47, no. 4, pp829-837, 2020

arXiv:1807.08291 [pdf, other]

doi 10.1016/j.image.2019.115731

Correlation Net: Spatiotemporal multimodal deep learning for action recognition

Authors: Novanto Yudistira, Takio Kurita

Abstract: This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal streams are required for action recognition by a deep CNN, but overfitting reduction and fusing these two streams remain open problems. The existing fusion approac… ▽ More This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal streams are required for action recognition by a deep CNN, but overfitting reduction and fusing these two streams remain open problems. The existing fusion approach averages the two streams. Here we propose a correlation network with a Shannon fusion for learning a pre-trained CNN. A Long-range video may consist of spatiotemporal correlations over arbitrary times, which can be captured by forming the correlation network from simple fully connected layers. This approach was found to complement the existing network fusion methods. The importance of multimodal correlation is validated in comparison experiments on the UCF-101 and HMDB-51 datasets. The multimodal correlation enhanced the accuracy of the video recognition results. △ Less

Submitted 16 December, 2019; v1 submitted 22 July, 2018; originally announced July 2018.

Journal ref: Signal Processing: Image Communication, Volume 82, March 2020, 115731

arXiv:1804.10855 [pdf]

Evaluation of Feature Detector-Descriptor for Real Object Matching under Various Conditions of Ilumination and Affine Transformation

Authors: Novanto Yudistira, Achmad Ridok, Ali Fauzi

Abstract: This study attempts to provide explanations, descriptions and evaluations of some most popular and current combinations of description and descriptor frameworks, namely SIFT, SURF, MSER, and BRISK for keypoint extractors and SIFT, SURF, BRISK, and FREAK for descriptors. Evaluations are made based on the number of matches of keypoints and repeatability in various image variations. It is used as the… ▽ More This study attempts to provide explanations, descriptions and evaluations of some most popular and current combinations of description and descriptor frameworks, namely SIFT, SURF, MSER, and BRISK for keypoint extractors and SIFT, SURF, BRISK, and FREAK for descriptors. Evaluations are made based on the number of matches of keypoints and repeatability in various image variations. It is used as the main parameter to assess how well combinations of algorithms are in matching objects with different variations. There are many papers that describe the comparison of detection and description features to detect objects in images under various conditions, but the combination of algorithms attached to them has not been much discussed. The problem domain is limited to different illumination levels and affine transformations from different perspectives. To evaluate the robustness of all combinations of algorithms, we use a stereo image matching case. △ Less

Submitted 1 May, 2018; v1 submitted 28 April, 2018; originally announced April 2018.

arXiv:1802.03155 [pdf]

Web-Based Implementation of Travelling Salesperson Problem Using Genetic Algorithm

Authors: Aryo Pinandito, Novanto Yudistira, Fajar Pradana

Abstract: The world is connected through the Internet. As the abundance of Internet users connected into the Web and the popularity of cloud computing research, the need of Artificial Intelligence (AI) is demanding. In this research, Genetic Algorithm (GA) as AI optimization method through natural selection and genetic evolution is utilized. There are many applications of GA such as web mining, load balanci… ▽ More The world is connected through the Internet. As the abundance of Internet users connected into the Web and the popularity of cloud computing research, the need of Artificial Intelligence (AI) is demanding. In this research, Genetic Algorithm (GA) as AI optimization method through natural selection and genetic evolution is utilized. There are many applications of GA such as web mining, load balancing, routing, and scheduling or web service selection. Hence, it is a challenging task to discover whether the code mainly server side and web based language technology affects the performance of GA. Travelling Salesperson Problem (TSP) as Non Polynomial-hard (NP-hard) problem is provided to be a problem domain to be solved by GA. While many scientists prefer Python in GA implementation, another popular high-level interpreter programming language such as PHP (PHP Hypertext Preprocessor) and Ruby were benchmarked. Line of codes, file sizes, and performances based on GA implementation and runtime were found varies among these programming languages. Based on the result, the use of Ruby in GA implementation is recommended. △ Less

Submitted 9 February, 2018; originally announced February 2018.

Showing 1–25 of 25 results for author: Yudistira, N