Skip to main content

Showing 1–34 of 34 results for author: Sakti, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00826  [pdf, other

    cs.CL cs.SD eess.AS

    NAIST Simultaneous Speech Translation System for IWSLT 2024

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding poli… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: IWSLT 2024 system paper

  2. arXiv:2301.02966  [pdf, other

    cs.CL cs.LG eess.AS

    SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

    Authors: Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use. This first release focuses on the TTS-to-ASR chain, a core component of the machine speech chain, that refers to the TTS data augmentation by unspoken text for ASR. To build an efficient pipeline for the large-scale TTS-to-ASR chain, we implement easy-to-use multi… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    Comments: Submitted to ICASSP 2023

    MSC Class: 68T10 ACM Class: I.2.7

  3. arXiv:2212.09648  [pdf, other

    cs.CL cs.AI

    NusaCrowd: Open Source Initiative for Indonesian NLP Resources

    Authors: Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri , et al. (22 additional authors not shown)

    Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple exp… ▽ More

    Submitted 21 July, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

  4. arXiv:2211.14515  [pdf, other

    cs.CV cs.LG

    Instance-level Heterogeneous Domain Adaptation for Limited-labeled Sketch-to-Photo Retrieval

    Authors: Fan Yang, Yang Wu, Zheng Wang, Xiang Li, Sakriani Sakti, Satoshi Nakamura

    Abstract: Although sketch-to-photo retrieval has a wide range of applications, it is costly to obtain paired and rich-labeled ground truth. Differently, photo retrieval data is easier to acquire. Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i.e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i.e., target domain). Howev… ▽ More

    Submitted 6 December, 2022; v1 submitted 26 November, 2022; originally announced November 2022.

  5. arXiv:2208.12940  [pdf, other

    cs.CV

    Actor-identified Spatiotemporal Action Detection -- Detecting Who Is Doing What in Videos

    Authors: Fan Yang, Norimichi Ukita, Sakriani Sakti, Satoshi Nakamura

    Abstract: The success of deep learning on video Action Recognition (AR) has motivated researchers to progressively promote related tasks from the coarse level to the fine-grained level. Compared with conventional AR which only predicts an action label for the entire video, Temporal Action Detection (TAD) has been investigated for estimating the start and end time for each action in videos. Taking TAD a step… ▽ More

    Submitted 7 September, 2022; v1 submitted 27 August, 2022; originally announced August 2022.

  6. arXiv:2206.00635  [pdf, other

    cs.SD cs.AI eess.AS

    Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition

    Authors: Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura

    Abstract: Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes. To fuel further EEG research with speech production, a method using three-mode tensor decomposition (time x space x freq… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Journal ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  7. arXiv:2205.06963  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing

    Authors: Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura

    Abstract: Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) automatic speech recognition (ASR). This principle encourages an ASR model to output similar predictions for the same input speech with different perturbations. The existing paradigm of semi-supervised S2S ASR utilizes SpecAugment as data augmentation and requires a static teacher model to produce ps… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

    MSC Class: 68T10 ACM Class: I.2.7

  8. arXiv:2203.15643  [pdf, other

    cs.SD cs.CL cs.LG cs.NE eess.AS

    Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

    Authors: Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti

    Abstract: Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specif… ▽ More

    Submitted 5 November, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted at SLT 2022 (https://slt2022.org/). Associated materials can be seen in https://github.com/rendchevi/nix-tts

    MSC Class: 68T50 (Primary) 68T07; 68T10; 68T99 (Secondary) ACM Class: I.2.7; I.2.6; H.5.5

  9. arXiv:2011.04845  [pdf, other

    cs.CL

    Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS

    Authors: Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

    Abstract: This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation. The system consists of three fully-incremental neural processing modules for automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). We investigated its overall latency in the system's Ear-Voice Span and speaking latency along with module-leve… ▽ More

    Submitted 11 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: 6 pages

  10. arXiv:2011.02128  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

    Authors: Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Even though over seven hundred ethnic languages are spoken in Indonesia, the available technology remains limited that could support communication within indigenous communities as well as with people outside the villages. As a result, indigenous communities still face isolation due to cultural barriers; languages continue to disappear. To accelerate communication, speech-to-speech translation (S2S… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted in SLTU-CCURL 2020

  11. arXiv:2011.02127  [pdf, other

    cs.CL cs.SD eess.AS

    Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

    Authors: Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Attention-based sequence-to-sequence automatic speech recognition (ASR) requires a significant delay to recognize long utterances because the output is generated after receiving entire input sequences. Although several studies recently proposed sequence mechanisms for incremental speech recognition (ISR), using different frameworks and learning algorithms is more complicated than the standard ASR… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted in INTERSPEECH 2019

  12. arXiv:2011.02126  [pdf, other

    cs.CL cs.SD eess.AS

    Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

    Authors: Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

    Abstract: Inspired by a human speech chain mechanism, a machine speech chain framework based on deep learning was recently proposed for the semi-supervised development of automatic speech recognition (ASR) and text-to-speech synthesis TTS) systems. However, the mechanism to listen while speaking can be done only after receiving entire input sequences. Thus, there is a significant delay when encountering lon… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted in INTERSPEECH 2020

  13. arXiv:2011.02099  [pdf, other

    cs.CL cs.SD eess.AS

    Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

    Authors: Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data. However, that framework still requires a large amount of unpaired (speech or text) data. A prototype multimodal machine chain was then explored… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted at INTERSPEECH 2020

  14. arXiv:2010.05967  [pdf, other

    cs.CL cs.AI

    The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

    Authors: Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speec… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of Interspeech 2020

  15. arXiv:2007.03200  [pdf, other

    cs.CV

    ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation

    Authors: Fan Yang, Xin Chang, Chenyu Dang, Ziqiang Zheng, Sakriani Sakti, Satoshi Nakamura, Yang Wu

    Abstract: We aim to improve the performance of Multiple Object Tracking and Segmentation (MOTS) by refinement. However, it remains challenging for refining MOTS results, which could be attributed to that appearance features are not adapted to target videos and it is also difficult to find proper thresholds to discriminate them. To tackle this issue, we propose a self-supervised refining MOTS (i.e., ReMOTS)… ▽ More

    Submitted 13 January, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: 4 pages

  16. arXiv:2005.11676  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019. The main theme in this challenge is to build a speech synthesizer without any textual information or phonetic labels. In order to tackle those challenges, we build a system that must address two major components such as 1) given speech audio, extract subword units in an unsupervised way and 2) re-synthes… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

    Comments: Submitted to INTERSPEECH 2020

  17. arXiv:1911.10535  [pdf, other

    cs.CV

    Using Panoramic Videos for Multi-person Localization and Tracking in a 3D Panoramic Coordinate

    Authors: Fan Yang, Feiran Li, Yang Wu, Sakriani Sakti, Satoshi Nakamura

    Abstract: 3D panoramic multi-person localization and tracking are prominent in many applications, however, conventional methods using LiDAR equipment could be economically expensive and also computationally inefficient due to the processing of point cloud data. In this work, we propose an effective and efficient approach at a low cost. First, we obtain panoramic videos with four normal cameras. Then, we tra… ▽ More

    Submitted 7 March, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

    Comments: 5 pages

  18. arXiv:1910.00795  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Speech-to-speech Translation between Untranscribed Unknown Languages

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: In this paper, we explore a method for training speech-to-speech translation tasks without any transcription or linguistic supervision. Our proposed method consists of two steps: First, we train and generate discrete representation with unsupervised term discovery with a discrete quantized autoencoder. Second, we train a sequence-to-sequence model that directly maps the source language speech to t… ▽ More

    Submitted 5 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: Accepted in IEEE ASRU 2019. Web-page for more samples & details: https://sp2code-translation-v1.netlify.com/

  19. arXiv:1907.09658  [pdf, other

    cs.CV

    Make Skeleton-based Action Recognition Model Smaller, Faster and Better

    Authors: Fan Yang, Sakriani Sakti, Yang Wu, Satoshi Nakamura

    Abstract: Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed. To alleviate this issue, we analyze skeleton sequence properties to propose a Double-feature Double-motion Network (DD-Net) for skeleton-based action recognition. By using a lightweight network structure (i.e., 0.15 million… ▽ More

    Submitted 18 March, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

    Comments: 6 pages, 5 figures

  20. arXiv:1906.00579  [pdf, other

    cs.CL cs.SD eess.AS

    Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain

    Authors: Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior. Such chains separately processed listening and speaking by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) and simultaneously enabled them to teach each other in semi-supervised learning when they received unpaired data. Un… ▽ More

    Submitted 14 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: Accepted in IEEE ASRU 2019

  21. arXiv:1905.11449  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

    Authors: Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

    Abstract: We describe our submitted system for the ZeroSpeech Challenge 2019. The current challenge theme addresses the difficulty of constructing a speech synthesizer without any text or phonetic labels and requires a system that can (1) discover subword units in an unsupervised way, and (2) synthesize the speech with a target speaker's voice. Moreover, the system should also balance the discrimination sco… ▽ More

    Submitted 29 May, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: Submitted to Interspeech 2019

  22. arXiv:1904.11469  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Challenge 2019: TTS without T

    Authors: Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery datase… ▽ More

    Submitted 7 July, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Interspeech 2019

  23. arXiv:1810.13107  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: The speech chain mechanism integrates automatic speech recognition (ASR) and text-to-speech synthesis (TTS) modules into a single cycle during training. In our previous work, we applied a speech chain mechanism as a semi-supervised learning. It provides the ability for ASR and TTS to assist each other when they receive unpaired data and let them infer the missing pair and optimize the model with r… ▽ More

    Submitted 31 October, 2018; originally announced October 2018.

  24. arXiv:1807.08280  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multi-scale Alignment and Contextual History for Attention Mechanism in Sequence-to-sequence Model

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: A sequence-to-sequence model is a neural network module for map** two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the encoder and decoder modules and improves model performance in many tasks. In this paper, we propose two ideas to improve sequence-to-sequence model performance by enh… ▽ More

    Submitted 22 July, 2018; originally announced July 2018.

  25. arXiv:1803.10525  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Machine Speech Chain with One-shot Speaker Adaptation

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components to mutually improve their performance. This was accomplished by the two parts teaching each other using both labeled and unlabeled data. This approach could significantly improve model performa… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

  26. arXiv:1802.10410  [pdf, other

    cs.LG

    Tensor Decomposition for Compressing Recurrent Neural Network

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling. However, behind the impressive performance, RNNs require a large number of parameters for both training and inference. In this paper, we are trying to reduce the number of parameters and maintain the expressive power from RNN simultaneously. We utilize several tensor decom… ▽ More

    Submitted 8 May, 2018; v1 submitted 28 February, 2018; originally announced February 2018.

    Comments: Accepted at IJCNN 2018. Source code URL: https://github.com/androstj/tensor_rnn

  27. arXiv:1802.08645  [pdf, other

    cs.CV

    Interactive Image Manipulation with Natural Language Instruction Commands

    Authors: Seitaro Shinagawa, Koichiro Yoshino, Sakriani Sakti, Yu Suzuki, Satoshi Nakamura

    Abstract: We propose an interactive image-manipulation system with natural language instruction, which can generate a target image from a source image and an instruction that describes the difference between the source and the target image. The system makes it possible to modify a generated image interactively and make natural language conditioned image generation more controllable. We construct a neural ne… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

    Comments: accepted at NIPS 2017 ViGIL workshop (https://nips2017vigil.github.io/)

  28. arXiv:1802.06003  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation

    Authors: Takatomo Kano, Sakriani Sakti, Satoshi Nakamura

    Abstract: Sequence-to-sequence attentional-based neural network architectures have been shown to provide a powerful model for machine translation and speech recognition. Recently, several works have attempted to extend the models for end-to-end speech translation task. However, the usefulness of these models were only investigated on language pairs with similar syntax and word order (e.g., English-French or… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

  29. arXiv:1710.10774  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Sequence-to-Sequence ASR Optimization via Reinforcement Learning

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the sequence-to-sequence architecture, the model is trained to predict the grapheme of the current time-step given the input of speech signal and the ground-truth grapheme hi… ▽ More

    Submitted 28 February, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: Accepted at ICASSP 2018

  30. arXiv:1709.07814  [pdf, other

    cs.CL cs.LG cs.SD

    Attention-based Wav2Text with Feature Transfer Learning

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Conventional automatic speech recognition (ASR) typically performs multi-level pattern recognition tasks that map the acoustic speech waveform into a hierarchy of speech units. But, it is widely known that information loss in the earlier stage can propagate through the later stages. After the resurgence of deep learning, interest has emerged in the possibility of develo** a purely end-to-end ASR… ▽ More

    Submitted 22 September, 2017; originally announced September 2017.

    Comments: Accepted at ASRU 2017

  31. arXiv:1707.04879  [pdf, other

    cs.CL cs.LG cs.SD

    Listening while Speaking: Speech Chain by Deep Learning

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Despite the close relationship between speech perception and production, research in automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has progressed more or less independently without exerting much mutual influence on each other. In human communication, on the other hand, a closed-loop speech chain mechanism with auditory feedback from the speaker's mouth to her ear is crucial… ▽ More

    Submitted 16 July, 2017; originally announced July 2017.

  32. arXiv:1706.02222  [pdf, other

    cs.LG cs.CL stat.ML

    Gated Recurrent Neural Tensor Network

    Authors: Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura

    Abstract: Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling temporal and sequential data need to capture long-term dependencies on datasets and represent them in hidden layers with a powerful model to capture more information from inputs. For modeling long-term dependencies in a dataset, the gating mechanism concept can help RNNs remember and forget previous information. Representin… ▽ More

    Submitted 7 June, 2017; originally announced June 2017.

    Comments: Accepted at IJCNN 2016 URL : http://ieeexplore.ieee.org/document/7727233/

  33. arXiv:1705.08091  [pdf, other

    cs.CL

    Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole… ▽ More

    Submitted 3 November, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: Accepted at IJCNLP 2017 --- (V2: added more experiments on G2P & MT)

  34. Compressing Recurrent Neural Network with Tensor Train

    Authors: Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

    Abstract: Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems. However, most of the state-of-the-art RNNs have millions of parameters and require many computational resources for training and predicting new data. This paper proposes an alternative RNN model to reduce the number of parameters… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

    Comments: Accepted at IJCNN 2017