Skip to main content

Showing 1–34 of 34 results for author: Hautamaki, V

.
  1. arXiv:2406.09999  [pdf, other

    eess.AS

    ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR

    Authors: Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md. Sahidullah, Tomi Kinnunen

    Abstract: While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fix… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted: Interspeech 2024

    Journal ref: Interspeech 2024

  2. arXiv:2406.04913  [pdf, other

    cs.AI cs.LG

    Online Adaptation for Enhancing Imitation Learning Policies

    Authors: Federico Malato, Ville Hautamaki

    Abstract: Imitation learning enables autonomous agents to learn from human examples, without the need for a reward signal. Still, if the provided dataset does not encapsulate the task correctly, or when the task is too complex to be modeled, such agents fail to reproduce the expert policy. We propose to recover from these failures through online adaptation. Our approach combines the action proposal coming f… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at IEEE Conference on Games 2024, Milan, Italy

  3. arXiv:2403.13801  [pdf, other

    cs.RO cs.AI cs.CL

    Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs

    Authors: Yusuke Mikami, Andrew Melnik, Jun Miura, Ville Hautamäki

    Abstract: We demonstrate experimental results with LLMs that address robotics task planning problems. Recently, LLMs have been applied in robotics task planning, particularly using a code generation approach that converts complex high-level instructions into mid-level policy codes. In contrast, our approach acquires text descriptions of the task and scene objects, then formulates task planning through natur… ▽ More

    Submitted 6 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 2 figures

    ACM Class: I.2.9; I.2.7

  4. Zero-shot Imitation Policy via Search in Demonstration Dataset

    Authors: Federco Malato, Florian Leopold, Andrew Melnik, Ville Hautamaki

    Abstract: Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  5. arXiv:2401.02626  [pdf, other

    cs.SD eess.AS

    Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio

    Authors: Yi Ma, Kong Aik Lee, Ville Hautamäki, Meng Ge, Haizhou Li

    Abstract: Speaker verification is hampered by background noise, particularly at extremely low Signal-to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing unwanted artifacts, which adversely affects speaker verification. We proposed the mechanism called Gradient Weighting (Grad-W), which dynamically identifies and reduces artifact noise during prediction. The mechanism is ba… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  6. arXiv:2306.09082  [pdf, other

    cs.AI

    Behavioral Cloning via Search in Embedded Demonstration Dataset

    Authors: Federico Malato, Florian Leopold, Ville Hautamaki, Andrew Melnik

    Abstract: Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the a… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  7. arXiv:2303.13512  [pdf, other

    cs.AI

    Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

    Authors: Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou, Bingquan Yu, He Liu, Kai Guan, Yu**g Hu, Tangjie Lv, Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik, Shu Ishida, João F. Henriques, Robert Klassert, Walter Laurito, Ellen Novoseller , et al. (5 additional authors not shown)

    Abstract: To facilitate research in the direction of fine-tuning foundation models from human feedback, we held the MineRL BASALT Competition on Fine-Tuning from Human Feedback at NeurIPS 2022. The BASALT challenge asks teams to compete to develop algorithms to solve tasks with hard-to-specify reward functions in Minecraft. Through this competition, we aimed to promote the development of algorithms that use… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  8. arXiv:2212.13326  [pdf, other

    cs.LG cs.AI cs.CV

    Behavioral Cloning via Search in Video PreTraining Latent Space

    Authors: Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik

    Abstract: Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in… ▽ More

    Submitted 17 April, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

  9. arXiv:2210.15385  [pdf, other

    eess.AS cs.SD eess.SP

    Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

    Authors: Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

    Abstract: We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance of various length. Contrastive learning is a typical self-supervised learning technique. However, the quality of the speaker encoder depends very much on the sa… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 13 pages

  10. arXiv:2205.07060  [pdf, other

    cs.AI cs.CR cs.LG

    GAN-Aimbots: Using Machine Learning for Cheating in First Person Shooters

    Authors: Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki

    Abstract: Playing games with cheaters is not fun, and in a multi-billion-dollar video game industry with hundreds of millions of players, game developers aim to improve the security and, consequently, the user experience of their games by preventing cheating. Both traditional software-based methods and statistical systems have been successful in protecting against cheating, but recent advances in the automa… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: Accepted to IEEE Transactions on Games. Source code available at https://github.com/miffyli/gan-aimbots

  11. arXiv:2203.05074  [pdf, other

    cs.LG cs.AI cs.CV

    The Transitive Information Theory and its Application to Deep Generative Models

    Authors: Trung Ngo, Najwa Laabid, Ville Hautamäki, Merja Heinäniemi

    Abstract: Paradoxically, a Variational Autoencoder (VAE) could be pushed in two opposite directions, utilizing powerful decoder model for generating realistic images but collapsing the learned representation, or increasing regularization coefficient for disentangling representation but ultimately generating blurry examples. Existing methods narrow the issues to the rate-distortion trade-off between compress… ▽ More

    Submitted 28 March, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

  12. arXiv:2201.09709  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

    Authors: Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

    Abstract: As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Published version available at: https://ieeexplore.ieee.org/document/9664367

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022

  13. arXiv:2201.07719  [pdf, other

    cs.AI

    Improving Behavioural Cloning with Human-Driven Dynamic Dataset Augmentation

    Authors: Federico Malato, Joona Jehkonen, Ville Hautamäki

    Abstract: Behavioural cloning has been extensively used to train agents and is recognized as a fast and solid approach to teach general behaviours based on expert trajectories. Such method follows the supervised learning paradigm and it strongly depends on the distribution of the data. In our paper, we show how combining behavioural cloning with human-in-the-loop training solves some of its flaws and provid… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: 6 pages, 5 figures, 2 code snippets, accepted at the AAAI-22 Workshop on Interactive Machine Learning

  14. arXiv:2110.03869  [pdf, other

    eess.AS eess.SP

    Self-supervised Speaker Recognition with Loss-gated Learning

    Authors: Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

    Abstract: In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to stud… ▽ More

    Submitted 14 July, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures

  15. arXiv:2110.00940  [pdf, other

    cs.SD cs.AI eess.AS

    PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

    Authors: Yi Ma, Kong Aik Lee, Ville Hautamaki, Haizhou Li

    Abstract: Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise. However, excessive suppression may lead to speech distortion and speaker information loss, which degrades the performance of speaker embedding extraction. To alleviate this problem, we propose an end-to-end deep learning framework, dubbed PL-EESR, for robust speaker representation… ▽ More

    Submitted 3 October, 2021; originally announced October 2021.

  16. arXiv:2109.13510  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    VoxCeleb Enrichment for Age and Gender Recognition

    Authors: Khaled Hechmi, Trung Ngo Trong, Ville Hautamaki, Tomi Kinnunen

    Abstract: VoxCeleb datasets are widely used in speaker recognition studies. Our work serves two purposes. First, we provide speaker age labels and (an alternative) annotation of speaker gender. Second, we demonstrate the use of this metadata by constructing age and gender recognition models with different features and classifiers. We query different celebrity databases and apply consensus rules to derive ag… ▽ More

    Submitted 20 December, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Accepted for presentation at ASRU 2021; repository: https://github.com/hechmik/voxceleb_enrichment_age_gender

  17. arXiv:2107.00703  [pdf, other

    cs.LG cs.AI

    Distilling Reinforcement Learning Tricks for Video Games

    Authors: Anssi Kanervisto, Christian Scheller, Yanick Schraner, Ville Hautamäki

    Abstract: Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps ("tricks") which may be needed to effectively use RL, such as reward sha**, curriculum learning, and splitting a large task into smaller chunks. Such tri… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: To appear in IEEE Conference on Games 2021. Experiment code is available at https://github.com/Miffyli/rl-human-prior-tricks

  18. arXiv:2104.10753  [pdf, other

    cs.RO cs.AI cs.CV

    Multi-task Learning with Attention for End-to-end Autonomous Driving

    Authors: Keishi Ishihara, Anssi Kanervisto, Jun Miura, Ville Hautamäki

    Abstract: Autonomous driving systems need to handle complex scenarios such as lane following, avoiding collisions, taking turns, and responding to traffic signals. In recent years, approaches based on end-to-end behavioral cloning have demonstrated remarkable performance in point-to-point navigational scenarios, using a realistic simulator and standard benchmarks. Offline imitation learning is readily avail… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR 2021 Workshop on Autonomous Driving

  19. arXiv:2012.04199  [pdf, other

    cs.CV cs.LG

    Cost Sensitive Optimization of Deepfake Detector

    Authors: Ivan Kukanov, Janne Karttunen, Hannu Sillanpää, Ville Hautamäki

    Abstract: Since the invention of cinema, the manipulated videos have existed. But generating manipulated videos that can fool the viewer has been a time-consuming endeavor. With the dramatic improvements in the deep generative modeling, generating believable looking fake videos has become a reality. In the present work, we concentrate on the so-called deepfake videos, where the source face is swapped with t… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

  20. arXiv:2012.01244  [pdf, other

    cs.AI cs.NE

    General Characterization of Agents by States they Visit

    Authors: Anssi Kanervisto, Tomi Kinnunen, Ville Hautamäki

    Abstract: Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update. However, previously presented solutions are not applicable in general, either due to lack of expressive power, computational constraint… ▽ More

    Submitted 28 October, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Deep Reinforcement Learning Workshop, NeurIPS 2021

  21. arXiv:2006.07698  [pdf, other

    cs.CL cs.LG

    Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya

    Authors: Abrhalei Tela, Abraham Woubie, Ville Hautamaki

    Abstract: In recent years, transformer models have achieved great success in natural language processing (NLP) tasks. Most of the current state-of-the-art NLP results are achieved by using monolingual transformer models, where the model is pre-trained using a single language unlabelled text corpus. Then, the model is fine-tuned to the specific downstream task. However, the cost of pre-training a new transfo… ▽ More

    Submitted 19 June, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

  22. arXiv:2005.03374  [pdf, other

    cs.AI cs.LG

    Playing Minecraft with Behavioural Cloning

    Authors: Anssi Kanervisto, Janne Karttunen, Ville Hautamäki

    Abstract: MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment. We approached this task with behavioural cloning by predicting what actions human players would take, and reached fifth place in the final ranking. Despite being a simple algorithm, we observed the performance of such… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: To appear in Post Proceedings of the Competitions & Demonstrations Track @ NeurIPS2019. Source code available at https://github.com/Miffyli/minecraft-bc

  23. arXiv:2004.00981  [pdf, other

    cs.AI

    Benchmarking End-to-End Behavioural Cloning on Video Games

    Authors: Anssi Kanervisto, Joonas Pussinen, Ville Hautamäki

    Abstract: Behavioural cloning, where a computer is taught to perform a task based on demonstrations, has been successfully applied to various video games and robotics tasks, with and without reinforcement learning. This also includes end-to-end approaches, where a computer plays a video game like humans do: by looking at the image displayed on the screen, and sending keystrokes to the game. As a general app… ▽ More

    Submitted 18 May, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: To appear in IEEE Conference on Games 2020. Experiment code available at https://github.com/joonaspu/video-game-behavioural-cloning and https://github.com/joonaspu/ViControl

  24. arXiv:2004.00980  [pdf, other

    cs.AI

    Action Space Sha** in Deep Reinforcement Learning

    Authors: Anssi Kanervisto, Christian Scheller, Ville Hautamäki

    Abstract: Reinforcement learning (RL) has been successful in training agents in various learning environments, including video-games. However, such work modifies and shrinks the action space from the game's original. This is to avoid trying "pointless" actions and to ease the implementation. Currently, this is mostly done based on intuition, with little systematic research supporting the design decisions. I… ▽ More

    Submitted 26 May, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: To appear in IEEE Conference on Games 2020. Experiment code is available at https://github.com/Miffyli/rl-action-space-sha**

  25. arXiv:2002.03801  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning

    Authors: Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

    Abstract: The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other. These systems can be combined, for example, into a cascaded system where CM produces first a decision whether the input is synthetic or bona fide speech. In case the CM decides it is a bona fide sample, then the ASV system will consider it for speaker verification. En… ▽ More

    Submitted 8 April, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: Odyssey 2020 The Speaker and Language Recognition Workshop. Code available at https://github.com/Miffyli/asv-cm-reinforce

  26. arXiv:1907.03164  [pdf, other

    cs.LG eess.AS stat.ML

    Towards Debugging Deep Neural Networks by Generating Speech Utterances

    Authors: Bilal Soomro, Anssi Kanervisto, Trung Ngo Trong, Ville Hautamäki

    Abstract: Deep neural networks (DNN) are able to successfully process and classify speech utterances. However, understanding the reason behind a classification by DNN is difficult. One such debugging method used with image classification DNNs is activation maximization, which generates example-images that are classified as one of the classes. In this work, we evaluate applicability of this method to speech… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

    Comments: Accepted to Interspeech 2019

  27. arXiv:1905.04192  [pdf, other

    cs.LG cs.AI cs.SD stat.ML

    Do Autonomous Agents Benefit from Hearing?

    Authors: Abraham Woubie, Anssi Kanervisto, Janne Karttunen, Ville Hautamaki

    Abstract: Map** states to actions in deep reinforcement learning is mainly based on visual information. The commonly used approach for dealing with visual information is to extract pixels from images and use them as state representation for reinforcement learning agent. But, any vision only agent is handicapped by not being able to sense audible cues. Using hearing, animals are able to sense targets that… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

  28. arXiv:1905.00741  [pdf, other

    cs.LG cs.AI cs.RO

    From Video Game to Real Robot: The Transfer between Action Spaces

    Authors: Janne Karttunen, Anssi Kanervisto, Ville Kyrki, Ville Hautamäki

    Abstract: Deep reinforcement learning has proven to be successful for learning tasks in simulated environments, but applying same techniques for robots in real-world domain is more challenging, as they require hours of training. To address this, transfer learning can be used to train the policy first in a simulated environment and then transfer it to physical agent. As the simulation never matches reality p… ▽ More

    Submitted 23 March, 2020; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: Two first authors contributed equally. Accepted by ICASSP 2020

  29. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, **g Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  30. arXiv:1811.03293  [pdf, other

    eess.AS cs.SD

    Who Do I Sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search

    Authors: Ville Vestman, Bilal Soomro, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen

    Abstract: The popularization of science can often be disregarded by scientists as it may be challenging to put highly sophisticated research into words that general public can understand. This work aims to help presenting speaker recognition research to public by proposing a publicly appealing concept for showcasing recognition systems. We leverage data from YouTube and use it in a large-scale voice search… ▽ More

    Submitted 10 February, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Accepted for presentation in ICASSP 2019

  31. arXiv:1807.10110  [pdf, other

    cs.AI cs.LG

    ToriLLE: Learning Environment for Hand-to-Hand Combat

    Authors: Anssi Kanervisto, Ville Hautamäki

    Abstract: We present Toribash Learning Environment (ToriLLE), a learning environment for machine learning agents based on the video game Toribash. Toribash is a MuJoCo-like environment of two humanoid character fighting each other hand-to-hand, controlled by changing actuation modes of the joints. Competitive nature of Toribash as well its focused domain provide a platform for evaluating self-play methods,… ▽ More

    Submitted 4 June, 2019; v1 submitted 26 July, 2018; originally announced July 2018.

    Comments: https://github.com/Miffyli/ToriLLE . Accepted to IEEE Conference on Games 2019

  32. arXiv:1804.11067  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    Staircase Network: structural language identification via hierarchical attentive units

    Authors: Trung Ngo Trong, Ville Hautamäki, Kristiina Jokinen

    Abstract: Language recognition system is typically trained directly to optimize classification error on the target language labels, without using the external, or meta-information in the estimation of the model parameters. However labels are not independent of each other, there is a dependency enforced by, for example, the language family, which affects negatively on classification. The other external infor… ▽ More

    Submitted 30 April, 2018; originally announced April 2018.

  33. arXiv:1804.08910  [pdf, other

    cs.SD cs.CY eess.AS

    Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification

    Authors: Rosa González Hautamäki, Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen

    Abstract: Voice disguise, purposeful modification of one's speaker identity with the aim of avoiding being identified as oneself, is a low-effort way to fool speaker recognition, whether performed by a human or an automatic speaker verification (ASV) system. We present an evaluation of the effectiveness of age stereotypes as a voice disguise strategy, as a follow up to our recent work where 60 native Finnis… ▽ More

    Submitted 28 May, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: Accepted to Speaker Odyssey 2018: The Speaker and Language Recognition Workshop

  34. arXiv:1602.01929  [pdf, other

    cs.CL

    Fantastic 4 system for NIST 2015 Language Recognition Evaluation

    Authors: Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

    Abstract: This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Université du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE). The submitted system is a fusion of nine sub-systems based on i-vectors extracted from different types of features… ▽ More

    Submitted 5 February, 2016; originally announced February 2016.

    Comments: Technical report for NIST LRE 2015 Workshop