Skip to main content

Showing 1–28 of 28 results for author: Tan, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00826  [pdf, other

    cs.CL cs.SD eess.AS

    NAIST Simultaneous Speech Translation System for IWSLT 2024

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding poli… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: IWSLT 2024 system paper

  2. arXiv:2406.12164  [pdf, other

    cs.SD cs.AI eess.AS

    A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

    Authors: Guoqiang Hu, Huaning Tan, Ruilai Li

    Abstract: Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrog… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2404.17357  [pdf, other

    eess.IV cs.CV

    Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model

    Authors: Yushen Xu, Xiaosong Li, Yuchan Jie, Haishu Tan

    Abstract: In clinical practice, tri-modal medical image fusion, compared to the existing dual-modal technique, can provide a more comprehensive view of the lesions, aiding physicians in evaluating the disease's shape, location, and biological activity. However, due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited, leading to sub-… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  4. arXiv:2404.17126  [pdf, other

    cs.LG cs.AI eess.IV physics.med-ph

    Deep Evidential Learning for Dose Prediction

    Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

    Abstract: In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of n… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 24 pages, 8 figures

  5. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2403.10024  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage

    Authors: Hao Hao Tan, Kin Wai Cheuk, Taemin Cho, Wei-Hsiang Liao, Yuki Mitsufuji

    Abstract: This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, a… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  7. arXiv:2401.03173  [pdf, other

    eess.IV cs.CV cs.LG

    UGGNet: Bridging U-Net and VGG for Advanced Breast Cancer Diagnosis

    Authors: Tran Cao Minh, Nguyen Kim Quoc, Phan Cong Vinh, Dang Nhu Phu, Vuong Xuan Chi, Ha Minh Tan

    Abstract: In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the p… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: Submitted to the journal "EAI Endorsed Transactions on Context-aware Systems and Applications" ,2 images, 5 data tables

    Journal ref: EAI Endorsed Transactions on Contex-aware Systems and Applications, 10(1), 2024

  8. arXiv:2311.06572  [pdf, other

    eess.IV cs.CV

    Swin UNETR++: Advancing Transformer-Based Dense Dose Prediction Towards Fully Automated Radiation Oncology Treatments

    Authors: Kuancheng Wang, Hai Siong Tan, Rafe Mcbeth

    Abstract: The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense predic… ▽ More

    Submitted 17 March, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 16 pages

  9. arXiv:2311.00940  [pdf, other

    eess.SY

    Dynamic Uploading Scheduling in mmWave-Based Sensor Networks via Mobile Blocker Detection

    Authors: Yifei Sun, Bojie Lv, Rui Wang, Haisheng Tan, Francis C. M. Lau

    Abstract: The freshness of information, measured as Age of Information (AoI), is critical for many applications in next-generation wireless sensor networks (WSNs). Due to its high bandwidth, millimeter wave (mmWave) communication is seen to be frequently exploited in WSNs to facilitate the deployment of bandwidth-demanding applications. However, the vulnerability of mmWave to user mobility typically results… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures, accepted for publication on ICPADS23

  10. arXiv:2308.11162  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    A Preliminary Investigation into Search and Matching for Tumour Discrimination in WHO Breast Taxonomy Using Deep Networks

    Authors: Abubakr Shafique, Ricardo Gonzalez, Liron Pantanowitz, Puay Hoon Tan, Alberto Machado, Ian A Cree, Hamid R. Tizhoosh

    Abstract: Breast cancer is one of the most common cancers affecting women worldwide. They include a group of malignant neoplasms with a variety of biological, clinical, and histopathological characteristics. There are more than 35 different histological forms of breast lesions that can be classified and diagnosed histologically according to cell morphology, growth, and architecture patterns. Recently, deep… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  11. arXiv:2303.16734  [pdf, other

    eess.SY

    Predictive Resource Allocation in mmWave Systems with Rotation Detection

    Authors: Yifei Sun, Bojie Lv, Rui Wang, Haisheng Tan, Francis C. M. Lau

    Abstract: Millimeter wave (MmWave) has been regarded as a promising technology to support high-capacity communications in 5G era. However, its high-layer performance such as latency and packet drop rate in the long term highly depends on resource allocation because mmWave channel suffers significant fluctuation with rotating users due to mmWave sparse channel property and limited field-of-view (FoV) of ante… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: 7 pages, 5 figures. Paper accepted for publication in IEEE International Conference on Communications, 2023

  12. arXiv:2212.09988  [pdf, other

    cs.CV eess.IV

    Multi-Reference Image Super-Resolution: A Posterior Fusion Approach

    Authors: Ke Zhao, Haining Tan, Tsz Fung Yau

    Abstract: Reference-based Super-resolution (RefSR) approaches have recently been proposed to overcome the ill-posed problem of image super-resolution by providing additional information from a high-resolution image. Multi-reference super-resolution extends this approach by allowing more information to be incorporated. This paper proposes a 2-step-weighting posterior fusion approach to combine the outputs of… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  13. arXiv:2208.02250  [pdf

    cs.SD cs.AI cs.CL cs.CR eess.AS

    Adversarial Attacks on ASR Systems: An Overview

    Authors: Xiao Zhang, Hao Tan, Xuan Huang, Denghui Zhang, Keke Tang, Zhaoquan Gu

    Abstract: With the development of hardware and algorithms, ASR(Automatic Speech Recognition) systems evolve a lot. As The models get simpler, the difficulty of development and deployment become easier, ASR systems are getting closer to our life. On the one hand, we often use APPs or APIs of ASR to generate subtitles and record meetings. On the other hand, smart speaker and self-driving car rely on ASR syste… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

  14. An Indoor Environment Sensing and Localization System via mmWave Phased Array

    Authors: Yifei Sun, Jie Li, Tong Zhang, Rui Wang, Xiaohui Peng, Tony Xiao Han, Haisheng Tan

    Abstract: An indoor layout sensing and localization system in 60GHz millimeter wave (mmWave) band, named mmReality, is elaborated in this paper. The mmReality system consists of one transmitter and one mobile receiver, each with a phased array and a single radio frequency (RF) chain. To reconstruct the room layout, the pilot signal is delivered from the transmitter to the receiver via different pairs of tra… ▽ More

    Submitted 9 January, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: Paper accepted for publication in Journal of Communications and Information Networks, 2022

  15. arXiv:2201.01669  [pdf, other

    eess.AS cs.LG cs.SD

    Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

    Authors: Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel C. H. Tan, Jennifer Ranjani J., Jaclyn Xiao, Gunvant Chaudhari, Akanksha Rajput, Praveen Govindan, Christian Canham, Wei Chen, Minami Yamaura, Laura Gomezjurado, Aaron Broukhim, Amil Khanzada, Mert Pilanci

    Abstract: The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a soluti… ▽ More

    Submitted 29 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

  16. arXiv:2112.14574  [pdf

    eess.SY cs.CY cs.HC cs.RO

    Industry 4.0: Challenges and success factors for adopting digital technologies in airports

    Authors: Jia Hao Tan, Tariq Masood

    Abstract: With the advent of Industry 4.0 technologies in the last decade, airports have undergone digitalisation to capitalise on the purported benefits of these technologies such as improved operational efficiency and passenger experience. The ongoing COVID-19 pandemic with emergence of its variants (e.g. Delta, Omicron) has exacerbated the need for airports to adopt new technologies such as contactless a… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 25 pages, 4 figures, 9 tables

  17. arXiv:2112.14333  [pdf

    eess.SY cs.HC cs.RO

    Adoption of Industry 4.0 technologies in airports -- A systematic literature review

    Authors: Jia Hao Tan, Tariq Masood

    Abstract: Airports have been constantly evolving and adopting digital technologies to improve operational efficiency, enhance passenger experience, generate ancillary revenues and boost capacity from existing infrastructure. The COVID-19 pandemic has also challenged airports and aviation stakeholders alike to adapt and manage new operational challenges such as facilitating a contactless travel experience an… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

    Comments: 25 pages, 2 figures, 2 tables, 106 references

  18. arXiv:2112.00702  [pdf, other

    cs.SD cs.LG eess.AS

    Semi-supervised music emotion recognition using noisy student training and harmonic pitch class profiles

    Authors: Hao Hao Tan

    Abstract: We present Mirable's submission to the 2021 Emotions and Themes in Music challenge. In this work, we intend to address the question: can we leverage semi-supervised learning techniques on music emotion recognition? With that, we experiment with noisy student training, which has improved model performance in the image classification domain. As the noisy student method requires a strong teacher mode… ▽ More

    Submitted 9 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: MediaEval 2021 submission for Emotion and Themes in Music

  19. arXiv:2108.07007  [pdf, other

    cs.CV cs.HC cs.RO eess.IV

    Flying Guide Dog: Walkable Path Discovery for the Visually Impaired Utilizing Drones and Transformer-based Semantic Segmentation

    Authors: Haobin Tan, Chang Chen, Xinyu Luo, Jiaming Zhang, Constantin Seibold, Kailun Yang, Rainer Stiefelhagen

    Abstract: Lacking the ability to sense ambient environments effectively, blind and visually impaired people (BVIP) face difficulty in walking outdoors, especially in urban areas. Therefore, tools for assisting BVIP are of great importance. In this paper, we propose a novel "flying guide dog" prototype for BVIP assistance using drone and street view semantic segmentation. Based on the walkable areas extracte… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

    Comments: Code, dataset, and video demo will be made publicly available at https://github.com/EckoTan0804/flying-guide-dog

  20. arXiv:2102.08015  [pdf

    cs.SD cs.CL eess.AS

    Improving speech recognition models with small samples for air traffic control systems

    Authors: Yi Lin, Qin Li, Bo Yang, Zhen Yan, Huachun Tan, Zhengmao Chen

    Abstract: In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: This work has been accepted by Neurocomputing for publication

  21. arXiv:2012.04885  [pdf

    eess.IV cs.CV cs.LG

    Annotation-efficient deep learning for automatic medical image segmentation

    Authors: Shanshan Wang, Cheng Li, Rongpin Wang, Zaiyi Liu, Meiyun Wang, Hongna Tan, Ya** Wu, Xinfeng Liu, Hui Sun, Rui Yang, Xin Liu, Jie Chen, Huihui Zhou, Ismail Ben Ayed, Hairong Zheng

    Abstract: Automatic medical image segmentation plays a critical role in scientific research and medical care. Existing high-performance deep learning methods typically rely on large training datasets with high-quality manual annotations, which are difficult to obtain in many clinical applications. Here, we introduce Annotation-effIcient Deep lEarning (AIDE), an open-source framework to handle imperfect trai… ▽ More

    Submitted 23 September, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

  22. arXiv:2007.15474  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

    Authors: Hao Hao Tan, Dorien Herremans

    Abstract: High-level musical qualities (such as emotion) are often abstract, subjective, and hard to quantify. Given these difficulties, it is not easy to learn good feature representations with supervised learning techniques, either because of the insufficiency of labels, or the subjectiveness (and hence large variance) in human-annotated labels. In this paper, we present a framework that can learn high-le… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Journal ref: Proc. of 21st International Society of Music Information Retrieval Conference, ISMIR 2020

  23. arXiv:2006.09833  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

    Authors: Hao Hao Tan, Yin-Jyun Luo, Dorien Herremans

    Abstract: We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics. We demonstrate how the model is able to apply fine-grained style morphing over the course of syn… ▽ More

    Submitted 12 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Journal ref: Published at ICML Workshop on Machine Learning for Media Discovery Workshop (ML4MD) 2020

  24. arXiv:2004.00879  [pdf, other

    eess.SP cs.LG

    Enhance the performance of navigation: A two-stage machine learning approach

    Authors: Yimin Fan, Zhiyuan Wang, Yuanpeng Lin, Haisheng Tan

    Abstract: Real time traffic navigation is an important capability in smart transportation technologies, which has been extensively studied these years. Due to the vast development of edge devices, collecting real time traffic data is no longer a problem. However, real traffic navigation is still considered to be a particularly challenging problem because of the time-varying patterns of the traffic flow and… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: 8 pages, under review

  25. arXiv:2002.12588  [pdf, other

    eess.IV cs.CV cs.LG

    Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts

    Authors: Mahsa Paknezhad, Sheng Yang Michael Loh, Yukti Choudhury, Valerie Koh Cui Koh, TimothyTay Kwang Yong, Hui Shan Tan, Ravindran Kanesvaran, Puay Hoon Tan, John Yuen Shyi Peng, Weimiao Yu, Yongcheng Benjamin Tan, Yong Zhen Loy, Min-Han Tan, Hwee Kuan Lee

    Abstract: Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences wh… ▽ More

    Submitted 28 February, 2020; originally announced February 2020.

  26. arXiv:1911.12796  [pdf, other

    cs.CV cs.LG eess.IV

    Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation

    Authors: Shaokai Ye, Kailu Wu, Mu Zhou, Yunfei Yang, Sia huat Tan, Kaidi Xu, Jiebo Song, Chenglong Bao, Kaisheng Ma

    Abstract: Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data cal… ▽ More

    Submitted 28 February, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted by CVPR2020

  27. arXiv:1911.00364  [pdf, other

    eess.IV cs.CV

    Validation of a deep learning mammography model in a population with low screening rates

    Authors: Kevin Wu, Eric Wu, Ya** Wu, Hongna Tan, Greg Sorensen, Meiyun Wang, Bill Lotter

    Abstract: A key promise of AI applications in healthcare is in increasing access to quality medical care in under-served populations and emerging markets. However, deep learning models are often only trained on data from advantaged populations that have the infrastructure and resources required for large-scale data collection. In this paper, we aim to empirically investigate the potential impact of such bia… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

    Journal ref: NeurIPS 2019. Fair ML for Health Workshop

  28. arXiv:1810.12093  [pdf

    eess.SP

    80-Channel WDM-MDM Transmission over 50-km Ring-Core Fiber Using a Compact OAM DEMUX and Modular 4x4 MIMO Equalization

    Authors: Junwei Zhang, Yuanhui Wen, Heyun Tan, Jie Liu, Lei Shen, Maochun Wang, Jiangbo Zhu, Changjian Guo, Yujie Chen, Zhaohui Li, Siyuan Yu

    Abstract: 8-OAM modes each carrying 10 wavelengths with 2.56-Tbit/s aggregated capacity and 10.24-bit/s/Hz spectral efficiency have been transmitted over 50-km specially designed ring-core fiber, using a compact OAM mode sorter and only modular 4x4 MIMO equalization.

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: 3 pages,2 figures, conference