Skip to main content

Showing 1–29 of 29 results for author: Hsieh, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.07882  [pdf, other

    cs.SD eess.AS

    On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

    Authors: Tsun-An Hsieh, Jacob Donley, Daniel Wong, Buye Xu, Ashutosh Pandey

    Abstract: We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural Wiener filter (NWF). The first DNN enhances the speech signal to estimate NWF coefficients, while the second DNN refines the output from the NWF. The NWF, while… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at ICASSP

  2. arXiv:2312.15320  [pdf

    q-bio.QM cs.CV cs.LG cs.MM q-bio.GN

    GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts

    Authors: Da Wu, **gye Yang, Cong Liu, Tzung-Chien Hsieh, Elaine Marchi, Justin Blair, Peter Krawitz, Chunhua Weng, Wendy Chung, Gholson J. Lyon, Ian D. Krantz, Jennifer M. Kalish, Kai Wang

    Abstract: Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artifi… ▽ More

    Submitted 21 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Significant revisions

  3. arXiv:2312.01644  [pdf

    eess.IV cs.CV

    TMSR: Tiny Multi-path CNNs for Super Resolution

    Authors: Chia-Hung Liu, Tzu-Hsin Hsieh, Kuan-Yu Huang, Pei-Yin Chen

    Abstract: In this paper, we proposed a tiny multi-path CNN-based Super-Resolution (SR) method, called TMSR. We mainly refer to some tiny CNN-based SR methods, under 5k parameters. The main contribution of the proposed method is the improved multi-path learning and self-defined activated function. The experimental results show that TMSR obtains competitive image quality (i.e. PSNR and SSIM) compared to the r… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 5 pages, 7 figures, published in the IEEE Eurasia Conference on IoT, Communication and Engineering proceedings 2023

  4. arXiv:2311.11783  [pdf

    cs.HC cs.MM

    CityScope: Enhanced Localozation and Synchronizing AR for Dynamic Urban Weather Visualization

    Authors: Tzu Hsin Hsieh

    Abstract: CityScope uses augmented reality (AR) to change our interaction with weather data. The main goal is to develop real-time 3D weather visualizations, with Taiwan as the model. It displays live weather data from the Central Weather Bureau (CWB), projected onto a physical representation of Taiwan's landscape. A pivotal advancement in our project is the integration of AprilTag with plane detection tech… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 9 pages, 15 figures

  5. arXiv:2307.10490  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

    Authors: Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov

    Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and… ▽ More

    Submitted 3 October, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  6. arXiv:2305.02143  [pdf, other

    cs.CV cs.MM

    GANonymization: A GAN-based Face Anonymization Framework for Preserving Emotional Expressions

    Authors: Fabio Hellmann, Silvan Mertes, Mohamed Benouis, Alexander Hustinx, Tzung-Chien Hsieh, Cristina Conati, Peter Krawitz, Elisabeth André

    Abstract: In recent years, the increasing availability of personal data has raised concerns regarding privacy and security. One of the critical processes to address these concerns is data anonymization, which aims to protect individual privacy and prevent the release of sensitive information. This research focuses on the importance of face anonymization. Therefore, we introduce GANonymization, a novel face… ▽ More

    Submitted 14 November, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: 26 pages, 11 figures, 6 tables, ACM Transactions on Multimedia Computing, Communications, and Applications

  7. arXiv:2211.06764  [pdf, other

    cs.CV cs.AI q-bio.GN

    Improving Deep Facial Phenoty** for Ultra-rare Disorder Verification Using Model Ensembles

    Authors: Alexander Hustinx, Fabio Hellmann, Ömer Sümer, Behnam Javanmardi, Elisabeth André, Peter Krawitz, Tzung-Chien Hsieh

    Abstract: Rare genetic disorders affect more than 6% of the global population. Reaching a diagnosis is challenging because rare disorders are very diverse. Many disorders have recognizable facial features that are hints for clinicians to diagnose patients. Previous work, such as GestaltMatcher, utilized representation vectors produced by a DCNN similar to AlexNet to match patients in high-dimensional featur… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    Journal ref: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

  8. arXiv:2211.01189  [pdf, other

    eess.AS cs.AI cs.LG cs.NE cs.SD

    Inference and Denoise: Causal Inference-based Neural Speech Enhancement

    Authors: Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

    Abstract: This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. Based on the potential outcome framework, the proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement module… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  9. Few-Shot Meta Learning for Recognizing Facial Phenotypes of Genetic Disorders

    Authors: Ömer Sümer, Fabio Hellmann, Alexander Hustinx, Tzung-Chien Hsieh, Elisabeth André, Peter Krawitz

    Abstract: Computer vision-based methods have valuable use cases in precision medicine, and recognizing facial phenotypes of genetic disorders is one of them. Many genetic disorders are known to affect faces' visual appearance and geometry. Automated classification and similarity retrieval aid physicians in decision-making to diagnose possible genetic conditions as early as possible. Previous work has addres… ▽ More

    Submitted 24 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: This paper is accepted for publication at MIE 2023 Conference

  10. arXiv:2202.09907  [pdf, other

    cs.SD eess.AS

    towards automatic transcription of polyphonic electric guitar music:a new dataset and a multi-loss transformer model

    Authors: Yu-Hua Chen, Wen-Yi Hsiao, Tsu-Kuang Hsieh, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: In this paper, we propose a new dataset named EGDB, that con-tains transcriptions of the electric guitar performance of 240 tab-latures rendered with different tones. Moreover, we benchmark theperformance of two well-known transcription models proposed orig-inally for the piano on this dataset, along with a multi-loss Trans-former model that we newly propose. Our evaluation on this datasetand a se… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

    Comments: to be published at ICASSP 2022

  11. arXiv:2201.09208  [pdf

    cs.CV eess.SP

    Design of Sensor Fusion Driver Assistance System for Active Pedestrian Safety

    Authors: I-Hsi Kao, Ya-Zhu Yian, Jian-An Su, Yi-Horng Lai, Jau-Woei Perng, Tung-Li Hsieh, Yi-Shueh Tsai, Min-Shiu Hsieh

    Abstract: In this paper, we present a parallel architecture for a sensor fusion detection system that combines a camera and 1D light detection and ranging (lidar) sensor for object detection. The system contains two object detection methods, one based on an optical flow, and the other using lidar. The two sensors can effectively complement the defects of the other. The accurate longitudinal accuracy of the… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

    Comments: The 14th International Conference on Automation Technology (Automation 2017), December 8-10, 2017, Kaohsiung, Taiwan

  12. arXiv:2111.05703  [pdf, other

    eess.AS cs.SD

    OSSEM: one-shot speaker adaptive speech enhancement using meta learning

    Authors: Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

    Abstract: Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers. In this study, we propose a novel meta-learning-based speaker-adaptive SE approach (called OSSEM) that aims to achieve SE model adaptation in a one-shot manner. OSSEM consists of a modified tra… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  13. arXiv:2106.05229  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Recovery for Real-World Self-powered Intermittent Devices

    Authors: Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

    Abstract: The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications. Although many researches have been proposed to address this issue, they controlled the data missing conditions by simulation with self-defined masking lengths or sizes. Besides, the masking definitions are different among all these experimental settings. This paper presen… ▽ More

    Submitted 24 January, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  14. arXiv:2104.06402  [pdf, other

    cs.CV

    DropLoss for Long-Tail Instance Segmentation

    Authors: Ting-I Hsieh, Esther Robb, Hwann-Tzong Chen, Jia-Bin Huang

    Abstract: Long-tailed class distributions are prevalent among the practical applications of object detection and instance segmentation. Prior work in long-tail instance segmentation addresses the imbalance of losses between rare and frequent categories by reducing the penalty for a model incorrectly predicting a rare class label. We demonstrate that the rare categories are heavily suppressed by correct back… ▽ More

    Submitted 17 April, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Code at https://github.com/timy90022/DropLoss

    Journal ref: AAAI 2021

  15. arXiv:2104.03538  [pdf

    cs.SD cs.AI eess.AS

    MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

    Authors: Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

    Abstract: The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discr… ▽ More

    Submitted 4 June, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Accepted by Interspeech 2021

  16. arXiv:2011.11631  [pdf, ps, other

    cs.LG

    Explainable Multivariate Time Series Classification: A Deep Neural Network Which Learns To Attend To Important Variables As Well As Informative Time Intervals

    Authors: Tsung-Yu Hsieh, Suhang Wang, Yiwei Sun, Vasant Honavar

    Abstract: Time series data is prevalent in a wide variety of real-world applications and it calls for trustworthy and explainable models for people to understand and fully trust decisions made by AI solutions. We consider the problem of building explainable classifiers from multi-variate time series data. A key criterion to understand such predictive models involves elucidating and quantifying the contribut… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

  17. arXiv:2010.15174  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

    Authors: Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

    Abstract: Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e.g. phones and syllables. In this study, we propose a novel phone-fortified perceptual loss (PFPL) that takes phonetic information into account for training SE models. To effectively incorporate the phonetic information… ▽ More

    Submitted 27 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

  18. arXiv:2006.10296  [pdf

    eess.AS cs.LG cs.SD

    Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

    Authors: Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-** Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

    Abstract: The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To fur… ▽ More

    Submitted 3 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by APSIPA 2020

  19. arXiv:2004.04098  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

    Authors: Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

    Abstract: Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the locality and temporal sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered or are too co… ▽ More

    Submitted 26 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  20. arXiv:2002.06817  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Addressing the confounds of accompaniments in singer identification

    Authors: Tsung-Han Hsieh, Kai-Hsiang Cheng, Zhe-Cheng Fan, Yu-Ching Yang, Yi-Hsuan Yang

    Abstract: Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer on… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  21. arXiv:1911.12529  [pdf, other

    cs.CV cs.LG eess.IV

    One-Shot Object Detection with Co-Attention and Co-Excitation

    Authors: Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, Tyng-Luh Liu

    Abstract: This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: NeurIPS 2019

  22. arXiv:1909.06543  [pdf, other

    cs.LG cs.CR cs.SI stat.ML

    Node Injection Attacks on Graphs via Reinforcement Learning

    Authors: Yiwei Sun, Suhang Wang, Xianfeng Tang, Tsung-Yu Hsieh, Vasant Honavar

    Abstract: Real-world graph applications, such as advertisements and product recommendations make profits based on accurately classify the label of the nodes. However, in such scenarios, there are high incentives for the adversaries to attack such graph to reduce the node classification performance. Previous work on graph adversarial attacks focus on modifying existing graph structures, which is infeasible i… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: Preprint, under review

  23. arXiv:1909.01084  [pdf, other

    cs.SI cs.LG

    MEGAN: A Generative Adversarial Network for Multi-View Network Embedding

    Authors: Yiwei Sun, Suhang Wang, Tsung-Yu Hsieh, Xianfeng Tang, Vasant Honavar

    Abstract: Data from many real-world applications can be naturally represented by multi-view networks where the different views encode different types of relationships (e.g., friendship, shared interests in music, etc.) between real-world individuals or entities. There is an urgent need for methods to obtain low-dimensional, information preserving and typically nonlinear embeddings of such multi-view network… ▽ More

    Submitted 19 August, 2019; originally announced September 2019.

    Comments: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19

  24. arXiv:1906.02772  [pdf

    eess.SP cs.LG

    Adaptive Subspace Sampling for Class Imbalance Processing-Some clarifications, algorithm, and further investigation including applications to Brain Computer Interface

    Authors: Chin-Teng Lin, Kuan-Chih Huang, Yu-Ting Liu, Yang-Yin Lin, Tsung-Yu Hsieh, Nikhil R. Pal, Shang-Lin Wu, Chieh-Ning Fang, Zehong Cao

    Abstract: Kohonen's Adaptive Subspace Self-Organizing Map (ASSOM) learns several subspaces of the data where each subspace represents some invariant characteristics of the data. To deal with the imbalance classification problem, earlier we have proposed a method for oversampling the minority class using Kohonen's ASSOM. This investigation extends that study, clarifies some issues related to our earlier work… ▽ More

    Submitted 7 October, 2020; v1 submitted 26 May, 2019; originally announced June 2019.

    Comments: The current version is accepted by iFuzzy 2020

  25. arXiv:1811.02616  [pdf, ps, other

    cs.LG cs.SI

    Multi-View Network Embedding Via Graph Factorization Clustering and Co-Regularized Multi-View Agreement

    Authors: Yiwei Sun, Ngot Bui, Tsung-Yu Hsieh, Vasant Honavar

    Abstract: Real-world social networks and digital platforms are comprised of individuals (nodes) that are linked to other individuals or entities through multiple types of relationships (links). Sub-networks of such a network based on each type of link correspond to distinct views of the underlying network. In real-world applications, each node is typically linked to only a small subset of other nodes. Hence… ▽ More

    Submitted 18 February, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: ICDMW2018 -- IEEE International Conference on Data Mining workshop on Graph Analytics

  26. arXiv:1810.12947  [pdf, other

    eess.AS cs.SD

    A Streamlined Encoder/Decoder Architecture for Melody Extraction

    Authors: Tsung-Han Hsieh, Li Su, Yi-Hsuan Yang

    Abstract: Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixel-wise segmentation, we pass through the pooling indices between pooling and un-pooling layers to lo… ▽ More

    Submitted 18 February, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: This is a pre-print version of an ICASSP 2019 paper

  27. arXiv:1809.01225  [pdf, ps, other

    cs.LG cs.CC stat.ML

    Compositional Stochastic Average Gradient for Machine Learning and Related Applications

    Authors: Tsung-Yu Hsieh, Yasser EL-Manzalawy, Yiwei Sun, Vasant Honavar

    Abstract: Many machine learning, statistical inference, and portfolio optimization problems require minimization of a composition of expected value functions (CEVF). Of particular interest is the finite-sum versions of such compositional optimization problems (FS-CEVF). Compositional stochastic variance reduced gradient (C-SVRG) methods that combine stochastic compositional gradient descent (SCGD) and stoch… ▽ More

    Submitted 7 September, 2018; v1 submitted 4 September, 2018; originally announced September 2018.

  28. arXiv:1710.06842  [pdf

    cs.CY

    Measuring the unmeasurable - a project of domestic violence risk prediction and management

    Authors: Ya-Yun Chen, Chia-Kai Liu, Yu-Hsiu Wang, Sue-Chuan Chen, Yi-Shan Hsieh, **g-Tai Ke, T. C. Hsieh

    Abstract: The prevention of domestic violence (DV) have aroused serious concerns in Taiwan because of the disparity between the increasing amount of reported DV cases that doubled over the past decade and the scarcity of social workers. Additionally, a large amount of data was collected when social workers use the predominant case management approach to document case reports information. However, these data… ▽ More

    Submitted 18 October, 2017; originally announced October 2017.

    Comments: Presented at the Data For Good Exchange 2017

  29. arXiv:0804.4749  [pdf

    cs.RO

    Study of improving nano-contouring performance by employing cross-coupling controller

    Authors: Wen Yuh Jywe, Shih Shin Chen, Hung-Shu Wang, Chien Hung Liu, Hsin Hung Jwo, Yun Feng Teng, Tung Hsien Hsieh

    Abstract: For the tracking stage path planning, we design a two-axis cross-coupling control system which uses the PI controller to compensate the contour error between axes. In this paper, the stage adoptive is designed by our laboratory (Precision Machine Center of National Formosa University). The cross-coupling controller calculates the actuating signal of each axis by combining multi-axes position err… ▽ More

    Submitted 30 April, 2008; originally announced April 2008.

    Comments: Uploaded by ICIUS2007 Conference Organizer on behalf of the author(s). 6 pages, 8 figures

    ACM Class: B.1.2

    Journal ref: Proceedings of the International Conference on Intelligent Unmanned System (ICIUS 2007), Bali, Indonesia, October 24-25, 2007, Paper No. ICIUS2007-C002