Skip to main content

Showing 1–29 of 29 results for author: Hung, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12204  [pdf, other

    stat.ML cs.LG math.OC

    An Optimal Transport Approach for Network Regression

    Authors: Alex G. Zalles, Kai M. Hung, Ann E. Finneran, Lydia Beaudrot, César A. Uribe

    Abstract: We study the problem of network regression, where one is interested in how the topology of a network changes as a function of Euclidean covariates. We build upon recent developments in generalized regression models on metric spaces based on Fréchet means and propose a network regression method using the Wasserstein metric. We show that when representing graphs as multivariate Gaussian distribution… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2405.18945  [pdf

    cs.CV cs.LG

    WTTFNet: A Weather-Time-Trajectory Fusion Network for Pedestrian Trajectory Prediction in Urban Complex

    Authors: Ho Chun Wu, Esther Hoi Shan Lau, Paul Yuen, Kevin Hung, John Kwok Tai Chui, Andrew Kwok Fai Lui

    Abstract: Pedestrian trajectory modelling in an urban complex is challenging because pedestrians can have many possible destinations, such as shops, escalators, and attractions. Moreover, weather and time-of-day may affect pedestrian behavior. In this paper, a new weather-time-trajectory fusion network (WTTFNet) is proposed to improve the performance of baseline deep neural network architecture. By incorpor… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 12 pages, 7 figures

  3. arXiv:2405.16545  [pdf, other

    cs.RO

    VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

    Authors: Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, Winston H. Hsu

    Abstract: We study reward models for long-horizon manipulation tasks by learning from action-free videos and language instructions, which we term the visual-instruction correlation (VIC) problem. Recent advancements in cross-modality modeling have highlighted the potential of reward modeling through visual and language correlations. However, existing VIC methods face challenges in learning rewards for long-… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2404.02388  [pdf, other

    cs.CV

    CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation

    Authors: Townim Faisal Chowdhury, Kewen Liao, Vu Minh Hieu Phan, Minh-Son To, Yutong Xie, Kevin Hung, David Ross, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

    Abstract: Deep Neural Networks (DNNs) are widely used for visual classification tasks, but their complex computation process and black-box nature hinder decision transparency and interpretability. Class activation maps (CAMs) and recent variants provide ways to visually explain the DNN decision-making process by displaying 'attention' heatmaps of the DNNs. Nevertheless, the CAM explanation only offers relat… ▽ More

    Submitted 4 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  5. arXiv:2404.01065  [pdf, other

    cs.CV

    T-Mamba: Frequency-Enhanced Gated Long-Range Dependency for Tooth 3D CBCT Segmentation

    Authors: **g Hao, Lei He, Kuo Feng Hung

    Abstract: Efficient tooth segmentation in three-dimensional (3D) imaging, critical for orthodontic diagnosis, remains challenging due to noise, low contrast, and artifacts in CBCT images. Both convolutional Neural Networks (CNNs) and transformers have emerged as popular architectures for image segmentation. However, their efficacy in handling long-range dependencies is limited due to inherent locality or co… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  6. arXiv:2402.16321  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

    Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

    Abstract: Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variatio… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  7. arXiv:2402.03860  [pdf, other

    cs.RO

    AED: Adaptable Error Detection for Few-shot Imitation Policy

    Authors: Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, Winston H. Hsu

    Abstract: We introduce a new task called Adaptable Error Detection (AED), which aims to identify behavior errors in few-shot imitation (FSI) policies based on visual observations in novel environments. The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsis… ▽ More

    Submitted 25 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  8. arXiv:2401.15282  [pdf, other

    cs.CV

    GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis

    Authors: **g Hao, Moyun Liu, Kuo Feng Hung

    Abstract: Detecting glass regions is a challenging task due to the ambiguity of their transparency and reflection properties. These transparent glasses share the visual appearance of both transmitted arbitrary background scenes and reflected objects, thus having no fixed patterns.Recent visual foundation models, which are trained on vast amounts of data, have manifested stunning performance in terms of imag… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 14 pages, 9 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2307.12018

  9. arXiv:2310.19211  [pdf, other

    cs.SI cs.LG

    Investigative Pattern Detection Framework for Counterterrorism

    Authors: Shashika R. Muramudalige, Benjamin W. K. Hung, Rosanne Libretti, Jytte Klausen, Anura P. Jayasumana

    Abstract: Law-enforcement investigations aimed at preventing attacks by violent extremists have become increasingly important for public safety. The problem is exacerbated by the massive data volumes that need to be scanned to identify complex behaviors of extremists and groups. Automated tools are required to extract information to respond queries from analysts, continually scan new information, integrate… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 9 pages, 4 figures

  10. arXiv:2307.12018  [pdf, other

    cs.CV

    GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models

    Authors: **g Hao, Moyun Liu, **rong Yang, Kuo Feng Hung

    Abstract: Detecting glass regions is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics. Current solutions in this field remain rooted in conventional deep learning paradigms, requiring the construction of annotated datasets and the design of network architectures. However, the evident drawback with these mainstream solutions lies in the time-consuming and… ▽ More

    Submitted 21 May, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

    Comments: 12 pages, 8 figures, 8 tables

  11. arXiv:2303.17388  [pdf, other

    cs.SE

    BPCE: A Prototype for Co-Evolution between Business Process Variants through Configurable Process Model

    Authors: Linyue Liu, Xi Guo, Chun Ouyang, Patrick C. K. Hung, Hong-Yu Zhang, Keqing He, Chen Mo, Zaiwen Feng

    Abstract: With the continuous development of business process management technology, the increasing business process models are usually owned by large enterprises. In large enterprises, different stakeholders may modify the same business process model. In order to better manage the changeability of processes, they adopt configurable business process models to manage process variants. However, the process va… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 18 pages , 11 figures

    MSC Class: 68N99 ACM Class: D.2.2

  12. arXiv:2303.09085  [pdf, other

    cs.LG

    Preoperative Prognosis Assessment of Lumbar Spinal Surgery for Low Back Pain and Sciatica Patients based on Multimodalities and Multimodal Learning

    Authors: Li-Chin Chen, Jung-Nien Lai, Hung-En Lin, Hsien-Te Chen, Kuo-Hsuan Hung, Yu Tsao

    Abstract: Low back pain (LBP) and sciatica may require surgical therapy when they are symptomatic of severe pain. However, there is no effective measures to evaluate the surgical outcomes in advance. This work combined elements of Eastern medicine and machine learning, and developed a preoperative assessment tool to predict the prognosis of lumbar spinal surgery in LBP and sciatica patients. Standard operat… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  13. Self-supervised learning-based general laboratory progress pretrained model for cardiovascular event detection

    Authors: Li-Chin Chen, Kuo-Hsuan Hung, Yi-Ju Tseng, Hsin-Yao Wang, Tse-Min Lu, Wei-Chieh Huang, Yu Tsao

    Abstract: The inherent nature of patient data poses several challenges. Prevalent cases amass substantial longitudinal data owing to their patient volume and consistent follow-ups, however, longitudinal laboratory data are renowned for their irregularity, temporality, absenteeism, and sparsity; In contrast, recruitment for rare or specific cases is often constrained due to their limited patient size and epi… ▽ More

    Submitted 7 September, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: published in IEEE Journal of Translational Engineering in Health & Medicine

    Journal ref: IEEE Journal of Translational Engineering in Health and Medicine, vol.12, p.43-56, 2023

  14. arXiv:2302.01798  [pdf, other

    cs.LG

    Interpretations of Domain Adaptations via Layer Variational Analysis

    Authors: Huan-Hsin Tseng, Hsin-Yi Lin, Kuo-Hsuan Hung, Yu Tsao

    Abstract: Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with c… ▽ More

    Submitted 9 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Published at ICLR 2023

  15. arXiv:2211.09401  [pdf, other

    cs.CL

    Open-Domain Conversational Question Answering with Historical Answers

    Authors: Hung-Chieh Fang, Kuo-Han Hung, Chao-Wei Huang, Yun-Nung Chen

    Abstract: Open-domain conversational question answering can be viewed as two tasks: passage retrieval and conversational question answering, where the former relies on selecting candidate passages from a large corpus and the latter requires better understanding of a question with contexts to predict the answers. This paper proposes ConvADR-QA that leverages historical answers to boost retrieval performance… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: AACL-IJCNLP 2022

  16. arXiv:2210.17456  [pdf, other

    eess.AS cs.SD

    Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

    Authors: I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

    Abstract: AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda… ▽ More

    Submitted 31 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: ICASSP AMHAT 2023

  17. arXiv:2206.12592  [pdf, other

    cs.CV

    Asymmetric Transfer Hashing with Adaptive Bipartite Graph Learning

    Authors: Jianglin Lu, Jie Zhou, Yudong Chen, Witold Pedrycz, Kwok-Wai Hung

    Abstract: Thanks to the efficient retrieval speed and low storage consumption, learning to hash has been widely used in visual retrieval tasks. However, existing hashing methods assume that the query and retrieval samples lie in homogeneous feature space within the same domain. As a result, they cannot be directly applied to heterogeneous cross-domain retrieval. In this paper, we propose a Generalized Image… ▽ More

    Submitted 27 December, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

  18. arXiv:2202.06684  [pdf, other

    eess.AS cs.LG cs.SD

    Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

    Authors: Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

    Abstract: The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion m… ▽ More

    Submitted 15 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Submitted to ICASSP 2022

  19. arXiv:2110.05866  [pdf

    cs.SD cs.CL eess.AS

    MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

    Authors: Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

    Abstract: Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training. Consequently, several noisy speeches recorded in daily life cannot be used to train the model. Although certain unsupervised learning frameworks have also been proposed to solve the pair constraint, they still require clean s… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  20. arXiv:2106.05229  [pdf, other

    cs.SD cs.LG eess.AS

    Speech Recovery for Real-World Self-powered Intermittent Devices

    Authors: Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

    Abstract: The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications. Although many researches have been proposed to address this issue, they controlled the data missing conditions by simulation with self-defined masking lengths or sizes. Besides, the masking definitions are different among all these experimental settings. This paper presen… ▽ More

    Submitted 24 January, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  21. arXiv:2104.06274  [pdf, other

    cs.DC

    Optimal Data Placement for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments

    Authors: Xin Du, Songtao Tang, Zhihui Lu, Keke Gai, Jie Wu, Patrick C. K. Hung

    Abstract: The heterogeneous edge-cloud computing paradigm can provide a more optimal direction to deploy scientific workflows than traditional distributed computing or cloud computing environments. Due to the different sizes of scientific datasets and some of these datasets must keep private, it is still a difficult problem to finding an data placement strategy that can minimize data transmission as well as… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  22. arXiv:2102.03786  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

    Authors: Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

    Abstract: Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-t… ▽ More

    Submitted 9 June, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

  23. arXiv:2012.03426  [pdf

    eess.SP cs.LG

    Deep Learning Based Signal Enhancement of Low-Resolution Accelerometer for Fall Detection Systems

    Authors: Kai-Chun Liu, Kuo-Hsuan Hung, Chia-Yeh Hsieh, Hsiang-Yun Huang, Chia-Tai Chan, Yu Tsao

    Abstract: In the last two decades, fall detection (FD) systems have been developed as a popular assistive technology. Such systems automatically detect critical fall events and immediately alert medical professionals or caregivers. To support long-term FD services, various power-saving strategies have been implemented. Among them, a reduced sampling rate is a common approach for an energy-efficient system i… ▽ More

    Submitted 27 September, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

    Comments: Accepted by IEEE Transactions on Cognitive and Developmental Systems, 12 pages, 7 figures, 8 tables

  24. arXiv:2008.09264  [pdf, other

    eess.AS cs.LG cs.SD

    CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

    Authors: Yu-Wen Chen, Kuo-Hsuan Hung, You-** Li, Alexander Chao-Fu Kang, Ya-Hsin Lai, Kai-Chun Liu, Szu-Wei Fu, Syu-Siang Wang, Yu Tsao

    Abstract: This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, a… ▽ More

    Submitted 25 April, 2022; v1 submitted 20 August, 2020; originally announced August 2020.

  25. arXiv:2006.10296  [pdf

    eess.AS cs.LG cs.SD

    Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

    Authors: Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-** Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

    Abstract: The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications. Therefore, our study applies a modified Transformer in a speech enhancement task. Specifically, positional encoding in the Transformer may not be necessary for speech enhancement, and hence, it is replaced by convolutional layers. To fur… ▽ More

    Submitted 3 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by APSIPA 2020

  26. arXiv:1911.09847  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

    Authors: Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

    Abstract: Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while ma… ▽ More

    Submitted 17 June, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    Comments: multi-modal, bone/air-conducted signals, speech enhancement, fully convolutional network

    Journal ref: IEEE Signal Processing Letters, vol. 27, pp. 1035-1039, 2020

  27. arXiv:1903.07821  [pdf

    cs.LG stat.ML

    POP-CNN: Predicting Odor's Pleasantness with Convolutional Neural Network

    Authors: Danli Wu, Yu Cheng, Dehan Luo, Kin-Yeung Wong, Kevin Hung, Zhi**g Yang

    Abstract: Predicting odor's pleasantness simplifies the evaluation of odors and has the potential to be applied in perfumes and environmental monitoring industry. Classical algorithms for predicting odor's pleasantness generally use a manual feature extractor and an independent classifier. Manual designing a good feature extractor depend on expert knowledge and experience is the key to the accuracy of the a… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

  28. Deep Nearest Class Mean Model for Incremental Odor Classification

    Authors: Yu Cheng, Angus Wong, Kevin Hung, Zhizhong Li, Weitong Li, Jun Zhang

    Abstract: In recent years, more machine learning algorithms have been applied to odor classification. These odor classification algorithms usually assume that the training datasets are static. However, for some odor recognition tasks, new odor classes continually emerge. That is, the odor datasets are dynamically growing while both training samples and number of classes are increasing over time. Motivated b… ▽ More

    Submitted 27 April, 2019; v1 submitted 8 January, 2018; originally announced January 2018.

    Comments: 17 pages, 6 figures

    Journal ref: IEEE Transactions on Instrumentation and Measurement ( Volume: 68 , Issue: 4 , April 2019 ) 952 - 962

  29. arXiv:1608.01760  [pdf, other

    cs.SI physics.soc-ph

    Investigative Simulation: Towards Utilizing Graph Pattern Matching for Investigative Search

    Authors: Benjamin W. K. Hung, Anura P. Jayasumana

    Abstract: This paper proposes the use of graph pattern matching for investigative graph search, which is the process of searching for and prioritizing persons of interest who may exhibit part or all of a pattern of suspicious behaviors or connections. While there are a variety of applications, our principal motivation is to aid law enforcement in the detection of homegrown violent extremists. We introduce i… ▽ More

    Submitted 5 August, 2016; originally announced August 2016.

    Comments: 8 pages, 6 figures. Paper to appear in the Fosint-SI 2016 conference proceedings in conjunction with the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ASONAM 2016