Skip to main content

Showing 1–49 of 49 results for author: Han, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.12478  [pdf, other

    eess.SY

    Efficient Economic Model Predictive Control of Water Treatment Process with Learning-based Koopman Operator

    Authors: Minghao Han, **gshi Yao, Adrian Wing-Keung Law, Xunyuan Yin

    Abstract: Used water treatment plays a pivotal role in advancing environmental sustainability. Economic model predictive control holds the promise of enhancing the overall operational performance of the water treatment facilities. In this study, we propose a data-driven economic predictive control approach within the Koopman modeling framework. First, we propose a deep learning-enabled input-output Koopman… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  2. arXiv:2405.04752  [pdf, other

    eess.AS cs.SD

    HILCodec: High Fidelity and Lightweight Neural Audio Codec

    Authors: Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

    Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. Reduced-order Koopman modeling and predictive control of nonlinear processes

    Authors: Xuewen Zhang, Minghao Han, Xunyuan Yin

    Abstract: In this paper, we propose an efficient data-driven predictive control approach for general nonlinear processes based on a reduced-order Koopman operator. A Kalman-based sparse identification of nonlinear dynamics method is employed to select lifting functions for Koopman identification. The selected lifting functions are used to project the original nonlinear state-space into a higher-dimensional… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 29 pages, 8 figures

    Journal ref: Computers & Chemical Engineering, 2023, 179, p.108440

  4. arXiv:2403.17801  [pdf, other

    cs.CV eess.IV

    Towards 3D Vision with Low-Cost Single-Photon Cameras

    Authors: Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li

    Abstract: We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to mo… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  5. arXiv:2402.04356  [pdf, other

    cs.SD cs.CV eess.AS

    Bidirectional Autoregressive Diffusion Model for Dance Generation

    Authors: Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, **g Xiao, Song Wang

    Abstract: Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create… ▽ More

    Submitted 22 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  6. arXiv:2312.06065  [pdf, other

    eess.AS cs.SD

    EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

    Authors: Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

    Abstract: In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  7. Deep Reinforcement Learning-driven Cross-Community Energy Interaction Optimal Scheduling

    Authors: Yang Li, Wenjie Ma, Fan** Bu, Zhen Yang, Bin Wang, Meng Han

    Abstract: In order to coordinate energy interactions among various communities and energy conversions among multi-energy subsystems within the multi-community integrated energy system under uncertain conditions, and achieve overall optimization and scheduling of the comprehensive energy system, this paper proposes a comprehensive scheduling model that utilizes a multi-agent deep reinforcement learning algor… ▽ More

    Submitted 2 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: in Chinese language, Accepted by Electric Power Construction

    Journal ref: Electric Power Construction 45 (2024) 59-70

  8. arXiv:2307.13343  [pdf, other

    eess.AS cs.CR cs.SD

    On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

    Authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

    Abstract: Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Proceedings of INTERSPEECH 2023

  9. arXiv:2305.19972  [pdf, other

    eess.AS cs.AI cs.CL

    VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

    Authors: Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, **g Shi, Pin Lv, Bo Xu

    Abstract: Enhancing automatic speech recognition (ASR) performance by leveraging additional multimodal information has shown promising results in previous studies. However, most of these works have primarily focused on utilizing visual cues derived from human lip motions. In fact, context-dependent visual and linguistic cues can also benefit in many scenarios. In this paper, we first propose ViLaS (Vision a… ▽ More

    Submitted 18 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2024

  10. arXiv:2305.19051  [pdf, other

    eess.AS cs.AI cs.SD

    Towards single integrated spoofing-aware speaker verification embeddings

    Authors: Sung Hwan Mun, Hye-** Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

    Abstract: This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023. Code and models are available in https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline

  11. arXiv:2305.14022  [pdf, other

    cs.CV eess.IV

    Realistic Noise Synthesis with Diffusion Models

    Authors: Qi Wu, Mingyan Han, Ting Jiang, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

    Abstract: Deep image denoising models often rely on large amount of training data for the high quality performance. However, it is challenging to obtain sufficient amount of data under real-world scenarios for the supervised training. As such, synthesizing realistic noise becomes an important solution. However, existing techniques have limitations in modeling complex noise distributions, resulting in residu… ▽ More

    Submitted 3 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  12. arXiv:2305.07945  [pdf, other

    cs.IT eess.SP

    Deep Learning-based Data-aided Activity Detection with Extraction Network in Grant-free Sparse Code Multiple Access Systems

    Authors: Minsig Han, Ameha T. Abebe, Chung G. Kang

    Abstract: This letter proposes a deep learning-based data-aided active user detection network (D-AUDN) for grant-free sparse code multiple access (SCMA) systems that leverages both SCMA codebook and Zadoff-Chu preamble for activity detection. Due to disparate data and preamble distribution as well as codebook collision, existing D-AUDNs experience performance degradation when multiple preambles are associat… ▽ More

    Submitted 19 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

  13. arXiv:2305.04160  [pdf, other

    cs.CL cs.AI cs.CV eess.AS

    X-LLM: Bootstrap** Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

    Authors: Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, **g Shi, Shuang Xu, Bo Xu

    Abstract: Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4, based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous visual language models. We attribute this to the use of more advanced LLMs compared with previous multimodal models. Unfortunately, the model architecture and training strategies of GPT-4 are unknown. To endow LLMs with multimod… ▽ More

    Submitted 21 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

  14. arXiv:2305.03189  [pdf

    eess.SP

    Experimental Validation of Coherent Joint Transmission in a Distributed-MIMO System with Analog Fronthaul for 6G

    Authors: Rafael Puerta, Mahdieh Joharifar, Mengyao Han, Anders Djupsjöbacka, Vjaceslavs Bobrovs, Sergei Popov, Oskars Ozolins, Xiaodan Pang

    Abstract: The sixth-generation (6G) mobile networks must increase coverage and improve spectral efficiency, especially for cell-edge users. Distributed multiple-input multiple-output (D-MIMO) networks can fulfill these requirements provided that transmission/reception points (TRxPs) of the network can be synchronized with sub nanosecond precision, however, synchronization with current backhaul and fronthaul… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted in EuCNC 2023 Conference, 6 pages, 8 figures

  15. arXiv:2302.04948  [pdf

    eess.SP

    NR Conformance Testing of Analog Radio-over-LWIR FSO Fronthaul link for 6G Distributed MIMO Networks

    Authors: Rafael Puerta, Mengyao Han, Mahdieh Joharifar, Richard Schatz, Yan-Ting Sun, Yuchuan Fan, Anders Djupsjöbacka, Grégory Maisons, Johan Abautret, Roland Teissier, Lu Zhang, Sandis Spolitis, Muguang Wang, Vjaceslavs Bobrovs, Sebastian Lourdudoss, Xianbin Yu, Sergei Popov, Oskars Ozolins, Xiaodan Pang

    Abstract: We experimentally test the compliance with 5G/NR 3GPP technical specifications of an analog radio-over-FSO link at 9 μm. The ACLR and EVM transmitter requirements are fulfilled validating the suitability of LWIR FSO for 6G fronthaul.

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted in Optical Fiber Communication Conference (OFC) 2023, 3 pages, 2 figures

  16. arXiv:2301.13003  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

    Authors: Minglun Han, Feilong Chen, **g Shi, Shuang Xu, Bo Xu

    Abstract: Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks. Leveraging the capabilities of PLMs to enhance automatic speech recognition (ASR) systems has also emerged as a promising research direction. However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems,… ▽ More

    Submitted 28 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted by INTERSPEECH 2023

  17. Data-Driven Distributionally Robust Scheduling of Community Integrated Energy Systems with Uncertain Renewable Generations Considering Integrated Demand Response

    Authors: Yang Li, Meng Han, Mohammad Shahidehpour, Jiazheng Li, Chao Long

    Abstract: A community integrated energy system (CIES) is an important carrier of the energy internet and smart city in geographical and functional terms. Its emergence provides a new solution to the problems of energy utilization and environmental pollution. To coordinate the integrated demand response and uncertainty of renewable energy generation (RGs), a data-driven two-stage distributionally robust opti… ▽ More

    Submitted 27 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Accepted by Applied Energy

    Journal ref: Applied Energy 335 (2023) 120749

  18. arXiv:2210.11388  [pdf

    eess.IV cs.CV

    Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial Intelligence

    Authors: Chen Qian, Yuncheng Gao, Mingyang Han, Zi Wang, Dan Ruan, Yu Shen, Ya** Wu, Yirong Zhou, Chengyan Wang, Boyu Jiang, Ran Tao, Zhigang Wu, Jiazheng Wang, Liuhong Zhu, Yi Guo, Taishan Kang, Jianzhong Lin, Tao Gong, Chen Yang, Guoqiang Fei, Mei** Lin, Di Guo, Jianjun Zhou, Meiyun Wang, Xiaobo Qu

    Abstract: Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion MRI (DWI) acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced arti… ▽ More

    Submitted 5 February, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 23 pages, 16 figures

  19. arXiv:2210.02732  [pdf, other

    eess.AS

    Fully Unsupervised Training of Few-shot Keyword Spotting

    Authors: Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples. To alleviate the expensive data collection with labeling, in this paper, we propose a novel FS-KWS system trained only on synthetic data. The proposed system is based on metric le… ▽ More

    Submitted 6 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE SLT 2022

  20. arXiv:2208.13113  [pdf, other

    eess.IV cs.CV

    Accurate and Robust Lesion RECIST Diameter Prediction and Segmentation with Transformers

    Authors: Youbao Tang, Ning Zhang, Yirui Wang, Shenghua He, Mei Han, **g Xiao, Ruei-Sung Lin

    Abstract: Automatically measuring lesion/tumor size with RECIST (Response Evaluation Criteria In Solid Tumors) diameters and segmentation is important for computer-aided diagnosis. Although it has been studied in recent years, there is still space to improve its accuracy and robustness, such as (1) enhancing features by incorporating rich contextual information while kee** a high spatial resolution and (2… ▽ More

    Submitted 27 August, 2022; originally announced August 2022.

    Comments: One of a series of works about lesion RECIST diameter prediction and weakly-supervised lesion segmentation (MICCAI 2022)

  21. arXiv:2208.08128  [pdf, other

    cs.IT eess.SY

    On the Performance of Deep Learning-based Data-aided Active User Detection for GF-SCMA System

    Authors: Minsig Han, Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: The recent works on a deep learning (DL)-based joint design of preamble set for the transmitters and data-aided active user detection (AUD) in the receiver has demonstrated a significant performance improvement for grant-free sparse code multiple access (GF-SCMA) system. The autoencoder for the joint design can be trained only in a given environment, but in an actual situation where the operating… ▽ More

    Submitted 5 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  22. arXiv:2208.08012  [pdf, other

    eess.AS cs.SD

    Disentangled Speaker Representation Learning via Mutual Information Minimization

    Authors: Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim

    Abstract: Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive lo… ▽ More

    Submitted 12 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted by APSIPA ASC 2022. Camera-ready. 8 pages, 4 figures, and 1 table

  23. Digitally-assisted photonic analog domain self-interference cancellation for in-band full-duplex MIMO systems via LS algorithm with adaptive order

    Authors: Moxuan Han, Yang Chen

    Abstract: A digitally-assisted photonic analog domain self-interference cancellation (SIC) and frequency downconversion method is proposed for in-band full-duplex multiple-input multiple-output (MIMO) systems using the least square (LS) algorithm with adaptive order. The SIC and frequency downconversion are achieved in the optical domain via a dual-parallel Mach-Zehnder modulator (DP-MZM), while the downcon… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 9 pages, 4 figures

  24. arXiv:2205.12633  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

    Authors: Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, ** Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang , et al. (68 additional authors not shown)

    Abstract: This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR Workshops 2022. 15 pages, 21 figures, 2 tables

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022

  25. arXiv:2205.10780  [pdf, other

    eess.SY cs.IT cs.LG

    Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems

    Authors: Minsig Han, Ameha T. Abebe, Chung G. Kang

    Abstract: In grant-free sparse code multiple access (GF-SCMA) system, active user detection (AUD) is a major performance bottleneck as it involves complex combinatorial problem, which makes joint design of contention resources for users and AUD at the receiver a crucial but a challenging problem. To this end, we propose autoencoder (AE)-based joint optimization of both preamble generation networks (PGNs) in… ▽ More

    Submitted 8 August, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  26. arXiv:2204.01005  [pdf, other

    eess.AS cs.AI

    Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification

    Authors: Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim

    Abstract: The majority of recent state-of-the-art speaker verification architectures adopt multi-scale processing and frequency-channel attention mechanisms. Convolutional layers of these models typically have a fixed kernel size, e.g., 3 or 5. In this study, we further contribute to this line of research utilising a selective kernel attention (SKA) mechanism. The SKA mechanism allows each convolutional lay… ▽ More

    Submitted 12 October, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

    Comments: Accepted by IEEE SLT 2022. 7 pages, 4 figures, 1 table. Code is available at https://github.com/msh9184/ska-tdnn.git

  27. arXiv:2201.12806  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

    Authors: Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu

    Abstract: Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may encounter confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigati… ▽ More

    Submitted 2 March, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Accepted by ICASSP 2022

  28. arXiv:2112.08929  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

    Authors: Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, Nam Soo Kim

    Abstract: In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In t… ▽ More

    Submitted 24 December, 2021; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted by IEEE Access

  29. Photonics-based de-chir** and leakage cancellation for frequency-modulated continuous-wave radar system

    Authors: Taixia Shi, Dingding Liang, Moxuan Han, Yang Chen

    Abstract: A photonics-based leakage cancellation and echo signal de-chir** approach for frequency-modulated continuous-wave radar systems is proposed based on a dual-drive Mach-Zehnder modulator (DD-MZM), with its performance evaluated by the radar measurement and imaging. The de-chirp reference signal and the leakage cancellation reference signal are combined and applied to the upper arm of the DD-MZM, w… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 20 pages, 9 figures

  30. Digital-assisted photonic analog wideband multipath self-interference cancellation

    Authors: Moxuan Han, Taixia Shi, Yang Chen

    Abstract: A digital-assisted photonic analog wideband radio-frequency multipath self-interference cancellation (SIC) and frequency downconversion method based on a dual-drive Mach-Zehnder modulator and the recursive least square (RLS) algorithm is proposed and demonstrated for in-band full-duplex systems. Besides the reference for the direct-path self-interference (SI) signal, the RLS algorithm is used to c… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 9 pages, 5 figures

  31. arXiv:2111.03111  [pdf, other

    cs.RO eess.SY

    Modeling and Control of an Omnidirectional Micro Aerial Vehicle Equipped with a Soft Robotic Arm

    Authors: Róbert Szász, Mike Allenspach, Minghao Han, Marco Tognon, Robert. K. Katzschmann

    Abstract: Flying manipulators are aerial drones with attached rigid-bodied robotic arms and belong to the latest and most actively developed research areas in robotics. The rigid nature of these arms often lack compliance, flexibility, and smoothness in movement. This work proposes to use a soft-bodied robotic arm attached to an omnidirectional micro aerial vehicle (OMAV) to leverage the compliant and flexi… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

  32. Photonics-assisted wideband RF self-interference cancellation with digital domain amplitude and delay pre-matching

    Authors: Taixia Shi, Moxuan Han, Yang Chen

    Abstract: A photonics-based digital and analog self-interference cancellation approach for in-band full-duplex communication systems and frequency-modulated continuous-wave radar systems is reported. One dual-drive Mach-Zehnder modulator is used to implement the analog self-interference cancellation by pre-adjusting the delay and amplitude of the reference signal applied to the dual-drive Mach-Zehnder modul… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: 10 pages, 6 figures

  33. Coordinating Flexible Demand Response and Renewable Uncertainties for Scheduling of Community Integrated Energy Systems with an Electric Vehicle Charging Station: A Bi-level Approach

    Authors: Yang Li, Meng Han, Zhen Yang, Guoqing Li

    Abstract: A community integrated energy system (CIES) with an electric vehicle charging station (EVCS) provides a new way for tackling growing concerns of energy efficiency and environmental pollution, it is a critical task to coordinate flexible demand response and multiple renewable uncertainties. To this end, a novel bi-level optimal dispatching model for the CIES with an EVCS in multi-stakeholder scenar… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

    Comments: Acccpted by IEEE Transactions on Sustainable Energy

    Journal ref: IEEE Transactions on Sustainable Energy 12 (2021) 2321-2331

  34. arXiv:2104.03893  [pdf, other

    cs.RO cs.AI cs.CV cs.HC eess.SP

    Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

    Authors: Mehrshad Zandigohar, Mo Han, Mohammadreza Sharif, Sezen Yagmur Gunay, Mariusz P. Furmanek, Mathew Yarossi, Paolo Bonato, Cagdas Onal, Taskin Padir, Deniz Erdogmus, Gunar Schirner

    Abstract: Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and… ▽ More

    Submitted 27 February, 2024; v1 submitted 8 April, 2021; originally announced April 2021.

    ACM Class: I.5.4; I.2.9

    Journal ref: Front. Robot. AI 11 (2024) Sec. Biomedical Robotics

  35. arXiv:2104.00818  [pdf, other

    cs.IT eess.SY

    Deep Learning-based Codebook Design for Code-domain Non-Orthogonal Multiple Access Approaching Single-User Bit Error Rate Performance

    Authors: Minsig Han, Hanchang Seo, Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: A general form of codebook design for code-domain non-orthogonal multiple access (CD-NOMA) can be considered equivalent to an autoencoder (AE)-based constellation design for multi-user multidimensional modulation (MU-MDM). Due to a constrained design space for optimal constellation, e.g., fixed resource map** and equal power allocation to all codebooks, however, existing AE architectures produce… ▽ More

    Submitted 10 October, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  36. arXiv:2012.09466  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

    Authors: Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu

    Abstract: End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become the mainstream. However, the unified structure and the E2E training hamper injecting contextual information into them for contextual biasing. Though contextual LAS (CLAS) gives an excellent all-neural solution, the degree of biasing to given context information is no… ▽ More

    Submitted 18 February, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: Accepted by ICASSP 2021

  37. arXiv:2011.06882  [pdf, other

    eess.SY cs.LG cs.RO

    Reinforcement Learning Control of Constrained Dynamic Systems with Uniformly Ultimate Boundedness Stability Guarantee

    Authors: Minghao Han, Yuan Tian, Lixian Zhang, Jun Wang, Wei Pan

    Abstract: Reinforcement learning (RL) is promising for complicated stochastic nonlinear control problems. Without using a mathematical model, an optimal controller can be learned from data evaluated by certain performance criteria through trial-and-error. However, the data-based learning approach is notorious for not guaranteeing stability, which is the most fundamental property for any control system. In t… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

  38. arXiv:2010.11433  [pdf, other

    eess.AS cs.SD

    Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the pr… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, 4 tables

  39. arXiv:2010.11408  [pdf, ps, other

    eess.AS cs.SD

    Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in INTERSPEECH 2020

  40. arXiv:2009.13453  [pdf, other

    eess.SP cs.LG

    Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders

    Authors: Mo Han, Ozan Ozdenizci, Toshiaki Koike-Akino, Ye Wang, Deniz Erdogmus

    Abstract: Human computer interaction (HCI) involves a multidisciplinary fusion of technologies, through which the control of external devices could be achieved by monitoring physiological status of users. However, physiological biosignals often vary across users and recording sessions due to unstable physical/mental conditions and task-irrelevant activities. To deal with this challenge, we propose a method… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: 8 pages

  41. arXiv:2008.11426  [pdf, other

    cs.LG eess.SP stat.ML

    Disentangled Adversarial Autoencoder for Subject-Invariant Physiological Feature Extraction

    Authors: Mo Han, Ozan Ozdenizci, Ye Wang, Toshiaki Koike-Akino, Deniz Erdogmus

    Abstract: Recent developments in biosignal processing have enabled users to exploit their physiological status for manipulating devices in a reliable and safe manner. One major challenge of physiological sensing lies in the variability of biosignals across different users and tasks. To address this issue, we propose an adversarial feature extractor for transfer learning to exploit disentangled universal rep… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: Accepted for publication by IEEE Signal Processing Letters

    Journal ref: IEEE Signal Processing Letters, 2020

  42. Disentangled speaker and nuisance attribute embedding for robust speaker verification

    Authors: Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states)… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Access

  43. arXiv:2005.04043  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Hypergraph Learning for Identification of COVID-19 with CT Imaging

    Authors: Donglin Di, Feng Shi, Fuhua Yan, Liming Xia, Zhanhao Mo, Zhongxiang Ding, Fei Shan, Shengrui Li, Ying Wei, Ying Shao, Miaofei Han, Yaozong Gao, He Sui, Yue Gao, Dinggang Shen

    Abstract: The coronavirus disease, named COVID-19, has become the largest global public health crisis since it started in early 2020. CT imaging has been used as a complementary tool to assist early screening, especially for the rapid identification of COVID-19 cases from community acquired pneumonia (CAP) cases. The main challenge in early screening is how to model the confusing cases in the COVID-19 and C… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  44. arXiv:2004.14288  [pdf, other

    cs.RO cs.LG eess.SY

    Actor-Critic Reinforcement Learning for Control with Stability Guarantee

    Authors: Minghao Han, Lixian Zhang, Jun Wang, Wei Pan

    Abstract: Reinforcement Learning (RL) and its integration with deep learning have achieved impressive performance in various robotic control tasks, ranging from motion planning and navigation to end-to-end visual manipulation. However, stability is not guaranteed in model-free RL by solely using data. From a control-theoretic perspective, stability is the most important property for any control system, sinc… ▽ More

    Submitted 15 July, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: IEEE RA-L + IROS 2020

  45. arXiv:2004.08289  [pdf, other

    eess.SP cs.HC cs.LG stat.ML

    Disentangled Adversarial Transfer Learning for Physiological Biosignals

    Authors: Mo Han, Ozan Ozdenizci, Ye Wang, Toshiaki Koike-Akino, Deniz Erdogmus

    Abstract: Recent developments in wearable sensors demonstrate promising results for monitoring physiological status in effective and comfortable ways. One major challenge of physiological status assessment is the problem of transfer learning caused by the domain inconsistency of biosignals across users or different recording sessions from the same user. We propose an adversarial inference approach for trans… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2020)

  46. arXiv:2003.04655  [pdf

    cs.CV eess.IV q-bio.QM

    Lung Infection Quantification of COVID-19 in CT Images with Deep Learning

    Authors: Fei Shan, Yaozong Gao, Jun Wang, Weiya Shi, Nannan Shi, Miaofei Han, Zhong Xue, Dinggang Shen, Yuxin Shi

    Abstract: CT imaging is crucial for diagnosis, assessment and staging COVID-19 infection. Follow-up scans every 3-5 days are often recommended for disease progression. It has been reported that bilateral and peripheral ground glass opacification (GGO) with or without consolidation are predominant CT findings in COVID-19 patients. However, due to lack of computerized quantification tools, only qualitative im… ▽ More

    Submitted 30 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: 23 pages, 6 figures

  47. arXiv:2001.00577  [pdf, other

    eess.AS cs.LG cs.SD

    Attention based on-device streaming speech recognition with large speech corpus

    Authors: Kwangyoun Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen **, Young-Yoon Lee, **su Yeo, Daehyun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim

    Abstract: In this paper, we present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses, minimum word error rate (MWER) training, layer… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

    Comments: Accepted and presented at the ASRU 2019 conference

  48. arXiv:1912.01054  [pdf, other

    eess.IV cs.CV cs.LG

    The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge

    Authors: Nicholas Heller, Fabian Isensee, Klaus H. Maier-Hein, Xiaoshuai Hou, Chunmei Xie, Fengyi Li, Yang Nan, Guangrui Mu, Zhiyong Lin, Miofei Han, Guang Yao, Yaozong Gao, Yao Zhang, Yixin Wang, Feng Hou, Jiawei Yang, Guangwei Xiong, Jiang Tian, Cheng Zhong, Jun Ma, Jack Rickman, Joshua Dean, Bethany Stai, Resha Tejpaul, Makinna Oestreich , et al. (16 additional authors not shown)

    Abstract: There is a large body of literature linking anatomic and geometric characteristics of kidney tumors to perioperative and oncologic outcomes. Semantic segmentation of these tumors and their host kidneys is a promising tool for quantitatively characterizing these lesions, but its adoption is limited due to the manual effort required to produce high-quality 3D segmentations of these structures. Recen… ▽ More

    Submitted 7 August, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 24 pages, 11 figures

  49. arXiv:1911.02875  [pdf, other

    cs.LG cs.RO eess.SY

    $H_\infty$ Model-free Reinforcement Learning with Robust Stability Guarantee

    Authors: Minghao Han, Yuan Tian, Lixian Zhang, Jun Wang, Wei Pan

    Abstract: Reinforcement learning is showing great potentials in robotics applications, including autonomous driving, robot manipulation and locomotion. However, with complex uncertainties in the real-world environment, it is difficult to guarantee the successful generalization and sim-to-real transfer of learned policies theoretically. In this paper, we introduce and extend the idea of robust stability and… ▽ More

    Submitted 25 July, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: NeurIPS 2019 Workshop on Robot Learning: Control and Interaction in the Real World, Vancouver, Canada