Skip to main content

Showing 1–29 of 29 results for author: Kang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  2. arXiv:2403.19105  [pdf, ps, other

    cs.IT eess.SP

    Pilot Signal and Channel Estimator Co-Design for Hybrid-Field XL-MIMO

    Authors: Yoonseong Kang, Hyowoon Seo, Wan Choi

    Abstract: This paper addresses the intricate task of hybrid-field channel estimation in extremely large-scale MIMO (XL-MIMO) systems, critical for the progression of 6G communications. Within these systems, comprising a line-of-sight (LoS) channel component alongside far-field and near-field scattering channel components, our objective is to tackle the channel estimation challenge. We encounter two central… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  3. arXiv:2403.07355  [pdf, ps, other

    eess.SP cs.AI cs.CV

    Vector Quantization for Deep-Learning-Based CSI Feedback in Massive MIMO Systems

    Authors: Junyong Shin, Yu** Kang, Yo-Seb Jeon

    Abstract: This paper presents a finite-rate deep-learning (DL)-based channel state information (CSI) feedback method for massive multiple-input multiple-output (MIMO) systems. The presented method provides a finite-bit representation of the latent vector based on a vector-quantized variational autoencoder (VQ-VAE) framework while reducing its computational complexity based on shape-gain vector quantization.… ▽ More

    Submitted 12 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2402.11492  [pdf, other

    eess.SY

    Exponential Cluster Synchronization in Fast Switching Network Topologies: A Pinning Control Approach with Necessary and Sufficient Conditions

    Authors: Ku Du, Yu Kang

    Abstract: This research investigates the intricate domain of synchronization problem among multiple agents operating within a dynamic fast switching network topology. We concentrate on cluster synchronization within coupled linear system under pinning control, providing both necessary and sufficient conditions. As a pivotal aspect, this paper aim to president the weakest possible conditions to make the coup… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  5. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  6. arXiv:2310.10992  [pdf, other

    cs.SD eess.AS

    A High Fidelity and Low Complexity Neural Audio Coding

    Authors: Wenzhe Liu, Wei Xiao, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu

    Abstract: Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor high-frequency expression and high computational cost and storage consumption, we proposed an integrated framework that utilizes a neural network to model wide… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  7. arXiv:2308.11639  [pdf, other

    eess.SP cs.AI cs.LG

    An Empirical Study on Fault Detection and Root Cause Analysis of Indium Tin Oxide Electrodes by Processing S-parameter Patterns

    Authors: Tae Yeob Kang, Haebom Lee, Sungho Suh

    Abstract: In the field of optoelectronics, indium tin oxide (ITO) electrodes play a crucial role in various applications, such as displays, sensors, and solar cells. Effective fault diagnosis and root cause analysis of the ITO electrodes are essential to ensure the performance and reliability of the devices. However, traditional visual inspection is challenging with transparent ITO electrodes, and existing… ▽ More

    Submitted 10 June, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted in IEEE Transactions on Device and Materials Reliability

  8. arXiv:2304.14225  [pdf, ps, other

    eess.SP

    Once and for All: Scheduling Multiple Users Using Statistical CSI under Fixed Wireless Access

    Authors: Xin Guan, Zhixing Chen, Yibin Kang, Qingjiang Shi

    Abstract: Conventional multi-user scheduling schemes are designed based on instantaneous channel state information (CSI), indicating that decisions must be made every transmission time interval (TTI) which lasts at most several milliseconds. Only quite simple approaches can be exploited under this stringent time constraint, resulting in less than satisfactory scheduling performance. In this paper, we invest… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: 12 pages,6 figures

  9. arXiv:2303.09278  [pdf, other

    eess.AS cs.SD

    DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

    Authors: Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma

    Abstract: Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech recognition (ASR). However, the large model size and the non-streaming architecture make it hard to be used under low-resource or streaming scenarios. In this work, we propose a two-stage knowledge distillation method to solve these two problems: the first step is to make the big and non-streaming teacher model smaller, and th… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  10. arXiv:2212.05397  [pdf, other

    eess.SY

    Task modules Partitioning, Scheduling and Floorplanning for Partially Dynamically Reconfigurable Systems Based on Modern Heterogeneous FPGAs

    Authors: Bo Ding, **glei Huang, Junpeng Wang, Qi Xu, Song Chen, Yi Kang

    Abstract: Modern field programmable gate array(FPGA) can be partially dynamically reconfigurable with heterogeneous resources distributed on the chip. And FPGA-based partially dynamically reconfigurable system(FPGA-PDRS) can be used to accelerate computing and improve computing flexibility. However, the traditional design of FPGA-PDRS is based on manual design. Implementing the automation of FPGA-PDRS n… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

  11. arXiv:2207.08998  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Discovering novel systemic biomarkers in photos of the external eye

    Authors: Boris Babenko, Ilana Traynis, Christina Chen, Preeti Singh, Akib Uddin, Jorge Cuadros, Lauren P. Daskivich, April Y. Maa, Ramasamy Kim, Eugene Yu-Chuan Kang, Yossi Matias, Greg S. Corrado, Lily Peng, Dale R. Webster, Christopher Semturs, Jonathan Krause, Avinash V. Varadarajan, Naama Hammel, Yun Liu

    Abstract: External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidn… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  12. arXiv:2206.13042  [pdf, other

    cs.CV eess.IV

    A Strategy Optimized Pix2pix Approach for SAR-to-Optical Image Translation Task

    Authors: Fujian Cheng, Yashu Kang, Chunlei Chen, Kezhao Jiang

    Abstract: This technical report summarizes the analysis and approach on the image-to-image translation task in the Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022). In terms of strategy optimization, cloud classification is utilized to filter optical images with dense cloud coverage to aid the supervised learning alike approach. The commonly used pix2pix framework with a few optimiz… ▽ More

    Submitted 4 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

  13. arXiv:2206.09756  [pdf, other

    cs.CV cs.LG eess.IV

    Time Gated Convolutional Neural Networks for Crop Classification

    Authors: Longlong Weng, Yashu Kang, Kezhao Jiang, Chunlei Chen

    Abstract: This paper presented a state-of-the-art framework, Time Gated Convolutional Neural Network (TGCNN) that takes advantage of temporal information and gating mechanisms for the crop classification problem. Besides, several vegetation indices were constructed to expand dimensions of input data to take advantage of spectral information. Both spatial (channel-wise) and temporal (step-wise) correlation a… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  14. arXiv:2205.02845  [pdf, other

    eess.IV cs.CV

    Invariant Content Synergistic Learning for Domain Generalization of Medical Image Segmentation

    Authors: Yuxin Kang, Hansheng Li, Xuan Zhao, Dongqing Hu, Feihong Liu, Lei Cui, Jun Feng, Lin Yang

    Abstract: While achieving remarkable success for medical image segmentation, deep convolution neural networks (DCNNs) often fail to maintain their robustness when confronting test data with the novel distribution. To address such a drawback, the inductive bias of DCNNs is recently well-recognized. Specifically, DCNNs exhibit an inductive bias towards image style (e.g., superficial texture) rather than invar… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 10 pages, 5 figures

  15. arXiv:2204.04645  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data

    Authors: Yu Kang, Tianqiao Liu, Hang Li, Yang Hao, Wenbiao Ding

    Abstract: Multimodal pre-training for audio-and-text has recently been proved to be effective and has significantly improved the performance of many downstream speech understanding tasks. However, these state-of-the-art pre-training audio-text models work well only when provided with large amount of parallel audio-and-text data, which brings challenges on many languages that are rich in unimodal corpora but… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: AAAI 2022

  16. arXiv:2203.14476  [pdf

    cs.CV eess.IV

    A Novel Remote Sensing Approach to Recognize and Monitor Red Palm Weevil in Date Palm Trees

    Authors: Yashu Kang, Chunlei Chen, Fujian Cheng, Jianyong Zhang

    Abstract: The spread of the Red Pal Weevil (RPW) has become an existential threat for palm trees around the world. In the Middle East, RPW is causing wide-spread damage to date palm Phoenix dactylifera L., having both agricultural impacts on the palm production and environmental impacts. Early detection of RPW is very challenging, especially at large scale. This research proposes a novel remote sensing appr… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

  17. arXiv:2109.07327  [pdf, ps, other

    eess.AS cs.SD

    Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning

    Authors: Songjun Cao, Yueteng Kang, Yanzhe Fu, Xiaoshuo Xu, Sining Sun, Yike Zhang, Long Ma

    Abstract: Recently self-supervised learning has emerged as an effective approach to improve the performance of automatic speech recognition (ASR). Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data. However, the non-streaming architecture like bidirectional transformer is usually adopted by the neural network to achieve… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: INTERSPEECH2021

  18. arXiv:2107.07956  [pdf, other

    cs.SD cs.CL eess.AS

    A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation

    Authors: Hang Li, Yu Kang, Yang Hao, Wenbiao Ding, Zhongqin Wu, Zitao Liu

    Abstract: The quality of vocal delivery is one of the key indicators for evaluating teacher enthusiasm, which has been widely accepted to be connected to the overall course qualities. However, existing evaluation for vocal delivery is mainly conducted with manual ratings, which faces two core challenges: subjectivity and time-consuming. In this paper, we present a novel machine learning approach that utiliz… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: AIED'21: The 22nd International Conference on Artificial Intelligence in Education, 2021

  19. arXiv:2104.01471  [pdf, other

    eess.AS

    Adversarial Joint Training with Self-Attention Mechanism for Robust End-to-End Speech Recognition

    Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel, Gerhard Rigoll

    Abstract: Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predicts the next output symbol depending on the full input sequence and the previous predictions. Inspired by the extensive applications of the generative adversarial networks (GANs) in speech enh… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  20. arXiv:2103.09407  [pdf, ps, other

    eess.SY

    Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective

    Authors: Man Li, Jiahu Qin, Wei Xing Zheng, Yaonan Wang, Yu Kang

    Abstract: To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stoch… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  21. arXiv:1910.13799  [pdf, other

    eess.AS cs.LG cs.SD

    Multimodal Learning For Classroom Activity Detection

    Authors: Hang Li, Yu Kang, Wenbiao Ding, Song Yang, Songfan Yang, Gale Yan Huang, Zitao Liu

    Abstract: Classroom activity detection (CAD) focuses on accurately classifying whether the teacher or student is speaking and recording both the length of individual utterances during a class. A CAD solution helps teachers get instant feedback on their pedagogical instructions. This greatly improves educators' teaching skills and hence leads to students' achievement. However, CAD is very challenging because… ▽ More

    Submitted 10 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: The 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

  22. Transferring Multiscale Map Styles Using Generative Adversarial Networks

    Authors: Yuhao Kang, Song Gao, Robert E. Roth

    Abstract: The advancement of the Artificial Intelligence (AI) technologies makes it possible to learn stylistic design criteria from existing maps or other visual art and transfer these styles to make new digital maps. In this paper, we propose a novel framework using AI for map style transfer applicable across multiple map scales. Specifically, we identify and transfer the stylistic elements from a target… ▽ More

    Submitted 18 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: 12 pages, 17 figure

    ACM Class: I.2.1; I.4.9

    Journal ref: International Journal of Cartography, 2019

  23. Design of generalized fractional order gradient descent method

    Authors: Yiheng Wei, Yu Kang, Weidi Yin, Yong Wang

    Abstract: This paper focuses on the convergence problem of the emerging fractional order gradient descent method, and proposes three solutions to overcome the problem. In fact, the general fractional gradient method cannot converge to the real extreme point of the target function, which critically hampers the application of this method. Because of the long memory characteristics of fractional derivative, fi… ▽ More

    Submitted 16 February, 2020; v1 submitted 24 December, 2018; originally announced January 2019.

    Comments: 8 pages, 16 figures

    MSC Class: 26A33; 90C25 ACM Class: F.2.2

  24. arXiv:1810.02558  [pdf, other

    eess.SY

    Optimal Denial-of-Service Attack Energy Management over an SINR-Based Network

    Authors: Jiahu Qin, Menglin Li, Ling Shi, Yu Kang

    Abstract: We consider a scenario in which a DoS attacker with the limited power resource jams a wireless network through which the packet from a sensor is sent to a remote estimator to estimate the system state. To degrade the estimation quality with power constraint, the attacker aims to solve how much power to obstruct the channel each time, which is the recently proposed optimal attack energy management… ▽ More

    Submitted 5 October, 2018; originally announced October 2018.

  25. arXiv:1810.02531  [pdf, other

    eess.SY

    Randomized Consensus based Distributed Kalman Filtering over Wireless Sensor Networks

    Authors: Jiahu Qin, Jie Wang, Ling Shi, Yu Kang

    Abstract: This paper is concerned with develo** a novel distributed Kalman filtering algorithm over wireless sensor networks based on randomized consensus strategy. Compared with the centralized algorithm, distributed filtering techniques require less computation per sensor and lead to more robust estimation since they simply use the information from the neighboring nodes in the network. However, poor loc… ▽ More

    Submitted 5 October, 2018; originally announced October 2018.

  26. arXiv:1807.05855  [pdf

    cs.CL cs.SD eess.AS

    A Fast-Converged Acoustic Modeling for Korean Speech Recognition: A Preliminary Study on Time Delay Neural Network

    Authors: Hosung Park, Donghyun Lee, Minkyu Lim, Yoseb Kang, Juneseok Oh, Ji-Hwan Kim

    Abstract: In this paper, a time delay neural network (TDNN) based acoustic model is proposed to implement a fast-converged acoustic modeling for Korean speech recognition. The TDNN has an advantage in fast-convergence where the amount of training data is limited, due to subsampling which excludes duplicated weights. The TDNN showed an absolute improvement of 2.12% in terms of character error rate compared t… ▽ More

    Submitted 11 July, 2018; originally announced July 2018.

    Comments: 6 pages, 2 figures

  27. arXiv:1806.09276  [pdf

    eess.AS cs.SD

    EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

    Authors: Hao Li, Yongguo Kang, Zhenyu Wang

    Abstract: We present EMPHASIS, an emotional phoneme-based acoustic model for speech synthesis system. EMPHASIS includes a phoneme duration prediction model and an acoustic parameter prediction model. It uses a CBHG-based regression network to model the dependencies between linguistic features and acoustic features. We modify the input and output layer structures of the network to improve the performance. Fo… ▽ More

    Submitted 25 June, 2018; v1 submitted 24 June, 2018; originally announced June 2018.

    Comments: Accepted by Interspeech 2018

  28. arXiv:1806.08619  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions

    Authors: Yu Gu, Yongguo Kang

    Abstract: This paper introduces an improved generative model for statistical parametric speech synthesis (SPSS) based on WaveNet under a multi-task learning framework. Different from the original WaveNet model, the proposed Multi-task WaveNet employs the frame-level acoustic feature prediction as the secondary task and the external fundamental frequency prediction model for the original WaveNet can be remov… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

    Comments: Accepted by Interspeech 2018

  29. arXiv:1711.03536  [pdf, other

    eess.IV cs.AI cs.CV

    Picasso, Matisse, or a Fake? Automated Analysis of Drawings at the Stroke Level for Attribution and Authentication

    Authors: Ahmed Elgammal, Yan Kang, Milko Den Leeuw

    Abstract: This paper proposes a computational approach for analysis of strokes in line drawings by artists. We aim at develo** an AI methodology that facilitates attribution of drawings of unknown authors in a way that is not easy to be deceived by forged art. The methodology used is based on quantifying the characteristics of individual strokes in drawings. We propose a novel algorithm for segmenting ind… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.