Skip to main content

Showing 1–30 of 30 results for author: Zhao, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.01146  [pdf, other

    eess.IV cs.CV

    Cross-Slice Attention and Evidential Critical Loss for Uncertainty-Aware Prostate Cancer Detection

    Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Kaifeng Pang, Demetri Terzopoulos, Kyunghyun Sung

    Abstract: Current deep learning-based models typically analyze medical images in either 2D or 3D albeit disregarding volumetric information or suffering sub-optimal performance due to the anisotropic resolution of MR data. Furthermore, providing an accurate uncertainty estimation is beneficial to clinicians, as it indicates how confident a model is about its prediction. We propose a novel 2.5D cross-slice a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2405.07478  [pdf, other

    eess.SY

    Coded Event-triggered Control for Nonlinear Systems

    Authors: Ruihang Ji, Shuzhi Sam Ge, Kai Zhao

    Abstract: This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather compl… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2403.09651  [pdf, other

    cs.CV eess.IV

    Precision Agriculture: Crop Map** using Machine Learning and Sentinel-2 Satellite Imagery

    Authors: Kui Zhao, Siyang Wu, Chang Liu, Yue Wu, Natalia Efremova

    Abstract: Food security has grown in significance due to the changing climate and its warming effects. To support the rising demand for agricultural products and to minimize the negative impact of climate change and mass cultivation, precision agriculture has become increasingly important for crop cultivation. This study employs deep learning and pixel-based machine learning methods to accurately segment la… ▽ More

    Submitted 25 November, 2023; originally announced March 2024.

  4. arXiv:2403.03629  [pdf, other

    eess.SP

    Spatially Selective Reconfigurable Intelligent Surfaces Through Element Permutation

    Authors: Fredrik Rusek, Jose Flordelis, Kun Zhao, Erik Bengtsson, Olof Zander

    Abstract: A standard reconfigurable intelligent surface (RIS) can be configured to reflect signals from an arbitrary im**ing direction to an arbitrary outgoing direction. However, if a signal im**es from any other direction, said signal is reflected, with full beamforming gain, to a specific direction, which is easily determined. The goal of this paper is to propose a RIS which \emph{only} reflects sign… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: ICC 2024, 6 pages, 4 figures

  5. arXiv:2402.14349  [pdf, other

    eess.IV cs.CV cs.LG

    Uncertainty-driven and Adversarial Calibration Learning for Epicardial Adipose Tissue Segmentation

    Authors: Kai Zhao, Zhiming Liu, Jiaqi Liu, **gbiao Zhou, Bihong Liao, Huifang Tang, Qiuyu Wang, Chunquan Li

    Abstract: Epicardial adipose tissue (EAT) is a type of visceral fat that can secrete large amounts of adipokines to affect the myocardium and coronary arteries. EAT volume and density can be used as independent risk markers measurement of volume by noninvasive magnetic resonance images is the best method of assessing EAT. However, segmenting EAT is challenging due to the low contrast between EAT and pericar… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 13 pages,7 figuers

  6. arXiv:2312.10921  [pdf, other

    cs.CV cs.SD eess.AS

    AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis

    Authors: Dongze Li, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, **g Dong, Tieniu Tan

    Abstract: Audio-driven talking head synthesis is a promising topic with wide applications in digital human, film making and virtual reality. Recent NeRF-based approaches have shown superiority in quality and fidelity compared to previous studies. However, when it comes to few-shot talking head generation, a practical scenario where only few seconds of talking video is available for one identity, two limitat… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  7. arXiv:2311.04942  [pdf, other

    eess.IV cs.CV

    CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation

    Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Xiaoxi Du, Kaifeng Pang, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

    Abstract: A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D meth… ▽ More

    Submitted 26 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  8. arXiv:2310.19022  [pdf, other

    math.OC cs.LG eess.SY

    Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

    Authors: **gliang Duan, Jie Li, Xuyang Chen, Kai Zhao, Shengbo Eben Li, Lin Zhao

    Abstract: In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the opti… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Cybernetics, 2023

  9. arXiv:2307.11926  [pdf, other

    eess.IV cs.CV

    PartDiff: Image Super-resolution with Partial Diffusion Models

    Authors: Kai Zhao, Alex Ling Yu Hung, Kaifeng Pang, Haoxin Zheng, Kyunghyun Sung

    Abstract: Denoising diffusion probabilistic models (DDPMs) have achieved impressive performance on various image generation tasks, including image super-resolution. By learning to reverse the process of gradually diffusing the data distribution into Gaussian noise, DDPMs generate new data by iteratively denoising from random noise. Despite their impressive performance, diffusion-based generative models suff… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  10. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  11. arXiv:2306.02011  [pdf

    physics.med-ph eess.IV

    The contribution of T2 relaxation time to diffusion MRI quantification and its clinical implications: a hypothesis

    Authors: Yi Xiang J Wang, Kai-Xuan Zhao, Fu-Zhao Ma, Ben-Heng Xiao

    Abstract: Considering liver as the reference, that both fast diffusion (PF) and slow diffusion (Dslow) of the spleen are much underestimated is likely due to the MRI properties of the spleen such as the much longer T2 relaxation time. It is possible that longer T2 relaxation time partially mitigates the signal decay effect of various gradients on diffusion weighted image. This phenomenon will not be limited… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  12. arXiv:2305.00139  [pdf, other

    cs.LG eess.SP

    Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks

    Authors: Feng Ji, See Hian Lee, Hanyang Meng, Kai Zhao, Jielong Yang, Wee Peng Tay

    Abstract: In node classification using graph neural networks (GNNs), a typical model generates logits for different class labels at each node. A softmax layer often outputs a label prediction based on the largest logit. We demonstrate that it is possible to infer hidden graph structural information from the dataset using these logits. We introduce the key notion of label non-uniformity, which is derived fro… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  13. arXiv:2304.04366  [pdf, other

    cs.RO cs.LG eess.SY

    Learning Residual Model of Model Predictive Control via Random Forests for Autonomous Driving

    Authors: Kang Zhao, Jianru Xue, Xiangning Meng, Gengxin Li, Mengsen Wu

    Abstract: One major issue in learning-based model predictive control (MPC) for autonomous driving is the contradiction between the system model's prediction accuracy and computation efficiency. The more situations a system model covers, the more complex it is, along with highly nonlinear and nonconvex properties. These issues make the optimization too complicated to solve and render real-time control imprac… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: 8 pages, 8 figures

  14. arXiv:2304.03507  [pdf, other

    eess.SP cs.LG

    Distributional Signals for Node Classification in Graph Neural Networks

    Authors: Feng Ji, See Hian Lee, Kai Zhao, Wee Peng Tay, Jielong Yang

    Abstract: In graph neural networks (GNNs), both node features and labels are examples of graph signals, a key notion in graph signal processing (GSP). While it is common in GSP to impose signal smoothness constraints in learning and estimation tasks, it is unclear how this can be done for discrete node labels. We bridge this gap by introducing the concept of distributional graph signals. In our framework, w… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  15. arXiv:2301.05648  [pdf, ps, other

    eess.SP

    Reconfigurable Intelligent Surface Empowered Rate-Splitting Multiple Access for Simultaneous Wireless Information and Power Transfer

    Authors: Chengzhong Tian, Yijie Mao, Kangchun Zhao, Yuanming Shi, Bruno Clerckx

    Abstract: Rate-splitting multiple access (RSMA) and reconfigurable intelligent surface (RIS) have been both recognized as promising techniques for 6G. The benefits of combining the two techniques to enhance the spectral and energy efficiency have been recently exploited in communication-only networks. Inspired by the recent advances on RIS empowered RSMA, in this work we investigate the use of RIS empowered… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  16. arXiv:2212.09988  [pdf, other

    cs.CV eess.IV

    Multi-Reference Image Super-Resolution: A Posterior Fusion Approach

    Authors: Ke Zhao, Haining Tan, Tsz Fung Yau

    Abstract: Reference-based Super-resolution (RefSR) approaches have recently been proposed to overcome the ill-posed problem of image super-resolution by providing additional information from a high-resolution image. Multi-reference super-resolution extends this approach by allowing more information to be incorporated. This paper proposes a 2-step-weighting posterior fusion approach to combine the outputs of… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  17. arXiv:2210.12715  [pdf, ps, other

    eess.SY

    Adaptive Control with Global Exponential Stability for Parameter-Varying Nonlinear Systems under Unknown Control Gains

    Authors: Hefu Ye, Haijia Wu, Kai Zhao, Yongduan Song

    Abstract: It is nontrivial to achieve exponential stability even for time-invariant nonlinear systems with matched uncertainties and persistent excitation (PE) condition. In this paper, without the need for PE condition, we address the problem of global exponential stabilization of strict-feedback systems with mismatched uncertainties and unknown yet time-varying control gains. The resultant control, embedd… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

  18. arXiv:2204.03238  [pdf, other

    eess.AS cs.MM

    Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis

    Authors: Yutian Wang, Yuankun Xie, Kun Zhao, Hui Wang, Qin Zhang

    Abstract: In this paper, we propose a novel prosody disentangle method for prosodic Text-to-Speech (TTS) model, which introduces the vector quantization (VQ) method to the auxiliary prosody encoder to obtain the decomposed prosody representations in an unsupervised manner. Rely on its advantages, the speaking styles, such as pitch, speaking velocity, local pitch variance, etc., are decomposed automatically… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: accepted by IEEE International Conference on Multimedia and Expo 2022 (ICME2022)

  19. Video Polyp Segmentation: A Deep Learning Perspective

    Authors: Ge-Peng Ji, Guobao Xiao, Yu-Cheng Chou, Deng-** Fan, Kai Zhao, Geng Chen, Luc Van Gool

    Abstract: We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the we… ▽ More

    Submitted 31 August, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: Accepted by Machine Intelligence Research 2022 (Project Page: https://github.com/GewelsJI/VPS)

    Journal ref: Machine Intelligence Research, vol. 19, no. 6, pp.531-549, 2022

  20. arXiv:2112.09891  [pdf, other

    cs.LG eess.IV

    Equilibrated Zeroth-Order Unrolled Deep Networks for Accelerated MRI

    Authors: Zhuo-Xu Cui, **g Cheng, Qingyong Zhu, Yuanyuan Liu, Sen Jia, Kankan Zhao, Ziwen Ke, Wenqi Huang, Haifeng Wang, Yanjie Zhu, Dong Liang

    Abstract: Recently, model-driven deep learning unrolls a certain iterative algorithm of a regularization model into a cascade network by replacing the first-order information (i.e., (sub)gradient or proximal operator) of the regularizer with a network module, which appears more explainable and predictable compared to common data-driven networks. Conversely, in theory, there is not necessarily such a functio… ▽ More

    Submitted 22 December, 2021; v1 submitted 18 December, 2021; originally announced December 2021.

    Comments: 11 figures

  21. arXiv:2103.13197  [pdf, other

    eess.SY

    Topology Design for GNSSs Considering Both Inter-satellite Links and Ground-satellite Links

    Authors: Z. Yan, K. Zhao, W. Li, C. Kang, J. Zheng, H. Yang, S. Du

    Abstract: Inter-satellite links (ISLs) are adopted in global navigation satellite systems (GNSSs) for high-precision orbit determination and space-based end-to-end telemetry telecommand control and communications. Due to limited onboard ISL terminals, the polling time division duplex (PTDD) mechanism is usually proposed for space link layer networking. By extending the polling mechanism to ground-satellite… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  22. arXiv:2103.09300  [pdf

    cs.CV eess.IV

    The impact of data volume on performance of deep learning based building rooftop extraction using very high spatial resolution aerial images

    Authors: Hongjie He, Ke Yang, Yuwei Cai, Zijian Jiang, Qiutong Yu, Kun Zhao, Junbo Wang, Sarah Narges Fatholahi, Yan Liu, Hasti Andon Petrosians, Bingxu Hu, Liyuan Qing, Zhehan Zhang, Hongzhang Xu, Siyu Li, Kyle Gao, Linlin Xu, Jonathan Li

    Abstract: Building rooftop data are of importance in several urban applications and in natural disaster management. In contrast to traditional surveying and map**, by using high spatial resolution aerial images, deep learning-based building rooftops extraction methods are efficient and accurate. Although more training data is preferred in deep learning-based tasks, the effect of data volume on building ex… ▽ More

    Submitted 4 October, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

  23. arXiv:2009.09761  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    DiffWave: A Versatile Diffusion Model for Audio Synthesis

    Authors: Zhifeng Kong, Wei **, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

    Abstract: In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave p… ▽ More

    Submitted 30 March, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: ICLR 2021 (oral)

  24. Learning to Estimate Driver Drowsiness from Car Acceleration Sensors using Weakly Labeled Data

    Authors: Takayuki Katsuki, Kun Zhao, Takayuki Yoshizumi

    Abstract: This paper addresses the learning task of estimating driver drowsiness from the signals of car acceleration sensors. Since even drivers themselves cannot perceive their own drowsiness in a timely manner unless they use burdensome invasive sensors, obtaining labeled training data for each timestamp is not a realistic goal. To deal with this difficulty, we formulate the task as a weakly supervised l… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: Accepted by ICASSP2020

  25. arXiv:1912.01219  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    WaveFlow: A Compact Flow-based Model for Raw Audio

    Authors: Wei **, Kainan Peng, Kexin Zhao, Zhao Song

    Abstract: In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including Wav… ▽ More

    Submitted 24 June, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Published at ICML 2020. Code and pre-trained models: https://github.com/PaddlePaddle/Parakeet

  26. arXiv:1907.06844  [pdf, other

    cs.CV eess.IV

    Deep inspection: an electrical distribution pole parts study via deep neural networks

    Authors: Liangchen Liu, Teng Zhang, Kun Zhao, Arnold Wiliem, Kieren Astin-Walmsley, Brian Lovell

    Abstract: Electrical distribution poles are important assets in electricity supply. These poles need to be maintained in good condition to ensure they protect community safety, maintain reliability of supply, and meet legislative obligations. However, maintaining such a large volumes of assets is an expensive and challenging task. To address this, recent approaches utilise imagery data captured from helicop… ▽ More

    Submitted 16 July, 2019; originally announced July 2019.

    Comments: electrical distribution pole inspection, integrated inspection system, object detection, imbalanced data classification, To appear in Proceeding of ICIP 2019

  27. arXiv:1907.04462  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multi-Speaker End-to-End Speech Synthesis

    Authors: Jihyun Park, Kexin Zhao, Kainan Peng, Wei **

    Abstract: In this work, we extend ClariNet (** et al., 2019), a fully end-to-end speech synthesis model (i.e., text-to-wave), to generate high-fidelity speech from multiple speakers. To model the unique characteristic of different voices, low dimensional trainable speaker embeddings are shared across each component of ClariNet and trained together with the rest of the model. We demonstrate that the multi-… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

  28. arXiv:1905.08459  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Non-Autoregressive Neural Text-to-Speech

    Authors: Kainan Peng, Wei **, Zhao Song, Kexin Zhao

    Abstract: In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and brings 46.7 times speed-up over the lightweight Deep Voice 3 at synthesis, while obtaining reasonably good speech quality. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a la… ▽ More

    Submitted 29 June, 2020; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: Published at ICML 2020. (v3 changed paper title)

  29. arXiv:1806.09068  [pdf

    physics.ins-det eess.SP

    Prototype of Front-end Electronics for PandaX-4ton Experiment

    Authors: Shuwen Wang, Zhongtao Shen, Keqing Zhao, Changqing Feng, Shubin Liu

    Abstract: At the China **** Underground Laboratory, the Particle AND Astrophysical Xenon phase IV (PandaX-4ton) in planning is a dark matter direct detection experiment with dual-phase xenon detector as an upgrade of the second phase of the experiment, PandaX-II. In this paper, the prototype of the front-end electronics of PandaX-4ton is presented. The front-end electronics consist of the high-gain pream… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.

  30. arXiv:1710.09026  [pdf, other

    cs.LG cs.CL eess.AS stat.ML

    Trace norm regularization and faster inference for embedded speech recognition RNNs

    Authors: Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi

    Abstract: We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, we introduce and study a trace norm regularization technique for training low rank factored versions of matrix multiplications. Compared to st… ▽ More

    Submitted 6 February, 2018; v1 submitted 24 October, 2017; originally announced October 2017.

    Comments: Our optimized inference kernels are available at: https://github.com/PaddlePaddle/farm (Note: This paper was submitted to, but rejected from, ICLR 2018. We believe it may still be of value to others. Please see the discussion here: https://openreview.net/forum?id=B1tC-LT6W)