Skip to main content

Showing 1–50 of 56 results for author: Han, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17196  [pdf, other

    cs.IT eess.SY

    Coded Kalman Filtering over MIMO Gaussian Channels with Feedback

    Authors: Barron Han, Oron Sabag, Victoria Kostina, Babak Hassibi

    Abstract: We consider the problem of remotely stabilizing a linear dynamical system. In this setting, a sensor co-located with the system communicates the system's state to a controller over a noisy communication channel with feedback. The objective of the controller (decoder) is to use the channel outputs to estimate the vector state with finite zero-delay mean squared error (MSE) at the infinite horizon.… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at the 2024 IEEE International Symposium on Information Theory

  2. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, **yi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  3. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, **yu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  4. arXiv:2405.09245  [pdf, other

    eess.SP

    A Robust UAV-Based Approach for Power-Modulated Jammer Localization Using DoA

    Authors: Zexin Fang, Bin Han, Hans D. Schotten

    Abstract: Unmanned aerial vehicles (UAVs) are well-suited to localize jammers, particularly when jammers are at non-terrestrial locations, where conventional detection methods face challenges. In this work we propose a novel localization method, sample pruning gradient descend (SPGD), which offers robust performance against multiple power-modulated jammers with low computational complexity.

    Submitted 21 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Submitted to the 2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall)

  5. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  6. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  7. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  8. arXiv:2404.03088  [pdf, other

    cs.LG cs.AI cs.NI eess.SP

    Robust Federated Learning for Wireless Networks: A Demonstration with Channel Estimation

    Authors: Zexin Fang, Bin Han, Hans D. Schotten

    Abstract: Federated learning (FL) offers a privacy-preserving collaborative approach for training models in wireless networks, with channel estimation emerging as a promising application. Despite extensive studies on FL-empowered channel estimation, the security concerns associated with FL require meticulous attention. In a scenario where small base stations (SBSs) serve as local models trained on cached da… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE GLOBECOM 2024

  9. arXiv:2404.02159  [pdf, other

    cs.IT eess.SP

    Fairness-aware Age-of-Information Minimization in WPT-Assisted Short-Packet THz Communications for mURLLC

    Authors: Yao Zhu, Xiaopeng Yuan, Yulin Hu, Bo Ai, Ruikang Wang, Bin Han, Anke Schmeink

    Abstract: The technological landscape is swiftly advancing towards large-scale systems, creating significant opportunities, particularly in the domain of Terahertz (THz) communications. Networks designed for massive connectivity, comprising numerous Internet of Things (IoT) devices, are at the forefront of this advancement. In this paper, we consider Wireless Power Transfer (WPT)-enabled networks that suppo… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

  10. arXiv:2402.09810  [pdf, other

    eess.SP

    3D Cooperative Localization in UAV Systems: CRLB Analysis and Security Solutions

    Authors: Zexin Fang, Bin Han, Hans D. Schotten

    Abstract: This paper presents a robust and secure framework for achieving accurate and reliable cooperative localization in multiple unmanned aerial vehicle (UAV) systems. The Cramer-Rao low bound (CRLB) for the three-dimensional (3D) cooperative localization network is derived, with particular attention given to the non-uniform spatial distribution of anchor nodes. Challenges of mobility and security threa… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE Transactions on Wireless Communications

  11. arXiv:2401.11902  [pdf, other

    eess.IV cs.CV

    A Training-Free Defense Framework for Robust Learned Image Compression

    Authors: Myungseo Song, **young Choi, Bohyung Han

    Abstract: We study the robustness of learned image compression models against adversarial attacks and present a training-free defense technique based on simple image transform functions. Recent learned image compression models are vulnerable to adversarial attacks that result in poor compression rate, low reconstruction quality, or weird artifacts. To address the limitations, we propose a simple but effecti… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 10 pages and 14 figures

  12. arXiv:2312.15946  [pdf, other

    cs.SD cs.GR eess.AS

    EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

    Authors: Bo Han, Yi Ren, Hao Peng, Teng Zhang, Zeyu Ling, Xiang Yin, Feilin Han

    Abstract: The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  13. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  14. arXiv:2310.11747  [pdf, other

    cs.IT eess.SY math.OC

    Coded Kalman Filtering Over Gaussian Channels with Feedback

    Authors: Barron Han, Oron Sabag, Victoria Kostina, Babak Hassibi

    Abstract: This paper investigates the problem of zero-delay joint source-channel coding of a vector Gauss-Markov source over a multiple-input multiple-output (MIMO) additive white Gaussian noise (AWGN) channel with feedback. In contrast to the classical problem of causal estimation using noisy observations, we examine a system where the source can be encoded before transmission. An encoder, equipped with fe… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Presented at 59th Allerton Conference on Communication, Control, and Computing

  15. arXiv:2309.11730  [pdf, other

    eess.AS cs.SD

    Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

    Authors: Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li

    Abstract: Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets. To boost the system performance, researchers leverage large pretrained models such as WavLM to transfer learned high-level features to the downstream speaker recognition task. However, this approach introduces extra parameters as the pretrained model remains in the inference s… ▽ More

    Submitted 26 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: submitted to ICASSP 2024

  16. arXiv:2309.06672  [pdf, other

    cs.SD eess.AS

    Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

    Authors: Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian

    Abstract: Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers, while target speaker voice activity detection (TS-VAD) systems tend to be overly complex. In this paper, we propose a simple attention-based encoder-decoder netw… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: IEEE/ACM Transactions on Audio Speech and Language Processing Under Review

  17. arXiv:2309.04270  [pdf, other

    eess.SP cs.MA

    A Reliable and Resilient Framework for Multi-UAV Mutual Localization

    Authors: Zexin Fang, Bin Han, Hans D. Schotten

    Abstract: This paper presents a robust and secure framework for achieving accurate and reliable mutual localization in multiple unmanned aerial vehicle (UAV) systems. Challenges of accurate localization and security threats are addressed and corresponding solutions are brought forth and accessed in our paper with numerical simulations. The proposed solution incorporates two key components: the Mobility Adap… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted by the 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), Hong Kong, 10-13 October 2023

  18. arXiv:2308.14360  [pdf, other

    cs.SD cs.AI eess.AS

    InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

    Authors: Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song

    Abstract: Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly… ▽ More

    Submitted 12 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Demo samples are available at https://musicedit.github.io/

  19. arXiv:2307.10321  [pdf, other

    eess.SP

    Terahertz Communications and Sensing for 6G and Beyond: A Comprehensive Review

    Authors: Wei Jiang, Qiuheng Zhou, Jiguang He, Mohammad Asif Habibi, Sergiy Melnyk, Mohammed El Absi, Bin Han, Marco Di Renzo, Hans Dieter Schotten, Fa-Long Luo, Tarek S. El-Bawab, Markku Juntti, Merouane Debbah, Victor C. M. Leung

    Abstract: Next-generation cellular technologies, commonly referred to as the 6G, are envisioned to support a higher system capacity, better performance, and network sensing capabilities. The THz band is one potential enabler to this end due to the large unused frequency bands and the high spatial resolution enabled by the short signal wavelength and large bandwidth. Different from earlier surveys, this pape… ▽ More

    Submitted 6 May, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: 56 pages, 9 figures, 11 tables, IEEE Communications Surveys & Tutorials

  20. arXiv:2307.08205  [pdf, ps, other

    eess.AS cs.SD

    Exploring Binary Classification Loss For Speaker Verification

    Authors: Bing Han, Zhengyang Chen, Yanmin Qian

    Abstract: The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pairs which might hinder further improvements. And popular multi-classification methods are usually observed with degradation when evaluated on unseen spe… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted by ICASSP 2023

  21. arXiv:2306.15161  [pdf, other

    eess.AS cs.SD

    Wespeaker baselines for VoxSRC2023

    Authors: Shuai Wang, Chengdong Liang, Xu Xiang, Bing Han, Zhengyang Chen, Hongji Wang, Wen Ding

    Abstract: This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In thi… ▽ More

    Submitted 28 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  22. arXiv:2305.12021  [pdf, other

    eess.SP cs.MA

    A Secure and Robust Approach for Distance-Based Mutual Positioning of Unmanned Aerial Vehicles

    Authors: Bin Han, Hans D. Schotten

    Abstract: Unmanned aerial vehicle (UAV) is becoming increasingly important in modern civilian and military applications. However, its novel use cases is bottlenecked by conventional satellite and terrestrial localization technologies, and calling for complementary solutions. Multi-UAV mutual positioning can be a potential answer, but its accuracy and security are challenged by inaccurate and/or malicious me… ▽ More

    Submitted 9 January, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at the IEEE WCNC 2024

  23. arXiv:2305.10704  [pdf, other

    cs.SD eess.AS

    Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

    Authors: Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian

    Abstract: This paper proposes a novel Attention-based Encoder-Decoder network for End-to-End Neural speaker Diarization (AED-EEND). In AED-EEND system, we incorporate the target speaker enrollment information used in target speaker voice activity detection (TS-VAD) to calculate the attractor, which can mitigate the speaker permutation problem and facilitate easier model convergence. In the training process,… ▽ More

    Submitted 15 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by InterSpeech 2023

  24. arXiv:2305.08029  [pdf, other

    cs.SD cs.AI eess.AS

    REMAST: Real-time Emotion-based Music Arrangement with Soft Transition

    Authors: Zihao Wang, Le Ma, Chen Zhang, Bo Han, Yunfei Xu, Yikai Wang, Xinyi Chen, HaoRong Hong, Wenbo Liu, Xinda Wu, Kejun Zhang

    Abstract: Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion rea… ▽ More

    Submitted 5 February, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

    ACM Class: H.5.5; F.2.2

  25. arXiv:2304.05754  [pdf, other

    cs.SD eess.AS

    Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification

    Authors: Bing Han, Zhengyang Chen, Yanmin Qian

    Abstract: Automatic speaker verification task has made great achievements using deep learning approaches with the large-scale manually annotated dataset. However, it's very difficult and expensive to collect a large amount of well-labeled data for system building. In this paper, we propose a novel and advanced self-supervised learning framework which can construct a high performance speaker verification sys… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Submitted to TASLP in July 19, 2022

  26. arXiv:2301.09080  [pdf, other

    cs.MM cs.SD eess.AS

    Dance2MIDI: Dance-driven multi-instruments music generation

    Authors: Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han

    Abstract: Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instruments scenario is under-explored. The challenges associated with the dance-driven multi-instrument music (MIDI) generation are twofold: 1) no publicly available multi-instruments MIDI and video paired dataset and 2) the weak co… ▽ More

    Submitted 27 February, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

    Comments: has been accepted by Computational Visual Media Journal

  27. arXiv:2211.00815  [pdf, other

    cs.SD eess.AS

    Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: Many speaker recognition challenges have been held to assess the speaker verification system in the wild and probe the performance limit. Voxceleb Speaker Recognition Challenge (VoxSRC), based on the voxceleb, is the most popular. Besides, another challenge called CN-Celeb Speaker Recognition Challenge (CNSRC) is also held this year, which is based on the Chinese celebrity multi-genre dataset CN-C… ▽ More

    Submitted 1 June, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted by InterSpeech 2023

  28. arXiv:2210.15936  [pdf, other

    cs.SD eess.AS

    A comprehensive study on self-supervised distillation for speaker representation learning

    Authors: Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng

    Abstract: In real application scenarios, it is often challenging to obtain a large amount of labeled data for speaker representation learning due to speaker privacy concerns. Self-supervised learning with no labels has become a more and more promising way to solve it. Compared with contrastive learning, self-distilled approaches use only positive samples in the loss function and thus are more attractive. In… ▽ More

    Submitted 25 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT2022

  29. arXiv:2210.14321  [pdf, other

    eess.AS cs.AI cs.MM cs.SD eess.SP

    Artificial ASMR: A Cyber-Psychological Approach

    Authors: Zexin Fang, Bin Han, C. Clark Cao, Hans. D. Schotten

    Abstract: The popularity of Autonomous Sensory Meridian Response (ASMR) has skyrockted over the past decade, but scientific studies on what exactly triggered ASMR effect remain few and immature, one most commonly acknowledged trigger is that ASMR clips typically provide rich semantic information. With our attention caught by the common acoustic patterns in ASMR audios, we investigate the correlation between… ▽ More

    Submitted 5 July, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE MLSP 2023

  30. arXiv:2210.12361  [pdf

    eess.IV cs.CV

    MS-DCANet: A Novel Segmentation Network For Multi-Modality COVID-19 Medical Images

    Authors: Xiaoyu Pan, Huazheng Zhu, **glong Du, Guangtao Hu, Baoru Han, Yuanyuan Jia

    Abstract: The Coronavirus Disease 2019 (COVID-19) pandemic has increased the public health burden and brought profound disaster to humans. For the particularity of the COVID-19 medical images with blurred boundaries, low contrast and different infection sites, some researchers have improved the accuracy by adding more complexity. Also, they overlook the complexity of lesions, which hinder their ability to c… ▽ More

    Submitted 19 July, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: 21pages,13 figures,9 tables

    Journal ref: J Multidiscip Healthc. 2023;16:2023-2043

  31. arXiv:2209.09076  [pdf, other

    cs.SD eess.AS

    SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022

    Authors: Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian

    Abstract: This report describes the SJTU-AISPEECH system for the Voxceleb Speaker Recognition Challenge 2022. For track1, we implemented two kinds of systems, the online system and the offline system. Different ResNet-based backbones and loss functions are explored. Our final fusion system achieved 3rd place in track1. For track3, we implemented statistic adaptation and jointly training based domain adaptat… ▽ More

    Submitted 20 September, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: System description of VoxSRC 2022

  32. arXiv:2208.01933  [pdf, other

    cs.SD eess.AS

    The SJTU System for Short-duration Speaker Verification Challenge 2021

    Authors: Bing Han, Zhengyang Chen, Zhikai Zhou, Yanmin Qian

    Abstract: This paper presents the SJTU system for both text-dependent and text-independent tasks in short-duration speaker verification (SdSV) challenge 2021. In this challenge, we explored different strong embedding extractors to extract robust speaker embedding. For text-independent task, language-dependent adaptive snorm is explored to improve the system performance under the cross-lingual verification c… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Published by Interspeech 2021

  33. arXiv:2208.01928  [pdf, other

    cs.SD eess.AS

    Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction

    Authors: Bing Han, Zhengyang Chen, Yanmin Qian

    Abstract: For self-supervised speaker verification, the quality of pseudo labels decides the upper bound of the system due to the massive unreliable labels. In this work, we propose dynamic loss-gate and label correction (DLG-LC) to alleviate the performance degradation caused by unreliable estimated labels. In DLG, we adopt Gaussian Mixture Model (GMM) to dynamically model the loss distribution and use the… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Accepted by Interspeech 2022

  34. arXiv:2206.11699  [pdf, ps, other

    cs.SD eess.AS

    The SJTU X-LANCE Lab System for CNSRC 2022

    Authors: Zhengyang Chen, Bei Liu, Bing Han, Leying Zhang, Yanmin Qian

    Abstract: This technical report describes the SJTU X-LANCE Lab system for the three tracks in CNSRC 2022. In this challenge, we explored the speaker embedding modeling ability of deep ResNet (Deeper r-vector). All the systems are only trained on the Cnceleb training set and we use the same systems for the three tracks in CNSRC 2022. In this challenge, our system ranks the first place in the fixed track of s… ▽ More

    Submitted 14 May, 2023; v1 submitted 23 June, 2022; originally announced June 2022.

  35. Modeling the System-Level Reliability towards a Convergence of Communication, Computing and Control

    Authors: Bin Han, Hans D. Schotten

    Abstract: Enabled and driven by modern advances in wireless telecommunication and artificial intelligence, the convergence of communication, computing, and control is becoming inevitable in future industrial applications. Analytical and optimizing frameworks, however, are not yet readily developed for this new technical trend. In this work we discuss the necessity and typical scenarios of this convergence,… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted to appear in the 8th International Symposium on Reliability and Risk Management (ISRERM 2022)

    Journal ref: in Proceedings of the 8th International Symposium on Reliability Engineering and Risk Management, 2022

  36. arXiv:2204.10197  [pdf, other

    cs.NI eess.SY

    Flexible and dependable manufacturing beyond xURLLC: A novel framework for communication-control co-design

    Authors: Bin Han, Mu-Xia Sun, Lai-Kan Muk, Yan-Fu Li, Hans D. Schotten

    Abstract: Future Industrial 4.0 applications in the 6G era is calling for high dependability that goes far beyond the current ultra-reliable low latency communication (URLLC), and therewith proposed critical challenges to the communication technology. Instead of struggling against the physical and technical limits towards an extreme URLLC (xURLLC), communication-control co-design (CoCoCo) appears a more pro… ▽ More

    Submitted 5 December, 2022; v1 submitted 18 March, 2022; originally announced April 2022.

    Comments: To appear in the 22nd IEEE International Conference on Software Quality, Reliability, and Security (QRS 2022) Workshops

  37. arXiv:2203.04398  [pdf

    eess.SP physics.optics

    Window Filtering Algorithm for Pulsed Light Coherent Combining of Low Repetition Frequency

    Authors: Jiali Zhang, Jie Cao, Qun Hao, Yang Cheng, Liquan Dong, Bin Han, Xuesheng Liu

    Abstract: The multi-dithering method has been well verified in phase locking of polarization coherent combination experiment. However, it is hard to apply to low repetition frequency pulsed lasers, since there exists an overlap frequency domain between pulse laser and the amplitude phase noise and traditional filters cannot effectively separate phase noise. Aiming to solve the problem in this paper, we prop… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  38. arXiv:2201.02876  [pdf, other

    eess.IV cs.CV

    Defocus Deblur Microscopy via Head-to-Tail Cross-scale Fusion

    Authors: Jiahe Wang, Boran Han

    Abstract: Microscopy imaging is vital in biology research and diagnosis. When imaging at the scale of cell or molecule level, mechanical drift on the axial axis can be difficult to correct. Although multi-scale networks have been developed for deblurring, those cascade residual learning approaches fail to accurately capture the end-to-end non-linearity of deconvolution, a relation between in-focus images an… ▽ More

    Submitted 30 May, 2023; v1 submitted 8 January, 2022; originally announced January 2022.

    Comments: published on ICIP 2022

  39. arXiv:2112.15399  [pdf, other

    cs.CV cs.GR eess.IV

    InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering

    Authors: Mijeong Kim, Seonguk Seo, Bohyung Han

    Abstract: We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation. The proposed approach minimizes potential reconstruction inconsistency that happens due to insufficient viewpoints by imposing the entropy constraint of the density in each ray. In addition, to alleviate the potential degenerate issue when all training images are… ▽ More

    Submitted 10 April, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

    Comments: CVPR 2022, Website: http://cv.snu.ac.kr/research/InfoNeRF

  40. Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC

    Authors: Bin Han, Yao Zhu, Anke Schmeink, Hans D. Schotten

    Abstract: The deployment of multi-access edge computing (MEC) is paving the way towards pervasive intelligence in future 6G networks. This new paradigm also proposes emerging requirements of dependable communications, which goes beyond the ultra-reliable low latency communication (URLLC), focusing on the performance of a closed loop instead of that of an unidirectional link. This work studies the simple but… ▽ More

    Submitted 10 December, 2021; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Accepted for publication at CSCN 2021 V1: accepted version V2: minor correction in the modulation order V3: corrections to resolve chaos caused by different normalizations of the FBL PER equation, model figure file updated in HQ

    Journal ref: in 2021 IEEE Conference on Standards for Communications and Networking (CSCN), 2021, pp. 180-185

  41. arXiv:2108.09551  [pdf, other

    eess.IV cs.CV cs.LG

    Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform

    Authors: Myungseo Song, **young Choi, Bohyung Han

    Abstract: We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815), which takes a source image and a corresponding quality map as inputs and produce a compressed image with variable rates. Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps. In addition, the proposed framework… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  42. arXiv:2106.08754  [pdf

    cond-mat.mtrl-sci eess.IV

    Conformal Three-Dimensional Interphase of Li Metal Anode Revealed by Low Dose Cryo-Electron Microscopy

    Authors: Bing Han, Xiangyan Li, Shuang Bai, Yucheng Zou, Bingyu Lu, Minghao Zhang, Xiaomin Ma, Zhi Chang, Ying Shirley Meng, Meng Gu

    Abstract: Using cryogenic transmission electron microscopy, we revealed three dimensional (3D) structural details of the electrochemically plated lithium (Li) flakes and their solid electrolyte interphase (SEI), including the composite SEI skin-layer and SEI fossil pieces buried inside the Li matrix. As the SEI skin-layer is largely comprised of nanocrystalline LiF and Li2O in amorphous polymeric matrix, wh… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  43. Online Estimation of Resource Overload Risk in 5G Multi-Tenancy Network

    Authors: Yasameen Shihab Hamad, Bin Han, Osman Nuri ucan

    Abstract: The technology of network slicing, as the most characteristic feature of the fifth generation (5G) wireless networks, manages the resources and network functions in heterogeneous and logically isolated slices on the top of a shared physical infrastructure, where every slice can be independently customized to fulfill the specific requirements of its devoted service type. It enables a new paradigm o… ▽ More

    Submitted 15 June, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: To appear at ESREL 2021

    Journal ref: Proceedings of the 31st European Safety and Reliability Conference, 2021

  44. arXiv:2106.06237  [pdf, other

    eess.IV cs.CV cs.LG

    KRADA: Known-region-aware Domain Alignment for Open-set Domain Adaptation in Semantic Segmentation

    Authors: Chenhong Zhou, Feng Liu, Chen Gong, Rongfei Zeng, Tongliang Liu, William K. Cheung, Bo Han

    Abstract: In semantic segmentation, we aim to train a pixel-level classifier to assign category labels to all pixels in an image, where labeled training images and unlabeled test images are from the same distribution and share the same label set. However, in an open world, the unlabeled test images probably contain unknown categories and have different distributions from the labeled images. Hence, in this p… ▽ More

    Submitted 19 February, 2023; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: 18 pages

    Journal ref: Transactions on Machine Learning Research, 2023

  45. arXiv:2104.12362  [pdf, other

    eess.SP

    Underwater Target Recognition based on Multi-Decision LOFAR Spectrum Enhancement: A Deep Learning Approach

    Authors: Jie Chen, Jie Liu, Chang Liu, Jian Zhang, Bing Han

    Abstract: The Low frequency analysis and recording (LOFAR) spectrum is one of the key features of the under water target, which can be used for underwater target recognition. However, the underwater environment noise is complicated and the signal-to-noise ratio of the underwater target is rather low, which introduces the breakpoints to the LOFAR spectrum and thus hinders the underwater target recognition. T… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

  46. The Road Towards 6G: A Comprehensive Survey

    Authors: Wei Jiang, Bin Han, Mohammad Asif Habibi, Hans Dieter Schotten

    Abstract: As of today, the fifth generation (5G) mobile communication system has been rolled out in many countries and the number of 5G subscribers already reaches a very large scale. It is time for academia and industry to shift their attention towards the next generation. At this crossroad, an overview of the current state of the art and a vision of future communications are definitely of interest. This a… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 30 Pages, 5 figures, IEEE open Journal

    Journal ref: IEEE Open Journal of the Communications Society (OJCOMS), Vol. 2, 2021, pp. 334 - 366

  47. Partial Discharge Direction of Arrival Estimation in Air-insulated Substation by UHF Wireless Array and RSSI Maximum Likelihood Estimator

    Authors: Bei Han, Lingen Luo, Gehao Sheng, Xiuchen Jiang

    Abstract: The quick detection and localization of partial discharge (PD) in an air-insulated substation (AIS) based on ultrahigh-frequency (UHF) sensor arrays are efficient for power equipment monitoring. The adopted UHF PD time difference of arrival (TDOA) methods mainly use the time difference of electromagnetic wave signals. Thus, the system requires both a high sampling rate and time synchronization acc… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

    Comments: 8 pages, 14 figures

  48. arXiv:2009.00197  [pdf, other

    eess.IV q-bio.QM

    Deep unsupervised learning for Microscopy-Based Malaria detection

    Authors: Alexander Tao, Boran Han

    Abstract: Malaria, a mosquito-borne disease caused by a parasite, kills over 1 million people globally each year. People, if left untreated, may develop severe complications, leading to death. Effective and accurate diagnosis is important for the management and control of malaria. Our research focuses on utilizing machine learning to improve the efficiency in Malaria diagnosis. We utilize a modified U-net a… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

  49. arXiv:2004.11536  [pdf, other

    physics.soc-ph eess.SY

    Leveraging inter-firm influence in the diffusion of energy efficiency technologies: An agent-based model

    Authors: Yingying Shi, Yongchao Zeng, Jean Engo, Botang Han, Yang Li, Ralph T Muehleisen

    Abstract: Energy efficiency technologies (EETs) are crucial for saving energy and reducing carbon dioxide emissions. However, the diffusion of EETs in small and medium-sized enterprises is rather slow. Literature shows the interactions between innovation adopters and potential adopters have significant impacts on innovation diffusion. Enterprises lack the motivation to share information, and EETs usually la… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Journal ref: Applied Energy 263 (2020) 114641

  50. Robustness Analysis of Networked Control Systems with Aging Status

    Authors: Bin Han, Siyu Yuan, Zhiyuan Jiang, Yao Zhu, Hans D. Schotten

    Abstract: As an emerging metric of communication systems, Age of Information (AoI) has been derived to have a critical impact in networked control systems with unreliable information links. This work sets up a novel model of outage probability in a loosely constrained control system as a function of the feedback AoI, and conducts numerical simulations to validate the model.

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: Submitted to IEEE INFOCOM 2020 poster session

    Journal ref: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)