Skip to main content

Showing 1–17 of 17 results for author: Dai, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.05010  [pdf, other

    cs.SD cs.AI eess.AS

    RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

    Authors: Peng Liu, Dongyang Dai, Zhiyong Wu

    Abstract: Recent advancements in generative modeling have significantly enhanced the reconstruction of audio waveforms from various representations. While diffusion models are adept at this task, they are hindered by latency issues due to their operation at the individual sample point level and the need for numerous sampling steps. In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow… ▽ More

    Submitted 2 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  2. arXiv:2403.02894  [pdf

    eess.SP

    DIFNet: SAR RFI suppression based on domain invariant features

    Authors: Fu** Fang, Wenhao Lv, Dahai Dai

    Abstract: Synthetic aperture radar is a high-resolution two-dimensional imaging radar, however, during the imaging process, SAR is susceptible to intentional and unintentional interference, with radio frequency interference (RFI) being the most common type, leading to a severe degradation in image quality. Although inpainting networks have achieved excellent results, their generalization is unclear, and whe… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: five pages

  3. arXiv:2301.06622  [pdf, other

    cs.DC eess.SY

    IOPathTune: Adaptive Online Parameter Tuning for Parallel File System I/O Path

    Authors: Md. Hasanur Rashid, Youbiao He, Forrest Sheng Bao, Dong Dai

    Abstract: Parallel file systems contain complicated I/O paths from clients to storage servers. An efficient I/O path requires proper settings of multiple parameters, as the default settings often fail to deliver optimal performance, especially for diverse workloads in the HPC environment. Existing tuning strategies have shortcomings in being adaptive, timely, and flexible. We propose IOPathTune, which adapt… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

  4. arXiv:2212.08558  [pdf, other

    cs.RO cs.CV eess.SP

    Simulating Road Spray Effects in Automotive Lidar Sensor Models

    Authors: Clemens Linnhoff, Dominik Scheuble, Mario Bijelic, Lukas Elster, Philipp Rosenberger, Werner Ritter, Dengxin Dai, Hermann Winner

    Abstract: Modeling perception sensors is key for simulation based testing of automated driving functions. Beyond weather conditions themselves, sensors are also subjected to object dependent environmental influences like tire spray caused by vehicles moving on wet pavement. In this work, a novel modeling approach for spray in lidar data is introduced. The model conforms to the Open Simulation Interface (OSI… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Submitted to IEEE Sensors Journal

  5. arXiv:2110.03347  [pdf, ps, other

    eess.AS cs.HC cs.SD

    Cloning one's voice using very limited data in the wild

    Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and tim… ▽ More

    Submitted 8 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  6. arXiv:2109.02763  [pdf, other

    cs.SD cs.CV eess.AS

    Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

    Authors: Dengxin Dai, Arun Balajee Vasudevan, Jiri Matas, Luc Van Gool

    Abstract: Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, a… ▽ More

    Submitted 27 February, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2003.04210

  7. arXiv:2012.11174  [pdf, other

    eess.AS cs.AI

    Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network

    Authors: Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng

    Abstract: By using deep learning approaches, Speech Emotion Recog-nition (SER) on a single domain has achieved many excellentresults. However, cross-domain SER is still a challenging taskdue to the distribution shift between source and target domains.In this work, we propose a Domain Adversarial Neural Net-work (DANN) based approach to mitigate this distribution shiftproblem for cross-lingual SER. Specifica… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

    Comments: This paper has been accepted by ISCSLP2021

    ACM Class: I.2

  8. arXiv:2010.13350  [pdf, other

    eess.AS cs.SD

    Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition

    Authors: Xiong Cai, Dongyang Dai, Zhiyong Wu, Xiang Li, **gbei Li, Helen Meng

    Abstract: Neural text-to-speech (TTS) approaches generally require a huge number of high quality speech data, which makes it difficult to obtain such a dataset with extra emotion labels. In this paper, we propose a novel approach for emotional TTS synthesis on a TTS dataset without emotion labels. Specifically, our proposed method consists of a cross-domain speech emotion recognition (SER) model and an emot… ▽ More

    Submitted 17 January, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: icassp2021 final version

    MSC Class: I.2

  9. arXiv:2006.11610  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

    Authors: Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

    Abstract: Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phone… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

    Comments: 5 pages, 5 figures

  10. arXiv:2006.04648  [pdf, other

    cs.CV cs.LG eess.IV

    Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

    Authors: Yang Hu, Guihua Wen, Adriane Chapman, Pei Yang, Mingnan Luo, Yingxue Xu, Dan Dai, Wendy Hall

    Abstract: Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-base… ▽ More

    Submitted 11 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 15 pages, 11 figures, on IEEE Transactions on Multimedia

    Journal ref: [J]. IEEE Transactions on Multimedia, 2021

  11. arXiv:2005.12531  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

    Authors: Dongyang Dai, Li Chen, Yu** Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang

    Abstract: With the popularity of deep neural network, speech synthesis task has achieved significant improvements based on the end-to-end encoder-decoder framework in the recent days. More and more applications relying on speech synthesis technology have been widely used in our daily life. Robust speech synthesis model depends on high quality and customized data which needs lots of collecting efforts. It is… ▽ More

    Submitted 22 October, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  12. arXiv:2004.01643  [pdf, other

    cs.CV cs.LG eess.IV

    Quantifying Data Augmentation for LiDAR based 3D Object Detection

    Authors: Martin Hahner, Dengxin Dai, Alexander Liniger, Luc Van Gool

    Abstract: In this work, we shed light on different data augmentation techniques commonly used in Light Detection and Ranging (LiDAR) based 3D Object Detection. For the bulk of our experiments, we utilize the well known PointPillars pipeline and the well established KITTI dataset. We investigate a variety of global and local augmentation techniques, where global augmentation techniques are applied to the ent… ▽ More

    Submitted 29 July, 2022; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: 2022 Update

  13. arXiv:2003.04210  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

    Authors: Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool

    Abstract: Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: Project page: https://www.trace.ethz.ch/publications/2020/sound_perception/index.html

  14. arXiv:2003.00636  [pdf, other

    cs.CV cs.LG eess.IV

    Matching Neuromorphic Events and Color Images via Adversarial Learning

    Authors: Fang Xu, Shijie Lin, Wen Yang, Lei Yu, Dengxin Dai, Gui-song Xia

    Abstract: The event camera has appealing properties: high dynamic range, low latency, low power consumption and low memory usage, and thus provides complementariness to conventional frame-based cameras. It only captures the dynamics of a scene and is able to capture almost "continuous" motion. However, different from frame-based camera that reflects the whole appearance as scenes are, the event camera casts… ▽ More

    Submitted 1 March, 2020; originally announced March 2020.

  15. arXiv:2001.02613  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    Don't Forget The Past: Recurrent Depth Estimation from Monocular Video

    Authors: Vaishakh Patil, Wouter Van Gansbeke, Dengxin Dai, Luc Van Gool

    Abstract: Autonomous cars need continuously updated depth information. Thus far, depth is mostly estimated independently for a single frame at a time, even if the method starts from video input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches. In particular, we put three different types of depth estimation (supervised depth prediction, self-s… ▽ More

    Submitted 28 July, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: Please refer to our webpage for details https://www.trace.ethz.ch/publications/2020/rec_depth_estimation/

  16. arXiv:1907.05738  [pdf, other

    cs.CV cs.RO eess.SY

    Learning a Curve Guardian for Motorcycles

    Authors: Simon Hecker, Alexander Liniger, Henrik Maurenbrecher, Dengxin Dai, Luc Van Gool

    Abstract: Up to 17% of all motorcycle accidents occur when the rider is maneuvering through a curve and the main cause of curve accidents can be attributed to inappropriate speed and wrong intra-lane position of the motorcycle. Existing curve warning systems lack crucial state estimation components and do not scale well. We propose a new type of road curvature warning system for motorcycles, combining the l… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: 8 pages, to be presented at IEEE-ITSC 2019

  17. arXiv:1807.08312  [pdf, ps, other

    eess.AS cs.AI cs.LG cs.SD

    Unified Hypersphere Embedding for Speaker Recognition

    Authors: Mahdi Hajibabaei, Dengxin Dai

    Abstract: Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition… ▽ More

    Submitted 22 July, 2018; originally announced July 2018.