Skip to main content

Showing 1–40 of 40 results for author: Huang, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19749  [pdf, other

    eess.IV cs.CV

    SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

    Abstract: Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.06375  [pdf, other

    cs.SD cs.AI eess.AS

    MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

    Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-** Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

    Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

  3. arXiv:2406.05170  [pdf

    q-bio.OT cs.CV eess.IV

    Research on Tumors Segmentation based on Image Enhancement Method

    Authors: Danyi Huang, Ziang Liu, Yizhou Li

    Abstract: One of the most effective ways to treat liver cancer is to perform precise liver resection surgery, the key step of which includes precise digital image segmentation of the liver and its tumor. However, traditional liver parenchymal segmentation techniques often face several challenges in performing liver segmentation: lack of precision, slow processing speed, and computational burden. These short… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  4. arXiv:2406.03663  [pdf

    eess.IV cs.LG q-bio.QM

    A Hybrid Deep Learning Classification of Perimetric Glaucoma Using Peripapillary Nerve Fiber Layer Reflectance and Other OCT Parameters from Three Anatomy Regions

    Authors: Ou Tan, David S. Greenfield, Brian A. Francis, Rohit Varma, Joel S. Schuman, David Huang, Dongseok Choi

    Abstract: Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 12 pages

  5. arXiv:2403.14523  [pdf, other

    eess.IV cs.CV

    Invisible Needle Detection in Ultrasound: Leveraging Mechanism-Induced Vibration

    Authors: Chenyang Li, Dianye Huang, Angelos Karlas, Nassir Navab, Zhongliang Jiang

    Abstract: In clinical applications that involve ultrasound-guided intervention, the visibility of the needle can be severely impeded due to steep insertion and strong distractors such as speckle noise and anatomical occlusion. To address this challenge, we propose VibNet, a learning-based framework tailored to enhance the robustness and accuracy of needle detection in ultrasound images, even when the target… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  6. arXiv:2401.11856  [pdf, other

    eess.IV cs.CV

    MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Xiu-Ling Liu, Zeng-Guang Hou

    Abstract: Medical image segmentation takes an important position in various clinical applications. Deep learning has emerged as the predominant solution for automated segmentation of volumetric medical images. 2.5D-based segmentation models bridge computational efficiency of 2D-based models and spatial perception capabilities of 3D-based models. However, prevailing 2.5D-based models often treat each slice e… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Under Review

  7. arXiv:2312.07290  [pdf, other

    cs.RO eess.SY

    Underwater motions analysis and control of a coupling-tiltable unmanned aerial-aquatic quadrotor

    Authors: Dongyue Huang, Chenggang Wang, Minghao Dou, Xuchen Liu, Zixuan Liu, Biao Wang, Ben M. Chen

    Abstract: This paper proposes a method for analyzing a series of potential motions in a coupling-tiltable aerial-aquatic quadrotor based on its nonlinear dynamics. Some characteristics and constraints derived by this method are specified as Singular Thrust Tilt Angles (STTAs), utilizing to generate motions including planar motions. A switch-based control scheme addresses issues of control direction uncertai… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Unmanned Aerial-Aquatic Vehicle

  8. arXiv:2312.03231  [pdf, other

    cs.LG cs.AI cs.CV cs.HC eess.AS

    Deep Multimodal Fusion for Surgical Feedback Classification

    Authors: Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

    Abstract: Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In th… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Journal ref: Published in Proceedings of Machine Learning for Health 2024

  9. arXiv:2311.05929  [pdf, other

    cs.CV eess.IV

    Efficient Segmentation with Texture in Ore Images Based on Box-supervised Approach

    Authors: Guodong Sun, Delong Huang, Yuting Peng, Le Cheng, Bo Wu, Yang Zhang

    Abstract: Image segmentation methods have been utilized to determine the particle size distribution of crushed ores. Due to the complex working environment, high-powered computing equipment is difficult to deploy. At the same time, the ore distribution is stacked, and it is difficult to identify the complete features. To address this issue, an effective box-supervised technique with texture features is prov… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 14 pages, 8 figures

  10. arXiv:2308.02782  [pdf

    eess.IV physics.optics

    Non-line-of-sight reconstruction via structure sparsity regularization

    Authors: Duolan Huang, Quan Chen, Zhun Wei, Rui Chen

    Abstract: Non-line-of-sight (NLOS) imaging allows for the imaging of objects around a corner, which enables potential applications in various fields such as autonomous driving, robotic vision, medical imaging, security monitoring, etc. However, the quality of reconstruction is challenged by low signal-noise-ratio (SNR) measurements. In this study, we present a regularization method, referred to as structure… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 8 pages, 5 figures

  11. arXiv:2307.05884  [pdf, other

    eess.SY cs.RO

    Learning Koopman Operators with Control Using Bi-level Optimization

    Authors: Daning Huang, Muhammad Bayu Prasetyo, Yin Yu, Junyi Geng

    Abstract: The accurate modeling and control of nonlinear dynamical effects are crucial for numerous robotic systems. The Koopman formalism emerges as a valuable tool for linear control design in nonlinear systems within unknown environments. However, it still remains a challenging task to learn the Koopman operator with control from data, and in particular, the simultaneous identification of the Koopman lin… ▽ More

    Submitted 5 November, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted by 2023 IEEE 62nd Conference on Decision and Control (CDC)

  12. arXiv:2307.04101  [pdf, other

    cs.CV eess.IV

    Enhancing Building Semantic Segmentation Accuracy with Super Resolution and Deep Learning: Investigating the Impact of Spatial Resolution on Various Datasets

    Authors: Zhiling Guo, Xiaodan Shi, Haoran Zhang, Dou Huang, Xiaoya Song, **yue Yan, Ryosuke Shibasaki

    Abstract: The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  13. arXiv:2307.03698  [pdf, other

    eess.IV cs.CV cs.RO

    Motion Magnification in Robotic Sonography: Enabling Pulsation-Aware Artery Segmentation

    Authors: Dianye Huang, Yuan Bi, Nassir Navab, Zhongliang Jiang

    Abstract: Ultrasound (US) imaging is widely used for diagnosing and monitoring arterial diseases, mainly due to the advantages of being non-invasive, radiation-free, and real-time. In order to provide additional information to assist clinicians in diagnosis, the tubular structures are often segmented from US images. To improve the artery segmentation accuracy and stability during scans, this work presents a… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted Paper IROS 2023

  14. Deep Learning Methods for Device Identification Using Symbols Trace Plot

    Authors: Da Huang, Akram Al-Hourani, Kandeepan Sithamparanathan, Wayne S. T. Rowe

    Abstract: Devices authentication is one crucial aspect of any communication system. Recently, the physical layer approach radio frequency (RF) fingerprinting has gained increased interest as it provides an extra layer of security without requiring additional components. In this work, we propose an RF fingerprinting based transmitter authentication approach density trace plot (DTP) to exploit device-identifi… ▽ More

    Submitted 11 February, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

  15. arXiv:2305.18234  [pdf, other

    eess.SP cs.AI cs.LG

    Temporal Aware Mixed Attention-based Convolution and Transformer Network (MACTN) for EEG Emotion Recognition

    Authors: Xiaopeng Si, Dong Huang, Yulin Sun, Dong Ming

    Abstract: Emotion recognition plays a crucial role in human-computer interaction, and electroencephalography (EEG) is advantageous for reflecting human emotional states. In this study, we propose MACTN, a hierarchical hybrid model for jointly modeling local and global temporal information. The model is inspired by neuroscience research on the temporal dynamics of emotions. MACTN extracts local emotional fea… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  16. arXiv:2305.14781  [pdf, other

    math.OC eess.SY

    Accelerated Nonconvex ADMM with Self-Adaptive Penalty for Rank-Constrained Model Identification

    Authors: Qingyuan Liu, Zhengchao Huang, Hao Ye, Dexian Huang, Chao Shang

    Abstract: The alternating direction method of multipliers (ADMM) has been widely adopted in low-rank approximation and low-order model identification tasks; however, the performance of nonconvex ADMM is highly reliant on the choice of penalty parameter. To accelerate ADMM for solving rank-constrained identification problems, this paper proposes a new self-adaptive strategy for automatic penalty update. Guid… ▽ More

    Submitted 8 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 7 pages, 5 figures. Accepted by 62nd IEEE Conference on Decision and Control (CDC 2023)

  17. arXiv:2305.08408  [pdf, other

    cs.CV eess.IV

    SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement

    Authors: Ding-Jiun Huang, Yu-Ting Kao, Tieh-Hung Chuang, Ya-Chun Tsai, **g-Kai Lou, Shuen-Huei Guan

    Abstract: In recent years, several video quality assessment (VQA) methods have been developed, achieving high performance. However, these methods were not specifically trained for enhanced videos, which limits their ability to predict video quality accurately based on human subjective perception. To address this issue, we propose a stack-based framework for VQA that outperforms existing state-of-the-art met… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: CVPR NTIRE 2023

  18. arXiv:2301.12344  [pdf, other

    cs.RO eess.SY

    TJ-FlyingFish: Design and Implementation of an Aerial-Aquatic Quadrotor with Tiltable Propulsion Units

    Authors: Xuchen Liu, Minghao Dou, Dongyue Huang, Biao Wang, **qiang Cui, Qinyuan Ren, Lihua Dou, Zhi Gao, Jie Chen, Ben M. Chen

    Abstract: Aerial-aquatic vehicles are capable to move in the two most dominant fluids, making them more promising for a wide range of applications. We propose a prototype with special designs for propulsion and thruster configuration to cope with the vast differences in the fluid properties of water and air. For propulsion, the operating range is switched for the different mediums by the dual-speed propulsi… ▽ More

    Submitted 6 February, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: 6 pages, 9 figures, accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA)

  19. arXiv:2212.06299  [pdf

    eess.IV cs.CV cs.LG

    Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map

    Authors: Pengxiao Zang, Tristan T. Hormel, Jie Wang, Yukun Guo, Steven T. Bailey, Christina J. Flaxel, David Huang, Thomas S. Hwang, Yali Jia

    Abstract: Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to in… ▽ More

    Submitted 26 June, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted by IEEE TBME

    ACM Class: I.2.0; I.4.0; J.3

  20. arXiv:2211.12421  [pdf, other

    q-bio.NC cs.LG eess.IV

    Data-Driven Network Neuroscience: On Data Collection and Benchmark

    Authors: Jiaxing Xu, Yunhan Yang, David Tse Jung Huang, Sophi Shilpa Gururajapathy, Yi** Ke, Miao Qiao, Alan Wang, Haribalan Kumar, Josh McGeown, Eryn Kwon

    Abstract: This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics. Anatomical and functional MRI images have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such… ▽ More

    Submitted 29 October, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Journal ref: Advances in Neural Information Processing Systems, 2023

  21. arXiv:2210.04990  [pdf, ps, other

    eess.SP

    Racial Disparities in Pulse Oximetry Cannot Be Fixed With Race-Based Correction

    Authors: Neal Patwari, Di Huang, Kiki Bonetta-Misteli

    Abstract: Studies have shown pulse oximeter measurements of blood oxygenation have statistical bias that is a function of race, which results in higher rates of occult hypoxemia, i.e., missed detection of dangerously low oxygenation, in patients of color. This paper further characterizes the statistical distribution of pulse ox measurements, showing they also have a higher variance for patients racialized a… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 6 pages, originally submitted to IEEE SPMB 2022 on 1 July 2022

  22. arXiv:2207.06424  [pdf, other

    cs.CE eess.SY

    Optimal control of dielectric elastomer actuated multibody dynamical systems

    Authors: Dengpeng Huang, Sigrid Leyendecker

    Abstract: In this work, a simulation model for the optimal control of dielectric elastomer actuated flexible multibody dynamics systems is presented. The Dielectric Elastomer Actuator (DEA) behaves like a flexible artificial muscles in soft robotics. It is modeled as an electromechanically coupled geometrically exact beam, where the electric charges serve as control variables. The DEA-beam is integrated as… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 22 pages, 11 figures

  23. arXiv:2206.02596  [pdf, other

    eess.SP

    A Robust Deep Learning Enabled Semantic Communication System for Text

    Authors: Xiang Peng, Zhi** Qin, Danlan Huang, Xiaoming Tao, Jianhua Lu, Guangyi Liu, Chengkang Pan

    Abstract: With the advent of the 6G era, the concept of semantic communication has attracted increasing attention. Compared with conventional communication systems, semantic communication systems are not only affected by physical noise existing in the wireless communication environment, e.g., additional white Gaussian noise, but also by semantic noise due to the source and the nature of deep learning-based… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 6 pages

  24. arXiv:2205.03214  [pdf, other

    eess.SY

    Modularized Bilinear Koopman Operator for Modeling and Predicting Transients of Microgrids

    Authors: Xinyuan Jiang, Yan Li, Daning Huang

    Abstract: Modularized Koopman Bilinear Form (M-KBF) is presented to model and predict the transient dynamics of microgrids in the presence of disturbances. As a scalable data-driven approach, M-KBF divides the identification and prediction of the high-dimensional nonlinear system into the individual study of subsystems; and thus, alleviating the difficulty of intensively handling high volume data and overco… ▽ More

    Submitted 17 May, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

  25. arXiv:2204.08557  [pdf, other

    eess.SY

    PIDGeuN: Graph Neural Network-Enabled Transient Dynamics Prediction of Networked Microgrids Through Full-Field Measurement

    Authors: Yin Yu, Xinyuan Jiang, Daning Huang, Yan Li

    Abstract: A Physics-Informed Dynamic Graph Neural Network (PIDGeuN) is presented to accurately, efficiently and robustly predict the nonlinear transient dynamics of microgrids in the presence of disturbances. The graph-based architecture of PIDGeuN provides a natural representation of the microgrid topology. Using only the state information that is practically measurable, PIDGeuN employs a time delay embedd… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: This paper is currently under review for a journal

  26. arXiv:2112.06657  [pdf, other

    eess.SP cs.LG

    You Can Wash Better: Daily Handwashing Assessment with Smartwatches

    Authors: Fei Wang, Xilei Wu, Xin Wang, Jianlei Chi, **gang Shi, Dong Huang

    Abstract: We propose UWash, an intelligent solution upon smartwatches, to assess handwashing for the purpose of raising users' awareness and cultivating habits in high-quality handwashing. UWash can identify the onset/offset of handwashing, measure the duration of each gesture, and score each gesture as well as the entire procedure in accordance with the WHO guidelines. Technically, we address the task of h… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: 7 pages, 9 figures, 5 tables

  27. arXiv:2110.06465  [pdf, other

    eess.IV cs.CV

    Breaking the Dilemma of Medical Image-to-image Translation

    Authors: Lingke Kong, Chenyu Lian, Detian Huang, Zhenjiang Li, Yanle Hu, Qichao Zhou

    Abstract: Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field of medical image-to-image translation. However, neither modes are ideal. The Pix2Pix mode has excellent performance. But it requires paired and well pixel-wise aligned images, which may not always be achievable due to respiratory motion or anatomy change between times that paired images are acquired. The Cy… ▽ More

    Submitted 10 November, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

  28. arXiv:2110.04444  [pdf

    eess.SP

    Sensoring and Application of Multimodal Data for the Detection of Freezing of Gait in Parkinson's Disease

    Authors: Wei Zhang, Debin Huang, Hantao Li, Lipeng Wang, Yanzhao Wei, Kang Pan, Lin Ma, Huanhuan Feng, **g Pan, Yuzhu Guo

    Abstract: The accurate and reliable detection or prediction of freezing of gaits (FOG) is important for fall prevention in Parkinson's Disease (PD) and studying the physiological transitions during the occurrence of FOG. Integrating both commercial and self-designed sensors, a protocal has been designed to acquire multimodal physical and physiological information during FOG, including gait acceleration (ACC… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: This paper has 13 pages and 8 figures. The data was published on Mendeley Data, where raw data availible at https://data.mendeley.com/datasets/t8j8v4hnm4/1 and filtered data availible at https://data.mendeley.com/datasets/r8gmbtv7w2/3

  29. arXiv:2109.08007  [pdf, other

    cs.MM cs.SD eess.AS

    Graph Fourier Transform based Audio Zero-watermarking

    Authors: Longting Xu, Daiyu Huang, Syed Faham Ali Zaidi, Abdul Rauf, Rohan Kumar Das

    Abstract: The frequent exchange of multimedia information in the present era projects an increasing demand for copyright protection. In this work, we propose a novel audio zero-watermarking technology based on graph Fourier transform for enhancing the robustness with respect to copyright protection. In this approach, the combined shift operator is used to construct the graph signal, upon which the graph Fou… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  30. arXiv:2104.14975  [pdf

    eess.SY

    Intelligent Decision Method for Main Control Parameters of Tunnel Boring Machine based on Multi-Objective Optimization of Excavation Efficiency and Cost

    Authors: Bin Liu, Yaxu Wang, Guangzu Zhao, Bin Yang, Ruirui Wang, Dexiang Huang, Bin Xiang

    Abstract: Timely and reasonable matching of the control parameters and geological conditions of the rock mass in tunnel excavation is crucial for hard rock tunnel boring machines (TBMs). Therefore, this paper proposes an intelligent decision method for the main control parameters of the TBM based on the multi-objective optimization of excavation efficiency and cost. The main objectives of this method are to… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: 29 pages, 10 figures

  31. DC-Assisted Stabilization of Internal Oscillations for Improved Symbol Transitions in a Direct Antenna Modulation Transmitter

    Authors: Danyang Huang, Kurt Schab, Joseph Dusenbury, Brandon Sluss, Jacob Adams

    Abstract: Internal oscillations in switched antenna transmitters cause undesirable fluctuations of the stored energy in the system, reducing the effectiveness of time-varying broadbanding methods, such as energy-synchronous direct antenna modulation. To mitigate these parasitic oscillations, a modified direct antenna modulation system with an auxiliary DC source is introduced to stabilize energy storage on… ▽ More

    Submitted 20 August, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

  32. arXiv:2011.02198  [pdf, other

    cs.SD eess.AS

    IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

    Authors: Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

    Abstract: The IEEE Spoken Language Technology Workshop (SLT) 2021 Alpha-mini Speech Challenge (ASC) is intended to improve research on keyword spotting (KWS) and sound source location (SSL) on humanoid robots. Many publications report significant improvements in deep learning based KWS and SSL on open source datasets in recent years. For deep learning model training, it is necessary to expand the data cover… ▽ More

    Submitted 14 November, 2020; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted at IEEE SLT 2021

  33. arXiv:2008.00820  [pdf, other

    cs.CV cs.SD eess.AS

    Generating Visually Aligned Sound from Videos

    Authors: Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan

    Abstract: We focus on the task of generating sound from natural videos, and the sound should be both temporally and content-wise aligned with visual signals. This task is extremely challenging because some sounds generated \emph{outside} a camera can not be inferred from video content. The model may be forced to learn an incorrect map** between visual content and these irrelevant sounds. To address this c… ▽ More

    Submitted 14 July, 2020; originally announced August 2020.

    Comments: Published in IEEE Transactions on Image Processing, 2020. Code, pre-trained models and demo video: https://github.com/PeihaoChen/regnet

  34. arXiv:2007.10984  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Foley Music: Learning to Generate Music from Videos

    Authors: Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

    Abstract: In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation probl… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: ECCV 2020. Project page: http://foley-music.csail.mit.edu

  35. arXiv:2006.13522  [pdf

    eess.IV q-bio.QM

    Focal Loss Analysis of Nerve Fiber Layer Reflectance for Glaucoma Diagnosis

    Authors: Ou Tan, Liang Liu, Qisheng You, Jie Wang, Aiyin Chen, Eliesa Ing, John C. Morrison, Yali Jia, David Huang

    Abstract: Purpose: To evaluate nerve fiber layer (NFL) reflectance for glaucoma diagnosis. Methods: Participants were imaged with 4.5X4.5-mm volumetric disc scans using spectral-domain optical coherence tomography (OCT). The normalized NFL reflectance map was processed by an azimuthal filter to reduce directional reflectance bias due to variation of beam incidence angle. The peripapillary area of the map wa… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: pages: 31; Tables: 6; Figures: 9

  36. arXiv:2004.09476  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Music Gesture for Visual Sound Separation

    Authors: Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba

    Abstract: Recent deep learning approaches have achieved impressive performance on visual sound separation tasks. However, these approaches are mostly built on appearance and optical flow like motion feature representations, which exhibit limited abilities to find the correlations between audio signals and visual points, especially when separating multiple instruments of the same types, such as multiple viol… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Comments: CVPR 2020. Project page: http://music-gesture.csail.mit.edu

  37. arXiv:2002.00552  [pdf, other

    cs.LG cs.CV eess.IV

    DWM: A Decomposable Winograd Method for Convolution Acceleration

    Authors: Di Huang, Xishan Zhang, Rui Zhang, Tian Zhi, Deyuan He, Jiaming Guo, Chang Liu, Qi Guo, Zidong Du, Shaoli Liu, Tianshi Chen, Yunji Chen

    Abstract: Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with str… ▽ More

    Submitted 2 February, 2020; originally announced February 2020.

    Comments: Accepted by AAAI 2020

  38. arXiv:1912.01167  [pdf, other

    eess.AS cs.SD

    High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram

    Authors: Leyuan Sheng, Dong-Yan Huang, Evgeniy N. Pavlovskiy

    Abstract: In speech synthesis and speech enhancement systems, melspectrograms need to be precise in acoustic representations. However, the generated spectrograms are over-smooth, that could not produce high quality synthesized speech. Inspired by image-to-image translation, we address this problem by using a learning-based post filter combining Pix2PixHD and ResUnet to reconstruct the mel-spectrograms toget… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

  39. arXiv:1906.08673  [pdf

    eess.IV cs.MM

    Enhancement of Underwater Images with Statistical Model of Background Light and Optimization of Transmission Map

    Authors: Wei Song, Yan Wang, Dongmei Huang, Antonio Liotta, Cristian Perra

    Abstract: Underwater images often have severe quality degradation and distortion due to light absorption and scattering in the water medium. A hazed image formation model is widely used to restore the image quality. It depends on two optical parameters: the background light and the transmission map. Underwater images can also be enhanced by color and contrast correction from the perspective of image process… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 17 pages

  40. arXiv:1904.11953  [pdf, other

    eess.SP cs.CV cs.HC

    Temporal Unet: Sample Level Human Action Recognition using WiFi

    Authors: Fei Wang, Yunpeng Song, Jimuyang Zhang, **song Han, Dong Huang

    Abstract: Human doing actions will result in WiFi distortion, which is widely explored for action recognition, such as the elderly fallen detection, hand sign language recognition, and keystroke estimation. As our best survey, past work recognizes human action by categorizing one complete distortion series into one action, which we term as series-level action recognition. In this paper, we introduce a much… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.

    Comments: 14 pages, 14 figures, 1 table