Skip to main content

Showing 1–21 of 21 results for author: Deng, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02554  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.MM

    Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

    Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

    Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  2. arXiv:2310.11713  [pdf, other

    cs.CV cs.SD eess.AS

    Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

    Authors: Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu

    Abstract: The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view. Current methods struggle with such sounds lacking visible cues. This paper introduces a novel "Audio-Visual Scene-Aware Separation" (AVSA-Sep) framework. It includes a semantic parser for visible and invisible sounds and a separator for scene-informed separation.… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted at ICCV 2023 - AV4D, 4 figures, 3 tables

  3. arXiv:2310.02141  [pdf, other

    cs.RO eess.SY

    Adaptive Gait Modeling and Optimization for Principally Kinematic Systems

    Authors: Siming Deng, Noah J. Cowan, Brian A. Bittner

    Abstract: Robotic adaptation to unanticipated operating conditions is crucial to achieving persistence and robustness in complex real world settings. For a wide range of cutting-edge robotic systems, such as micro- and nano-scale robots, soft robots, medical robots, and bio-hybrid robots, it is infeasible to anticipate the operating environment a priori due to complexities that arise from numerous factors i… ▽ More

    Submitted 18 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 7 pages, 4 figures

  4. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  5. arXiv:2307.02836  [pdf, other

    cs.CV eess.IV

    Noise-to-Norm Reconstruction for Industrial Anomaly Detection and Localization

    Authors: Shiqi Deng, Zhiyu Sun, Ruiyan Zhuang, Jun Gong

    Abstract: Anomaly detection has a wide range of applications and is especially important in industrial quality inspection. Currently, many top-performing anomaly-detection models rely on feature-embedding methods. However, these methods do not perform well on datasets with large variations in object locations. Reconstruction-based methods use reconstruction errors to detect anomalies without considering pos… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  6. arXiv:2307.01062  [pdf, other

    cs.RO eess.SY

    A Data-Driven Approach to Geometric Modeling of Systems with Low-Bandwidth Actuator Dynamics

    Authors: Siming Deng, Junning Liu, Bibekananda Datta, Aishwarya Pantula, David H. Gracias, Thao D. Nguyen, Brian A. Bittner, Noah J. Cowan

    Abstract: It is challenging to perform system identification on soft robots due to their underactuated, high-dimensional dynamics. In this work, we present a data-driven modeling framework, based on geometric mechanics (also known as gauge theory) that can be applied to systems with low-bandwidth control of the system's internal configuration. This method constructs a series of connected models comprising a… ▽ More

    Submitted 3 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: 9 pages, 6 figures

  7. arXiv:2212.13913  [pdf

    eess.SP

    Highly-Accurate Electricity Load Estimation via Knowledge Aggregation

    Authors: Yuting Ding, Di Wu, Yi He, Xin Luo, Song Deng

    Abstract: Mid-term and long-term electric energy demand prediction is essential for the planning and operations of the smart grid system. Mainly in countries where the power system operates in a deregulated environment. Traditional forecasting models fail to incorporate external knowledge while modern data-driven ignore the interpretation of the model, and the load series can be influenced by many complex f… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  8. arXiv:2211.16806  [pdf, other

    eess.IV cs.CV cs.LG

    Toward Robust Diagnosis: A Contour Attention Preserving Adversarial Defense for COVID-19 Detection

    Authors: Kun Xiang, Xing Zhang, **wen She, **peng Liu, Haohan Wang, Shiqi Deng, Shancheng Jiang

    Abstract: As the COVID-19 pandemic puts pressure on healthcare systems worldwide, the computed tomography image based AI diagnostic system has become a sustainable solution for early diagnosis. However, the model-wise vulnerability under adversarial perturbation hinders its deployment in practical situation. The existing adversarial training strategies are difficult to generalized into medical imaging field… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI 2023

  9. arXiv:2210.06091  [pdf

    cs.CL cs.SD eess.AS

    Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

    Authors: Shuhao Deng, Chengfei Li, **feng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

    Abstract: Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life. The ISCSLP 2022 Chinese-English Code-Switching Automatic Speech Recognition (CSASR) Challenge aims to promote the de… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: accepted by ISCSLP 2022

  10. arXiv:2209.11971  [pdf, other

    cs.ET eess.SP

    A Homogeneous Processing Fabric for Matrix-Vector Multiplication and Associative Search Using Ferroelectric Time-Domain Compute-in-Memory

    Authors: Xunzhao Yin, Qingrong Huang, Franz Müller, Shan Deng, Alptekin Vardar, Sourav De, Zhouhang Jiang, Mohsen Imani, Cheng Zhuo, Thomas Kämpfe, Kai Ni

    Abstract: In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain wi… ▽ More

    Submitted 24 September, 2022; originally announced September 2022.

    Comments: 8 pages, 8 figures

  11. arXiv:2206.14746  [pdf, other

    eess.IV cs.CV

    Placenta Segmentation in Ultrasound Imaging: Addressing Sources of Uncertainty and Limited Field-of-View

    Authors: Veronika A. Zimmer, Alberto Gomez, Emily Skelton, Robert Wright, Gavin Wheeler, Shujie Deng, Nooshin Ghavami, Karen Lloyd, Jacqueline Matthew, Bernhard Kainz, Daniel Rueckert, Joseph V. Hajnal, Julia A. Schnabel

    Abstract: Automatic segmentation of the placenta in fetal ultrasound (US) is challenging due to the (i) high diversity of placenta appearance, (ii) the restricted quality in US resulting in highly variable reference annotations, and (iii) the limited field-of-view of US prohibiting whole placenta assessment at late gestation. In this work, we address these three challenges with a multi-task learning approac… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 21 pages (18 + appendix), 13 figures (9 + appendix)

  12. arXiv:2206.13135  [pdf

    cs.CL cs.SD eess.AS

    TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

    Authors: Chengfei Li, Shuhao Deng, Yao** Wang, Guang**g Wang, Yaguang Gong, Changbin Chen, **feng Bai

    Abstract: This paper introduces a new corpus of Mandarin-English code-switching speech recognition--TALCS corpus, suitable for training and evaluating code-switching speech recognition systems. TALCS corpus is derived from real online one-to-one English teaching scenes in TAL education group, which contains roughly 587 hours of speech sampled at 16 kHz. To our best knowledge, TALCS corpus is the largest wel… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: accepted by INTERSPEECH 2022

  13. arXiv:2204.08013  [pdf, other

    physics.med-ph eess.IV

    One-step Method for Material Quantitation using In-line Tomography with Single Scanning

    Authors: Suyu Liao, Shiwo Deng, Yining Zhu, Huitao Zhang, Pei** Zhu, Kai Zhang, Xing Zhao

    Abstract: Objective: Quantitative technique based on In-line phase-contrast computed tomography with single scanning attracts more attention in application due to the flexibility of the implementation. However, the quantitative results usually suffer from artifacts and noise, since the phase retrieval and reconstruction are independent ("two-steps") without feedback from the original data. Our goal is to de… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

    Journal ref: IEEE Transactions on Biomedical Engineering, 2022

  14. arXiv:2203.07948  [pdf, other

    cs.ET eess.SP

    An Ultra-Compact Single FeFET Binary and Multi-Bit Associative Search Engine

    Authors: Xunzhao Yin, Franz Müller, Qingrong Huang, Chao Li, Mohsen Imani, Zeyu Yang, Jiahao Cai, Maximilian Lederer, Ricardo Olivo, Nellie Laleni, Shan Deng, Zijian Zhao, Cheng Zhuo, Thomas Kämpfe, Kai Ni

    Abstract: Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables paral… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: 20 pages, 14 figures

  15. arXiv:2110.02495  [pdf, other

    cs.ET eess.SP

    Deep Random Forest with Ferroelectric Analog Content Addressable Memory

    Authors: Xunzhao Yin, Franz Müller, Ann Franchesca Laguna, Chao Li, Wenwen Ye, Qingrong Huang, Qinming Zhang, Zhiguo Shi, Maximilian Lederer, Nellie Laleni, Shan Deng, Zijian Zhao, Michael Niemier, Xiaobo Sharon Hu, Cheng Zhuo, Thomas Kämpfe, Kai Ni

    Abstract: Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behin… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: 44 pages, 16 figures

  16. arXiv:2107.06172  [pdf, other

    physics.chem-ph eess.SP

    Arrhenius.jl: A Differentiable Combustion SimulationPackage

    Authors: Weiqi Ji, Xingyu Su, Bin Pang, Sean Joseph Cassady, Alison M. Ferris, Yujuan Li, Zhuyin Ren, Ronald Hanson, Sili Deng

    Abstract: Combustion kinetic modeling is an integral part of combustion simulation, and extensive studies have been devoted to develo** both high fidelity and computationally affordable models. Despite these efforts, modeling combustion kinetics is still challenging due to the demand for expert knowledge and optimization against experiments, as well as the lack of understanding of the associated uncertain… ▽ More

    Submitted 19 June, 2021; originally announced July 2021.

  17. arXiv:2010.12715  [pdf, other

    eess.AS

    Improving Noise Robustness of an End-to-End Neural Model for Automatic Speech Recognition

    Authors: Jagadeesh Balam, Jocelyn Huang, Vitaly Lavrukhin, Slyne Deng, Somshubra Majumdar, Boris Ginsburg

    Abstract: We present our experiments in training robust to noise an end-to-end automatic speech recognition (ASR) model using intensive data augmentation. We explore the efficacy of fine-tuning a pre-trained model to improve noise robustness, and we find it to be a very efficient way to train for various noisy conditions, especially when the conditions in which the model will be used, are unknown. Starting… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  18. arXiv:2008.06091  [pdf, other

    eess.IV

    A Technical Overview of AV1

    Authors: **gning Han, Bohan Li, Debargha Mukherjee, Ching-Han Chiang, Adrian Grange, Cheng Chen, Hui Su, Sarah Parker, Sai Deng, Urvang Joshi, Yue Chen, Yunqing Wang, Paul Wilkins, Yaowu Xu, James Bankoski

    Abstract: The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than 30% reduction in bit-rate compared to its predecessor VP9 for the same decoded video quality. This paper provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.

    Submitted 8 February, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

  19. arXiv:1908.10267  [pdf, other

    eess.IV cs.CV

    DRD-Net: Detail-recovery Image Deraining via Context Aggregation Networks

    Authors: Sen Deng, Mingqiang Wei, Jun Wang, Luming Liang, Haoran Xie, Meng Wang

    Abstract: Image deraining is a fundamental, yet not well-solved problem in computer vision and graphics. The traditional image deraining approaches commonly behave ineffectively in medium and heavy rain removal, while the learning-based ones lead to image degradations such as the loss of image details, halo artifacts and/or color distortion. Unlike existing image deraining approaches that lack the detail-re… ▽ More

    Submitted 28 August, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

  20. arXiv:1904.05243  [pdf

    cs.SD eess.AS eess.SP

    A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification

    Authors: Hongwei Song, Jiqing Han, Shiwen Deng

    Abstract: One of the biggest challenges of acoustic scene classification (ASC) is to find proper features to better represent and characterize environmental sounds. Environmental sounds generally involve more sound sources while exhibiting less structure in temporal spectral representations. However, the background of an acoustic scene exhibits temporal homogeneity in acoustic properties, suggesting it coul… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: Accepted as a conference paper of Interspeech 2018

    Journal ref: in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2018-September, 2018, pp. 3294-3298

  21. Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events

    Authors: Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du

    Abstract: In this paper, we propose a new strategy for acoustic scene classification (ASC) , namely recognizing acoustic scenes through identifying distinct sound events. This differs from existing strategies, which focus on characterizing global acoustical distributions of audio or the temporal evolution of short-term audio features, without analysis down to the level of sound events. To identify distinct… ▽ More

    Submitted 26 April, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

    Comments: code URL typo, code is available at https://github.com/hackerekcah/distinct-events-asc.git

    Journal ref: Proc. Interspeech 2019, 3860-3864