Skip to main content

Showing 1–19 of 19 results for author: Mo, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04930  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

    Authors: Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu

    Abstract: Recent advances in pre-trained vision transformers have shown promise in parameter-efficient audio-visual learning without audio pre-training. However, few studies have investigated effective methods for aligning multimodal features in parameter-efficient audio-visual transformers. In this paper, we propose MA-AVT, a new parameter-efficient audio-visual transformer employing deep modality alignmen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted in Efficient Deep Learning for Computer Vision CVPR Workshop 2024

  2. arXiv:2405.17995  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture

    Authors: Shentong Mo, Sukmin Yun

    Abstract: The joint-embedding predictive architecture (JEPA) recently has shown impressive results in extracting visual representations from unlabeled imagery under a masking strategy. However, we reveal its disadvantages, notably its insufficient understanding of local semantics. This deficiency originates from masked modeling in the embedding space, resulting in a reduction of discriminative power and can… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.07202  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Unified Video-Language Pre-training with Synchronized Audio

    Authors: Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang

    Abstract: Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two m… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  4. arXiv:2403.07938  [pdf, other

    cs.SD cs.AI cs.CV cs.LG cs.MM eess.AS

    Text-to-Audio Generation Synchronized with Videos

    Authors: Shentong Mo, **g Shi, Yapeng Tian

    Abstract: In recent times, the focus on text-to-audio (TTA) generation has intensified, as researchers strive to synthesize audio from textual descriptions. However, most existing methods, though leveraging latent diffusion models to learn the correlation between audio and text embeddings, fall short when it comes to maintaining a seamless synchronization between the produced audio and its video. This often… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.12903

  5. arXiv:2311.15080  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Weakly-Supervised Audio-Visual Segmentation

    Authors: Shentong Mo, Bhiksha Raj

    Abstract: Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotat… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  6. arXiv:2305.19458  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition

    Authors: Shentong Mo, Pedro Morgado

    Abstract: The ability to accurately recognize, localize and separate sound sources is fundamental to any audio-visual perception task. Historically, these abilities were tackled separately, with several methods developed independently for each task. However, given the interconnected nature of source localization, separation, and recognition, independent models are likely to yield suboptimal performance as t… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  7. arXiv:2305.01836  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation

    Authors: Shentong Mo, Yapeng Tian

    Abstract: Segment Anything Model (SAM) has recently shown its powerful effectiveness in visual segmentation tasks. However, there is less exploration concerning how SAM works on audio-visual tasks, such as visual sound localization and segmentation. In this work, we propose a simple yet effective audio-visual localization and segmentation framework based on the Segment Anything Model, namely AV-SAM, that ca… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  8. arXiv:2205.01679  [pdf, other

    eess.IV cs.CV

    Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging

    Authors: Fangzhou Mu, Sicheng Mo, Jiayong Peng, Xiaochun Liu, Ji Hyun Nam, Siddeshwar Raghavan, Andreas Velten, Yin Li

    Abstract: Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al. demonstrated a high-speed non-confocal imaging system that operates at 5Hz, 100x faster than the prior art. This enormous gain in acquisition rate,… ▽ More

    Submitted 5 August, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: ICCP 2022 (TPAMI Special Issue on Computational Photography). Project page: https://pages.cs.wisc.edu/~fmu/nlos3d/

  9. arXiv:2203.09324  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Localizing Visual Sounds the Easy Way

    Authors: Shentong Mo, Pedro Morgado

    Abstract: Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training. Previous works often seek high audio-visual similarities for likely positive (sounding) regions and low similarities for likely negative regions. However, accurately distinguishing between sounding and non-sounding regions is challenging witho… ▽ More

    Submitted 29 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

  10. arXiv:2111.04146  [pdf, other

    eess.SY cs.LG cs.RO

    Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

    Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

    Abstract: Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC param… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  11. Delay-Compensated Distributed PDE Control of Traffic with Connected/Automated Vehicles

    Authors: Jie Qi, Shurong Mo, Miroslav Krstic

    Abstract: We develop an input delay-compensating design for stabilization of an Aw-Rascle-Zhang (ARZ) traffic model in congested regime which is governed by a $2\times 2$ first-order hyperbolic nonlinear PDE. The traffic flow consists of both adaptive cruise control-equipped (ACC-equipped) and manually-driven vehicles. The control input is the time gap of ACC-equipped and connected vehicles, which is subjec… ▽ More

    Submitted 2 September, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

  12. arXiv:2104.07667  [pdf

    eess.IV cs.CV

    Shoulder Implant X-Ray Manufacturer Classification: Exploring with Vision Transformer

    Authors: Meng Zhou, Shanglin Mo

    Abstract: Shoulder replacement surgery, also called total shoulder replacement, is a common and complex surgery in Orthopedics discipline. It involves replacing a dead shoulder joint with an artificial implant. In the market, there are many artificial implant manufacturers and each of them may produce different implants with different structures compares to other providers. The problem arises in the followi… ▽ More

    Submitted 21 April, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: 11 pages, 12 figures

  13. arXiv:2102.11122  [pdf, other

    eess.SY cs.LG

    Reinforcement Learning of the Prediction Horizon in Model Predictive Control

    Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

    Abstract: Model predictive control (MPC) is a powerful trajectory optimization control technique capable of controlling complex nonlinear systems while respecting system constraints and ensuring safe operation. The MPC's capabilities come at the cost of a high online computational complexity, the requirement of an accurate model of the system dynamics, and the necessity of tuning its parameters to the speci… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: This work has been submitted to IFAC NMPC 2021 for possible publication

  14. arXiv:2012.08095  [pdf, other

    cs.LG eess.AS

    Automatic Speech Verification Spoofing Detection

    Authors: Shentong Mo, Haofan Wang, Pinxu Ren, Ta-Chung Chi

    Abstract: Automatic speech verification (ASV) is the technology to determine the identity of a person based on their voice. While being convenient for identity verification, we should aim for the highest system security standard given that it is the safeguard of valuable digital assets. Bearing this in mind, we follow the setup in ASVSpoof 2019 competition to develop potential countermeasures that are robus… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  15. arXiv:2011.13365  [pdf, other

    eess.SY cs.LG

    Optimization of the Model Predictive Control Update Interval Using Reinforcement Learning

    Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

    Abstract: In control applications there is often a compromise that needs to be made with regards to the complexity and performance of the controller and the computational resources that are available. For instance, the typical hardware platform in embedded control applications is a microcontroller with limited memory and processing power, and for battery powered applications the control system can account f… ▽ More

    Submitted 26 November, 2020; originally announced November 2020.

    Comments: Submitted to 3rd Annual Learning for Dynamics and Control Conference (L4DC 2021)

  16. Bus Frequency Optimization: When Waiting Time Matters in User Satisfaction

    Authors: Songsong Mo, Zhifeng Bao, Baihua Zheng, Zhiyong Peng

    Abstract: Reorganizing bus frequency to cater for the actual travel demand can save the cost of the public transport system significantly. Many, if not all, existing studies formulate this as a bus frequency optimization problem which tries to minimize passengers' average waiting time. However, many investigations have confirmed that the user satisfaction drops faster as the waiting time increases. Conseque… ▽ More

    Submitted 23 March, 2020; originally announced April 2020.

    Journal ref: International Conference on Database Systems for Advanced Applications 2020

  17. arXiv:2003.04949  [pdf, other

    eess.IV cs.CV

    LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images

    Authors: Shan Lin, Fangbo Qin, Yangming Li, Randall A. Bly, Kris S. Moe, Blake Hannaford

    Abstract: Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN)… ▽ More

    Submitted 13 August, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Accepted by 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  18. Deep Reinforcement Learning Attitude Control of Fixed-Wing UAVs Using Proximal Policy Optimization

    Authors: Eivind Bøhn, Erlend M. Coates, Signe Moe, Tor Arne Johansen

    Abstract: Contemporary autopilot systems for unmanned aerial vehicles (UAVs) are far more limited in their flight envelope as compared to experienced human pilots, thereby restricting the conditions UAVs can operate in and the types of missions they can accomplish autonomously. This paper proposes a deep reinforcement learning (DRL) controller to handle the nonlinear attitude control problem, enabling exten… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: 11 pages, 3 figures, 2019 International Conference on Unmanned Aircraft Systems (ICUAS)

    Journal ref: In 2019 International Conference on Unmanned Aircraft Systems (ICUAS) (pp. 523-533). IEEE

  19. arXiv:1612.09105  [pdf, other

    eess.SY

    Set-based Control for Autonomous Spray Painting

    Authors: Signe Moe, Jan Tommy Gravdahl, Kristin Y. Pettersen

    Abstract: In this paper, a method is presented for lowering the energy consumption and/or increasing the speed of a standard manipulator spray painting a surface. The approach is based on the observation that a small angle between the spray direction and the surface normal does not affect the quality of the paint job. Recent results in set-based kinematic control are utilized to develop a switched control s… ▽ More

    Submitted 29 December, 2016; originally announced December 2016.