Skip to main content

Showing 1–50 of 137 results for author: Yeung, S

.
  1. arXiv:2404.13953  [pdf, other

    cs.CV

    360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

    Authors: Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

    Abstract: Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360° images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidire… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  2. arXiv:2404.10681  [pdf, other

    cs.CV

    StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization

    Authors: Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung

    Abstract: Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a larg… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: project page: https://chenyingshu.github.io/stylecity3d/

  3. arXiv:2404.08590  [pdf, other

    cs.CV cs.AI

    Improving Referring Image Segmentation using Vision-Aware Text Features

    Authors: Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Referring image segmentation is a challenging task that involves generating pixel-wise segmentation masks based on natural language descriptions. Existing methods have relied mostly on visual features to generate the segmentation masks while treating text features as supporting components. This over-reliance on visual features can lead to suboptimal results, especially in complex scenarios where t… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 30 pages including supplementary

  4. arXiv:2404.03202  [pdf, other

    cs.CV

    OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images

    Authors: Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng

    Abstract: Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstru… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures

  5. arXiv:2403.11499  [pdf, other

    hep-ph astro-ph.CO

    Resolving the $H_0$ and $S_8$ tensions with neutrino mass and chemical potential

    Authors: Shek Yeung, Wangzheng Zhang, Ming-chung Chu

    Abstract: A simple and natural extension of the standard $Λ$CDM model is to allow relic neutrinos to have non-zero degeneracy. We confront this $Λ$CDM$ξ$ model, $Λ$CDM with neutrino mass $M_ν$ and degeneracy $ξ_3$ as additional parameters, with the \textit{Planck} TT, lowT, plik--lensing, BAO, and DES datasets, and we observe a strong preference (Bayes factor $\log_{10}B=1.9$) for it over the standard $Λ$CD… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 main + 2 appendix figures

  6. arXiv:2403.03004  [pdf, other

    astro-ph.CO gr-qc hep-ph

    Ultralight vector dark matter search using data from the KAGRA O3GK run

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

    Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 20 pages, 5 figures

    Report number: LIGO-P2300250

  7. arXiv:2401.13937  [pdf, other

    cs.CV

    Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention

    Authors: Quang-Trung Truong, Duc Thanh Nguyen, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Video object segmentation is a fundamental research problem in computer vision. Recent techniques have often applied attention mechanism to object representation learning from video sequences. However, due to temporal changes in the video data, attention maps may not well align with the objects of interest across video frames, causing accumulated errors in long-term video processing. In addition,… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: under review

  8. arXiv:2401.12421  [pdf, other

    cs.CV cs.AI

    AdaEmbed: Semi-supervised Domain Adaptation in the Embedding Space

    Authors: Ali Mottaghi, Mohammad Abdullah Jamal, Serena Yeung, Omid Mohareri

    Abstract: Semi-supervised domain adaptation (SSDA) presents a critical hurdle in computer vision, especially given the frequent scarcity of labeled data in real-world settings. This scarcity often causes foundation models, trained on extensive datasets, to underperform when applied to new domains. AdaEmbed, our newly proposed methodology for SSDA, offers a promising solution to these challenges. Leveraging… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  9. arXiv:2401.02147  [pdf, other

    cs.CL cs.CV

    Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

    Authors: Ziqiang Zheng, Yiwei Chen, Jipeng Zhang, Tuan-Anh Vu, Huimin Zeng, Yue Him Wong Tim, Sai-Kit Yeung

    Abstract: Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significan… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 51 pages, 36 figures, Repository: https://github.com/hkust-vgd/Marine_GPT-4V_Eval

  10. arXiv:2312.17505  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation

    Authors: Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo, Binh-Son Hua, Nhat Minh Chung, Ivor W. Tsang, Sai-Kit Yeung

    Abstract: Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. This indicates that there exists a strong correlation between the visual and textual domains. In addition, text-image discriminative models such as CLIP excel in image labelling from text prompts, thanks to the rich and diverse information available from open concepts. In t… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: This work is under review

  11. arXiv:2312.05745  [pdf, other

    cs.CV cs.AI

    Open World Object Detection in the Era of Foundation Models

    Authors: Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Kuan-Chieh Wang

    Abstract: Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method deve… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  12. Measuring neutrino mass and asymmetry with matter pairwise velocities

    Authors: Wangzheng Zhang, Ming-chung Chu, Rui Hu, Shihong Liao, Shek Yeung

    Abstract: Neutrinos are believed to be the most abundant fermions in the Universe, but their masses are unknown, except for being non-zero but much smaller than other fermions. Cosmological relic neutrinos could also have non-zero chemical potentials (or asymmetries). Using neutrino-involved N-body simulations, we investigate the neutrino effects on the matter pairwise velocity, which itself is an interesti… ▽ More

    Submitted 15 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 15 pages, 4 main + 3 appendix figures, accepted for publication in MNRAS

    Journal ref: MNRAS 529, 360 (2024)

  13. arXiv:2311.18328  [pdf, other

    cs.CV cs.AI cs.GR

    Advances in 3D Neural Stylization: A Survey

    Authors: Yingshu Chen, Guocheng Shao, Ka Chun Shum, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Modern artificial intelligence offers a novel and transformative approach to creating digital art across diverse styles and modalities like images, videos and 3D data, unleashing the power of creativity and revolutionizing the way that we perceive and interact with visual content. This paper reports on recent advances in stylized 3D asset creation and manipulation with the expressive power of neur… ▽ More

    Submitted 18 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  14. arXiv:2311.17389  [pdf, other

    cs.CV

    360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

    Authors: Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung

    Abstract: Portable 360$^\circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, compos… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024. Project Page: https://huajianup.github.io/research/360Loc/

  15. arXiv:2311.16728  [pdf, other

    cs.CV

    Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Map** for Monocular, Stereo, and RGB-D Cameras

    Authors: Huajian Huang, Longwei Li, Hui Cheng, Sai-Kit Yeung

    Abstract: The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework… ▽ More

    Submitted 8 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: CVPR 2024. Code: https://github.com/HuajianUP/Photo-SLAM - Project Page: https://huajianup.github.io/research/Photo-SLAM/

  16. arXiv:2311.14762  [pdf, other

    cs.CV cs.AI

    The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

    Authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo , et al. (24 additional authors not shown)

    Abstract: The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

  17. arXiv:2311.13152  [pdf, other

    cs.CV

    Test-Time Augmentation for 3D Point Cloud Classification and Segmentation

    Authors: Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Data augmentation is a powerful technique to enhance the performance of a deep learning task but has received less attention in 3D deep learning. It is well known that when 3D shapes are sparsely represented with low point density, the performance of the downstream tasks drops significantly. This work explores test-time augmentation (TTA) for 3D point clouds. We are inspired by the recent revoluti… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: This paper is accepted in 3DV 2024

  18. arXiv:2311.10798  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

    Authors: Shih-Cheng Huang, Zepeng Huo, Ethan Steinberg, Chia-Chun Chiang, Matthew P. Lungren, Curtis P. Langlotz, Serena Yeung, Nigam H. Shah, Jason A. Fries

    Abstract: Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patien… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  19. arXiv:2310.13596  [pdf, other

    cs.CL cs.AI

    MarineGPT: Unlocking Secrets of Ocean to the Public

    Authors: Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung

    Abstract: Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant. The continuous works are proposing multi-modal large language models (MLLM), empowering LLMs with the ability to sense multiple modality inputs through constructing a joint semantic space (e.g. visual-text space). Though significant success was achieved in LLMs… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: work in progress. Code and data will be available at https://github.com/hkust-vgd/MarineGPT

  20. arXiv:2310.01946  [pdf, other

    cs.CV

    CoralVOS: Dataset and Benchmark for Coral Video Segmentation

    Authors: Zheng Ziqiang, Xie Yaofeng, Liang Haixin, Yu Zhibin, Sai-Kit Yeung

    Abstract: Coral reefs formulate the most valuable and productive marine ecosystems, providing habitat for many marine species. Coral reef surveying and analysis are currently confined to coral experts who invest substantial effort in generating comprehensive and dependable reports (\emph{e.g.}, coral coverage, population, spatial distribution, \textit{etc}), from the collected survey data. However, performi… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 8 pages, 9 figures, dense coral video segmentation dataset and benchmark

  21. arXiv:2310.01931  [pdf, other

    cs.CV

    MarineDet: Towards Open-Marine Object Detection

    Authors: Liang Haixin, Zheng Ziqiang, Ma Zeyu, Sai-Kit Yeung

    Abstract: Marine object detection has gained prominence in marine research, driven by the pressing need to unravel oceanic mysteries and enhance our understanding of invaluable marine ecosystems. There is a profound requirement to efficiently and accurately identify and localize diverse and unseen marine entities within underwater imagery. The open-marine object detection (OMOD for short) is required to det… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 8 pages, 5 figures

  22. arXiv:2309.12668  [pdf, other

    cs.RO

    UWA360CAM: A 360$^{\circ}$ 24/7 Real-Time Streaming Camera System for Underwater Applications

    Authors: Quan-Dung Pham, Yipeng Zhu, Tan-Sang Ha, K. H. Long Nguyen, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Omnidirectional camera is a cost-effective and information-rich sensor highly suitable for many marine applications and the ocean scientific community, encompassing several domains such as augmented reality, map**, motion estimation, visual surveillance, and simultaneous localization and map**. However, designing and constructing such a high-quality 360$^{\circ}$ real-time streaming camera sys… ▽ More

    Submitted 30 September, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

  23. arXiv:2309.11281  [pdf, other

    cs.CV

    Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates

    Authors: Ka Chun Shum, Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung

    Abstract: Neural radiance field is an emerging rendering method that generates high-quality multi-view consistent images from a neural scene representation and volume rendering. Although neural radiance field-based techniques are robust for scene reconstruction, their ability to add or remove objects remains limited. This paper proposes a new language-driven approach for object manipulation with neural radi… ▽ More

    Submitted 31 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: CVPR 2024

  24. arXiv:2309.10684  [pdf, other

    cs.CV cs.GR

    Locally Stylized Neural Radiance Fields

    Authors: Hong-Wing Pang, Binh-Son Hua, Sai-Kit Yeung

    Abstract: In recent years, there has been increasing interest in applying stylization on 3D scenes from a reference style image, in particular onto neural radiance fields (NeRF). While performing stylization directly on NeRF guarantees appearance consistency over arbitrary novel views, it is a challenging problem to guide the transfer of patterns from the style image onto different parts of the NeRF scene.… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  25. arXiv:2309.07986  [pdf, other

    cs.CV cs.AI cs.LG

    Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models

    Authors: James Burgess, Kuan-Chieh Wang, Serena Yeung

    Abstract: Text-to-image diffusion models understand spatial relationship between objects, but do they represent the true 3D structure of the world from only 2D supervision? We demonstrate that yes, 3D knowledge is encoded in 2D image diffusion models like Stable Diffusion, and we show that this structure can be exploited for 3D vision tasks. Our method, Viewpoint Neural Textual Inversion (ViewNeTI), control… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Project page: https://jmhb0.github.io/viewneti/

  26. arXiv:2309.06660  [pdf, other

    cs.LG cs.CV

    Generalizable Neural Fields as Partially Observed Neural Processes

    Authors: Jeffrey Gu, Kuan-Chieh Wang, Serena Yeung

    Abstract: Neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to repre… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: To appear ICCV 2023

  27. arXiv:2309.03097  [pdf, other

    stat.AP

    An Algorithm for Modelling Escalator Fixed Loss Energy for PHM and sustainable energy usage

    Authors: Xuwen Hu, Jiaqi Qiu, Yu Lin, Inez Maria Zwetsloot, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

    Abstract: Prognostic Health Management (PHM) is designed to assess and monitor the health status of systems, anticipate the onset of potential failure, and prevent unplanned downtime. In recent decades, collecting massive amounts of real-time sensor data enabled condition monitoring (CM) and consequently, detection of abnormalities to support maintenance decision-making. Additionally, the utilization of PHM… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  28. arXiv:2308.03822  [pdf, other

    astro-ph.HE

    Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

    Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 24 pages, 5 figures

    Report number: LIGO-P2300080

  29. arXiv:2307.14630  [pdf, other

    cs.CV

    360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking

    Authors: Huajian Huang, Yinzhe Xu, Yingshu Chen, Sai-Kit Yeung

    Abstract: 360° images can provide an omnidirectional field of view which is important for stable and long-term scene perception. In this paper, we explore 360° images for visual object tracking and perceive new challenges caused by large distortion, stitching artifacts, and other unique attributes of 360° images. To alleviate these problems, we take advantage of novel representations of target localization,… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: ICCV 2023. Homepage: https://360vot.hkustvgd.com The toolkit of the benchmark is available at: https://github.com/HuajianUP/360VOT

  30. arXiv:2307.09621  [pdf, other

    cs.CV

    Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration

    Authors: Ka Chun Shum, Hong-Wing Pang, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung

    Abstract: In this paper, we address the problem of conditional scene decoration for 360-degree images. Our method takes a 360-degree background photograph of an indoor scene and generates decorated images of the same scene in the panorama view. To do this, we develop a 360-aware object layout generator that learns latent object vectors in the 360-degree view to enable a variety of furniture arrangements for… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: ICCV2023

  31. arXiv:2306.09605  [pdf, ps, other

    math.AG math.NT

    Arithmetic fake compact Hermitian symmetric spaces of Type $A_3$

    Authors: Gopal Prasad, Sai-Kee Yeung

    Abstract: We reduced the classification of arithmetic fake compact Hermitian symmetric spaces of type $A_3$ to a few cases.

    Submitted 15 June, 2023; originally announced June 2023.

  32. arXiv:2306.08893  [pdf, other

    cs.CV cs.AI cs.LG

    LOVM: Language-Only Vision Model Selection

    Authors: Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, Serena Yeung

    Abstract: Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few- and zero-shot settings. However, selecting the best-performing VLM for some downstream applications is non-trivial, as it is dataset and task-dependent. Meanwhile, the exhaustive evaluation of all available VLMs on… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  33. arXiv:2306.05436  [pdf, other

    stat.AP cs.CY

    Remaining Useful Life Modelling with an Escalator Health Condition Analytic System

    Authors: Inez M. Zwetsloot, Yu Lin, Jiaqi Qiu, Lishuai Li, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

    Abstract: The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic syste… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 14 pages, 12 figures, 7 tables

  34. arXiv:2306.04593  [pdf, other

    cs.CV cs.IR

    MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding

    Authors: Tan-Sang Ha, Hai Nguyen-Truong, Tuan-Anh Vu, Sai-Kit Yeung

    Abstract: Building a video retrieval system that is robust and reliable, especially for the marine environment, is a challenging task due to several factors such as dealing with massive amounts of dense and repetitive data, occlusion, blurriness, low lighting conditions, and abstract queries. To address these challenges, we present MarineVRS, a novel and flexible video retrieval system designed explicitly f… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to OCEANS 2023 Limerick. Website: https://marinevrs.hkustvgd.com/

  35. arXiv:2305.17311  [pdf, other

    cs.CL cs.AI cs.LG

    Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models

    Authors: Yuhui Zhang, Michihiro Yasunaga, Zheng** Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung

    Abstract: Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data. In this work, we introduce NeQA, a dataset consisting of questions with negation in which language models do not exhibit straightforward positive scaling. We show that this task can exhibit inverse scaling, U-shaped scaling, or positive scaling, and th… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Published at ACL 2023 Findings

  36. arXiv:2305.16411  [pdf, other

    cs.CV

    ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image

    Authors: Zhenzhen Weng, Zeyu Wang, Serena Yeung

    Abstract: Recent advancements in text-to-image generation have enabled significant progress in zero-shot 3D shape generation. This is achieved by score distillation, a methodology that uses pre-trained text-to-image diffusion models to optimize the parameters of a 3D neural presentation, e.g. Neural Radiance Field (NeRF). While showing promising results, existing methods are often not able to preserve the g… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  37. arXiv:2305.06611  [pdf, other

    cs.CV

    Hyperbolic Deep Learning in Computer Vision: A Survey

    Authors: Pascal Mettes, Mina Ghadimi Atigh, Martin Keller-Ressel, Jeffrey Gu, Serena Yeung

    Abstract: Deep representation learning is a ubiquitous part of modern computer vision. While Euclidean space has been the de facto standard manifold for learning visual representations, hyperbolic space has recently gained rapid traction for learning in computer vision. Specifically, hyperbolic learning has shown a strong potential to embed hierarchical structures, learn from limited samples, quantify uncer… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  38. arXiv:2304.00546  [pdf, other

    eess.IV cs.CV cs.LG

    Video Pretraining Advances 3D Deep Learning on Chest CT Tasks

    Authors: Alexander Ke, Shih-Cheng Huang, Chloe P O'Connell, Michal Klimont, Serena Yeung, Pranav Rajpurkar

    Abstract: Pretraining on large natural image classification datasets such as ImageNet has aided model development on data-scarce 2D medical tasks. 3D medical tasks often have much less data than 2D medical tasks, prompting practitioners to rely on pretrained 2D models to featurize slices. However, these 2D models have been surpassed by 3D models on 3D computer vision benchmarks since they do not natively le… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted at MIDL 2023

  39. arXiv:2302.04303  [pdf, other

    cs.CV

    Adapting Pre-trained Vision Transformers from 2D to 3D through Weight Inflation Improves Medical Image Segmentation

    Authors: Yuhui Zhang, Shih-Cheng Huang, Zheng** Zhou, Matthew P. Lungren, Serena Yeung

    Abstract: Given the prevalence of 3D medical imaging technologies such as MRI and CT that are widely used in diagnosing and treating diverse diseases, 3D segmentation is one of the fundamental tasks of medical image analysis. Recently, Transformer-based models have started to achieve state-of-the-art performances across many vision tasks, through pre-training on large-scale natural image benchmark datasets.… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: Published at ML4H 2022

  40. arXiv:2302.04269  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Diagnosing and Rectifying Vision Models using Language

    Authors: Yuhui Zhang, Jeff Z. HaoChen, Shih-Cheng Huang, Kuan-Chieh Wang, James Zou, Serena Yeung

    Abstract: Recent multi-modal contrastive learning models have demonstrated the ability to learn an embedding space suitable for building strong vision classifiers, by leveraging the rich information in large-scale image-caption datasets. Our work highlights a distinct advantage of this multi-modal embedding space: the ability to diagnose vision classifiers through natural language. The traditional process o… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: Published at ICLR 2023

  41. arXiv:2212.13660  [pdf, other

    cs.CV

    NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

    Authors: Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo, Jeffrey Gu, C. Karen Liu, Serena Yeung

    Abstract: The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap system… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

  42. arXiv:2212.01424  [pdf, other

    cs.CV cs.AI cs.LG

    PROB: Probabilistic Objectness for Open World Object Detection

    Authors: Orr Zohar, Kuan-Chieh Wang, Serena Yeung

    Abstract: Open World Object Detection (OWOD) is a new and challenging computer vision task that bridges the gap between classic object detection (OD) benchmarks and object detection in the real world. In addition to detecting and classifying seen/labeled objects, OWOD algorithms are expected to detect novel/unknown objects - which can be classified and incrementally learned. In standard OD, object proposals… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  43. arXiv:2211.13508  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

    Authors: Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, **g Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda , et al. (48 additional authors not shown)

    Abstract: The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detec… ▽ More

    Submitted 28 November, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: MaCVi 2023 was part of WACV 2023. This report (38 pages) discusses the competition as part of MaCVi

  44. arXiv:2211.08702  [pdf, other

    cs.CV cs.AI cs.GR

    PointInverter: Point Cloud Reconstruction and Editing via a Generative Model with Shape Priors

    Authors: Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung

    Abstract: In this paper, we propose a new method for map** a 3D point cloud to the latent space of a 3D generative adversarial network. Our generative model for 3D point clouds is based on SP-GAN, a state-of-the-art sphere-guided 3D point cloud generator. We derive an efficient way to encode an input 3D point cloud to the latent space of the SP-GAN. Our point cloud encoder can resolve the point ordering i… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: WACV 2023 paper. 8 pages of main content, 2 pages of references, 7 pages of supplementary material

  45. arXiv:2209.11518  [pdf, other

    cs.CV cs.IR cs.MM

    Marine Video Kit: A New Marine Video Dataset for Content-based Analysis and Retrieval

    Authors: Quang-Trung Truong, Tuan-Anh Vu, Tan-Sang Ha, Lokoc Jakub, Yue Him Wong Tim, Ajay Joneja, Sai-Kit Yeung

    Abstract: Effective analysis of unusual domain specific video collections represents an important practical problem, where state-of-the-art general purpose models still face limitations. Hence, it is desirable to design benchmark datasets that challenge novel powerful models for specific domains with additional constraints. It is important to remember that domain specific data may be noisier (e.g., endoscop… ▽ More

    Submitted 6 December, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: Camera Ready for MMM 2023, Bergen, Norway

  46. arXiv:2209.05800  [pdf, other

    cs.CV cs.GR cs.MM

    Time-of-Day Neural Style Transfer for Architectural Photographs

    Authors: Yingshu Chen, Tuan-Anh Vu, Ka-Chun Shum, Binh-Son Hua, Sai-Kit Yeung

    Abstract: Architectural photography is a genre of photography that focuses on capturing a building or structure in the foreground with dramatic lighting in the background. Inspired by recent successes in image-to-image translation methods, we aim to perform style transfer for architectural photographs. However, the special composition in architectural photography poses great challenges for style transfer in… ▽ More

    Submitted 27 October, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: Updated version with corrected equations. Paper published at the International Conference on Computational Photography (ICCP) 2022. 12 pages of content with 6 pages of supplementary materials

  47. arXiv:2208.02705  [pdf, other

    cs.CV

    360Roam: Real-Time Indoor Roaming Using Geometry-Aware 360$^\circ$ Radiance Fields

    Authors: Huajian Huang, Yingshu Chen, Tianjia Zhang, Sai-Kit Yeung

    Abstract: Virtual tour among sparse 360$^\circ$ images is widely used while hindering smooth and immersive roaming experiences. The emergence of Neural Radiance Field (NeRF) has showcased significant progress in synthesizing novel views, unlocking the potential for immersive scene exploration. Nevertheless, previous NeRF works primarily focused on object-centric scenarios, resulting in noticeable performanc… ▽ More

    Submitted 28 November, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

  48. arXiv:2207.10062  [pdf, other

    cs.LG

    DataPerf: Benchmarks for Data-Centric AI Development

    Authors: Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman , et al. (20 additional authors not shown)

    Abstract: Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing datase… ▽ More

    Submitted 13 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  49. arXiv:2207.05765  [pdf, other

    astro-ph.CO gr-qc hep-ph hep-th physics.hist-ph

    Is the Observable Universe Consistent with the Cosmological Principle?

    Authors: Pavan Kumar Aluri, Paolo Cea, Pravabati Chingangbam, Ming-Chung Chu, Roger G. Clowes, Damien Hutsemékers, Joby P. Kochappan, Alexia M. Lopez, Lang Liu, Niels C. M. Martens, C. J. A. P. Martins, Konstantinos Migkas, Eoin Ó Colgáin, Pratyush Pranav, Lior Shamir, Ashok K. Singal, M. M. Sheikh-Jabbari, Jenny Wagner, Shao-Jiang Wang, David L. Wiltshire, Shek Yeung, Lu Yin, Wen Zhao

    Abstract: The Cosmological Principle (CP) -- the notion that the Universe is spatially isotropic and homogeneous on large scales -- underlies a century of progress in cosmology. It is conventionally formulated through the Friedmann-Lemaître-Robertson-Walker (FLRW) cosmologies as the spacetime metric, and culminates in the successful and highly predictive $Λ$-Cold-Dark-Matter ($Λ$CDM) model. Yet, tensions ha… ▽ More

    Submitted 27 February, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: extended contents and references, 73 pages (excluding references), 30 figures, version accepted for publication in Class. Quant. Grav. "Focus issue on the Hubble constant tension"

    Journal ref: Classical and Quantum Gravity, Vol. 40, Issue No. 9, Page No. 094001 (2023)

  50. arXiv:2207.03083  [pdf, other

    cs.CV

    Adaptation of Surgical Activity Recognition Models Across Operating Rooms

    Authors: Ali Mottaghi, Aidean Sharghi, Serena Yeung, Omid Mohareri

    Abstract: Automatic surgical activity recognition enables more intelligent surgical devices and a more efficient workflow. Integration of such technology in new operating rooms has the potential to improve care delivery to patients and decrease costs. Recent works have achieved a promising performance on surgical activity recognition; however, the lack of generalizability of these models is one of the criti… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: MICCAI 2022