Skip to main content

Showing 1–50 of 66 results for author: Lau, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, **gtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.10652  [pdf, other

    cs.CV

    MDeRainNet: An Efficient Neural Network for Rain Streak Removal from Macro-pixel Images

    Authors: Tao Yan, Weijiang He, Chenglong Wang, Xiangjie Zhu, Yinghui Wang, Rynson W. H. Lau

    Abstract: Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benef… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 13 pages, 13 figures, 4 tables

  3. arXiv:2406.01476  [pdf, other

    cs.CV

    DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

    Authors: Tianyu Huang, Yihan Zeng, Hui Li, Wangmeng Zuo, Rynson W. H. Lau

    Abstract: Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Technical report. Codes are released at: https://github.com/tyhuang0428/DreamPhysics

  4. arXiv:2405.17725  [pdf, other

    cs.CV

    Color Shift Estimation-and-Correction for Image Enhancement

    Authors: Yiyu Li, Ke Xu, Gerhard Petrus Hancke, Rynson W. H. Lau

    Abstract: Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and under-exposed regions display opposite color tone distribution shifts with res… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: CVPR2024 accepted paper

  5. arXiv:2404.07662  [pdf, other

    cs.LG cs.AI physics.comp-ph physics.data-an stat.ML

    PINNACLE: PINN Adaptive ColLocation and Experimental points selection

    Authors: Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft constraints, train with a composite loss function that contains multiple training point types: different types of collocation points chosen during training to enforce each PDE and initial/boundary conditions, and experimental points which are usually costly to obtain via experiments or simulations. Training PINNs using this l… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to 12th International Conference on Learning Representations (ICLR 2024), 36 pages

  6. arXiv:2403.17013  [pdf, other

    cs.CV cs.LG

    Temporal-Spatial Processing of Event Camera Data via Delay-Loop Reservoir Neural Network

    Authors: Richard Lau, Anthony Tylan-Tyler, Lihan Yao, Rey de Castro Roberto, Robert Taylor, Isaiah Jones

    Abstract: This paper describes a temporal-spatial model for video processing with special applications to processing event camera videos. We propose to study a conjecture motivated by our previous study of video processing with delay loop reservoir (DLR) neural network, which we call Temporal-Spatial Conjecture (TSC). The TSC postulates that there is significant information content carried in the temporal r… ▽ More

    Submitted 12 February, 2024; originally announced March 2024.

    Comments: 10 pages, 12 figures, Darpa Distribution Statement A. Approved for public release. Distribution Unlimited

  7. arXiv:2403.16224  [pdf, other

    cs.CV

    Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields

    Authors: Haoyuan Wang, Wenbo Hu, Lei Zhu, Rynson W. H. Lau

    Abstract: Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D enviro… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 paper. Project webpage https://whyy.site/paper/nep

  8. arXiv:2403.15383  [pdf, other

    cs.CV

    ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

    Authors: Zhenwei Wang, Tengfei Wang, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau

    Abstract: Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D gener… ▽ More

    Submitted 15 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted to SIGGRAPH 2024. Project page: https://3dthemestation.github.io/

  9. arXiv:2403.00644  [pdf, other

    cs.CV

    Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

    Authors: Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W. H. Lau

    Abstract: Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity result… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024. Replaced some celebrity images to avoid copyright disputes

  10. arXiv:2402.14808  [pdf, other

    cs.CL

    RelayAttention for Efficient Large Language Model Serving with Long System Prompts

    Authors: Lei Zhu, Xinjiang Wang, Wayne Zhang, Rynson W. H. Lau

    Abstract: A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that… ▽ More

    Submitted 30 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: accepted by the ACL 2024 main conference

  11. arXiv:2402.13631  [pdf, other

    cs.CV

    Delving into Dark Regions for Robust Shadow Detection

    Authors: Huankang Guan, Ke Xu, Rynson W. H. Lau

    Abstract: Shadow detection is a challenging task as it requires a comprehensive understanding of shadow characteristics and global/local illumination conditions. We observe from our experiment that state-of-the-art deep methods tend to have higher error rates in differentiating shadow pixels from non-shadow pixels in dark regions (ie, regions with low-intensity values). Our key insight to this problem is th… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  12. arXiv:2402.00341  [pdf, other

    cs.CV

    Recasting Regional Lighting for Shadow Removal

    Authors: Yuhao Liu, Zhanghan Ke, Ke Xu, Fang Liu, Zhenwei Wang, Rynson W. H. Lau

    Abstract: Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color map**s between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: AAAI 2024 (Oral)

  13. arXiv:2312.06439  [pdf, other

    cs.CV

    DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

    Authors: Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, Wangmeng Zuo

    Abstract: 3D generation has raised great attention in recent years. With the success of text-to-image diffusion models, the 2D-lifting technique becomes a promising route to controllable 3D generation. However, these methods tend to present inconsistent geometry, which is also known as the Janus problem. We observe that the problem is caused mainly by two aspects, i.e., viewpoint bias in 2D diffusion models… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  14. arXiv:2310.05373  [pdf, other

    cs.LG cs.AI

    Quantum Bayesian Optimization

    Authors: Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets f… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  15. arXiv:2309.17175  [pdf, other

    cs.CV

    TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

    Authors: Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W. H. Lau, Wangmeng Zuo

    Abstract: Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rat… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by ICLR 2024

  16. arXiv:2309.09774  [pdf, other

    cs.LG cs.CV

    Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning

    Authors: Lei Zhu, Zhanghan Ke, Rynson Lau

    Abstract: Recent semi-supervised learning (SSL) methods typically include a filtering strategy to improve the quality of pseudo labels. However, these filtering strategies are usually hand-crafted and do not change as the model is updated, resulting in a lot of correct pseudo labels being discarded and incorrect pseudo labels being selected during the training process. In this work, we observe that the dist… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: This paper was first submitted to NeurIPS 2021

  17. arXiv:2308.14575  [pdf, other

    cs.CV

    Referring Image Segmentation Using Text Supervision

    Authors: Fang Liu, Yuhao Liu, Yuqiu Kong, Ke Xu, Lihe Zhang, Baocai Yin, Gerhard Hancke, Rynson Lau

    Abstract: Existing Referring Image Segmentation (RIS) methods typically require expensive pixel-level or box-level annotations for supervision. In this paper, we observe that the referring texts used in RIS already provide sufficient information to localize the target object. Hence, we propose a novel weakly-supervised RIS framework to formulate the target localization problem as a classification process to… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  18. arXiv:2308.03059  [pdf, other

    cs.CV cs.AI cs.GR

    Language-based Photo Color Adjustment for Graphic Designs

    Authors: Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau

    Abstract: Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuiti… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 15 pages, 19 figures. Accepted by SIGGRAPH 2023. Project page: https://zhenwwang.github.io/langrecol

  19. arXiv:2307.10664  [pdf, other

    cs.CV cs.GR

    Lighting up NeRF via Unsupervised Decomposition and Enhancement

    Authors: Haoyuan Wang, Xiaogang Xu, Ke Xu, Rynson WH. Lau

    Abstract: Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement meth… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: ICCV 2023. Project website: https://whyy.site/paper/llnerf

  20. arXiv:2303.13511  [pdf, other

    cs.CV cs.AI cs.LG

    Neural Preset for Color Style Transfer

    Authors: Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W. H. Lau

    Abstract: In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Map** (DNCM) to consistently operate on each pixel via an image-adaptive color map** matrix, avoiding ar… ▽ More

    Submitted 24 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Project page with demos: https://zhkkke.github.io/NeuralPreset . Artifact-free real-time 4K color style transfer via AI-generated presets. CVPR 2023

  21. arXiv:2303.08810  [pdf, other

    cs.CV

    BiFormer: Vision Transformer with Bi-Level Routing Attention

    Authors: Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson Lau

    Abstract: As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 camera-ready

  22. arXiv:2301.03182  [pdf, other

    cs.CV

    Structure-Informed Shadow Removal Networks

    Authors: Yuhao Liu, Qing Guo, Lan Fu, Zhanghan Ke, Ke Xu, Wei Feng, Ivor W. Tsang, Rynson W. H. Lau

    Abstract: Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image map** paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence… ▽ More

    Submitted 1 February, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: IEEE TIP

  23. arXiv:2211.15644  [pdf, other

    cs.CV

    Efficient Mirror Detection via Multi-level Heterogeneous Learning

    Authors: Ruozhen He, Jiaying Lin, Rynson W. H. Lau

    Abstract: We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between diffe… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI 2023. The code is available at https://github.com/Catherine-R-He/HetNet

  24. arXiv:2210.01055  [pdf, other

    cs.CV

    CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

    Authors: Tianyu Huang, Bowen Dong, Yunhan Yang, Xiaoshui Huang, Rynson W. H. Lau, Wanli Ouyang, Wangmeng Zuo

    Abstract: Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the… ▽ More

    Submitted 22 August, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by ICCV2023

  25. Large-Field Contextual Feature Learning for Glass Detection

    Authors: Haiyang Mei, Xin Yang, Letian Yu, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau

    Abstract: Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RG… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

  26. Rain Removal from Light Field Images with 4D Convolution and Multi-scale Gaussian Process

    Authors: Tao Yan, Mingyue Li, Bin Li, Yang Yang, Rynson W. H. Lau

    Abstract: Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plen… ▽ More

    Submitted 27 January, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: This paper has been published on IEEE Transactions on Image Processing

    Journal ref: IEEE Transactions on Image Processing (2023), v32, pages 921-936

  27. arXiv:2207.14083  [pdf, other

    cs.CV

    Weakly-Supervised Camouflaged Object Detection with Scribble Annotations

    Authors: Ruozhen He, Qihua Dong, Jiaying Lin, Rynson W. H. Lau

    Abstract: Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and labor-intensive, taking ~60mins to label one image. In this paper, we propose the first weakly-supervised COD method, using scribble annotations as supervision. To achieve… ▽ More

    Submitted 28 November, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted to AAAI 2023. The code and dataset are available at https://github.com/dddraxxx/Weakly-Supervised-Camouflaged-Object-Detection-with-Scribble-Annotations

  28. arXiv:2207.06332  [pdf, other

    cs.CV

    Symmetry-Aware Transformer-based Mirror Detection

    Authors: Tianyu Huang, Bowen Dong, Jiaying Lin, Xiaohui Liu, Rynson W. H. Lau, Wangmeng Zuo

    Abstract: Mirror detection aims to identify the mirror regions in the given input image. Existing works mainly focus on integrating the semantic features and structural features to mine specific relations between mirror and non-mirror regions, or introducing mirror properties like depth or chirality to help analyze the existence of mirrors. In this work, we observe that a real object typically forms a loose… ▽ More

    Submitted 4 September, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  29. arXiv:2207.01322  [pdf, other

    cs.CV

    Harmonizer: Learning to Perform White-Box Image and Video Harmonization

    Authors: Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, Rynson W. H. Lau

    Abstract: Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the… ▽ More

    Submitted 20 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

  30. arXiv:2206.11250  [pdf, other

    cs.CV

    Depth-aware Glass Surface Detection with Cross-modal Context Mining

    Authors: Jiaying Lin, Yuen Hei Yeung, Rynson W. H. Lau

    Abstract: Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflection… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  31. arXiv:2203.17257  [pdf, other

    cs.CV

    Rethinking Video Salient Object Ranking

    Authors: Jiaying Lin, Huankang Guan, Rynson W. H. Lau

    Abstract: Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency rank… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  32. arXiv:2203.09416  [pdf, other

    cs.CV

    Bi-directional Object-context Prioritization Learning for Saliency Ranking

    Authors: Xin Tian, Ke Xu, Xin Yang, Lin Du, Baocai Yin, Rynson W. H. Lau

    Abstract: The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with str… ▽ More

    Submitted 22 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  33. arXiv:2112.02082  [pdf, other

    cs.CV

    Geometry-aware Two-scale PIFu Representation for Human Reconstruction

    Authors: Zheng Dong, Ke Xu, Ziheng Duan, Hujun Bao, Weiwei Xu, Rynson W. H. Lau

    Abstract: Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction fr… ▽ More

    Submitted 27 September, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: Accepted by NeurIPS 2022. 20 pages, 20 figures

  34. arXiv:2111.10137  [pdf, other

    cs.CV

    Learning to Detect Instance-level Salient Objects Using Complementary Image Labels

    Authors: Xin Tian, Ke Xu, Xin Yang, Baocai Yin, Rynson W. H. Lau

    Abstract: Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-a… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: to appear IJCV. arXiv admin note: text overlap with arXiv:2009.13898

  35. arXiv:2109.11818  [pdf, other

    cs.CV

    MODNet-V: Improving Portrait Video Matting via Background Restoration

    Authors: Jiayu Sun, Zhanghan Ke, Lihe Zhang, Huchuan Lu, Rynson W. H. Lau

    Abstract: To address the challenging portrait video matting problem more precisely, existing works typically apply some matting priors that require additional user efforts to obtain, such as annotated trimaps or background images. In this work, we observe that instead of asking the user to explicitly provide a background image, we may recover it from the input video itself. To this end, we first propose a n… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  36. arXiv:2107.04688  [pdf, other

    cs.CV

    Scaled-Time-Attention Robust Edge Network

    Authors: Richard Lau, Lihan Yao, Todd Huster, William Johnson, Stephen Arleth, Justin Wong, Devin Ridge, Michael Fletcher, William C. Headley

    Abstract: This paper describes a systematic approach towards building a new family of neural networks based on a delay-loop version of a reservoir neural network. The resulting architecture, called Scaled-Time-Attention Robust Edge (STARE) network, exploits hyper dimensional space and non-multiply-and-add computation to achieve a simpler architecture, which has shallow layers, is simple to train, and is bet… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 20 pages, 22 figures, 9 tables, Darpa Distribution Statement A. Approved for public release. Distribution Unlimited

    MSC Class: 68T05

  37. Smart Scribbles for Image Mating

    Authors: Xin Yang, Yu Qiao, Shaozhe Chen, Shengfeng He, Baocai Yin, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau

    Abstract: Image matting is an ill-posed problem that usually requires additional user input, such as trimaps or scribbles. Drawing a fne trimap requires a large amount of user effort, while using scribbles can hardly obtain satisfactory alpha mattes for non-professional users. Some recent deep learning-based matting networks rely on large-scale composite datasets for training to improve performance, resulti… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: ACM Trans. Multimedia Comput. Commun. Appl

  38. arXiv:2101.11111  [pdf, other

    cs.CV

    Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

    Authors: Xin Yang, Zongliang Ma, Letian Yu, Ying Cao, Baocai Yin, Xiaopeng Wei, Qiang Zhang, Rynson W. H. Lau

    Abstract: In this paper, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles, and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework, which can allocate the images across mult… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  39. arXiv:2101.00932  [pdf, other

    cs.CV

    Weakly-Supervised Saliency Detection via Salient Object Subitizing

    Authors: Xiaoyang Zheng, Xin Tan, Jie Zhou, Lizhuang Ma, Rynson W. H. Lau

    Abstract: Salient object detection aims at detecting the most visually distinct objects and producing the corresponding masks. As the cost of pixel-level annotations is high, image tags are usually used as weak supervisions. However, an image tag can only be used to annotate one class of objects. In this paper, we introduce saliency subitizing as the weak supervision since it is class-agnostic. This allows… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: This paper is accepted to IEEE Trans. on Circuits and Systems for Video Technology (TCSVT)

  40. arXiv:2012.07131  [pdf, other

    cs.CV

    Location-aware Single Image Reflection Removal

    Authors: Zheng Dong, Ke Xu, Yin Yang, Hujun Bao, Weiwei Xu, Rynson W. H. Lau

    Abstract: This paper proposes a novel location-aware deep-learning-based single image reflection removal method. Our network has a reflection detection module to regress a probabilistic reflection confidence map, taking multi-scale Laplacian features as inputs. This probabilistic map tells if a region is reflection-dominated or transmission-dominated, and it is used as a cue for the network to control the f… ▽ More

    Submitted 19 August, 2021; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: 10 pages, 10 figures, 3 tables

  41. arXiv:2011.11961  [pdf, other

    cs.CV

    MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition

    Authors: Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, Rynson W. H. Lau

    Abstract: Existing portrait matting methods either require auxiliary inputs that are costly to obtain or involve multiple stages that are computationally expensive, making them less suitable for real-time applications. In this work, we present a light-weight matting objective decomposition network (MODNet) for portrait matting in real-time with a single input image. The key idea behind our efficient design… ▽ More

    Submitted 18 March, 2022; v1 submitted 24 November, 2020; originally announced November 2020.

  42. arXiv:2009.13898  [pdf, other

    cs.CV

    Weakly-supervised Salient Instance Detection

    Authors: Xin Tian, Ke Xu, Xin Yang, Baocai Yin, Rynson W. H. Lau

    Abstract: Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-a… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: BMVC 2020, best student paper runner-up

  43. arXiv:2008.05258  [pdf, other

    cs.CV cs.LG eess.IV

    Guided Collaborative Training for Pixel-wise Semi-Supervised Learning

    Authors: Zhanghan Ke, Di Qiu, Kaican Li, Qiong Yan, Rynson W. H. Lau

    Abstract: We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: 16th European Conference on Computer Vision (ECCV 2020)

  44. HDR-GAN: HDR Image Reconstruction from Multi-Exposed LDR Images with Large Motions

    Authors: Yuzhen Niu, Jianbin Wu, Wenxi Liu, Wenzhong Guo, Rynson W. H. Lau

    Abstract: Synthesizing high dynamic range (HDR) images from multiple low-dynamic range (LDR) exposures in dynamic scenes is challenging. There are two major problems caused by the large motions of foreground objects. One is the severe misalignment among the LDR images. The other is the missing content due to the over-/under-saturated regions caused by the moving objects, which may not be easily compensated… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  45. arXiv:2006.06606  [pdf, other

    cs.CV

    What makes instance discrimination good for transfer learning?

    Authors: Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin

    Abstract: Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress. Notably, recent work on unsupervised pretraining has shown to surpass the supervised counterpart for finetuning downstream applications such as object detection and segmentation. It comes as a surprise that image annotations would be better left unused for transfer learning. In this work… ▽ More

    Submitted 19 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted by ICLR 2021

  46. arXiv:2004.06638  [pdf, other

    cs.CV

    Distilling Localization for Self-Supervised Representation Learning

    Authors: Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, Stephen Lin

    Abstract: Recent progress in contrastive learning has revolutionized unsupervised representation learning. Concretely, multiple views (augmentations) from the same image are encouraged to map to the similar embeddings, while views from different images are pulled apart. In this paper, through visualizing and diagnosing classification errors, we observe that current contrastive models are ineffective at loca… ▽ More

    Submitted 19 January, 2021; v1 submitted 14 April, 2020; originally announced April 2020.

    Comments: Accepted by AAAI2021

  47. arXiv:2003.13623  [pdf, other

    cs.CV cs.LG

    Laplacian Denoising Autoencoder

    Authors: Jianbo Jiao, Linchao Bao, Yunchao Wei, Shengfeng He, Honghui Shi, Rynson Lau, Thomas S. Huang

    Abstract: While deep neural networks have been shown to perform remarkably well in many machine learning tasks, labeling a large amount of ground truth data for supervised training is usually very costly to scale. Therefore, learning robust representations with unlabeled data is critical in relieving human effort and vital for many downstream tasks. Recent advances in unsupervised and self-supervised learni… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  48. Night-time Scene Parsing with a Large Real Dataset

    Authors: Xin Tan, Ke Xu, Ying Cao, Yiheng Zhang, Lizhuang Ma, Rynson W. H. Lau

    Abstract: Although huge progress has been made on scene analysis in recent years, most existing works assume the input images to be in day-time with good lighting conditions. In this work, we aim to address the night-time scene parsing (NTSP) problem, which has two main challenges: 1) labeled night-time data are scarce, and 2) over- and under-exposures may co-occur in the input night-time images and are not… ▽ More

    Submitted 1 April, 2022; v1 submitted 15 March, 2020; originally announced March 2020.

    Comments: 13 pages, 11 figures. This paper is accepted by IEEE Transactions on Image Processing. The dataset can be accessed via https://dmcv.sjtu.edu.cn/people/phd/tanxin/NightCity/index.html

  49. arXiv:1909.01804  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning

    Authors: Zhanghan Ke, Daoye Wang, Qiong Yan, Jimmy Ren, Rynson W. H. Lau

    Abstract: Recently, consistency-based methods have achieved state-of-the-art results in semi-supervised learning (SSL). These methods always involve two roles, an explicit or implicit teacher model and a student model, and penalize predictions under different perturbations by a consistency constraint. However, the weights of these two roles are tightly coupled since the teacher is essentially an exponential… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: International Conference in Computer Vision 2019 (ICCV 2019)

  50. arXiv:1908.09101  [pdf, other

    cs.CV

    Where Is My Mirror?

    Authors: Xin Yang, Haiyang Mei, Ke Xu, Xiaopeng Wei, Baocai Yin, Rynson W. H. Lau

    Abstract: Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content outside a mirror from the reflected content inside it is non-trivial. The key challenge is that mirrors typically reflect contents similar to thei… ▽ More

    Submitted 3 October, 2019; v1 submitted 24 August, 2019; originally announced August 2019.

    Comments: Accepted by ICCV 2019. Project homepage: https://mhaiyang.github.io/ICCV2019_MirrorNet/index.html