Skip to main content

Showing 1–50 of 60 results for author: Wong, K K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04253  [pdf, other

    cs.CV

    A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

    Authors: Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong

    Abstract: 3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures

  2. arXiv:2403.07860  [pdf, other

    cs.CV

    Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

    Authors: Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong

    Abstract: Text-to-image generation has made significant advancements with the introduction of text-to-image diffusion models. These models typically consist of a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of component… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  3. arXiv:2403.01852  [pdf, other

    cs.CV

    PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

    Authors: Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo, Kwan-Yee K. Wong

    Abstract: Recent advancements in large-scale pre-trained text-to-image models have led to remarkable progress in semantic image synthesis. Nevertheless, synthesizing high-quality images with consistent semantics and layout remains a challenge. In this paper, we propose the adaPtive LAyout-semantiC fusion modulE (PLACE) that harnesses pre-trained models to alleviate the aforementioned issues. Specifically, w… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  4. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pu** Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  5. arXiv:2401.14074  [pdf, other

    cs.CV cs.LG

    ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation

    Authors: Y. Liu, L. Lin, K. K. Y. Wong, X. Tang

    Abstract: Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medi… ▽ More

    Submitted 5 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  6. arXiv:2401.07314  [pdf, other

    cs.AI cs.CV cs.RO

    MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: LLM/VLM-based VLN Agents. Accepted to ACL 2024. Project: https://chen-judge.github.io/MapGPT/

  7. arXiv:2311.13535  [pdf, other

    cs.CV

    DiffusionMat: Alpha Matting as Sequential Refinement Learning

    Authors: Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, ** Luo

    Abstract: In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. Diverging from conventional methods that utilize trimaps merely as loose guidance for alpha matte prediction, our approach treats image matting as a sequential refinement learning process. This process begins with the addition of noise to… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  8. arXiv:2310.01412  [pdf, other

    cs.CV cs.RO

    DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

    Authors: Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao

    Abstract: Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on… ▽ More

    Submitted 14 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: The project page is available at https://tonyxuqaq.github.io/projects/DriveGPT4/

  9. arXiv:2308.09705  [pdf, other

    cs.CV

    Guide3D: Create 3D Avatars from Text and Image Guidance

    Authors: Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

    Abstract: Recently, text-to-image generation has exhibited remarkable advancements, with the ability to produce visually impressive results. In contrast, text-to-3D generation has not yet reached a comparable level of quality. Existing methods primarily rely on text-guided score distillation sampling (SDS), and they encounter difficulties in transferring 2D attributes of the generated images to 3D content.… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 25 pages, 22 figures

  10. arXiv:2308.08543  [pdf, other

    cs.CV cs.RO

    InsMapper: Exploring Inner-instance Information for Vectorized HD Map**

    Authors: Zhenhua Xu, Kwan-Yee. K. Wong, Hengshuang Zhao

    Abstract: Vectorized high-definition (HD) maps contain detailed information about surrounding road elements, which are crucial for various downstream tasks in modern autonomous vehicles, such as motion planning and vehicle control. Recent works attempt to directly detect the vectorized HD map as a point set prediction task, achieving notable detection performance improvements. However, these methods usually… ▽ More

    Submitted 8 March, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Code and demo will be available at https://tonyxuqaq.github.io/InsMapper/

  11. arXiv:2308.06097  [pdf, other

    cs.CV

    RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

    Authors: Yangyang Xu, Shengfeng He, Kwan-Yee K. Wong, ** Luo

    Abstract: GAN inversion is indispensable for applying the powerful editability of GAN to real images. However, existing methods invert video frames individually often leading to undesired inconsistent results over time. In this paper, we propose a unified recurrent framework, named \textbf{R}ecurrent v\textbf{I}deo \textbf{G}AN \textbf{I}nversion and e\textbf{D}iting (RIGID), to explicitly and simultaneousl… ▽ More

    Submitted 15 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICCV2023

  12. VideoPro: A Visual Analytics Approach for Interactive Video Programming

    Authors: Jianben He, Xingbo Wang, Kam Kwai Wong, Xijie Huang, Changjian Chen, Zixin Chen, Fengjie Wang, Min Zhu, Huamin Qu

    Abstract: Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional chall… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 11 pages, 7 figures

  13. arXiv:2307.14227  [pdf, other

    cs.CV

    Computational Approaches for Traditional Chinese Painting: From the "Six Principles of Painting" Perspective

    Authors: Wei Zhang, Jian-Wei Zhang, Kam Kwai Wong, Yifang Wang, Yingchaojie Feng, Luwei Wang, Wei Chen

    Abstract: Traditional Chinese Painting (TCP) is an invaluable cultural heritage resource and a unique visual art style. In recent years, increasing interest has been placed on digitalizing TCPs to preserve and revive the culture. The resulting digital copies have enabled the advancement of computational methods for structured and systematic understanding of TCPs. To explore this topic, we conducted an in-de… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  14. arXiv:2307.10281  [pdf, other

    cs.CV

    Semi-supervised Cycle-GAN for face photo-sketch translation in the wild

    Authors: Chaofeng Chen, Wei Liu, Xiao Tan, Kwan-Yee K. Wong

    Abstract: The performance of face photo-sketch translation has improved a lot thanks to deep neural networks. GAN based methods trained on paired images can produce high-quality results under laboratory settings. Such paired datasets are, however, often very small and lack diversity. Meanwhile, Cycle-GANs trained with unpaired photo-sketch datasets suffer from the \emph{steganography} phenomenon, which make… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 11 pages, 11 figures, 5 tables (+ 7 page appendix)

  15. PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

    Authors: Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen

    Abstract: Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, develo** effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the… ▽ More

    Submitted 15 August, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted full paper for IEEE VIS 2023

  16. arXiv:2306.03038  [pdf, other

    cs.CV

    HeadSculpt: Crafting 3D Head Avatars with Text

    Authors: Xiao Han, Yukang Cao, Kai Han, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, Tao Xiang, Kwan-Yee K. Wong

    Abstract: Recently, text-guided 3D generative methods have made remarkable advancements in producing high-quality textures and geometry, capitalizing on the proliferation of large vision-language and image diffusion models. However, existing methods still struggle to create high-fidelity 3D head avatars in two aspects: (1) They rely mostly on a pre-trained text-to-image diffusion model whilst missing the ne… ▽ More

    Submitted 29 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Webpage: https://brandonhan.uk/HeadSculpt/

  17. arXiv:2306.00971  [pdf, other

    cs.CV cs.AI

    ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

    Authors: Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong

    Abstract: Personalized text-to-image generation using diffusion models has recently emerged and garnered significant interest. This task learns a novel concept (e.g., a unique toy), illustrated in a handful of images, into a generative model that captures fine visual details and generates photorealistic images based on textual embeddings. In this paper, we present ViCo, a novel lightweight plug-and-play met… ▽ More

    Submitted 7 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Under review

  18. arXiv:2305.16322  [pdf, other

    cs.CV cs.GR

    Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

    Authors: Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

    Abstract: Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challeng… ▽ More

    Submitted 29 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Camera Ready, Code is available at https://github.com/ShihaoZhaoZSH/Uni-ControlNet

  19. arXiv:2304.06928  [pdf, other

    cs.CV cs.AI

    CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery

    Authors: Shaozhe Hao, Kai Han, Kwan-Yee K. Wong

    Abstract: We tackle the issue of generalized category discovery (GCD). GCD considers the open-world problem of automatically clustering a partially labelled dataset, in which the unlabelled data may contain instances from both novel categories and labelled classes. In this paper, we address the GCD problem with an unknown category number for the unlabelled data. We propose a framework, named CiPR, to bootst… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted to TMLR. Code: https://github.com/haoosz/CiPR

  20. arXiv:2304.05635  [pdf, other

    eess.IV cs.CV

    Unifying and Personalizing Weakly-supervised Federated Medical Image Segmentation via Adaptive Representation and Aggregation

    Authors: Li Lin, Jiewei Wu, Yixiang Liu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) enables multiple sites to collaboratively train powerful deep models without compromising data privacy and security. The statistical heterogeneity (e.g., non-IID data and domain shifts) is a primary obstacle in FL, impairing the generalization performance of the global model. Weakly supervised segmentation, which uses sparsely-grained (i.e., point-, bounding box-, scribble-… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 13 pages, 7 figures

  21. arXiv:2304.05011  [pdf, other

    cs.HC cs.CL

    Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

    Authors: Luoxuan Weng, Minfeng Zhu, Kam Kwai Wong, Shi Liu, Jiashun Sun, Hang Zhu, Dongming Han, Wei Chen

    Abstract: Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences betwee… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  22. arXiv:2304.00916  [pdf, other

    cs.CV

    DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

    Authors: Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

    Abstract: We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tack… ▽ More

    Submitted 30 November, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Project page: https://yukangcao.github.io/DreamAvatar/

  23. arXiv:2304.00359  [pdf, other

    cs.CV

    SeSDF: Self-evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction

    Authors: Yukang Cao, Kai Han, Kwan-Yee K. Wong

    Abstract: We address the problem of clothed human reconstruction from a single image or uncalibrated multi-view images. Existing methods struggle with reconstructing detailed geometry of a clothed human and often require a calibrated setting for multi-view reconstruction. We propose a flexible framework which, by leveraging the parametric SMPL-X model, can take an arbitrary number of input images to reconst… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: 25 pages, 21 figures

  24. arXiv:2303.15111  [pdf, other

    cs.CV cs.AI

    Learning Attention as Disentangler for Compositional Zero-shot Learning

    Authors: Shaozhe Hao, Kai Han, Kwan-Yee K. Wong

    Abstract: Compositional zero-shot learning (CZSL) aims at learning visual concepts (i.e., attributes and objects) from seen compositions and combining concept knowledge into unseen compositions. The key to CZSL is learning the disentanglement of the attribute-object composition. To this end, we propose to exploit cross-attentions as compositional disentanglers to learn disentangled concept embeddings. For e… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 2023, available at https://haoosz.github.io/ade-czsl/

  25. arXiv:2302.09884  [pdf, other

    cs.CV

    GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation

    Authors: Zezheng Zhang, Ryan K. Y. Chan, Kenneth K. Y. Wong

    Abstract: In recent years, self-supervised monocular depth estimation has drawn much attention since it frees of depth annotations and achieved remarkable results on standard benchmarks. However, most of existing methods only focus on either daytime or nighttime images, thus their performance degrades on the other domain because of the large domain shift between daytime and nighttime images. To address this… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  26. Anchorage: Visual Analysis of Satisfaction in Customer Service Videos via Anchor Events

    Authors: Kam Kwai Wong, Xingbo Wang, Yong Wang, Jianben He, Rong Zhang, Huamin Qu

    Abstract: Delivering customer services through video communications has brought new opportunities to analyze customer satisfaction for quality management. However, due to the lack of reliable self-reported responses, service providers are troubled by the inadequate estimation of customer services and the tedious investigation into multimodal video recordings. We introduce Anchorage, a visual analytics syste… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 13 pages. A preprint version of a publication at IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023

  27. arXiv:2302.01966  [pdf, other

    cs.HC

    Towards an Understanding of Distributed Asymmetric Collaborative Visualization on Problem-solving

    Authors: Wai Tong, Meng Xia, Kam Kwai Wong, Doug A. Bowman, Ting-Chuen Pong, Huamin Qu, Yalong Yang

    Abstract: This paper provided empirical knowledge of the user experience for using collaborative visualization in a distributed asymmetrical setting through controlled user studies. With the ability to access various computing devices, such as Virtual Reality (VR) head-mounted displays, scenarios emerge when collaborators have to or prefer to use different computing environments in different places. However… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: 11 pages, 12 figures, accepted at IEEE VR 2023

  28. XNLI: Explaining and Diagnosing NLI-based Visual Data Analysis

    Authors: Yingchaojie Feng, Xingbo Wang, Bo Pan, Kam Kwai Wong, Yi Ren, Shi Liu, Zihan Yan, Yuxin Ma, Huamin Qu, Wei Chen

    Abstract: Natural language interfaces (NLIs) enable users to flexibly specify analytical intentions in data visualization. However, diagnosing the visualization results without understanding the underlying generation process is challenging. Our research explores how to provide explanations for NLIs to help users locate the problems and further revise the queries. We present XNLI, an explainable NLI system f… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 14 pages, 7 figures. A preprint version of a publication at IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023

  29. arXiv:2212.05566  [pdf, other

    cs.CV eess.IV

    YoloCurvSeg: You Only Label One Noisy Skeleton for Vessel-style Curvilinear Structure Segmentation

    Authors: Li Lin, Linkai Peng, Huaqing He, Pu** Cheng, Jiewei Wu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Weakly-supervised learning (WSL) has been proposed to alleviate the conflict between data annotation cost and model performance through employing sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown promising performance, particularly in the image segmentation field. However, it is still a very challenging task due to the limited supervision, especially when only a small… ▽ More

    Submitted 18 August, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: 20 pages, 15 figures, MEDIA accepted

  30. arXiv:2210.08936  [pdf, other

    cs.CV cs.AI

    S$^3$-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint

    Authors: Wenqi Yang, Guanying Chen, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

    Abstract: In this paper, we address the "dual problem" of multi-view scene reconstruction in which we utilize single-view images captured under different point lights to learn a neural scene representation. Different from existing single-view methods which can only recover a 2.5D scene representation (i.e., a normal / depth map for the visible surface), our method learns a neural reflectance field to repres… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022, Project page: https://ywq.github.io/s3nerf

  31. arXiv:2207.11406  [pdf, other

    cs.CV cs.AI

    PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo

    Authors: Wenqi Yang, Guanying Chen, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

    Abstract: Traditional multi-view photometric stereo (MVPS) methods are often composed of multiple disjoint stages, resulting in noticeable accumulated errors. In this paper, we present a neural inverse rendering method for MVPS based on implicit representation. Given multi-view images of a non-Lambertian object illuminated by multiple unknown directional lights, our method jointly estimates the geometry, ma… ▽ More

    Submitted 22 December, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, Project page: https://ywq.github.io/psnerf

  32. arXiv:2207.09353  [pdf, other

    cs.IT cs.AI cs.LG

    Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications

    Authors: Deniz Gunduz, Zhi** Qin, Inaki Estella Aguerri, Harpreet S. Dhillon, Zhaohui Yang, Aylin Yener, Kai Kit Wong, Chan-Byoung Chae

    Abstract: Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these… ▽ More

    Submitted 3 October, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: 32 pages, 14 figures

  33. arXiv:2204.10549  [pdf, other

    cs.CV

    JIFF: Jointly-aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction

    Authors: Yukang Cao, Guanying Chen, Kai Han, Wenqi Yang, Kwan-Yee K. Wong

    Abstract: This paper addresses the problem of single view 3D human reconstruction. Recent implicit function based methods have shown impressive results, but they fail to recover fine face details in their reconstructions. This largely degrades user experience in applications like 3D telepresence. In this paper, we focus on improving the quality of face in the reconstruction and propose a novel Jointly-align… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Camera-ready for CVPR 2022. Project page: https://yukangcao.github.io/JIFF

  34. arXiv:2203.13570  [pdf, other

    cs.LG

    Improving Question Answering over Knowledge Graphs Using Graph Summarization

    Authors: Sirui Li, Kok Kai Wong, Dengya Zhu, Chun Che Fung

    Abstract: Question Answering (QA) systems over Knowledge Graphs (KGs) (KGQA) automatically answer natural language questions using triples contained in a KG. The key idea is to represent questions and entities of a KG as low-dimensional embeddings. Previous KGQAs have attempted to represent entities using Knowledge Graph Embedding (KGE) and Deep Learning (DL) methods. However, KGEs are too shallow to captur… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: The paper is accepted by ICONIP 2021

  35. arXiv:2202.07358  [pdf, other

    cs.CV cs.LG eess.IV

    A Unified Framework for Masked and Mask-Free Face Recognition via Feature Rectification

    Authors: Shaozhe Hao, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

    Abstract: Face recognition under ideal conditions is now considered a well-solved problem with advances in deep learning. Recognizing faces under occlusion, however, still remains a challenge. Existing techniques often fail to recognize faces with both the mouth and nose covered by a mask, which is now very common under the COVID-19 pandemic. Common approaches to tackle this problem include 1) discarding in… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 5 pages, 4 figures, conference

  36. Deep Face Video Inpainting via UV Map**

    Authors: Wenqi Yang, Zhenfang Chen, Chaofeng Chen, Guanying Chen, Kwan-Yee K. Wong

    Abstract: This paper addresses the problem of face video inpainting. Existing video inpainting methods target primarily at natural scenes with repetitive patterns. They do not make use of any prior knowledge of the face to help retrieve correspondences for the corrupted face. They therefore only achieve sub-optimal results, particularly for faces under large pose and expression variations where face compone… ▽ More

    Submitted 13 February, 2023; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: TIP 2023

  37. arXiv:2107.00986  [pdf, other

    cs.CV

    Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

    Authors: Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong

    Abstract: While researches on model-based blind single image super-resolution (SISR) have achieved tremendous successes recently, most of them do not consider the image degradation sufficiently. Firstly, they always assume image noise obeys an independent and identically distributed (i.i.d.) Gaussian or Laplacian distribution, which largely underestimates the complexity of real noise. Secondly, previous com… ▽ More

    Submitted 16 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: Accepted by CVPR 2022

    ACM Class: I.4.4

  38. Dense Reconstruction of Transparent Objects by Altering Incident Light Paths Through Refraction

    Authors: Kai Han, Kwan-Yee K. Wong, Miaomiao Liu

    Abstract: This paper addresses the problem of reconstructing the surface shape of transparent objects. The difficulty of this problem originates from the viewpoint dependent appearance of a transparent object, which quickly makes reconstruction methods tailored for diffuse surfaces fail disgracefully. In this paper, we introduce a fixed viewpoint approach to dense surface reconstruction of transparent objec… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: International Journal of Computer Vision (IJCV)

  39. arXiv:2103.16564  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SC

    Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

    Authors: Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan

    Abstract: We study the problem of dynamic visual reasoning on raw videos. This is a challenging problem; currently, state-of-the-art models often require dense supervision on physical object properties and events from simulation, which are impractical to obtain in real life. In this paper, we present the Dynamic Concept Learner (DCL), a unified framework that grounds physical objects and events from video a… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: ICLR 2021. Project page: http://dcl.csail.mit.edu/

  40. arXiv:2103.14943  [pdf, other

    cs.CV

    HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset

    Authors: Guanying Chen, Chaofeng Chen, Shi Guo, Zhetong Liang, Kwan-Yee K. Wong, Lei Zhang

    Abstract: High dynamic range (HDR) video reconstruction from sequences captured with alternating exposures is a very challenging problem. Existing methods often align low dynamic range (LDR) input sequence in the image space using optical flow, and then merge the aligned images to produce HDR output. However, accurate alignment and fusion in the image space are difficult due to the missing details in the ov… ▽ More

    Submitted 21 August, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: ICCV 2021: http://guanyingc.github.io/DeepHDRVideo/

  41. Fixed Viewpoint Mirror Surface Reconstruction under an Uncalibrated Camera

    Authors: Kai Han, Miaomiao Liu, Dirk Schnieders, Kwan-Yee K. Wong

    Abstract: This paper addresses the problem of mirror surface reconstruction, and proposes a solution based on observing the reflections of a moving reference plane on the mirror surface. Unlike previous approaches which require tedious calibration, our method can recover the camera intrinsics, the poses of the reference plane, as well as the mirror surface from the observed reflections of the reference plan… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

    Comments: IEEE Transactions on Image Processing (TIP). Code available at https://github.com/k-han/mirror

  42. Learning Spatial Attention for Face Super-Resolution

    Authors: Chaofeng Chen, Dihong Gong, Hao Wang, Zhifeng Li, Kwan-Yee K. Wong

    Abstract: General image super-resolution techniques have difficulties in recovering detailed face structures when applying to low resolution face images. Recent deep learning based methods tailored for face images have achieved improved performance by jointly trained with additional task such as face parsing and landmark prediction. However, multi-task learning requires extra manually labeled data. Besides,… ▽ More

    Submitted 4 December, 2020; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: TIP 2020. Codes are available at https://github.com/chaofengc/Face-SPARNet

  43. arXiv:2009.08709  [pdf, other

    cs.CV

    Progressive Semantic-Aware Style Transformation for Blind Face Restoration

    Authors: Chaofeng Chen, Xiaoming Li, Lingbo Yang, Xianhui Lin, Lei Zhang, Kwan-Yee K. Wong

    Abstract: Face restoration is important in face image processing, and has been widely studied in recent years. However, previous works often fail to generate plausible high quality (HQ) results for real-world low quality (LQ) face images. In this paper, we propose a new progressive semantic-aware style transformation framework, named PSFR-GAN, for face restoration. Specifically, instead of using an encoder-… ▽ More

    Submitted 21 March, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: Accepted to CVPR2021, https://github.com/chaofengc/PSFRGAN

  44. arXiv:2009.08679  [pdf, other

    cs.CV

    Face Sketch Synthesis with Style Transfer using Pyramid Column Feature

    Authors: Chaofeng Chen, Xiao Tan, Kwan-Yee K. Wong

    Abstract: In this paper, we propose a novel framework based on deep neural networks for face sketch synthesis from a photo. Imitating the process of how artists draw sketches, our framework synthesizes face sketches in a cascaded manner. A content image is first generated that outlines the shape of the face and the key facial features. Textures and shadings are then added to enrich the details of the sketch… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

    Comments: WACV2018

  45. arXiv:2008.10796  [pdf, other

    eess.IV cs.CV

    Deep Variational Network Toward Blind Image Restoration

    Authors: Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong

    Abstract: Blind image restoration (IR) is a common yet challenging problem in computer vision. Classical model-based methods and recent deep learning (DL)-based methods represent two different methodologies for this problem, each with their own merits and drawbacks. In this paper, we propose a novel blind image restoration method, aiming to integrate both the advantages of them. Specifically, we construct a… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: Accepted by TPAMI@2024. Code: https://github.com/zsyOAOA/VIRNet

    ACM Class: I.4.4

  46. arXiv:2007.13145  [pdf, other

    cs.CV

    Deep Photometric Stereo for Non-Lambertian Surfaces

    Authors: Guanying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong

    Abstract: This paper addresses the problem of photometric stereo, in both calibrated and uncalibrated scenarios, for non-Lambertian surfaces based on deep learning. We first introduce a fully convolutional deep network for calibrated photometric stereo, which we call PS-FCN. Unlike traditional approaches that adopt simplified reflectance models to make the problem tractable, our method directly learns the m… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

  47. arXiv:2003.00403  [pdf, other

    cs.CV

    Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

    Authors: Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, Qi Wu

    Abstract: Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their expressions typically des… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: To appear in CVPR2020

  48. arXiv:2001.09308  [pdf, other

    cs.CV

    Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

    Authors: Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong

    Abstract: In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video. Specifically, given an untrimmed video and a query sentence, our goal is to localize a temporal segment in the video that semantically corresponds to the query sentence, with no reliance on any temporal annotation during training. We propose a two-stage model to tackle this problem in a coarse-to-fine… ▽ More

    Submitted 25 January, 2020; originally announced January 2020.

  49. arXiv:1910.02222  [pdf, other

    cs.CV

    Colored Transparent Object Matting from a Single Image Using Deep Learning

    Authors: Jamal Ahmed Rahim, Kwan-Yee Kenneth Wong

    Abstract: This paper proposes a deep learning based method for colored transparent object matting from a single image. Existing approaches for transparent object matting often require multiple images and long processing times, which greatly hinder their applications on real-world transparent objects. The recently proposed TOM-Net can produce a matte for a colorless transparent object from a single image in… ▽ More

    Submitted 5 October, 2019; originally announced October 2019.

  50. arXiv:1907.11544  [pdf, other

    cs.CV eess.IV

    Learning Transparent Object Matting

    Authors: Guanying Chen, Kai Han, Kwan-Yee K. Wong

    Abstract: This paper addresses the problem of image matting for transparent objects. Existing approaches often require tedious capturing procedures and long processing time, which limit their practical use. In this paper, we formulate transparent object matting as a refractive flow estimation problem, and propose a deep learning framework, called TOM-Net, for learning the refractive flow. Our framework comp… ▽ More

    Submitted 25 July, 2019; originally announced July 2019.

    Comments: To appear in International Journal of Computer Vision, Project Page: https://guanyingc.github.io/TOM-Net. arXiv admin note: substantial text overlap with arXiv:1803.04636