Skip to main content

Showing 1–50 of 60 results for author: Wang, Y F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20066  [pdf, other

    cs.CV

    ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

    Authors: Ding-Jiun Huang, Zi-Ting Chou, Yu-Chiang Frank Wang, Cheng Sun

    Abstract: NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.19392  [pdf, other

    cs.CV

    ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

    Authors: Jr-Jen Chen, Yu-Chien Liao, Hsi-Che Lin, Yu-Chu Yu, Yen-Chun Chen, Yu-Chiang Frank Wang

    Abstract: We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across vi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.18871  [pdf, other

    eess.AS cs.CL

    DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  4. arXiv:2406.12834  [pdf, other

    cs.CV

    GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

    Authors: Ci-Siang Lin, I-Jieh Liu, Min-Hung Chen, Chien-Yi Wang, Sifei Liu, Yu-Chiang Frank Wang

    Abstract: Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence throughout the entire video. Most existing methods require end-to-end training with dense mask annotations, which could be computation-consuming and less scalable. In this work, we aim to efficiently adapt foundation segmentation models for addressing RVOS from weak supervision with the proposed… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: CVPR Workshop (CVinW) 2024. Project page: https://jack24658735.github.io/groprompt/

  5. arXiv:2405.16194  [pdf, other

    cs.LG cs.AI cs.RO

    Diffusion-Reward Adversarial Imitation Learning

    Authors: Chun-Mao Lai, Hsiang-Chun Wang, **-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun

    Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despit… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  6. arXiv:2403.16539  [pdf, other

    cs.CV

    Data-Efficient 3D Visual Grounding via Order-Aware Referring

    Authors: Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang

    Abstract: 3D visual grounding aims to identify the target object within a 3D point cloud scene referred to by a natural language description. Previous works usually require significant data relating to point color and their descriptions to exploit the corresponding complicated verbo-visual relations. In our work, we introduce Vigor, a novel Data-Efficient 3D Visual Grounding framework via Order-aware Referr… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  7. arXiv:2403.09296  [pdf, other

    cs.CV

    Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

    Authors: Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, when adapting pre-trained VLMs to a sequence of downstream tasks, they are prone to forgetting previously learned knowledge and degrade their zero-shot classification capability. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer frame… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  8. arXiv:2403.03608  [pdf, other

    cs.CV cs.AI cs.LG

    GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

    Authors: Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu, Yu-Chiang Frank Wang

    Abstract: Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision. In this work, we introduce a Generalizable Semantic Neural Radiance Field (GSNeRF), which uniquely takes image semantics into the synthesis process so that both novel view images and the associated semantic maps can be produced for unseen scenes. Our GSN… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  9. arXiv:2402.16321  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

    Authors: Szu-Wei Fu, Kuo-Hsuan Hung, Yu Tsao, Yu-Chiang Frank Wang

    Abstract: Speech quality estimation has recently undergone a paradigm shift from human-hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantized-variatio… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Published as a conference paper at ICLR 2024

  10. arXiv:2402.09353  [pdf, other

    cs.CL cs.CV

    DoRA: Weight-Decomposed Low-Rank Adaptation

    Authors: Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

    Abstract: Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Code available at https://github.com/NVlabs/DoRA

  11. arXiv:2401.11791  [pdf, other

    cs.CV cs.CL cs.LG

    SemPLeS: Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation

    Authors: Ci-Siang Lin, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen

    Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models using image data with only image-level supervision. Since precise pixel-level annotations are not accessible, existing methods typically focus on producing pseudo masks for training segmentation models by refining CAM-like heatmaps. However, the produced heatmaps may capture only the discriminative image regions of ob… ▽ More

    Submitted 11 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  12. arXiv:2312.07165  [pdf, other

    cs.CV cs.AI cs.LG

    Language-Guided Transformer for Federated Multi-Label Classification

    Authors: I-Jieh Liu, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Federated Learning (FL) is an emerging paradigm that enables multiple users to collaboratively train a robust model in a privacy-preserving manner without sharing their private data. Most existing approaches of FL only consider traditional single-label image classification, ignoring the impact when transferring the task to multi-label image classification. Nevertheless, it is still challenging for… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  13. arXiv:2312.02647  [pdf, other

    cs.CV

    TPA3D: Triplane Attention for Fast Text-to-3D Generation

    Authors: Hong-En Chen, Bin-Shih Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang

    Abstract: Due to the lack of large-scale text-3D correspondence data, recent text-to-3D generation works mainly rely on utilizing 2D diffusion models for synthesizing 3D data. Since diffusion-based methods typically require significant optimization time for both training and inference, the use of GAN-based models would still be desirable for fast 3D generation. In this work, we propose Triplane Attention fo… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  14. arXiv:2311.18695  [pdf, other

    cs.CV cs.LG

    Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction

    Authors: Cheng Sun, Wei-En Tai, Yu-Lin Shih, Kuan-Wei Chen, Yong-**g Syu, Kent Selwyn The, Yu-Chiang Frank Wang, Hwann-Tzong Chen

    Abstract: State-of-the-art single-view 360-degree room layout reconstruction methods formulate the problem as a high-level 1D (per-column) regression task. On the other hand, traditional low-level 2D layout segmentation is simpler to learn and can represent occluded regions, but it requires complex post-processing for the targeting layout polygon and sacrifices accuracy. We present Seg2Reg to render 1D layo… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  15. arXiv:2311.17717  [pdf, other

    cs.CV cs.LG

    Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

    Authors: Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Concept erasure in text-to-image diffusion models aims to disable pre-trained diffusion models from generating images related to a target concept. To perform reliable concept erasure, the properties of robustness and locality are desirable. The former refrains the model from producing images associated with the target concept for any paraphrased or learned prompts, while the latter preserves its a… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  16. arXiv:2310.12344  [pdf, other

    cs.CL cs.CV

    LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

    Authors: Cheng-Fu Yang, Yen-Chun Chen, Jianwei Yang, Xiyang Dai, Lu Yuan, Yu-Chiang Frank Wang, Kai-Wei Chang

    Abstract: End-to-end Transformers have demonstrated an impressive success rate for Embodied Instruction Following when the environment has been seen in training. However, they tend to struggle when deployed in an unseen environment. This lack of generalizability is due to the agent's insensitivity to subtle changes in natural language instructions. To mitigate this issue, we propose explicitly aligning the… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  17. arXiv:2309.04723  [pdf, other

    cs.CV

    Frequency-Aware Self-Supervised Long-Tailed Learning

    Authors: Ci-Siang Lin, Min-Hung Chen, Yu-Chiang Frank Wang

    Abstract: Data collected from the real world typically exhibit long-tailed distributions, where frequent classes contain abundant data while rare ones have only a limited number of samples. While existing supervised learning approaches have been proposed to tackle such data imbalance, the requirement of label supervision would limit their applicability to real-world scenarios in which label annotation might… ▽ More

    Submitted 15 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: ICCV Workshop 2023 (Oral)

  18. arXiv:2308.15367  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation

    Authors: Fu-En Yang, Chien-Yi Wang, Yu-Chiang Frank Wang

    Abstract: Federated learning (FL) emerges as a decentralized learning framework which trains models from multiple distributed clients without sharing their data to preserve privacy. Recently, large-scale pre-trained models (e.g., Vision Transformer) have shown a strong capability of deriving robust representations. However, the data heterogeneity among clients, the limited computation resources, and the com… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023

  19. arXiv:2307.10317  [pdf, other

    cs.LG cs.AI

    FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning

    Authors: Chia-Hsiang Kao, Yu-Chiang Frank Wang

    Abstract: Federated Learning (FL) offers a collaborative training framework, allowing multiple clients to contribute to a shared model without compromising data privacy. Due to the heterogeneous nature of local datasets, updated client models may overfit and diverge from one another, commonly known as the problem of client drift. In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Un… ▽ More

    Submitted 13 November, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: 20 pages, 5 figures

  20. arXiv:2306.17404  [pdf, other

    cs.CV

    QuAVF: Quality-aware Audio-Visual Fusion for Ego4D Talking to Me Challenge

    Authors: Hsi-Che Lin, Chien-Yi Wang, Min-Hung Chen, Szu-Wei Fu, Yu-Chiang Frank Wang

    Abstract: This technical report describes our QuAVF@NTU-NVIDIA submission to the Ego4D Talking to Me (TTM) Challenge 2023. Based on the observation from the TTM task and the provided dataset, we propose to use two separate models to process the input videos and audio. By doing so, we can utilize all the labeled training data, including those without bounding box labels. Furthermore, we leverage the face qua… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: 1st place at Ego4D Talking to Me (TTM) Challenge 2023

  21. arXiv:2305.17343  [pdf, other

    cs.CV cs.SD eess.AS

    Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

    Authors: Yung-Hsuan Lai, Yen-Chun Chen, Yu-Chiang Frank Wang

    Abstract: Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, Listen, and Parse dataset (LLP), we investigate the under-explored unaligned setting, where the goal is to recognize audio and visual events in a video… ▽ More

    Submitted 2 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  22. arXiv:2302.09561  [pdf, other

    cs.CV cs.AI cs.LG

    TAX: Tendency-and-Assignment Explainer for Semantic Segmentation with Multi-Annotators

    Authors: Yuan-Chia Cheng, Zu-Yun Shiau, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: To understand how deep neural networks perform classification predictions, recent research attention has been focusing on develo** techniques to offer desirable explanations. However, most existing methods cannot be easily applied for semantic segmentation; moreover, they are not designed to offer interpretability under the multi-annotator setting. Instead of viewing ground-truth pixel-level lab… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Comments: 10 pages, 5 figures

  23. arXiv:2211.14544  [pdf, other

    cs.CV cs.AI cs.LG

    Target-Free Text-guided Image Manipulation

    Authors: Wan-Cyuan Fan, Cheng-Fu Yang, Chiao-An Yang, Yu-Chiang Frank Wang

    Abstract: We tackle the problem of target-free text-guided image manipulation, which requires one to modify the input reference image based on the given text instruction, while no ground truth target image is observed during training. To address this challenging task, we propose a Cyclic-Manipulation GAN (cManiGAN) in this paper, which is able to realize where and how to edit the image regions of interest.… ▽ More

    Submitted 1 December, 2022; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: AAAI 2023

  24. arXiv:2209.12343  [pdf, other

    cs.CV cs.LG

    Paraphrasing Is All You Need for Novel Object Captioning

    Authors: Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang

    Abstract: Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristi… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022

  25. arXiv:2208.14439  [pdf, other

    cs.CV cs.LG

    Self-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond

    Authors: Cheng-Yen Hsieh, Chih-Jung Chang, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: While self-supervised learning has been shown to benefit a number of vision tasks, existing techniques mainly focus on image-level manipulation, which may not generalize well to downstream tasks at patch or pixel levels. Moreover, existing SSL methods might not sufficiently describe and associate the above representations within and across image scales. In this paper, we propose a Self-Supervised… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: IEEE WACV 2023, Github: https://github.com/WesleyHsieh0806/SS-PRL

  26. arXiv:2208.13753  [pdf, other

    cs.CV cs.AI cs.LG

    Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

    Authors: Wan-Cyuan Fan, Yen-Chun Chen, Dongdong Chen, Yu Cheng, Lu Yuan, Yu-Chiang Frank Wang

    Abstract: Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our m… ▽ More

    Submitted 1 December, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: AAAI 2023

  27. arXiv:2208.07828  [pdf, other

    cs.CV cs.LG

    Learning Facial Liveness Representation for Domain Generalized Face Anti-spoofing

    Authors: Zih-Ching Chen, Lin-Hsi Tsao, Chin-Lun Fu, Shang-Fu Chen, Yu-Chiang Frank Wang

    Abstract: Face anti-spoofing (FAS) aims at distinguishing face spoof attacks from the authentic ones, which is typically approached by learning proper models for performing the associated classification task. In practice, one would expect such models to be generalized to FAS in different image domains. Moreover, it is not practical to assume that the type of spoof attacks would be known in advance. In this… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted to ICME 2022

  28. arXiv:2205.02958  [pdf, other

    cs.CV

    Scene Graph Expansion for Semantics-Guided Image Outpainting

    Authors: Chiao-An Yang, Cheng-Yo Tan, Wan-Cyuan Fan, Cheng-Fu Yang, Meng-Lin Wu, Yu-Chiang Frank Wang

    Abstract: In this paper, we address the task of semantics-guided image outpainting, which is to complete an image by generating semantically practical content. Different from most existing image outpainting works, we approach the above task by understanding and completing image semantics at the scene graph level. In particular, we propose a novel network of Scene Graph Transformer (SGT), which is designed t… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: CVPR 2022

  29. arXiv:2204.13696  [pdf, other

    cs.CV

    NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

    Authors: Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang

    Abstract: We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance. NeurMiPs leverages a collection of local planar experts in 3D space as the scene representation. Each planar expert consists of the parameters of the local rectangular shape representing geometry and a neural radiance field modeling the color and opacity. We rend… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: CVPR 2022. Project page: https://zhihao-lin.github.io/neurmips/

  30. arXiv:2203.12304  [pdf, other

    cs.CV

    Domain-Generalized Textured Surface Anomaly Detection

    Authors: Shang-Fu Chen, Yu-Min Liu, Chia-Ching Lin, Trista Pei-Chun Chen, Yu-Chiang Frank Wang

    Abstract: Anomaly detection aims to identify abnormal data that deviates from the normal ones, while typically requiring a sufficient amount of normal data to train the model for performing this task. Despite the success of recent anomaly detection methods, performing anomaly detection in an unseen domain remain a challenging task. In this paper, we address the task of domain-generalized textured surface an… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME) 2022

  31. arXiv:2112.13539  [pdf, other

    cs.CV

    Few-Shot Classification in Unseen Domains by Episodic Meta-Learning Across Visual Domains

    Authors: Yuan-Chia Cheng, Ci-Siang Lin, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Few-shot classification aims to carry out classification given only few labeled examples for the categories of interest. Though several approaches have been proposed, most existing few-shot learning (FSL) models assume that base and novel classes are drawn from the same data domain. When it comes to recognizing novel-class data in an unseen domain, this becomes an even more challenging task of dom… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

    Comments: Accepted by ICIP 2021

  32. arXiv:2112.13538  [pdf, other

    cs.CV

    Meta-Learned Feature Critics for Domain Generalized Semantic Segmentation

    Authors: Zu-Yun Shiau, Wei-Wei Lin, Ci-Siang Lin, Yu-Chiang Frank Wang

    Abstract: How to handle domain shifts when recognizing or segmenting visual data across domains has been studied by learning and vision communities. In this paper, we address domain generalized semantic segmentation, in which the segmentation model is trained on multiple source domains and is expected to generalize to unseen data domains. We propose a novel meta-learning scheme with feature disentanglement… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

    Comments: Accepted by ICIP 2021

  33. arXiv:2111.01418  [pdf, other

    cs.CV

    A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation

    Authors: Yuan-Hao Lee, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest. One is typically required to collect a large mount of data (i.e., base classes) with such ground truth information, followed by meta-learning strategies to address the above learning task. When only image-level semantic labels can… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: Accepted to WACV 2022

  34. arXiv:2105.00708  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

    Authors: Yan-Bo Lin, Yu-Chiang Frank Wang

    Abstract: Human perceives rich auditory experience with distinct sound heard by ears. Videos recorded with binaural audio particular simulate how human receives ambient sound. However, a large number of videos are with monaural audio only, which would degrade the user experience due to the lack of ambient information. To address this issue, we propose an audio spatialization framework to convert a monaural… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: AAAI'21

  35. arXiv:2102.13329  [pdf, other

    cs.CV

    Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis

    Authors: Fu-En Yang, **g-Cheng Chang, Yuan-Hao Lee, Yu-Chiang Frank Wang

    Abstract: Generating videos with content and motion variations is a challenging task in computer vision. While the recent development of GAN allows video generation from latent representations, it is not easy to produce videos with particular content of motion patterns of interest. In this paper, we propose Dual Motion Transfer GAN (Dual-MTGAN), which takes image and video data as inputs while learning dise… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

    Comments: Accepted to ICPR 2020

  36. arXiv:2011.00788  [pdf, other

    cs.CV

    Representation Decomposition for Image Manipulation and Beyond

    Authors: Shang-Fu Chen, Jia-Wei Yan, Ya-Fan Su, Yu-Chiang Frank Wang

    Abstract: Representation disentanglement aims at learning interpretable features, so that the output can be recovered or manipulated accordingly. While existing works like infoGAN and AC-GAN exist, they choose to derive disjoint attribute code for feature disentanglement, which is not applicable for existing/trained generative models. In this paper, we propose a decomposition-GAN (dec-GAN), which is able to… ▽ More

    Submitted 23 March, 2022; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Published at IEEE International Conference in Image Processing (ICIP) 2021

  37. arXiv:2010.10772  [pdf, other

    cs.CV

    Semantics-Guided Representation Learning with Applications to Visual Synthesis

    Authors: Jia-Wei Yan, Ci-Siang Lin, Fu-En Yang, Yu-Jhe Li, Yu-Chiang Frank Wang

    Abstract: Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand and utilize the derived latent space for further applications such as visual synthesis or recognition. While most existing approaches derive an interpolatable latent space and induces smooth transition in image appearance, it is still not clear how to observe… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: ICPR 2020

  38. arXiv:2010.09561  [pdf, other

    cs.CV

    Domain Generalized Person Re-Identification via Cross-Domain Episodic Learning

    Authors: Ci-Siang Lin, Yuan-Chia Cheng, Yu-Chiang Frank Wang

    Abstract: Aiming at recognizing images of the same person across distinct camera views, person re-identification (re-ID) has been among active research topics in computer vision. Most existing re-ID works require collection of a large amount of labeled image data from the scenes of interest. When the data to be recognized are different from the source-domain training ones, a number of domain adaptation appr… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted to ICPR 2020

  39. arXiv:2010.01148  [pdf, other

    cs.CV

    Semantics-Guided Clustering with Deep Progressive Learning for Semi-Supervised Person Re-identification

    Authors: Chih-Ting Liu, Yu-Jhe Li, Shao-Yi Chien, Yu-Chiang Frank Wang

    Abstract: Person re-identification (re-ID) requires one to match images of the same person across camera views. As a more challenging task, semi-supervised re-ID tackles the problem that only a number of identities in training data are fully labeled, while the remaining are unlabeled. Assuming that such labeled and unlabeled training data share disjoint identity labels, we propose a novel framework of Seman… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  40. arXiv:2008.11203  [pdf, other

    cs.CV cs.LG

    Learning to Learn in a Semi-Supervised Fashion

    Authors: Yun-Chun Chen, Chao-Te Chou, Yu-Chiang Frank Wang

    Abstract: To address semi-supervised learning from both labeled and unlabeled data, we present a novel meta-learning scheme. We particularly consider that labeled and unlabeled data share disjoint ground truth label sets, which can be seen tasks like in person re-identification or image retrieval. Our learning scheme exploits the idea of leveraging information from labeled to unlabeled data. Instead of fitt… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  41. arXiv:2007.09163  [pdf, other

    cs.CV cs.LG cs.NE eess.IV

    Wavelet Channel Attention Module with a Fusion Network for Single Image Deraining

    Authors: Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang

    Abstract: Single image deraining is a crucial problem because rain severely degenerates the visibility of images and affects the performance of computer vision tasks like outdoor surveillance systems and intelligent vehicles. In this paper, we propose the new convolutional neural network (CNN) called the wavelet channel attention module with a fusion network. Wavelet transform and the inverse wavelet transf… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted to IEEE ICIP 2020

    Journal ref: 2020 IEEE International Conference on Image Processing (ICIP)

  42. arXiv:2006.01410  [pdf, other

    cs.CV

    Transforming Multi-Concept Attention into Video Summarization

    Authors: Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

    Abstract: Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input. In this paper, we propose an novel attention-based framework for video summarization with complex video data. Unlike previous works which only apply attention mechanism on the correspondence between frames, our multi-concept video self-attention (MC-VSA… ▽ More

    Submitted 2 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

  43. arXiv:2002.09274  [pdf, other

    cs.CV

    Cross-Resolution Adversarial Dual Network for Person Re-Identification and Beyond

    Authors: Yu-Jhe Li, Yun-Chun Chen, Yen-Yu Lin, Yu-Chiang Frank Wang

    Abstract: Person re-identification (re-ID) aims at matching images of the same person across camera views. Due to varying distances between cameras and persons of interest, resolution mismatch can be expected, which would degrade re-ID performance in real-world scenarios. To overcome this problem, we propose a novel generative adversarial network to address cross-resolution person re-ID, allowing query imag… ▽ More

    Submitted 22 October, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). 17 pages. arXiv admin note: substantial text overlap with arXiv:1908.06052

  44. arXiv:1909.09675  [pdf, other

    cs.CV

    Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation

    Authors: Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, Yu-Chiang Frank Wang

    Abstract: Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras. To address this challenging task, existing re-ID models typically rely on a large amount of labeled training data, which is not practical for real-world applications. To alleviate this limitation, researchers now targets at cross-dataset re-ID which focuses on generalizing the discrimin… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

    Comments: Accepted to ICCV 2019

  45. arXiv:1908.06052  [pdf, other

    cs.CV cs.LG

    Recover and Identify: A Generative Dual Model for Cross-Resolution Person Re-Identification

    Authors: Yu-Jhe Li, Yun-Chun Chen, Yen-Yu Lin, Xiaofei Du, Yu-Chiang Frank Wang

    Abstract: Person re-identification (re-ID) aims at matching images of the same identity across camera views. Due to varying distances between cameras and persons of interest, resolution mismatch can be expected, which would degrade person re-ID performance in real-world scenarios. To overcome this problem, we propose a novel generative adversarial network to address cross-resolution person re-ID, allowing q… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  46. arXiv:1908.01683  [pdf, other

    cs.CV

    Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification

    Authors: Chih-Ting Liu, Chih-Wei Wu, Yu-Chiang Frank Wang, Shao-Yi Chien

    Abstract: Video-based person re-identification (Re-ID) aims at matching video sequences of pedestrians across non-overlap** cameras. It is a practical yet challenging task of how to embed spatial and temporal information of a video into its feature representation. While most existing methods learn the video characteristics by aggregating image-wise features and designing attention mechanisms in Neural Net… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

    Comments: This paper was accepted by 2019 British Machine Vision Conference (BMVC)

    Journal ref: BMVC2019

  47. arXiv:1907.10843  [pdf, other

    cs.CV cs.LG

    Learning Resolution-Invariant Deep Representations for Person Re-Identification

    Authors: Yun-Chun Chen, Yu-Jhe Li, Xiaofei Du, Yu-Chiang Frank Wang

    Abstract: Person re-identification (re-ID) solves the task of matching images across cameras and is among the research topics in vision community. Since query images in real-world scenarios might suffer from resolution loss, how to solve the resolution mismatch problem during person re-ID becomes a practical problem. Instead of applying separate image super-resolution models, we propose a novel network arch… ▽ More

    Submitted 25 July, 2019; originally announced July 2019.

    Comments: Accepted to AAAI 2019 (Oral)

  48. arXiv:1904.04232  [pdf, other

    cs.CV

    A Closer Look at Few-shot Classification

    Authors: Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

    Abstract: Few-shot classification aims to learn a classifier to recognize unseen classes during training with limited labeled examples. While significant progress has been made, the growing complexity of network designs, meta-learning algorithms, and differences in implementation details make a fair comparison difficult. In this paper, we present 1) a consistent comparative analysis of several representativ… ▽ More

    Submitted 12 January, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: ICLR 2019. Code: https://github.com/wyharveychen/CloserLookFewShot . Project: https://sites.google.com/view/a-closer-look-at-few-shot/

  49. arXiv:1902.07473  [pdf, other

    cs.CV

    Dual-modality seq2seq network for audio-visual event localization

    Authors: Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang

    Abstract: Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level). To address this task, we pro-pose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking bothaudio and visual features at each time segment as inputs, ourproposed model learns global and local event informat… ▽ More

    Submitted 6 August, 2020; v1 submitted 20 February, 2019; originally announced February 2019.

    Comments: Accepted in ICASSP 2019

  50. arXiv:1811.12016  [pdf, other

    cs.CV

    3D Shape Reconstruction from a Single 2D Image via 2D-3D Self-Consistency

    Authors: Yi-Lun Liao, Yao-Cheng Yang, Yu-Chiang Frank Wang

    Abstract: Aiming at inferring 3D shapes from 2D images, 3D shape reconstruction has drawn huge attention from researchers in computer vision and deep learning communities. However, it is not practical to assume that 2D input images and their associated ground truth 3D shapes are always available during training. In this paper, we propose a framework for semi-supervised 3D reconstruction. This is realized by… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.