Skip to main content

Showing 1–16 of 16 results for author: Tai, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.16136  [pdf, other

    cs.AI cs.CL cs.LG cs.SD eess.AS

    C3LLM: Conditional Multimodal Content Generation Using Large Language Models

    Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

    Abstract: We introduce C3LLM (Conditioned-on-Three-Modalities Large Language Models), a novel framework combining three tasks of video-to-audio, audio-to-text, and text-to-audio together. C3LLM adapts the Large Language Model (LLM) structure as a bridge for aligning different modalities, synthesizing the given conditional information, and making multimodal generation in a discrete manner. Our contributions… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  2. arXiv:2404.01717  [pdf, other

    cs.CV eess.IV

    AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation

    Authors: Rui Xie, Ying Tai, Chen Zhao, Kai Zhang, Zhenyu Zhang, Jun Zhou, Xiaoqian Ye, Qian Wang, Jian Yang

    Abstract: Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs. However, their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps. Inspired by the efficient adversarial diffusion di… ▽ More

    Submitted 23 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  3. arXiv:2305.12901  [pdf

    eess.IV

    TSPTQ-ViT: Two-scaled post-training quantization for vision transformer

    Authors: Yu-Shan Tai, Ming-Guang Lin, An-Yeu, Wu

    Abstract: Vision transformers (ViTs) have achieved remarkable performance in various computer vision tasks. However, intensive memory and computation requirements impede ViTs from running on resource-constrained edge devices. Due to the non-normally distributed values after Softmax and GeLU, post-training quantization on ViTs results in severe accuracy degradation. Moreover, conventional methods fail to add… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  4. arXiv:2211.02297  [pdf, other

    eess.IV

    SDRTV-to-HDRTV Conversion via Spatial-Temporal Feature Fusion

    Authors: Kepeng Xu, Li Xu, Gang He, Chang Wu, Zijia Ma, Ming Sun, Yu-Wing Tai

    Abstract: HDR(High Dynamic Range) video can reproduce realistic scenes more realistically, with a wider gamut and broader brightness range. HDR video resources are still scarce, and most videos are still stored in SDR (Standard Dynamic Range) format. Therefore, SDRTV-to-HDRTV Conversion (SDR video to HDR video) can significantly enhance the user's video viewing experience. Since the correlation between adja… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 8 pages

  5. arXiv:2207.07931  [pdf

    eess.IV cs.LG

    Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

    Authors: Yu-Shan Tai, Cheng-Yang Chang, Chieh-Fang Teng, AnYeu, Wu

    Abstract: Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention… ▽ More

    Submitted 18 July, 2022; v1 submitted 16 July, 2022; originally announced July 2022.

  6. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  7. arXiv:2112.10184  [pdf

    eess.IV cs.CV

    A Deep Learning Based Workflow for Detection of Lung Nodules With Chest Radiograph

    Authors: Yang Tai, Yu-Wen Fang, Fang-Yi Su, Jung-Hsien Chiang

    Abstract: PURPOSE: This study aimed to develop a deep learning-based tool to detect and localize lung nodules with chest radiographs(CXRs). We expected it to enhance the efficiency of interpreting CXRs and reduce the possibilities of delayed diagnosis of lung cancer. MATERIALS AND METHODS: We collected CXRs from NCKUH database and VBD, an open-source medical image dataset, as our training and validation d… ▽ More

    Submitted 11 March, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

  8. arXiv:2112.07948  [pdf, other

    cs.CV eess.IV

    Transcoded Video Restoration by Temporal Spatial Auxiliary Network

    Authors: Li Xu, Gang He, **jia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

    Abstract: In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers. Previous works in compressed video restoration typically assume the compression artifacts are caused by one-time encoding.… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI2022

  9. arXiv:2110.08828  [pdf

    cs.CV cs.LG eess.SP

    Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations

    Authors: Yu-Shan Tai, Chieh-Fang Teng, Cheng-Yang Chang, An-Yeu Wu

    Abstract: Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Neverthele… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 5 pages, 5 figures, submitted to 2022 ICASSP

  10. arXiv:2108.02656  [pdf

    eess.IV cs.CV

    A Computer-Aided Diagnosis System for Breast Pathology: A Deep Learning Approach with Model Interpretability from Pathological Perspective

    Authors: Wei-Wen Hsu, Yongfang Wu, Chang Hao, Yu-Ling Hou, Xiang Gao, Yun Shao, Xueli Zhang, Tao He, Yanhong Tai

    Abstract: Objective: We develop a computer-aided diagnosis (CAD) system using deep learning approaches for lesion detection and classification on whole-slide images (WSIs) with breast cancer. The deep features being distinguishing in classification from the convolutional neural networks (CNN) are demonstrated in this study to provide comprehensive interpretability for the proposed CAD system using pathologi… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

  11. arXiv:2009.05224  [pdf, other

    cs.CV cs.LG eess.IV

    HAA500: Human-Centric Atomic Action Dataset with Curated Videos

    Authors: Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang

    Abstract: We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames. To minimize ambiguities in action classification, HAA500 consists of highly diversified classes of fine-grained atomic actions, where only consistent actions fall under the same label, e.g., "Baseball Pitching" vs "Free Throw in Basketball". Thus HAA50… ▽ More

    Submitted 16 August, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

  12. Cascaded deep monocular 3D human pose estimation with evolutionary training data

    Authors: Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng

    Abstract: End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data. This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for traini… ▽ More

    Submitted 8 April, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: Accepted to CVPR 2020 as Oral Presentation

  13. arXiv:2005.03819  [pdf, other

    cs.CV eess.IV

    One-Shot Object Detection without Fine-Tuning

    Authors: Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited. In this paper, we attempt to enrich such categories by addressing the one-shot object detection problem, where the number of annotated training examples for learning an unseen class is limited to one. We introduce a two-stage model consisting of a first sta… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  14. arXiv:2005.01996  [pdf, other

    eess.IV cs.CV

    NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

    Authors: Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng , et al. (21 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Proc… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

  15. arXiv:1911.08681  [pdf, other

    eess.IV

    Color-wise Attention Network for Low-light Image Enhancement

    Authors: Yousef Atoum, Mao Ye, Liu Ren, Ying Tai, Xiaoming Liu

    Abstract: Absence of nearby light sources while capturing an image will degrade the visibility and quality of the captured image, making computer vision tasks difficult. In this paper, a color-wise attention network (CWAN) is proposed for low-light image enhancement based on convolutional neural networks. Motivated by the human visual system when looking at dark images, CWAN learns an end-to-end map** bet… ▽ More

    Submitted 16 May, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: 8 pages, 9 figures

  16. arXiv:1907.10283  [pdf, other

    cs.CV eess.IV

    StableNet: Semi-Online, Multi-Scale Deep Video Stabilization

    Authors: Chia-Hung Huang, Hang Yin, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Video stabilization algorithms are of greater importance nowadays with the prevalence of hand-held devices which unavoidably produce videos with undesirable shaky motions. In this paper we propose a data-driven online video stabilization method along with a paired dataset for deep learning. The network processes each unsteady frame progressively in a multi-scale manner, from low resolution to high… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: Chia-Hung and Hang have equal contribution