Search | arXiv e-print repository

Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Yu Jian, Chi-Han Tsai

Abstract: Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing… ▽ More Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing on the overlooked phenomenon of semantic inconsistency between images and text in social media posts. Through extensive analysis, we demonstrate that this inconsistency increases with post popularity, challenging the conventional use of VLM features. We provide a comprehensive investigation of semantic inconsistency across different popularity intervals and analyze the impact of VLM feature adaptation on SMP tasks. Our experiments reveal that incorporating inconsistency measures and adapted text features significantly improves model performance, achieving an SRC of 0.729 and an MAE of 1.227. These findings not only enhance SMP prediction accuracy but also provide crucial insights for develo** more targeted approaches in social media analysis. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Submission of the 7th Social Media Prediction Challenge

arXiv:2406.19941 [pdf, other]

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Authors: Chih-Chung Hsu, Shao-Ning Chen, Mei-Hsuan Wu, Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

Abstract: As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques.… ▽ More As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE. △ Less

Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

Comments: Submitted to TPAMI 2024

arXiv:2406.01356 [pdf, other]

MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images

Authors: Ke-Lei Wang, Pin-Hsuan Chou, Young-Ching Chou, Chia-Jen Liu, Cheng-Kuan Lin, Yu-Chee Tseng

Abstract: While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time a… ▽ More While there are a lot of models for instance segmentation, PolarMask stands out as a unique one that represents an object by a Polar coordinate system. With an anchor-box-free design and a single-stage framework that conducts detection and segmentation at one time, PolarMask is proved to be able to balance efficiency and accuracy. Hence, it can be easily connected with other downstream real-time applications. In this work, we observe that there are two deficiencies associated with PolarMask: (i) inability of representing concave objects and (ii) inefficiency in using ray regression. We propose MP-PolarMask (Multi-Point PolarMask) by taking advantage of multiple Polar systems. The main idea is to extend from one main Polar system to four auxiliary Polar systems, thus capable of representing more complicated convex-and-concave-mixed shapes. We validate MP-PolarMask on both general objects and food objects of the COCO dataset, and the results demonstrate significant improvement of 13.69% in AP_L and 7.23% in AP over PolarMask with 36 rays. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.16466 [pdf, other]

High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

Authors: JiaKui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo XU, Guoqi Li

Abstract: Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the for… ▽ More Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the forward propagation of SNNs. We turn off the temporal dynamics of most spiking neurons and design multi-level temporal reversible interactions at temporal turn-on spiking neurons, resulting in a $O(L)$ training memory. Combined with the temporal reversible nature, we redesign the input encoding and network organization of SNNs to achieve $O(1)$ inference energy cost. Then, we finely adjust the internal units and residual connections of the basic SNN block to ensure the effectiveness of sparse temporal information interaction. T-RevSNN achieves excellent accuracy on ImageNet, while the memory efficiency, training time acceleration, and inference energy efficiency can be significantly improved by $8.6 \times$, $2.0 \times$, and $1.6 \times$, respectively. This work is expected to break the technical bottleneck of significantly increasing memory cost and training time for large-scale SNNs while maintaining high performance and low inference energy cost. Source code and models are available at: https://github.com/BICLab/T-RevSNN. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

arXiv:2405.06801 [pdf, other]

doi 10.1109/MNET.2024.3391271

LEO Satellite Network Access in the Wild: Potentials, Experiences, and Challenges

Authors: Sami Ma, Yi Ching Chou, Miao Zhang, Hao Fang, Haoyuan Zhao, Jiangchuan Liu, William I. Atlas

Abstract: In the past three years, working with the Pacific Salmon Foundation and various First Nations groups, we have established Starlink-empowered wild salmon monitoring sites in remote Northern British Columbia, Canada. We report our experiences with the network services in these challenging environments, including deep woods and deep valleys, that lack infrastructural support with some close to Starli… ▽ More In the past three years, working with the Pacific Salmon Foundation and various First Nations groups, we have established Starlink-empowered wild salmon monitoring sites in remote Northern British Columbia, Canada. We report our experiences with the network services in these challenging environments, including deep woods and deep valleys, that lack infrastructural support with some close to Starlink's service boundary at the far north. We assess the portability and mobility of the satellite dishes and the quality of existing network access in underdeveloped countries that Starlink expects to cover. Our experiences suggest that network access based on LEO satellite constellations holds promise but faces hurdles such as energy supply constraints and environmental factors like temperature, precipitation, and solar storms. The presence of wildlife and respecting local residents' culture and heritage pose further complications. We envision several technical solutions addressing the challenges and believe that further regulations will be necessary. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 8 pages, 6 figures

ACM Class: C.2.1

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.09790 [pdf, other]

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

arXiv:2404.01643 [pdf, other]

A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

Abstract: Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions… ▽ More Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code is available at https://github.com/ming053l/E2D △ Less

Submitted 20 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Camera-ready version, accepted by DEF-AI-MIA workshop, in conjunted with CVPR2024

arXiv:2404.00722 [pdf, other]

DRCT: Saving Image Super-resolution away from Information Bottleneck

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou

Abstract: In recent years, Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing non-local information. In the domain of super-resolution, Swin-transformer-based models have become mainstream due to their capability of global sp… ▽ More In recent years, Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing non-local information. In the domain of super-resolution, Swin-transformer-based models have become mainstream due to their capability of global spatial information modeling and their shifting-window attention mechanism that facilitates the interchange of information between different windows. Many researchers have enhanced model performance by expanding the receptive fields or designing meticulous networks, yielding commendable results. However, we observed that it is a general phenomenon for the feature map intensity to be abruptly suppressed to small values towards the network's end. This implies an information bottleneck and a diminishment of spatial information, implicitly limiting the model's potential. To address this, we propose the Dense-residual-connected Transformer (DRCT), aimed at mitigating the loss of spatial information and stabilizing the information flow through dense-residual connections between layers, thereby unleashing the model's potential and saving the model away from information bottleneck. Experiment results indicate that our approach surpasses state-of-the-art methods on benchmark datasets and performs commendably at the NTIRE-2024 Image Super-Resolution (x4) Challenge. Our source code is available at https://github.com/ming053l/DRCT △ Less

Submitted 15 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: Camera-ready version, NTIRE 2024 Image Super-resolution (x4)

arXiv:2403.11230 [pdf, other]

Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

Abstract: This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this me… ▽ More This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this method does not incorporate learning features specific to each slice, leading to a compromise in effectiveness. To address these challenges, we propose an advanced Spatial-Slice Feature Learning (SSFL++) framework specifically tailored for CT scans. It aims to filter out out-of-distribution (OOD) data within the entire CT scan, allowing us to select essential spatial-slice features for analysis by reducing data redundancy by 70\%. Additionally, we introduce a Kernel-Density-based slice Sampling (KDS) method to enhance stability during training and inference phases, thereby accelerating convergence and enhancing overall performance. Remarkably, our experiments reveal that our model achieves promising results with a simple EfficientNet-2D (E2D) model. The effectiveness of our approach is confirmed on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2312.06668 [pdf]

Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus

Authors: Yi-Hui Chou, Kalvin Chang, Meng-Ju Wu, Winston Ou, Alice Wen-Hsin Bi, Carol Yang, Bryan Y. Chen, Rong-Wei Pai, Po-Yen Yeh, Jo-Peng Chiang, Iu-Tshian Phoann, Winnie Chang, Chenxuan Cui, Noel Chen, Jiatong Shi

Abstract: Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-… ▽ More Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted to ASRU 2023

arXiv:2310.15098 [pdf, other]

doi 10.1007/s11633-023-1380-5

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

Authors: Yu-Cheng Chou, Bowen Li, Deng-** Fan, Alan Yuille, Zongwei Zhou

Abstract: Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization. However, with limited resources, it is challenging to determine the best type of annotations when annotating massive amounts of unlabeled data. To address this issue, we focus on polyps in colonoscopy videos and pancreatic tumors in abdominal CT scans; both application… ▽ More Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization. However, with limited resources, it is challenging to determine the best type of annotations when annotating massive amounts of unlabeled data. To address this issue, we focus on polyps in colonoscopy videos and pancreatic tumors in abdominal CT scans; both applications require significant effort and time for pixel-wise annotation due to the high dimensional nature of the data, involving either temporary or spatial dimensions. In this paper, we develop a new annotation strategy, termed Drag&Drop, which simplifies the annotation process to drag and drop. This annotation strategy is more efficient, particularly for temporal and volumetric imaging, than other types of weak annotations, such as per-pixel, bounding boxes, scribbles, ellipses, and points. Furthermore, to exploit our Drag&Drop annotations, we develop a novel weakly supervised learning method based on the watershed algorithm. Experimental results show that our method achieves better detection and localization performance than alternative weak annotations and, more importantly, achieves similar performance to that trained on detailed per-pixel annotations. Interestingly, we find that, with limited resources, allocating weak annotations from a diverse patient population can foster models more robust to unseen images than allocating per-pixel annotations for a small set of images. In summary, this research proposes an efficient annotation strategy for tumor detection and localization that is less accurate than per-pixel annotations but useful for creating large-scale datasets for screening tumors in various medical modalities. △ Less

Submitted 20 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: Published in Machine Intelligence Research

Journal ref: Mach. Intell. Res. (2024)

arXiv:2308.06582 [pdf, other]

Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks

Authors: Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou, Zhaorui Wang, Liang-jian Deng, Guoqi Li

Abstract: Spiking neural networks (SNNs) are emerging as an energy-efficient alternative to traditional artificial neural networks (ANNs) due to their unique spike-based event-driven nature. Coding is crucial in SNNs as it converts external input stimuli into spatio-temporal feature sequences. However, most existing deep SNNs rely on direct coding that generates powerless spike representation and lacks the… ▽ More Spiking neural networks (SNNs) are emerging as an energy-efficient alternative to traditional artificial neural networks (ANNs) due to their unique spike-based event-driven nature. Coding is crucial in SNNs as it converts external input stimuli into spatio-temporal feature sequences. However, most existing deep SNNs rely on direct coding that generates powerless spike representation and lacks the temporal dynamics inherent in human vision. Hence, we introduce Gated Attention Coding (GAC), a plug-and-play module that leverages the multi-dimensional gated attention unit to efficiently encode inputs into powerful representations before feeding them into the SNN architecture. GAC functions as a preprocessing layer that does not disrupt the spike-driven nature of the SNN, making it amenable to efficient neuromorphic hardware implementation with minimal modifications. Through an observer model theoretical analysis, we demonstrate GAC's attention mechanism improves temporal dynamics and coding efficiency. Experiments on CIFAR10/100 and ImageNet datasets demonstrate that GAC achieves state-of-the-art accuracy with remarkable efficiency. Notably, we improve top-1 accuracy by 3.10\% on CIFAR100 with only 6-time steps and 1.07\% on ImageNet while reducing energy usage to 66.9\% of the previous works. To our best knowledge, it is the first time to explore the attention-based dynamic coding scheme in deep SNNs, with exceptional effectiveness and efficiency on large-scale datasets.The Code is available at https://github.com/bollossom/GAC. △ Less

Submitted 4 June, 2024; v1 submitted 12 August, 2023; originally announced August 2023.

Comments: Accepted by Proceedings of the AAAI Conference on Artificial Intelligence 38 (AAAI 24)

arXiv:2308.04872 [pdf, other]

Tracking Players in a Badminton Court by Two Cameras

Authors: Young-Ching Chou, Shen-Ru Zhang, Bo-Wei Chen, Hong-Qi Chen, Cheng-Kuan Lin, Yu-Chee Tseng

Abstract: This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging… ▽ More This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging the information of the two cameras, MOT of badminton players is obtained. This two-camera approach addresses the challenge of player occlusion and overlap** in a badminton court, providing player trajectory tracking and multi-angle analysis. The presented system offers insights into the positions and movements of badminton players, thus serving as a coaching or self-training tool for badminton players to improve their gaming strategies. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.03008 [pdf, other]

Early Detection and Localization of Pancreatic Cancer by Label-Free Tumor Synthesis

Authors: Bowen Li, Yu-Cheng Chou, Shuwen Sun, Hualin Qiao, Alan Yuille, Zongwei Zhou

Abstract: Early detection and localization of pancreatic cancer can increase the 5-year survival rate for patients from 8.5% to 20%. Artificial intelligence (AI) can potentially assist radiologists in detecting pancreatic tumors at an early stage. Training AI models require a vast number of annotated examples, but the availability of CT scans obtaining early-stage tumors is constrained. This is because earl… ▽ More Early detection and localization of pancreatic cancer can increase the 5-year survival rate for patients from 8.5% to 20%. Artificial intelligence (AI) can potentially assist radiologists in detecting pancreatic tumors at an early stage. Training AI models require a vast number of annotated examples, but the availability of CT scans obtaining early-stage tumors is constrained. This is because early-stage tumors may not cause any symptoms, which can delay detection, and the tumors are relatively small and may be almost invisible to human eyes on CT scans. To address this issue, we develop a tumor synthesis method that can synthesize enormous examples of small pancreatic tumors in the healthy pancreas without the need for manual annotation. Our experiments demonstrate that the overall detection rate of pancreatic tumors, measured by Sensitivity and Specificity, achieved by AI trained on synthetic tumors is comparable to that of real tumors. More importantly, our method shows a much higher detection rate for small tumors. We further investigate the per-voxel segmentation performance of pancreatic tumors if AI is trained on a combination of CT scans with synthetic tumors and CT scans with annotated large tumors at an advanced stage. Finally, we show that synthetic tumors improve AI generalizability in tumor detection and localization when processing CT scans from different hospitals. Overall, our proposed tumor synthesis method has immense potential to improve the early detection of pancreatic cancer, leading to better patient outcomes. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: Big Task Small Data, 1001-AI, MICCAI Workshop, 2023

arXiv:2307.11411 [pdf, other]

Deep Directly-Trained Spiking Neural Networks for Object Detection

Authors: Qiaoyi Su, Yuhong Chou, Yifan Hu, Jianing Li, Shijie Mei, Ziyang Zhang, Guoqi Li

Abstract: Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To ad… ▽ More Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. Specifically, we design a full-spike residual block, EMS-ResNet, which can effectively extend the depth of the directly-trained SNN with low power consumption. Furthermore, we theoretically analyze and prove the EMS-ResNet could avoid gradient vanishing or exploding. The results demonstrate that our approach outperforms the state-of-the-art ANN-SNN conversion methods (at least 500 time steps) in extremely fewer time steps (only 4 time steps). It is shown that our model could achieve comparable performance to the ANN with the same architecture while consuming 5.83 times less energy on the frame-based COCO Dataset and the event-based Gen1 Dataset. △ Less

Submitted 26 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV2023

arXiv:2306.09607 [pdf, other]

Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain

Authors: Shih-Lun Wu, Yi-Hui Chou, Liangze Li

Abstract: PhotoBook is a collaborative dialogue game where two players receive private, partially-overlap** sets of images and resolve which images they have in common. It presents machines with a great challenge to learn how people build common ground around multimodal context to communicate effectively. Methods developed in the literature, however, cannot be deployed to real gameplay since they only tac… ▽ More PhotoBook is a collaborative dialogue game where two players receive private, partially-overlap** sets of images and resolve which images they have in common. It presents machines with a great challenge to learn how people build common ground around multimodal context to communicate effectively. Methods developed in the literature, however, cannot be deployed to real gameplay since they only tackle some subtasks of the game, and they require additional reference chains inputs, whose extraction process is imperfect. Therefore, we propose a reference chain-free listener model that directly addresses the game's predictive task, i.e., deciding whether an image is shared with partner. Our DeBERTa-based listener model reads the full dialogue, and utilizes CLIPScore features to assess utterance-image relevance. We achieve >77% accuracy on unseen sets of images/game themes, outperforming baseline by >17 points. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted to ACL 2023 main conference (short paper)

arXiv:2305.12148 [pdf, other]

Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network

Authors: Man Yao, Yuhong Chou, Guangshe Zhao, Xiawu Zheng, Yonghong Tian, Bo Xu, Guoqi Li

Abstract: The Lottery Ticket Hypothesis (LTH) states that a randomly-initialized large neural network contains a small sub-network (i.e., winning tickets) which, when trained in isolation, can achieve comparable performance to the large network. LTH opens up a new path for network pruning. Existing proofs of LTH in Artificial Neural Networks (ANNs) are based on continuous activation functions, such as ReLU,… ▽ More The Lottery Ticket Hypothesis (LTH) states that a randomly-initialized large neural network contains a small sub-network (i.e., winning tickets) which, when trained in isolation, can achieve comparable performance to the large network. LTH opens up a new path for network pruning. Existing proofs of LTH in Artificial Neural Networks (ANNs) are based on continuous activation functions, such as ReLU, which satisfying the Lipschitz condition. However, these theoretical methods are not applicable in Spiking Neural Networks (SNNs) due to the discontinuous of spiking function. We argue that it is possible to extend the scope of LTH by eliminating Lipschitz condition. Specifically, we propose a novel probabilistic modeling approach for spiking neurons with complicated spatio-temporal dynamics. Then we theoretically and experimentally prove that LTH holds in SNNs. According to our theorem, we conclude that pruning directly in accordance with the weight size in existing SNNs is clearly not optimal. We further design a new criterion for pruning based on our theory, which achieves better pruning results than baseline. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: 22pages, 5 figures

arXiv:2303.14334 [pdf, other]

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Authors: Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney , et al. (30 additional authors not shown)

Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan… ▽ More Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges. △ Less

Submitted 23 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.13631 [pdf, other]

In-depth analysis of music structure as a text network

Authors: **-Rui Tsai, Yen-Ting Chou, Nathan-Christopher Wang, Hui-Ling Chen, Hong-Yue Huang, Zih-Jia Luo, Tzay-Ming Hong

Abstract: Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental… ▽ More Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language, aligning with the statistical characteristics of texts. Through this approach, we aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music. Relying on the advantages of structuralism, we can concentrate on the relationships and order between the physical elements of music, rather than getting entangled in the blurred boundaries of science and philosophy. The scientific framework we present not only conforms to past conclusions in music, but also serves as a bridge that connects music to natural language processing and knowledge graphs. △ Less

Submitted 2 January, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: 7 pages, 8 figures

arXiv:2212.13697 [pdf, other]

Network Characteristics of LEO Satellite Constellations: A Starlink-Based Measurement from End Users

Authors: Sami Ma, Yi Ching Chou, Haoyuan Zhao, Long Chen, Xiaoqiang Ma, Jiangchuan Liu

Abstract: Low Earth orbit Satellite Networks (LSNs) have been advocated as a key infrastructure for truly global coverage in the forthcoming 6G. This paper presents our initial measurement results and observations on the end-to-end network characteristics of Starlink, arguably the largest LSN constellation to date. Our findings confirm that LSNs are a promising solution towards ubiquitous Internet coverage… ▽ More Low Earth orbit Satellite Networks (LSNs) have been advocated as a key infrastructure for truly global coverage in the forthcoming 6G. This paper presents our initial measurement results and observations on the end-to-end network characteristics of Starlink, arguably the largest LSN constellation to date. Our findings confirm that LSNs are a promising solution towards ubiquitous Internet coverage over the Earth; yet, we also find that the users of Starlink experience much more dynamics in throughput and latency than terrestrial network users, and even frequent outages. Its user experiences are heavily affected by environmental factors such as terrain, solar storms, rain, clouds, and temperature, so is the power consumption. We further analyze Starlink's current bent-pipe relay strategy and its limits, particularly for cross-ocean routes. We have also explored its mobility and portability potentials, and extended our experiments from urban cities to wild remote areas that are facing distinct practical and cultural challenges. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 12 pages, 20 figures, to be published in IEEE INFOCOM 2023

arXiv:2212.09638 [pdf, other]

doi 10.1109/MS.2022.3229894

Co-designing for a Hybrid Workplace Experience in Software Development

Authors: Zhendong Wang, Yi-Hung Chou, Kayla Fathi, Tobias Schimmer, Peter Colligan, David Redmiles, Rafael Prikladnicki

Abstract: With increasing demands for flexible work models, many IT organizations have adapted to hybrid work that promises enhanced team productivity as well as work satisfaction. To achieve productive engineering practice, collaborative product innovation, and effective mentorship in the ensuing hybrid work, we introduce a workshop approach on co-designing for a hybrid workplace experience and provide imp… ▽ More With increasing demands for flexible work models, many IT organizations have adapted to hybrid work that promises enhanced team productivity as well as work satisfaction. To achieve productive engineering practice, collaborative product innovation, and effective mentorship in the ensuing hybrid work, we introduce a workshop approach on co-designing for a hybrid workplace experience and provide implications for continuously improving collaborative software development at scale. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Accepted at IEEE Software

arXiv:2205.12853 [pdf, other]

doi 10.1007/s11633-022-1365-9

Deep Gradient Learning for Efficient Camouflaged Object Detection

Authors: Ge-Peng Ji, Deng-** Fan, Yu-Cheng Chou, Dengxin Dai, Alexander Liniger, Luc Van Gool

Abstract: This paper introduces DGNet, a novel deep framework that exploits object gradient supervision for camouflaged object detection (COD). It decouples the task into two connected branches, i.e., a context and a texture encoder. The essential connection is the gradient-induced transition, representing a soft grou** between context and texture features. Benefiting from the simple but efficient framewo… ▽ More This paper introduces DGNet, a novel deep framework that exploits object gradient supervision for camouflaged object detection (COD). It decouples the task into two connected branches, i.e., a context and a texture encoder. The essential connection is the gradient-induced transition, representing a soft grou** between context and texture features. Benefiting from the simple but efficient framework, DGNet outperforms existing state-of-the-art COD models by a large margin. Notably, our efficient version, DGNet-S, runs in real-time (80 fps) and achieves comparable results to the cutting-edge model JCSOD-CVPR$_{21}$ with only 6.82% parameters. Application results also show that the proposed DGNet performs well in polyp segmentation, defect detection, and transparent object segmentation tasks. Codes will be made available at https://github.com/GewelsJI/DGNet. △ Less

Submitted 8 August, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted by Machine Intelligence Research

Journal ref: Machine Intelligence Research. 20, 92-108 (2023)

arXiv:2204.04090 [pdf, other]

Single-level Adversarial Data Synthesis based on Neural Tangent Kernels

Authors: Yu-Rong Zhang, Ruei-Yang Su, Sheng Yen Chou, Shan-Hung Wu

Abstract: Abstract Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial N… ▽ More Abstract Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial NTK (GA-NTK) that has a single-level objective. The GA-NTK keeps the spirit of adversarial learning (which helps generate plausible data) while avoiding the training difficulties of GANs. This is done by modeling the discriminator as a Gaussian process with a neural tangent kernel (NTK-GP) whose training dynamics can be completely described by a closed-form formula. We analyze the convergence behavior of GA-NTK trained by gradient descent and give some sufficient conditions for convergence. We also conduct extensive experiments to study the advantages and limitations of GA-NTK and propose some techniques that make GA-NTK more practical. △ Less

Submitted 20 November, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

arXiv:2203.14291 [pdf, other]

doi 10.1007/s11633-022-1371-y

Video Polyp Segmentation: A Deep Learning Perspective

Authors: Ge-Peng Ji, Guobao Xiao, Yu-Cheng Chou, Deng-** Fan, Kai Zhao, Geng Chen, Luc Van Gool

Abstract: We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the we… ▽ More We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the well-known SUN-database. We provide additional annotations with diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, dubbed PNS+, consisting of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively updated by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community. △ Less

Submitted 31 August, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: Accepted by Machine Intelligence Research 2022 (Project Page: https://github.com/GewelsJI/VPS)

Journal ref: Machine Intelligence Research, vol. 19, no. 6, pp.531-549, 2022

arXiv:2203.02399 [pdf, other]

Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box

Authors: Catarina Moreira, Yu-Liang Chou, Chihcheng Hsieh, Chun Ouyang, Joaquim Jorge, João Madeiras Pereira

Abstract: This study investigates the impact of machine learning models on the generation of counterfactual explanations by conducting a benchmark evaluation over three different types of models: a decision tree (fully transparent, interpretable, white-box model), a random forest (semi-interpretable, grey-box model), and a neural network (fully opaque, black-box model). We tested the counterfactual generati… ▽ More This study investigates the impact of machine learning models on the generation of counterfactual explanations by conducting a benchmark evaluation over three different types of models: a decision tree (fully transparent, interpretable, white-box model), a random forest (semi-interpretable, grey-box model), and a neural network (fully opaque, black-box model). We tested the counterfactual generation process using four algorithms (DiCE, WatcherCF, prototype, and GrowingSpheresCF) in the literature in 25 different datasets. Our findings indicate that: (1) Different machine learning models have little impact on the generation of counterfactual explanations; (2) Counterfactual algorithms based uniquely on proximity loss functions are not actionable and will not provide meaningful explanations; (3) One cannot have meaningful evaluation results without guaranteeing plausibility in the counterfactual generation. Algorithms that do not consider plausibility in their internal mechanisms will lead to biased and unreliable conclusions if evaluated with the current state-of-the-art metrics; (4) A counterfactual inspection analysis is strongly recommended to ensure a robust examination of counterfactual explanations and the potential identification of biases. △ Less

Submitted 11 June, 2024; v1 submitted 4 March, 2022; originally announced March 2022.

arXiv:2110.07957 [pdf, other]

Don't speak too fast: The impact of data bias on self-supervised speech models

Authors: Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-yi Lee

Abstract: Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these… ▽ More Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. Our experiments show that S3Ms have tolerance toward gender bias. Moreover, we find that the content of speech has little impact on the performance of S3Ms across downstream tasks, but S3Ms do show a preference toward a slower speech rate. △ Less

Submitted 26 April, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted by ICASSP 2022

arXiv:2108.13858 [pdf, other]

GRP-FED: Addressing Client Imbalance in Federated Learning via Global-Regularized Personalization

Authors: Yen-Hsiu Chou, Shenda Hong, Chenxi Sun, Derun Cai, Moxian Song, Hongyan Li

Abstract: Since data is presented long-tailed in reality, it is challenging for Federated Learning (FL) to train across decentralized clients as practical applications. We present Global-Regularized Personalization (GRP-FED) to tackle the data imbalanced issue by considering a single global model and multiple local models for each client. With adaptive aggregation, the global model treats multiple clients f… ▽ More Since data is presented long-tailed in reality, it is challenging for Federated Learning (FL) to train across decentralized clients as practical applications. We present Global-Regularized Personalization (GRP-FED) to tackle the data imbalanced issue by considering a single global model and multiple local models for each client. With adaptive aggregation, the global model treats multiple clients fairly and mitigates the global long-tailed issue. Each local model is learned from the local data and aligns with its distribution for customization. To prevent the local model from just overfitting, GRP-FED applies an adversarial discriminator to regularize between the learned global-local features. Extensive results show that our GRP-FED improves under both global and local scenarios on real-world MIT-BIH and synthesis CIFAR-10 datasets, achieving comparable performance and addressing client imbalance. △ Less

Submitted 31 August, 2021; originally announced August 2021.

Comments: (FL-ICML'21) International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021

arXiv:2107.05223 [pdf, other]

BERT-like Pre-training for Symbolic Piano Music Classification Tasks

Authors: Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang

Abstract: This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and… ▽ More This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines. △ Less

Submitted 13 April, 2024; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: Accepted to Journal of Creative Music Systems

arXiv:2105.10110 [pdf, other]

Guidance and Teaching Network for Video Salient Object Detection

Authors: Yingxia Jiao, Xiao Wang, Yu-Cheng Chou, Shouyuan Yang, Ge-Peng Ji, Rong Zhu, Ge Gao

Abstract: Owing to the difficulties of mining spatial-temporal cues, the existing approaches for video salient object detection (VSOD) are limited in understanding complex and noisy scenarios, and often fail in inferring prominent objects. To alleviate such shortcomings, we propose a simple yet efficient architecture, termed Guidance and Teaching Network (GTNet), to independently distil effective spatial an… ▽ More Owing to the difficulties of mining spatial-temporal cues, the existing approaches for video salient object detection (VSOD) are limited in understanding complex and noisy scenarios, and often fail in inferring prominent objects. To alleviate such shortcomings, we propose a simple yet efficient architecture, termed Guidance and Teaching Network (GTNet), to independently distil effective spatial and temporal cues with implicit guidance and explicit teaching at feature- and decision-level, respectively. To be specific, we (a) introduce a temporal modulator to implicitly bridge features from motion into the appearance branch, which is capable of fusing cross-modal features collaboratively, and (b) utilise motion-guided mask to propagate the explicit cues during the feature aggregation. This novel learning strategy achieves satisfactory results via decoupling the complex spatial-temporal cues and map** informative cues across different modalities. Extensive experiments on three challenging benchmarks show that the proposed method can run at ~28 fps on a single TITAN Xp GPU and perform competitively against 14 cutting-edge baselines. △ Less

Submitted 6 June, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: Accepted at IEEE ICIP 2021

arXiv:2105.08468 [pdf, other]

doi 10.1007/978-3-030-87193-2_14

Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Authors: Ge-Peng Ji, Yu-Cheng Chou, Deng-** Fan, Geng Chen, Huazhu Fu, Debesh Jha, Ling Shao

Abstract: Existing video polyp segmentation (VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs can not fully exploit the global temporal and spatial information in successive video frames, resulting in false-positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attent… ▽ More Existing video polyp segmentation (VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs can not fully exploit the global temporal and spatial information in successive video frames, resulting in false-positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (~140fps) on a single RTX 2080 GPU and no post-processing. Our PNS-Net is based solely on a basic normalized self-attention block, equip** with recurrence and CNNs entirely. Experiments on challenging VPS datasets demonstrate that the proposed PNS-Net achieves state-of-the-art performance. We also conduct extensive experiments to study the effectiveness of the channel split, soft-attention, and progressive learning strategy. We find that our PNS-Net works well under different settings, making it a promising solution to the VPS task. △ Less

Submitted 24 May, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: MICCAI 2021 (Provisional accept); Code: https://github.com/GewelsJI/PNS-Net

arXiv:2104.04492 [pdf, other]

doi 10.1109/ACCESS.2023.3243068

Joint QoS-Aware Scheduling and Precoding for Massive MIMO Systems via Deep Reinforcement Learning

Authors: Chih-Wei Huang, Yen-Cheng Chou, Hong-Yunn Chen, Cheng-Fu Chou

Abstract: The rapid development of mobile networks proliferates the demands of high data rate, low latency, and high-reliability applications for the fifth-generation (5G) and beyond (B5G) mobile networks. Concurrently, the massive multiple-input-multiple-output (MIMO) technology is essential to realize the vision and requires coordination with resource management functions for high user experiences. Though… ▽ More The rapid development of mobile networks proliferates the demands of high data rate, low latency, and high-reliability applications for the fifth-generation (5G) and beyond (B5G) mobile networks. Concurrently, the massive multiple-input-multiple-output (MIMO) technology is essential to realize the vision and requires coordination with resource management functions for high user experiences. Though conventional cross-layer adaptation algorithms have been developed to schedule and allocate network resources, the complexity of resulting rules is high with diverse quality of service (QoS) requirements and B5G features. In this work, we consider a joint user scheduling, antenna allocation, and precoding problem in a massive MIMO system. Instead of directly assigning resources, such as the number of antennas, the allocation process is transformed into a deep reinforcement learning (DRL) based dynamic algorithm selection problem for efficient Markov decision process (MDP) modeling and policy training. Specifically, the proposed utility function integrates QoS requirements and constraints toward a long-term system-wide objective that matches the MDP return. The componentized action structure with action embedding further incorporates the resource management process into the model. Simulations show 7.2% and 12.5% more satisfied users against static algorithm selection and related works under demanding scenarios. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Journal ref: IEEE Access, vol. 11, pp. 13243-13256, 2023

arXiv:2103.04244 [pdf, other]

Counterfactuals and Causability in Explainable Artificial Intelligence: Theory, Algorithms, and Applications

Authors: Yu-Liang Chou, Catarina Moreira, Peter Bruza, Chun Ouyang, Joaquim Jorge

Abstract: There has been a growing interest in model-agnostic methods that can make deep learning models more transparent and explainable to a user. Some researchers recently argued that for a machine to achieve a certain degree of human-level explainability, this machine needs to provide human causally understandable explanations, also known as causability. A specific class of algorithms that have the pote… ▽ More There has been a growing interest in model-agnostic methods that can make deep learning models more transparent and explainable to a user. Some researchers recently argued that for a machine to achieve a certain degree of human-level explainability, this machine needs to provide human causally understandable explanations, also known as causability. A specific class of algorithms that have the potential to provide causability are counterfactuals. This paper presents an in-depth systematic review of the diverse existing body of literature on counterfactuals and causability for explainable artificial intelligence. We performed an LDA topic modelling analysis under a PRISMA framework to find the most relevant literature articles. This analysis resulted in a novel taxonomy that considers the grounding theories of the surveyed algorithms, together with their underlying properties and applications in real-world data. This research suggests that current model-agnostic counterfactual algorithms for explainable AI are not grounded on a causal theoretical formalism and, consequently, cannot promote causability to a human decision-maker. Our findings suggest that the explanations derived from major algorithms in the literature provide spurious correlations rather than cause/effects relationships, leading to sub-optimal, erroneous or even biased explanations. This paper also advances the literature with new directions and challenges on promoting causability in model-agnostic approaches for explainable artificial intelligence. △ Less

Submitted 8 June, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

arXiv:2103.02767 [pdf, other]

doi 10.1007/978-3-030-59520-3_1

Contrast Adaptive Tissue Classification by Alternating Segmentation and Synthesis

Authors: Dzung L. Pham, Yi-Yu Chou, Blake E. Dewey, Daniel S. Reich, John A. Butman, Snehashis Roy

Abstract: Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantl… ▽ More Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantly reduced performance. Furthermore, heterogeneous data sets cannot be easily evaluated because the quantitative variation due to acquisition differences often dwarfs the variation due to the biological differences that one seeks to measure. In this work, we describe an approach using alternating segmentation and synthesis steps that adapts the contrast properties of the training data to the input image. This allows input images that do not resemble the training data to be more consistently segmented. A notable advantage of this approach is that only a single example of the acquisition protocol is required to adapt to its contrast properties. We demonstrate the efficacy of our approaching using brain images from a set of human subjects scanned with two different T1-weighted volumetric protocols. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 10 pages. MICCAI SASHIMI Workshop 2021

arXiv:2007.10668 [pdf, other]

An Interpretable Probabilistic Approach for Demystifying Black-box Predictive Models

Authors: Catarina Moreira, Yu-Liang Chou, Mythreyi Velmurugan, Chun Ouyang, Renuka Sindhgatta, Peter Bruza

Abstract: The use of sophisticated machine learning models for critical decision making is faced with a challenge that these models are often applied as a "black-box". This has led to an increased interest in interpretable machine learning, where post hoc interpretation presents a useful mechanism for generating interpretations of complex learning models. In this paper, we propose a novel approach underpinn… ▽ More The use of sophisticated machine learning models for critical decision making is faced with a challenge that these models are often applied as a "black-box". This has led to an increased interest in interpretable machine learning, where post hoc interpretation presents a useful mechanism for generating interpretations of complex learning models. In this paper, we propose a novel approach underpinned by an extended framework of Bayesian networks for generating post hoc interpretations of a black-box predictive model. The framework supports extracting a Bayesian network as an approximation of the black-box model for a specific prediction. Compared to the existing post hoc interpretation methods, the contribution of our approach is three-fold. Firstly, the extracted Bayesian network, as a probabilistic graphical model, can provide interpretations about not only what input features but also why these features contributed to a prediction. Secondly, for complex decision problems with many features, a Markov blanket can be generated from the extracted Bayesian network to provide interpretations with a focused view on those input features that directly contributed to a prediction. Thirdly, the extracted Bayesian network enables the identification of four different rules which can inform the decision-maker about the confidence level in a prediction, thus hel** the decision-maker assess the reliability of predictions learned by a black-box model. We implemented the proposed approach, applied it in the context of two well-known public datasets and analysed the results, which are made available in an open-source repository. △ Less

Submitted 21 July, 2020; originally announced July 2020.

arXiv:2007.02235 [pdf, other]

Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

Authors: Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama

Abstract: In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning w… ▽ More In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning with complementary labels. We argue the quality of gradient estimation matters more in risk minimization. Theoretically, we show that a URE gives an unbiased gradient estimator(UGE). Practically, however, UGEs may suffer from huge variance, which causes empirical gradients to be usually far away from true gradients during minimization. To this end, we propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance and makes empirical gradients more aligned with true gradients in the direction. Thanks to this characteristic, SCL successfully mitigates the overfitting issue and improves URE-based methods. △ Less

Submitted 21 August, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

Comments: Accepted at ICML 2020

arXiv:1805.04980 [pdf, other]

Unifying and Merging Well-trained Deep Neural Networks for Inference Stage

Authors: Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen

Abstract: We propose a novel method to merge convolutional neural-nets for the inference stage. Given two well-trained networks that may have different architectures that handle different tasks, our method aligns the layers of the original networks and merges them into a unified model by sharing the representative codes of weights. The shared weights are further re-trained to fine-tune the performance of th… ▽ More We propose a novel method to merge convolutional neural-nets for the inference stage. Given two well-trained networks that may have different architectures that handle different tasks, our method aligns the layers of the original networks and merges them into a unified model by sharing the representative codes of weights. The shared weights are further re-trained to fine-tune the performance of the merged model. The proposed method effectively produces a compact model that may run original tasks simultaneously on resource-limited devices. As it preserves the general architectures and leverages the co-used weights of well-trained networks, a substantial training overhead can be reduced to shorten the system development time. Experimental results demonstrate a satisfactory performance and validate the effectiveness of the method. △ Less

Submitted 13 May, 2018; originally announced May 2018.

Comments: To appear in the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence, 2018. (IJCAI-ECAI 2018)

arXiv:1805.04262 [pdf, other]

Stingray Detection of Aerial Images Using Augmented Training Images Generated by A Conditional Generative Model

Authors: Yi-Min Chou, Chien-Hung Chen, Keng-Hao Liu, Chu-Song Chen

Abstract: In this paper, we present an object detection method that tackles the stingray detection problem based on aerial images. In this problem, the images are aerially captured on a sea-surface area by using an Unmanned Aerial Vehicle (UAV), and the stingrays swimming under (but close to) the sea surface are the target we want to detect and locate. To this end, we use a deep object detection method, fas… ▽ More In this paper, we present an object detection method that tackles the stingray detection problem based on aerial images. In this problem, the images are aerially captured on a sea-surface area by using an Unmanned Aerial Vehicle (UAV), and the stingrays swimming under (but close to) the sea surface are the target we want to detect and locate. To this end, we use a deep object detection method, faster RCNN, to train a stingray detector based on a limited training set of images. To boost the performance, we develop a new generative approach, conditional GLO, to increase the training samples of stingray, which is an extension of the Generative Latent Optimization (GLO) approach. Unlike traditional data augmentation methods that generate new data only for image classification, our proposed method that mixes foreground and background together can generate new data for an object detection task, and thus improve the training efficacy of a CNN detector. Experimental results show that satisfiable performance can be obtained by using our approach on stingray detection in aerial images. △ Less

Submitted 25 June, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: to appear in CVPR 2018 Workshop (CVPR 2018 Workshop and Challenge: Automated Analysis of Marine Video for Environmental Monitoring)

arXiv:1601.07021 [pdf]

Polyhedron Volume-Ratio-based Classification for Image Recognition

Authors: Qingxiang Feng, Jeng-Shyang Pan, Jar-Ferr Yang, Yang-Ting Chou

Abstract: In this paper, a novel method, called polyhedron volume ratio classification (PVRC) is proposed for image recognition In this paper, a novel method, called polyhedron volume ratio classification (PVRC) is proposed for image recognition △ Less

Submitted 26 January, 2016; originally announced January 2016.

arXiv:1506.06366 [pdf]

A Novel Method for Stock Forecasting based on Fuzzy Time Series Combined with the Longest Common/Repeated Sub-sequence

Authors: He-Wen Chen, Zih-Ci Wang, Shu-Yu Kuo, Yao-Hsin Chou

Abstract: Stock price forecasting is an important issue for investors since extreme accuracy in forecasting can bring about high profits. Fuzzy Time Series (FTS) and Longest Common/Repeated Sub-sequence (LCS/LRS) are two important issues for forecasting prices. However, to the best of our knowledge, there are no significant studies using LCS/LRS to predict stock prices. It is impossible that prices stay exa… ▽ More Stock price forecasting is an important issue for investors since extreme accuracy in forecasting can bring about high profits. Fuzzy Time Series (FTS) and Longest Common/Repeated Sub-sequence (LCS/LRS) are two important issues for forecasting prices. However, to the best of our knowledge, there are no significant studies using LCS/LRS to predict stock prices. It is impossible that prices stay exactly the same as historic prices. Therefore, this paper proposes a state-of-the-art method which combines FTS and LCS/LRS to predict stock prices. This method is based on the principle that history will repeat itself. It uses different interval lengths in FTS to fuzzify the prices, and LCS/LRS to look for the same pattern in the historical prices to predict future stock prices. In the experiment, we examine various intervals of fuzzy time sets in order to achieve high prediction accuracy. The proposed method outperforms traditional methods in terms of prediction accuracy and, furthermore, it is easy to implement. △ Less

Submitted 21 June, 2015; originally announced June 2015.

Showing 1–40 of 40 results for author: Chou, Y