-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection
Authors:
Chih-Chung Hsu,
Chia-Ming Lee,
Yang Fan Chiang,
Yi-Shiuan Chou,
Chih-Yu Jiang,
Shen-Chieh Tai,
Chi-Han Tsai
Abstract:
Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions…
▽ More
Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code is available at https://github.com/ming053l/E2D
△ Less
Submitted 20 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection
Authors:
Chih-Chung Hsu,
Chia-Ming Lee,
Yang Fan Chiang,
Yi-Shiuan Chou,
Chih-Yu Jiang,
Shen-Chieh Tai,
Chi-Han Tsai
Abstract:
This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this me…
▽ More
This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this method does not incorporate learning features specific to each slice, leading to a compromise in effectiveness. To address these challenges, we propose an advanced Spatial-Slice Feature Learning (SSFL++) framework specifically tailored for CT scans. It aims to filter out out-of-distribution (OOD) data within the entire CT scan, allowing us to select essential spatial-slice features for analysis by reducing data redundancy by 70\%. Additionally, we introduce a Kernel-Density-based slice Sampling (KDS) method to enhance stability during training and inference phases, thereby accelerating convergence and enhancing overall performance. Remarkably, our experiments reveal that our model achieves promising results with a simple EfficientNet-2D (E2D) model. The effectiveness of our approach is confirmed on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Evaluating Self-supervised Speech Models on a Taiwanese Hokkien Corpus
Authors:
Yi-Hui Chou,
Kalvin Chang,
Meng-Ju Wu,
Winston Ou,
Alice Wen-Hsin Bi,
Carol Yang,
Bryan Y. Chen,
Rong-Wei Pai,
Po-Yen Yeh,
Jo-Peng Chiang,
Iu-Tshian Phoann,
Winnie Chang,
Chenxuan Cui,
Noel Chen,
Jiatong Shi
Abstract:
Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-…
▽ More
Taiwanese Hokkien is declining in use and status due to a language shift towards Mandarin in Taiwan. This is partly why it is a low resource language in NLP and speech research today. To ensure that the state of the art in speech processing does not leave Taiwanese Hokkien behind, we contribute a 1.5-hour dataset of Taiwanese Hokkien to ML-SUPERB's hidden set. Evaluating ML-SUPERB's suite of self-supervised learning (SSL) speech representations on our dataset, we find that model size does not consistently determine performance. In fact, certain smaller models outperform larger ones. Furthermore, linguistic alignment between pretraining data and the target language plays a crucial role.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Early Detection and Localization of Pancreatic Cancer by Label-Free Tumor Synthesis
Authors:
Bowen Li,
Yu-Cheng Chou,
Shuwen Sun,
Hualin Qiao,
Alan Yuille,
Zongwei Zhou
Abstract:
Early detection and localization of pancreatic cancer can increase the 5-year survival rate for patients from 8.5% to 20%. Artificial intelligence (AI) can potentially assist radiologists in detecting pancreatic tumors at an early stage. Training AI models require a vast number of annotated examples, but the availability of CT scans obtaining early-stage tumors is constrained. This is because earl…
▽ More
Early detection and localization of pancreatic cancer can increase the 5-year survival rate for patients from 8.5% to 20%. Artificial intelligence (AI) can potentially assist radiologists in detecting pancreatic tumors at an early stage. Training AI models require a vast number of annotated examples, but the availability of CT scans obtaining early-stage tumors is constrained. This is because early-stage tumors may not cause any symptoms, which can delay detection, and the tumors are relatively small and may be almost invisible to human eyes on CT scans. To address this issue, we develop a tumor synthesis method that can synthesize enormous examples of small pancreatic tumors in the healthy pancreas without the need for manual annotation. Our experiments demonstrate that the overall detection rate of pancreatic tumors, measured by Sensitivity and Specificity, achieved by AI trained on synthetic tumors is comparable to that of real tumors. More importantly, our method shows a much higher detection rate for small tumors. We further investigate the per-voxel segmentation performance of pancreatic tumors if AI is trained on a combination of CT scans with synthetic tumors and CT scans with annotated large tumors at an advanced stage. Finally, we show that synthetic tumors improve AI generalizability in tumor detection and localization when processing CT scans from different hospitals. Overall, our proposed tumor synthesis method has immense potential to improve the early detection of pancreatic cancer, leading to better patient outcomes.
△ Less
Submitted 5 August, 2023;
originally announced August 2023.
-
In-depth analysis of music structure as a text network
Authors:
**-Rui Tsai,
Yen-Ting Chou,
Nathan-Christopher Wang,
Hui-Ling Chen,
Hong-Yue Huang,
Zih-Jia Luo,
Tzay-Ming Hong
Abstract:
Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental…
▽ More
Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language, aligning with the statistical characteristics of texts. Through this approach, we aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music. Relying on the advantages of structuralism, we can concentrate on the relationships and order between the physical elements of music, rather than getting entangled in the blurred boundaries of science and philosophy. The scientific framework we present not only conforms to past conclusions in music, but also serves as a bridge that connects music to natural language processing and knowledge graphs.
△ Less
Submitted 2 January, 2024; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Enhancing Light Extraction of Organic Light Emitting Diodes by Deep-Groove High-index Dielectric Nanomesh Using Large-area Nanoimprint
Authors:
Ji Qi,
Wei Ding,
Qi Zhang,
Yuxuan Wang,
Hao Chen,
Stephen Y. Chou
Abstract:
To solve the conventional conflict between maintaining good charge transport property and achieving high light extraction efficiency when using micro/nanostructure patterned substrates to extract light from organic light emitting diodes (OLEDs), we developed a novel OLED structure, termed High-index Deep-Groove Dielectric Nanomesh OLED (HDNM-OLED), fabricated by large-area nanoimprint lithography…
▽ More
To solve the conventional conflict between maintaining good charge transport property and achieving high light extraction efficiency when using micro/nanostructure patterned substrates to extract light from organic light emitting diodes (OLEDs), we developed a novel OLED structure, termed High-index Deep-Groove Dielectric Nanomesh OLED (HDNM-OLED), fabricated by large-area nanoimprint lithography (NIL). The key component is a nanostructure-patterned substrate embedded with a high-index deep-groove nanomesh and capped with a low-index planarization layer. The high-index and deep-groove nanomesh efficiently releases the tapped photons to achieve significantly enhanced light extraction. And the planarization layer helps to maintain the good charge transport property of the organic active layers deposited on top of it. For a green phosphorescent OLED in our demonstration, with the HDNM-OLED structure, compared to planar conventional ITO-OLED structure, the external quantum efficiency (EQE) was enhanced by 1.85-fold from 26% to 48% and power efficiency was enhanced by 1.86-fold from 42lm/W to 79lm/W. Besides green OELDs, the HDNM-OLED structure was also shown to be able to work for red and blue-emitting OELDs and achieved enhanced light extraction efficiency (1.58-fold for red light, 1.86-fold for blue light) without further structure modification, which demonstrated the light extraction enhancement by the HDNM-OLED is broadband.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Video Polyp Segmentation: A Deep Learning Perspective
Authors:
Ge-Peng Ji,
Guobao Xiao,
Yu-Cheng Chou,
Deng-** Fan,
Kai Zhao,
Geng Chen,
Luc Van Gool
Abstract:
We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the we…
▽ More
We present the first comprehensive video polyp segmentation (VPS) study in the deep learning era. Over the years, developments in VPS are not moving forward with ease due to the lack of large-scale fine-grained segmentation annotations. To address this issue, we first introduce a high-quality frame-by-frame annotated VPS dataset, named SUN-SEG, which contains 158,690 colonoscopy frames from the well-known SUN-database. We provide additional annotations with diverse types, i.e., attribute, object mask, boundary, scribble, and polygon. Second, we design a simple but efficient baseline, dubbed PNS+, consisting of a global encoder, a local encoder, and normalized self-attention (NS) blocks. The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations, which are then progressively updated by two NS blocks. Extensive experiments show that PNS+ achieves the best performance and real-time inference speed (170fps), making it a promising solution for the VPS task. Third, we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons. Finally, we discuss several open issues and suggest possible research directions for the VPS community.
△ Less
Submitted 31 August, 2022; v1 submitted 27 March, 2022;
originally announced March 2022.
-
Don't speak too fast: The impact of data bias on self-supervised speech models
Authors:
Yen Meng,
Yi-Hui Chou,
Andy T. Liu,
Hung-yi Lee
Abstract:
Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these…
▽ More
Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. Our experiments show that S3Ms have tolerance toward gender bias. Moreover, we find that the content of speech has little impact on the performance of S3Ms across downstream tasks, but S3Ms do show a preference toward a slower speech rate.
△ Less
Submitted 26 April, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
Authors:
Yi-Hui Chou,
I-Chun Chen,
Chin-Jui Chang,
Joann Ching,
Yi-Hsuan Yang
Abstract:
This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and…
▽ More
This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
△ Less
Submitted 13 April, 2024; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Contrast Adaptive Tissue Classification by Alternating Segmentation and Synthesis
Authors:
Dzung L. Pham,
Yi-Yu Chou,
Blake E. Dewey,
Daniel S. Reich,
John A. Butman,
Snehashis Roy
Abstract:
Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantl…
▽ More
Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantly reduced performance. Furthermore, heterogeneous data sets cannot be easily evaluated because the quantitative variation due to acquisition differences often dwarfs the variation due to the biological differences that one seeks to measure. In this work, we describe an approach using alternating segmentation and synthesis steps that adapts the contrast properties of the training data to the input image. This allows input images that do not resemble the training data to be more consistently segmented. A notable advantage of this approach is that only a single example of the acquisition protocol is required to adapt to its contrast properties. We demonstrate the efficacy of our approaching using brain images from a set of human subjects scanned with two different T1-weighted volumetric protocols.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.