Search | arXiv e-print repository

Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement

Authors: Yung-Hui Lin, Yu-Wen Chang, Huang-Chia Shih, Takahiro Ogawa

Abstract: Jersey number recognition (JNR) has always been an important task in sports analytics. Improving recognition accuracy remains an ongoing challenge because images are subject to blurring, occlusion, deformity, and low resolution. Recent research has addressed these problems using number localization and optical character recognition. Some approaches apply player identification schemes to image sequ… ▽ More Jersey number recognition (JNR) has always been an important task in sports analytics. Improving recognition accuracy remains an ongoing challenge because images are subject to blurring, occlusion, deformity, and low resolution. Recent research has addressed these problems using number localization and optical character recognition. Some approaches apply player identification schemes to image sequences, ignoring the impact of human body rotation angles on jersey digit identification. Accurately predicting the number of jersey digits by using a multi-task scheme to recognize each individual digit enables more robust results. Based on the above considerations, this paper proposes a multi-task learning method called the angle-digit refine scheme (ADRS), which combines human body orientation angles and digit number clues to recognize athletic jersey numbers. Based on our experimental results, our approach increases inference information, significantly improving prediction accuracy. Compared to state-of-the-art methods, which can only handle a single type of sport, the proposed method produces a more diverse and practical JNR application. The incorporation of diverse types of team sports such as soccer, football, basketball, volleyball, and baseball into our dataset contributes greatly to generalized JNR in sports analytics. Our accuracy achieves 64.07% on Top-1 and 89.97% on Top-2, with corresponding F1 scores of 67.46% and 90.64%, respectively. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 6 figures, 5 tables

arXiv:2402.06982 [pdf, other]

Treatment-wise Glioblastoma Survival Inference with Multi-parametric Preoperative MRI

Authors: Xiaofeng Liu, Nadya Shusharina, Helen A Shih, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo

Abstract: In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of tr… ▽ More In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of treatment are the cause of ST. While previous related MR-based glioblastoma ST studies have focused only on the direct map** of MR scans to ST, they have not included the underlying causal relationship between treatments and ST. To address this limitation, we propose a treatment-conditioned regression model for glioblastoma ST that incorporates treatment information in addition to MR scans. Our approach allows us to effectively utilize the data from all of the treatments in a unified manner, rather than having to train separate models for each of the treatments. Furthermore, treatment can be effectively injected into each convolutional layer through the adaptive instance normalization we employ. We evaluate our framework on the BraTS20 ST prediction task. Three treatment options are considered: Gross Total Resection (GTR), Subtotal Resection (STR), and no resection. The evaluation results demonstrate the effectiveness of injecting the treatment for estimating GBM survival. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: SPIE Medical Imaging 2024: Computer-Aided Diagnosis

arXiv:2305.19404 [pdf, other]

Incremental Learning for Heterogeneous Structure Segmentation in Brain Tumor MRI

Authors: Xiaofeng Liu, Helen A. Shih, Fangxu Xing, Emiliano Santarnecchi, Georges El Fakhri, Jonghye Woo

Abstract: Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following… ▽ More Deep learning (DL) models for segmenting various anatomical structures have achieved great success via a static DL model that is trained in a single source domain. Yet, the static DL model is likely to perform poorly in a continually evolving environment, requiring appropriate model updates. In an incremental learning setting, we would expect that well-trained static models are updated, following continually evolving target domain data -- e.g., additional lesions or structures of interest -- collected from different sites, without catastrophic forgetting. This, however, poses challenges, due to distribution shifts, additional structures not seen during the initial model training, and the absence of training data in a source domain. To address these challenges, in this work, we seek to progressively evolve an ``off-the-shelf" trained segmentation model to diverse datasets with additional anatomical categories in a unified manner. Specifically, we first propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks, which is guided by continuous batch renormalization. Then, a complementary pseudo-label training scheme with self-entropy regularized momentum MixUp decay is developed for adaptive network optimization. We evaluated our framework on a brain tumor segmentation task with continually changing target domains -- i.e., new MRI scanners/modalities with incremental structures. Our framework was able to well retain the discriminability of previously learned structures, hence enabling the realistic life-long segmentation model extension along with the widespread accumulation of big medical data. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Early Accept to MICCAI 2023

arXiv:2205.01571 [pdf, other]

doi 10.1109/TVLSI.2022.3149768

A Real Time 1280x720 Object Detection Chip With 585MB/s Memory Traffic

Authors: Kuo-Wei Chang, Hsu-Tung Shih, Tian-Sheuan Chang, Shang-Hong Tsai, Chih-Chyau Yang, Chien-Ming Wu, Chun-Ming Huang

Abstract: Memory bandwidth has become the real-time bottleneck of current deep learning accelerators (DLA), particularly for high definition (HD) object detection. Under resource constraints, this paper proposes a low memory traffic DLA chip with joint hardware and software optimization. To maximize hardware utilization under memory bandwidth, we morph and fuse the object detection model into a group fusion… ▽ More Memory bandwidth has become the real-time bottleneck of current deep learning accelerators (DLA), particularly for high definition (HD) object detection. Under resource constraints, this paper proposes a low memory traffic DLA chip with joint hardware and software optimization. To maximize hardware utilization under memory bandwidth, we morph and fuse the object detection model into a group fusion-ready model to reduce intermediate data access. This reduces the YOLOv2's feature memory traffic from 2.9 GB/s to 0.15 GB/s. To support group fusion, our previous DLA based hardware employes a unified buffer with write-masking for simple layer-by-layer processing in a fusion group. When compared to our previous DLA with the same PE numbers, the chip implemented in a TSMC 40nm process supports 1280x720@30FPS object detection and consumes 7.9X less external DRAM access energy, from 2607 mJ to 327.6 mJ. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: 11 pages, 14 figures, to be published IEEE Transactions on Very Large Scale Integration (VLSI) Systems

arXiv:2205.00779 [pdf, other]

Zebra: Memory Bandwidth Reduction for CNN Accelerators With Zero Block Regularization of Activation Maps

Authors: Hsu-Tung Shih, Tian-Sheuan Chang

Abstract: The large amount of memory bandwidth between local buffer and external DRAM has become the speedup bottleneck of CNN hardware accelerators, especially for activation maps. To reduce memory bandwidth, we propose to learn pruning unimportant blocks dynamically with zero block regularization of activation maps (Zebra). This strategy has low computational overhead and could easily integrate with other… ▽ More The large amount of memory bandwidth between local buffer and external DRAM has become the speedup bottleneck of CNN hardware accelerators, especially for activation maps. To reduce memory bandwidth, we propose to learn pruning unimportant blocks dynamically with zero block regularization of activation maps (Zebra). This strategy has low computational overhead and could easily integrate with other pruning methods for better performance. The experimental results show that the proposed method can reduce 70\% of memory bandwidth for Resnet-18 on Tiny-Imagenet within 1\% accuracy drops and 2\% accuracy gain with the combination of Network Slimming. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: 5 pages, 5 figures, published in IEEE ISCAS 2021

arXiv:2202.10277 [pdf, other]

End-to-End High Accuracy License Plate Recognition Based on Depthwise Separable Convolution Networks

Authors: Song-Ren Wang, Hong-Yang Shih, Zheng-Yi Shen, Wen-Kai Tai

Abstract: Automatic license plate recognition plays a crucial role in modern transportation systems such as for traffic monitoring and vehicle violation detection. In real-world scenarios, license plate recognition still faces many challenges and is impaired by unpredictable interference such as weather or lighting conditions. Many machine learning based ALPR solutions have been proposed to solve such chall… ▽ More Automatic license plate recognition plays a crucial role in modern transportation systems such as for traffic monitoring and vehicle violation detection. In real-world scenarios, license plate recognition still faces many challenges and is impaired by unpredictable interference such as weather or lighting conditions. Many machine learning based ALPR solutions have been proposed to solve such challenges in recent years. However, most are not convincing, either because their results are evaluated on small or simple datasets that lack diverse surroundings, or because they require powerful hardware to achieve a reasonable frames-per-second in real-world applications. In this paper, we propose a novel segmentation-free framework for license plate recognition and introduce NP-ALPR, a diverse and challenging dataset which resembles real-world scenarios. The proposed network model consists of the latest deep learning methods and state-of-the-art ideas, and benefits from a novel network architecture. It achieves higher accuracy with lower computational requirements than previous works. We evaluate the effectiveness of the proposed method on three different datasets and show a recognition accuracy of over 99% and over 70 fps, demonstrating that our method is not only robust but also computationally efficient. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:1706.03038 [pdf, other]

Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Authors: Mohammadamin Barekatain, Miquel Martí, Hsueh-Fu Shih, Samuel Murray, Kotaro Nakayama, Yutaka Matsuo, Helmut Prendinger

Abstract: Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing i… ▽ More Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications. △ Less

Submitted 15 June, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

Comments: Computer Vision and Pattern Recognition Workshops (CVPRW), Hawaii, USA, 2017

arXiv:1703.01170 [pdf]

doi 10.1109/TCSVT.2017.2655624

A Survey on Content-Aware Video Analysis for Sports

Authors: Huang-Chia Shih

Abstract: Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretati… ▽ More Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. On the basis of this insight, we provide an overview of the themes particularly relevant to the research on content-aware systems for broadcast sports. Specifically, we focus on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges. Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. In each group, the gap between sensation and content excitement must be bridged using proper strategies. In this regard, a content-aware approach is required to determine user demands. Finally, the paper summarizes the future trends and challenges for sports video analysis. We believe that our findings can advance the field of research on content-aware video analysis for broadcast sports. △ Less

Submitted 3 March, 2017; originally announced March 2017.

Comments: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

Showing 1–8 of 8 results for author: Shih, H