Search | arXiv e-print repository

Roadside Units Assisted Localized Automated Vehicle Maneuvering: An Offline Reinforcement Learning Approach

Authors: Kui Wang, Changyang She, Zongdian Li, Tao Yu, Yonghui Li, Kei Sakaguchi

Abstract: Traffic intersections present significant challenges for the safe and efficient maneuvering of connected and automated vehicles (CAVs). This research proposes an innovative roadside unit (RSU)-assisted cooperative maneuvering system aimed at enhancing road safety and traveling efficiency at intersections for CAVs. We utilize RSUs for real-time traffic data acquisition and train an offline reinforc… ▽ More Traffic intersections present significant challenges for the safe and efficient maneuvering of connected and automated vehicles (CAVs). This research proposes an innovative roadside unit (RSU)-assisted cooperative maneuvering system aimed at enhancing road safety and traveling efficiency at intersections for CAVs. We utilize RSUs for real-time traffic data acquisition and train an offline reinforcement learning (RL) algorithm based on human driving data. Evaluation results obtained from hardware-in-loop autonomous driving simulations show that our approach employing the twin delayed deep deterministic policy gradient and behavior cloning (TD3+BC), achieves performance comparable to state-of-the-art autonomous driving systems in terms of safety measures while significantly enhancing travel efficiency by up to 17.38% in intersection areas. This paper makes a pivotal contribution to the field of intelligent transportation systems, presenting a breakthrough solution for improving urban traffic flow and safety at intersections. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 6 pages, 6 figures

arXiv:2403.18992 [pdf]

Tractography with T1-weighted MRI and associated anatomical constraints on clinical quality diffusion MRI

Authors: Tian Yu, Yunhe Li, Michael E. Kim, Chenyu Gao, Qi Yang, Leon Y. Cai, Susane M. Resnick, Lori L. Beason-Held, Daniel C. Moyer, Kurt G. Schilling, Bennett A. Landman

Abstract: Diffusion MRI (dMRI) streamline tractography, the gold standard for in vivo estimation of brain white matter (WM) pathways, has long been considered indicative of macroscopic relationships with WM microstructure. However, recent advances in tractography demonstrated that convolutional recurrent neural networks (CoRNN) trained with a teacher-student framework have the ability to learn and propagate… ▽ More Diffusion MRI (dMRI) streamline tractography, the gold standard for in vivo estimation of brain white matter (WM) pathways, has long been considered indicative of macroscopic relationships with WM microstructure. However, recent advances in tractography demonstrated that convolutional recurrent neural networks (CoRNN) trained with a teacher-student framework have the ability to learn and propagate streamlines directly from T1 and anatomical contexts. Training for this network has previously relied on high-resolution dMRI. In this paper, we generalize the training mechanism to traditional clinical resolution data, which allows generalizability across sensitive and susceptible study populations. We train CoRNN on a small subset of the Baltimore Longitudinal Study of Aging (BLSA), which better resembles clinical protocols. Then, we define a metric, termed the epsilon ball seeding method, to compare T1 tractography and traditional diffusion tractography at the streamline level. Under this metric, T1 tractography generated by CoRNN reproduces diffusion tractography with approximately two millimeters of error. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2402.12682 [pdf, other]

doi 10.1109/TIV.2024.3368109

Smart Mobility Digital Twin Based Automated Vehicle Navigation System: A Proof of Concept

Authors: Kui Wang, Zongdian Li, Kazuma Nonomura, Tao Yu, Kei Sakaguchi, Omar Hashash, Walid Saad

Abstract: Digital twins (DTs) have driven major advancements across various industrial domains over the past two decades. With the rapid advancements in autonomous driving and vehicle-to-everything (V2X) technologies, integrating DTs into vehicular platforms is anticipated to further revolutionize smart mobility systems. In this paper, a new smart mobility DT (SMDT) platform is proposed for the control of c… ▽ More Digital twins (DTs) have driven major advancements across various industrial domains over the past two decades. With the rapid advancements in autonomous driving and vehicle-to-everything (V2X) technologies, integrating DTs into vehicular platforms is anticipated to further revolutionize smart mobility systems. In this paper, a new smart mobility DT (SMDT) platform is proposed for the control of connected and automated vehicles (CAVs) over next-generation wireless networks. In particular, the proposed platform enables cloud services to leverage the abilities of DTs to promote the autonomous driving experience. To enhance traffic efficiency and road safety measures, a novel navigation system that exploits available DT information is designed. The SMDT platform and navigation system are implemented with state-of-the-art products, e.g., CAVs and roadside units (RSUs), and emerging technologies, e.g., cloud and cellular V2X (C-V2X). In addition, proof-of-concept (PoC) experiments are conducted to validate system performance. The performance of SMDT is evaluated from two standpoints: (i) the rewards of the proposed navigation system on traffic efficiency and safety and, (ii) the latency and reliability of the SMDT platform. Our experimental results using SUMO-based large-scale traffic simulations show that the proposed SMDT can reduce the average travel time and the blocking probability due to unexpected traffic incidents. Furthermore, the results record a peak overall latency for DT modeling and route planning services to be 155.15 ms and 810.59 ms, respectively, which validates that our proposed design aligns with the 3GPP requirements for emerging V2X use cases and fulfills the targets of the proposed design. Our demonstration video can be found at https://youtu.be/3waQwlaHQkk. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 15 pages, 10 figures

arXiv:2402.00565 [pdf, other]

A Review of Carsickness Mitigation: Navigating Challenges and Exploiting Opportunities in the Era of Intelligent Vehicles

Authors: Daofei Li, Tingzhe Yu, Binbin Tang

Abstract: Motion sickness (MS) has long been a common complaint in road transportation. However, in the era of driving automation, MS has become an increasingly significant issue. The future intelligent vehicle is envisioned as a mobile space for work or entertainment, but unfortunately passengers' engagement in non-driving tasks may exacerbate MS. Finding effective MS countermeasures is crucial to ensure a… ▽ More Motion sickness (MS) has long been a common complaint in road transportation. However, in the era of driving automation, MS has become an increasingly significant issue. The future intelligent vehicle is envisioned as a mobile space for work or entertainment, but unfortunately passengers' engagement in non-driving tasks may exacerbate MS. Finding effective MS countermeasures is crucial to ensure a pleasant passenger experience. Nevertheless, due to the complex mechanism of MS, there are numerous challenges in mitigating it, hindering the development of practical countermeasures. To address this, we first review two prevalent theories explaining the mechanism of MS. Subsequently, this paper provides a summary of current subjective and objective approaches for quantifying motion sickness levels. Then, it surveys existing methods for alleviating MS, including passenger adjustment, intelligent vehicle solutions, and motion cues of various modalities. Furthermore, we outline the limitations and remaining challenges of current research and highlight novel opportunities in the context of intelligent vehicles. Finally, we propose an integrated framework for alleviating MS. The findings of this review will enhance our understanding of carsickness and offer valuable insights for future research and practice in MS mitigation within modern vehicles. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 19 pages, 5 figures, 5 tables

arXiv:2401.12235 [pdf]

Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot Adaption via Contextual Meta Graph Reinforcement Learning

Authors: Bairong Deng, Tao Yu, Zhenning Pan, Xuehan Zhang, Yufeng Wu, Qiaoyi Ding

Abstract: Reinforcement learning is an emerging approaches to facilitate multi-stage sequential decision-making problems. This paper studies a real-time multi-stage stochastic power dispatch considering multivariate uncertainties. Current researches suffer from low generalization and practicality, that is, the learned dispatch policy can only handle a specific dispatch scenario, its performance degrades sig… ▽ More Reinforcement learning is an emerging approaches to facilitate multi-stage sequential decision-making problems. This paper studies a real-time multi-stage stochastic power dispatch considering multivariate uncertainties. Current researches suffer from low generalization and practicality, that is, the learned dispatch policy can only handle a specific dispatch scenario, its performance degrades significantly if actual samples and training samples are inconsistent. To fill these gaps, a novel contextual meta graph reinforcement learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch policy is proposed. Specifically, a more general contextual Markov decision process (MDP) and scalable graph representation are introduced to achieve a more generalized multi-stage stochastic power dispatch modeling. An upper meta-learner is proposed to encode context for different dispatch scenarios and learn how to achieve dispatch task identification while the lower policy learner learns context-specified dispatch policy. After sufficient offline learning, this approach can rapidly adapt to unseen and undefined scenarios with only a few updations of the hypothesis judgments generated by the meta-learner. Numerical comparisons with state-of-the-art policies and traditional reinforcement learning verify the optimality, efficiency, adaptability, and scalability of the proposed Meta-GRL. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.08654 [pdf, other]

Smart Mobility Digital Twin for Automated Driving: Design and Proof-of-Concept

Authors: Kui Wang, Zongdian Li, Tao Yu, Kei Sakaguchi

Abstract: During the past decade, smart mobility and intelligent vehicles have attracted increasing attention, because they promise to create a highly efficient and safe transportation system in the future. Meanwhile, digital twin, as an emerging technology, will play an important role in automated driving and intelligent transportation systems. This technology is applied in this paper to design a platform… ▽ More During the past decade, smart mobility and intelligent vehicles have attracted increasing attention, because they promise to create a highly efficient and safe transportation system in the future. Meanwhile, digital twin, as an emerging technology, will play an important role in automated driving and intelligent transportation systems. This technology is applied in this paper to design a platform for smart mobility, providing large-scale route planning services. Utilizing sensing technologies and cloud/edge computing, we build a digital twin system model that reflects the static and dynamic objects from the real world in real time. With the smart mobility platform, we realize traffic monitoring and route planning through cooperative environment perception to help automated vehicles circumvent jams. A proof-of-concept test with a real vehicle in real traffic is conducted to validate the functions and the delay performance of the proposed platform. △ Less

Submitted 24 December, 2023; originally announced January 2024.

arXiv:2401.02099 [pdf]

Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

Authors: Zeyu Li, Suncheng Xiang, Tong Yu, **gsheng Gao, Jiacheng Ruan, Yan** Hu, Ting Liu, Yuzhuo Fu

Abstract: The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audi… ▽ More The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audio data and predict the vessel type. The current UATR dataset exhibits shortcomings in both duration and sample quantity. In this paper, we propose Oceanship, a large-scale and diverse underwater audio dataset. This dataset comprises 15 categories, spans a total duration of 121 hours, and includes comprehensive annotation information such as coordinates, velocity, vessel types, and timestamps. We compiled the dataset by crawling and organizing original communication data from the Ocean Communication Network (ONC) database between 2021 and 2022. While audio retrieval tasks are well-established in general audio classification, they have not been explored in the context of underwater audio recognition. Leveraging the Oceanship dataset, we introduce a baseline model named Oceannet for underwater audio retrieval. This model achieves a recall at 1 (R@1) accuracy of 67.11% and a recall at 5 (R@5) accuracy of 99.13% on the Deepship dataset. △ Less

Submitted 10 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by ICIC 2024

arXiv:2306.13101 [pdf, other]

BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Authors: Junru Chen, Yang Yang, Tao Yu, Yingying Fan, Xiaolong Mo, Carl Yang

Abstract: Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recentl… ▽ More Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recently developed stereoelectrocorticography (SEEG) method provides information in stereo that is more precise than conventional EEG, and has been broadly applied in clinical practice. Therefore, we propose the first data-driven study to detect epileptic waves in a real-world SEEG dataset. While offering new opportunities, SEEG also poses several challenges. In clinical practice, epileptic wave activities are considered to propagate between different regions in the brain. These propagation paths, also known as the epileptogenic network, are deemed to be a key factor in the context of epilepsy surgery. However, the question of how to extract an exact epileptogenic network for each patient remains an open problem in the field of neuroscience. To address these challenges, we propose a novel model (BrainNet) that jointly learns the dynamic diffusion graphs and models the brain wave diffusion patterns. In addition, our model effectively aids in resisting label imbalance and severe noise by employing several self-supervised learning tasks and a hierarchical framework. By experimenting with the extensive real SEEG dataset obtained from multiple patients, we find that BrainNet outperforms several latest state-of-the-art baselines derived from time-series analysis. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.18771 [pdf, other]

SFCNeXt: a simple fully convolutional network for effective brain age estimation with small sample size

Authors: Yu Fu, Yanyan Huang, Shunjie Dong, Yalin Wang, Tianbai Yu, Meng Niu, Cheng Zhuo

Abstract: Deep neural networks (DNN) have been designed to predict the chronological age of a healthy brain from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as a valuable biomarker for the early detection of development-related or aging-related disorders. Recent DNN models for brain age estimations usually rely too much on large sample sizes and complex network s… ▽ More Deep neural networks (DNN) have been designed to predict the chronological age of a healthy brain from T1-weighted magnetic resonance images (T1 MRIs), and the predicted brain age could serve as a valuable biomarker for the early detection of development-related or aging-related disorders. Recent DNN models for brain age estimations usually rely too much on large sample sizes and complex network structures for multi-stage feature refinement. However, in clinical application scenarios, researchers usually cannot obtain thousands or tens of thousands of MRIs in each data center for thorough training of these complex models. This paper proposes a simple fully convolutional network (SFCNeXt) for brain age estimation in small-sized cohorts with biased age distributions. The SFCNeXt consists of Single Pathway Encoded ConvNeXt (SPEC) and Hybrid Ranking Loss (HRL), aiming to estimate brain ages in a lightweight way with a sufficient exploration of MRI, age, and ranking features of each batch of subjects. Experimental results demonstrate the superiority and efficiency of our approach. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: This paper has been accepted by IEEE ISBI 2023

arXiv:2305.15526 [pdf, other]

Radiomap Inpainting for Restricted Areas based on Propagation Priority and Depth Map

Authors: Songyang Zhang, Tianhang Yu, Brian Choi, Feng Ouyang, Zhi Ding

Abstract: Providing rich and useful information regarding spectrum activities and propagation channels, radiomaps characterize the detailed distribution of power spectral density (PSD) and are important tools for network planning in modern wireless systems. Generally, radiomaps are constructed from radio strength measurements by deployed sensors and user devices. However, not all areas are accessible for ra… ▽ More Providing rich and useful information regarding spectrum activities and propagation channels, radiomaps characterize the detailed distribution of power spectral density (PSD) and are important tools for network planning in modern wireless systems. Generally, radiomaps are constructed from radio strength measurements by deployed sensors and user devices. However, not all areas are accessible for radio measurements due to physical constraints and security consideration, leading to non-uniformly spaced measurements and blanks on a radiomap. In this work, we explore distribution of radio spectrum strengths in view of surrounding environments, and propose two radiomap inpainting approaches for the reconstruction of radiomaps that cover missing areas. Specifically, we first define a propagation-based priority and integrate exemplar-based inpainting with radio propagation model for fine-resolution small-size missing area reconstruction on a radiomap. Then, we introduce a novel radio depth map and propose a two-step template-perturbation approach for large-size restricted region inpainting. Our experimental results demonstrate the power of the proposed propagation priority and radio depth map in capturing the PSD distribution, as well as the efficacy of the proposed methods for radiomap reconstruction. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: submitted to IEEE journal for possible publication

arXiv:2305.09011 [pdf, other]

The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)

Authors: Hongwei Bran Li, Gian Marco Conte, Syed Muhammad Anwar, Florian Kofler, Ivan Ezhov, Koen van Leemput, Marie Piraud, Maria Diaz, Byrone Cole, Evan Calabrese, Jeff Rudie, Felix Meissen, Maruf Adewole, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Ahmed W. Moawad, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman , et al. (43 additional authors not shown)

Abstract: Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time const… ▽ More Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time constraints or image artifacts, such as patient motion. Consequently, the ability to substitute missing modalities and gain segmentation performance is highly desirable and necessary for the broader adoption of these algorithms in the clinical routine. In this work, we present the establishment of the Brain MR Image Synthesis Benchmark (BraSyn) in conjunction with the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2023. The primary objective of this challenge is to evaluate image synthesis methods that can realistically generate missing MRI modalities when multiple available images are provided. The ultimate aim is to facilitate automated brain tumor segmentation pipelines. The image dataset used in the benchmark is diverse and multi-modal, created through collaboration with various hospitals and research institutions. △ Less

Submitted 28 June, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: Technical report of BraSyn

arXiv:2305.05644 [pdf, other]

Towards Building the Federated GPT: Federated Instruction Tuning

Authors: Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Yufan Zhou, Guoyin Wang, Yiran Chen

Abstract: While "instruction-tuned" generative large language models (LLMs) have demonstrated an impressive ability to generalize to new tasks, the training phases heavily rely on large amounts of diverse and high-quality instruction data (such as ChatGPT and GPT-4). Unfortunately, acquiring high-quality data, especially when it comes to human-written data, can pose significant challenges both in terms of c… ▽ More While "instruction-tuned" generative large language models (LLMs) have demonstrated an impressive ability to generalize to new tasks, the training phases heavily rely on large amounts of diverse and high-quality instruction data (such as ChatGPT and GPT-4). Unfortunately, acquiring high-quality data, especially when it comes to human-written data, can pose significant challenges both in terms of cost and accessibility. Moreover, concerns related to privacy can further limit access to such data, making the process of obtaining it a complex and nuanced undertaking. Consequently, this hinders the generality of the tuned models and may restrict their effectiveness in certain contexts. To tackle this issue, our study introduces a new approach called Federated Instruction Tuning (FedIT), which leverages federated learning (FL) as the learning framework for the instruction tuning of LLMs. This marks the first exploration of FL-based instruction tuning for LLMs. This is especially important since text data is predominantly generated by end users. Therefore, it is imperative to design and adapt FL approaches to effectively leverage these users' diverse instructions stored on local devices, while preserving privacy and ensuring data security. In the current paper, by conducting widely used GPT-4 auto-evaluation, we demonstrate that by exploiting the heterogeneous and diverse sets of instructions on the client's end with the proposed framework FedIT, we improved the performance of LLMs compared to centralized training with only limited local instructions. Further, in this paper, we developed a Github repository named Shepherd. This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories. △ Less

Submitted 29 January, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Project page: https://github.com/JayZhang42/FederatedGPT-Shepherd

arXiv:2305.02583 [pdf, other]

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

Authors: Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

Abstract: Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage thei… ▽ More Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage their respective advantages for robust AHS. During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN). During streaming inference, the DNN's parameters are fixed while its output serves as a reference signal for updating the Kalman filter. Evaluation in both offline and streaming inference scenarios using simulated and real-recorded data shows that the proposed method efficiently suppresses howling and consistently outperforms baselines. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: submitted to INTERSPEECH 2023. arXiv admin note: text overlap with arXiv:2302.09252

arXiv:2303.07704 [pdf, other]

TEA-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System For ICASSP 2023 DNS Challenge

Authors: Yukai Ju, Jun Chen, Shimin Zhang, Shulin He, Wei Rao, Weixin Zhu, Yannan Wang, Tao Yu, Shidong Shang

Abstract: This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) struct… ▽ More This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM after squeezed temporal convolution network (S-TCN) to enhance sequence modeling capabilities. Additionally, the local-global representation (LGR) structure is introduced to boost speaker information extraction, and multi-STFT resolution loss is used to effectively capture the time-frequency characteristics of the speech signals. Moreover, retraining methods are employed based on the freeze training strategy to fine-tune the system. According to the official results, TEA-PSE 3.0 ranks 1st in both ICASSP 2023 DNS-Challenge track 1 and track 2. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2301.00504 [pdf]

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Authors: Timothy T. Yu, Da Ma, Jayden Cole, Myeong ** Ju, Mirza F. Beg, Marinko V. Sarunic

Abstract: Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subs… ▽ More Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture. △ Less

Submitted 1 January, 2023; originally announced January 2023.

arXiv:2212.11233 [pdf]

Realization Scheme for Visual Cryptography with Computer-generated Holograms

Authors: Tao Yu, **ge Ma, Guilin Li, Dongyu Yang, Rui Ma, Yishi Shi

Abstract: We propose to realize visual cryptography in an indirect way with the help of computer-generated hologram. At present, the recovery method of visual cryptography is mainly superimposed on transparent film or superimposed by computer equipment, which greatly limits the application range of visual cryptography. In this paper, the shares of the visual cryptography were encoded with computer-generated… ▽ More We propose to realize visual cryptography in an indirect way with the help of computer-generated hologram. At present, the recovery method of visual cryptography is mainly superimposed on transparent film or superimposed by computer equipment, which greatly limits the application range of visual cryptography. In this paper, the shares of the visual cryptography were encoded with computer-generated hologram, and the shares is reproduced by optical means, and then superimposed and decrypted. This method can expand the application range of visual cryptography and further increase the security of visual cryptography. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: International Workshop on Holography and related technologies (IWH) 2018

arXiv:2212.00729 [pdf, other]

Edge Deep Learning Enabled Freezing of Gait Detection in Parkinson's Patients

Authors: Ourong Lin, Tian Yu, Yuhan Hou, Yi Zhu, Xilin Liu

Abstract: This paper presents the design of a wireless sensor network for detecting and alerting the freezing of gait (FoG) symptoms in patients with Parkinson's disease. Three sensor nodes, each integrating a 3-axis accelerometer, can be placed on a patient at ankle, thigh, and truck. Each sensor node can independently detect FoG using an on-device deep learning (DL) model, featuring a squeeze and excitati… ▽ More This paper presents the design of a wireless sensor network for detecting and alerting the freezing of gait (FoG) symptoms in patients with Parkinson's disease. Three sensor nodes, each integrating a 3-axis accelerometer, can be placed on a patient at ankle, thigh, and truck. Each sensor node can independently detect FoG using an on-device deep learning (DL) model, featuring a squeeze and excitation convolutional neural network (CNN). In a validation using a public dataset, the prototype developed achieved a FoG detection sensitivity of 88.8% and an F1 score of 85.34%, using less than 20 k trainable parameters per sensor node. Once FoG is detected, an auditory signal will be generated to alert users, and the alarm signal will also be sent to mobile phones for further actions if needed. The sensor node can be easily recharged wirelessly by inductive coupling. The system is self-contained and processes all user data locally without streaming data to external devices or the cloud, thus eliminating the cybersecurity risks and power penalty associated with wireless data transmission. The developed methodology can be used in a wide range of applications. △ Less

Submitted 27 November, 2022; originally announced December 2022.

arXiv:2211.05432 [pdf, other]

Speech Enhancement with Fullband-Subband Cross-Attention Network

Authors: Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng

Abstract: FullSubNet has shown its promising performance on speech enhancement by utilizing both fullband and subband information. However, the relationship between fullband and subband in FullSubNet is achieved by simply concatenating the output of fullband model and subband units. It only supplements the subband units with a small quantity of global information and has not considered the interaction betwe… ▽ More FullSubNet has shown its promising performance on speech enhancement by utilizing both fullband and subband information. However, the relationship between fullband and subband in FullSubNet is achieved by simply concatenating the output of fullband model and subband units. It only supplements the subband units with a small quantity of global information and has not considered the interaction between fullband and subband. This paper proposes a fullband-subband cross-attention (FSCA) module to interactively fuse the global and local information and applies it to FullSubNet. This new framework is called as FS-CANet. Moreover, different from FullSubNet, the proposed FS-CANet optimize the fullband extractor by temporal convolutional network (TCN) blocks to further reduce the model size. Experimental results on DNS Challenge - Interspeech 2021 dataset show that the proposed FS-CANet outperforms other state-of-the-art speech enhancement approaches, and demonstrate the effectiveness of fullband-subband cross-attention. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted by InterSpeech 2022. arXiv admin note: text overlap with arXiv:2203.12188

arXiv:2210.02883 [pdf, ps, other]

A Novel Energy Efficiency Metric for Next Generation Wireless Communication Networks

Authors: Tao Yu, Shunqing Zhang, Xiao**g Chen, Xin Wang

Abstract: As a core performance metric for green communications, the conventional energy efficiency definition has successfully resolved many issues in the energy efficient wireless network design. In the past several generations of wireless communication networks, the traditional energy efficiency measure plays an important role to guide many energy saving techniques for slow varying traffic profiles. Howe… ▽ More As a core performance metric for green communications, the conventional energy efficiency definition has successfully resolved many issues in the energy efficient wireless network design. In the past several generations of wireless communication networks, the traditional energy efficiency measure plays an important role to guide many energy saving techniques for slow varying traffic profiles. However, for the next generation wireless networks, the traditional energy efficiency fails to capture the traffic and capacity variations of wireless networks in temporal or spatial domains, which is shown to be quite popular, especially with ultra-scale multiple antennas and space-air-ground integrated network. In this paper, we present a novel energy efficiency metric named integrated relative energy efficiency (IREE), which is able to jointly measure the traffic profiles and the network capacities from the energy efficiency perspective. On top of that, the IREE based green trade-offs have been investigated and compared with the conventional energy efficient design. Moreover, we apply the IREE based green trade-offs to evaluate several candidate technologies for 6G networks, including reconfigurable intelligent surfaces and space-air-ground integrated network. Through some analytical and numerical results, we show that the proposed IREE metric is able to capture the wireless traffic and capacity mismatch property, which is significantly different from the conventional energy efficiency metric. Since the IREE oriented design or deployment strategy is able to consider the network capacity improvement and the wireless traffic matching simultaneously, it can be regarded as a useful guidance for future energy efficient network design. △ Less

Submitted 9 September, 2022; originally announced October 2022.

arXiv:2210.00778 [pdf, other]

On Lattice-Code based Multiple Access: Uplink Architecture and Algorithms

Authors: Tao Yang. Fangtao Yu, Qiuzhuo Chen, Rongke Liu

Abstract: This paper studies a lattice-code based multiple-access (LCMA) framework, and develops a package of processing techniques that are essential to its practical implementation. In the uplink, $K$ users encode their messages with the same ring coded modulation of $2^{m}$-PAM signaling. With it, the integer sum of multiple codewords belongs to the $n$-dimension lattice of the base code. Such property e… ▽ More This paper studies a lattice-code based multiple-access (LCMA) framework, and develops a package of processing techniques that are essential to its practical implementation. In the uplink, $K$ users encode their messages with the same ring coded modulation of $2^{m}$-PAM signaling. With it, the integer sum of multiple codewords belongs to the $n$-dimension lattice of the base code. Such property enables efficient \textit{algebraic binning} for computing linear combinations of $K$ users' messages. For the receiver, we devise two new algorithms, based on linear physical-layer network coding and linear filtering, to calculate the symbol-wise a posteriori probabilities (APPs) w.r.t. the $K$ streams of linear codeword combinations. The resultant APP streams are forwarded to the $q$-ary belief-propagation decoders, which parallelly compute $K$ streams of linear message combinations. Finally, by multiplying the inverse of the coefficient matrix, all users' messages are recovered. Even with single-stage parallel processing, LCMA is shown to support a remarkably larger number of users and exhibits improved frame error rate (FER) relative to existing NOMA systems such as IDMA and SCMA. Further, we propose a new multi-stage LCMA receiver relying on \emph{generalized matrix inversion}. With it, a near-capacity performance is demonstrated for a wide range of system loads. Numerical results demonstrate that the number of users that LCMA can support is no less than 350\% of the length of the spreading sequence or number of receive antennas. Since LCMA relaxes receiver iteration, off-the-shelf channel codes in standards can be directly utilized, avoiding the compatibility and convergence issue of channel code and detector in IDMA and SCMA. △ Less

Submitted 21 June, 2024; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 30 Pages, 11 figures, submitted to IEEE Trans. Wireless Comm

arXiv:2209.10843 [pdf, ps, other]

A Unified Joint Optimization of Training Sequences and Transceivers Based on Matrix-Monotonic Optimization

Authors: Chengwen Xing, Tao Yu, **peng Song, Zhong Zheng, Lian Zhao, Lajos Hanzo

Abstract: Channel estimation and data transmission constitute the most fundamental functional modules of multiple-input multiple-output (MIMO) communication systems. The underlying key tasks corresponding to these modules are training sequence optimization and transceiver optimization. Hence, we jointly optimize the linear transmit precoder and the training sequence of MIMO systems using the metrics of thei… ▽ More Channel estimation and data transmission constitute the most fundamental functional modules of multiple-input multiple-output (MIMO) communication systems. The underlying key tasks corresponding to these modules are training sequence optimization and transceiver optimization. Hence, we jointly optimize the linear transmit precoder and the training sequence of MIMO systems using the metrics of their effective mutual information (MI), effective mean squared error (MSE), effective weighted MI, effective weighted MSE, as well as their effective generic Schur-convex and Schur-concave functions. Both statistical channel state information (CSI) and estimated CSI are considered at the transmitter in the joint optimization. A unified framework termed as joint matrix-monotonic optimization is proposed. Based on this, the optimal precoder matrix and training matrix structures can be derived for both CSI scenarios. Then, based on the optimal matrix structures, our linear transceivers and their training sequences can be jointly optimized. Compared to state-of-the-art benchmark algorithms, the proposed algorithms visualize the bold explicit relationships between the attainable system performance of our linear transceivers conceived and their training sequences, leading to implementation ready recipes. Finally, several numerical results are provided, which corroborate our theoretical results and demonstrate the compelling benefits of our proposed pilot-aided MIMO solutions. △ Less

Submitted 17 July, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 39 pages, 9 figures, 1 table, manuscript accepted in IEEE TVT

arXiv:2209.04566 [pdf, other]

Exemplar-Based Radio Map Reconstruction of Missing Areas Using Propagation Priority

Authors: Songyang Zhang, Tianhang Yu, Jonathan Tivald, Brian Choi, Feng Ouyang, Zhi Ding

Abstract: Radio map describes network coverage and is a practically important tool for network planning in modern wireless systems. Generally, radio strength measurements are collected to construct fine-resolution radio maps for analysis. However, certain protected areas are not accessible for measurement due to physical constraints and security considerations, leading to blanked spaces on a radio map. Non-… ▽ More Radio map describes network coverage and is a practically important tool for network planning in modern wireless systems. Generally, radio strength measurements are collected to construct fine-resolution radio maps for analysis. However, certain protected areas are not accessible for measurement due to physical constraints and security considerations, leading to blanked spaces on a radio map. Non-uniformly spaced measurement and uneven observation resolution make it more difficult for radio map estimation and spectrum planning in protected areas. This work explores the distribution of radio spectrum strengths and proposes an exemplar-based approach to reconstruct missing areas on a radio map. Instead of taking generic image processing approaches, we leverage radio propagation models to determine directions of region filling and develop two different schemes to estimate the missing radio signal power. Our test results based on high-fidelity simulation demonstrate efficacy of the proposed methods for radio map reconstruction. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: To appear in 2022 IEEE Global Communications Conference (Globecom)

arXiv:2208.14635 [pdf, other]

Segmentation-guided Domain Adaptation and Data Harmonization of Multi-device Retinal Optical Coherence Tomography using Cycle-Consistent Generative Adversarial Networks

Authors: Shuo Chen, Da Ma, Sieun Lee, Timothy T. L. Yu, Gavin Xu, Donghuan Lu, Karteek Popuri, Myeong ** Ju, Marinko V. Sarunic, Mirza Faisal Beg

Abstract: Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of reti… ▽ More Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of retinal layer thicknesses and the retinal layer deformation due to the presence of the fluid are highly correlated with multiple epidemic eye diseases like Diabetic Retinopathy(DR) and Age-related Macular Degeneration (AMD). However, these images are acquired from different devices, which have different intensity distribution, or in other words, belong to different imaging domains. This paper proposes a segmentation-guided domain-adaptation method to adapt images from multiple devices into single image domain, where the state-of-art pre-trained segmentation model is available. It avoids the time consumption of manual labelling for the upcoming new dataset and the re-training of the existing network. The semantic consistency and global feature consistency of the network will minimize the hallucination effect that many researchers reported regarding Cycle-Consistent Generative Adversarial Networks(CycleGAN) architecture. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: 16 pages, 10 figures

arXiv:2208.13328 [pdf, other]

Slice estimation in diffusion MRI of neonatal and fetal brains in image and spherical harmonics domains using autoencoders

Authors: Hamza Kebiri, Gabriel Girard, Yasser Aleman-Gomez, Thomas Yu, Andras Jakab, Erick Jorge Canales-Rodriguez, Meritxell Bach Cuadra

Abstract: Diffusion MRI (dMRI) of the develo** brain can provide valuable insights into the white matter development. However, slice thickness in fetal dMRI is typically high (i.e., 3-5 mm) to freeze the in-plane motion, which reduces the sensitivity of the dMRI signal to the underlying anatomy. In this study, we aim at overcoming this problem by using autoencoders to learn unsupervised efficient represen… ▽ More Diffusion MRI (dMRI) of the develo** brain can provide valuable insights into the white matter development. However, slice thickness in fetal dMRI is typically high (i.e., 3-5 mm) to freeze the in-plane motion, which reduces the sensitivity of the dMRI signal to the underlying anatomy. In this study, we aim at overcoming this problem by using autoencoders to learn unsupervised efficient representations of brain slices in a latent space, using raw dMRI signals and their spherical harmonics (SH) representation. We first learn and quantitatively validate the autoencoders on the develo** Human Connectome Project pre-term newborn data, and further test the method on fetal data. Our results show that the autoencoder in the signal domain better synthesized the raw signal. Interestingly, the fractional anisotropy and, to a lesser extent, the mean diffusivity, are best recovered in missing slices by using the autoencoder trained with SH coefficients. A comparison was performed with the same maps reconstructed using an autoencoder trained with raw signals, as well as conventional interpolation methods of raw signals and SH coefficients. From these results, we conclude that the recovery of missing/corrupted slices should be performed in the signal domain if the raw signal is aimed to be recovered, and in the SH domain if diffusion tensor properties (i.e., fractional anisotropy) are targeted. Notably, the trained autoencoders were able to generalize to fetal dMRI data acquired using a much smaller number of diffusion gradients and a lower b-value, where we qualitatively show the consistency of the estimated diffusion tensor maps. △ Less

Submitted 28 August, 2022; originally announced August 2022.

arXiv:2207.02663 [pdf, other]

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Authors: Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

Abstract: With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major… ▽ More With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major languages, such as English and Chinese. There is a huge data scarcity issue for low-resource languages, hindering the development of research and applications for broader communities. Therefore, it is crucial to have more benchmarks to raise awareness and motivate the research in low-resource languages. To mitigate this problem, we collect a new dataset, namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car speech recognition in the Cantonese language with video and audio data. Together with it, we propose Cantonese Audio-Visual Speech Recognition for In-car Commands as a new challenge for the community to tackle low-resource speech recognition under in-car scenarios. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2207.00492 [pdf]

Reinforcement Learning Based User-Guided Motion Planning for Human-Robot Collaboration

Authors: Tian Yu, Qing Chang

Abstract: Robots are good at performing repetitive tasks in modern manufacturing industries. However, robot motions are mostly planned and preprogrammed with a notable lack of adaptivity to task changes. Even for slightly changed tasks, the whole system must be reprogrammed by robotics experts. Therefore, it is highly desirable to have a flexible motion planning method, with which robots can adapt to specif… ▽ More Robots are good at performing repetitive tasks in modern manufacturing industries. However, robot motions are mostly planned and preprogrammed with a notable lack of adaptivity to task changes. Even for slightly changed tasks, the whole system must be reprogrammed by robotics experts. Therefore, it is highly desirable to have a flexible motion planning method, with which robots can adapt to specific task changes in unstructured environments, such as production systems or warehouses, with little or no intervention from non-expert personnel. In this paper, we propose a user-guided motion planning algorithm in combination with the reinforcement learning (RL) method to enable robots automatically generate their motion plans for new tasks by learning from a few kinesthetic human demonstrations. To achieve adaptive motion plans for a specific application environment, e.g., desk assembly or warehouse loading/unloading, a library is built by abstracting features of common human demonstrated tasks. The definition of semantical similarity between features in the library and features of a new task is proposed and further used to construct the reward function in RL. The RL policy can automatically generate motion plans for a new task if it determines that new task constraints can be satisfied with the current library and request additional human demonstrations. Multiple experiments conducted on common tasks and scenarios demonstrate that the proposed user-guided RL-assisted motion planning method is effective. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2205.04264 [pdf, other]

SwinIQA: Learned Swin Distance for Compressed Image Quality Assessment

Authors: Jianzhao Liu, Xin Li, Yanding Peng, Tao Yu, Zhibo Chen

Abstract: Image compression has raised widespread interest recently due to its significant importance for multimedia storage and transmission. Meanwhile, a reliable image quality assessment (IQA) for compressed images can not only help to verify the performance of various compression algorithms but also help to guide the compression optimization in turn. In this paper, we design a full-reference image quali… ▽ More Image compression has raised widespread interest recently due to its significant importance for multimedia storage and transmission. Meanwhile, a reliable image quality assessment (IQA) for compressed images can not only help to verify the performance of various compression algorithms but also help to guide the compression optimization in turn. In this paper, we design a full-reference image quality assessment metric SwinIQA to measure the perceptual quality of compressed images in a learned Swin distance space. It is known that the compression artifacts are usually non-uniformly distributed with diverse distortion types and degrees. To warp the compressed images into the shared representation space while maintaining the complex distortion information, we extract the hierarchical feature representations from each stage of the Swin Transformer. Besides, we utilize cross attention operation to map the extracted feature representations into a learned Swin distance space. Experimental results show that the proposed metric achieves higher consistency with human's perceptual judgment compared with both traditional methods and learning-based methods on CLIC datasets. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: CVPR2022 Workshop (CLIC) accepted

arXiv:2203.04301 [pdf, other]

doi 10.1016/j.media.2023.102866

Live Laparoscopic Video Retrieval with Compressed Uncertainty

Authors: Tong Yu, Pietro Mascagni, Juan Verde, Jacques Marescaux, Didier Mutter, Nicolas Padoy

Abstract: Searching through large volumes of medical data to retrieve relevant information is a challenging yet crucial task for clinical care. However the primitive and most common approach to retrieval, involving text in the form of keywords, is severely limited when dealing with complex media formats. Content-based retrieval offers a way to overcome this limitation, by using rich media as the query itsel… ▽ More Searching through large volumes of medical data to retrieve relevant information is a challenging yet crucial task for clinical care. However the primitive and most common approach to retrieval, involving text in the form of keywords, is severely limited when dealing with complex media formats. Content-based retrieval offers a way to overcome this limitation, by using rich media as the query itself. Surgical video-to-video retrieval in particular is a new and largely unexplored research problem with high clinical value, especially in the real-time case: using real-time video hashing, search can be achieved directly inside of the operating room. Indeed, the process of hashing converts large data entries into compact binary arrays or hashes, enabling large-scale search operations at a very fast rate. However, due to fluctuations over the course of a video, not all bits in a given hash are equally reliable. In this work, we propose a method capable of mitigating this uncertainty while maintaining a light computational footprint. We present superior retrieval results (3-4 % top 10 mean average precision) on a multi-task evaluation protocol for surgery, using cholecystectomy phases, bypass phases, and coming from an entirely new dataset introduced here, critical events across six different surgery types. Success on this multi-task benchmark shows the generalizability of our approach for surgical video retrieval. △ Less

Submitted 12 June, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 16 pages, 13 figures

Journal ref: Medical Image Analysis 88 (2023) 102866

arXiv:2202.06722 [pdf]

doi 10.1049/rpg2.12432

Active and Passive Hybrid Detection Method for Power CPS False Data Injection Attacks with Improved AKF and GRU-CNN

Authors: Zhaoyang Qu, Xiaoyong Bo, Tong Yu, Yaowei Liu, Yunchang Dong, Zhongfeng Kan, Lei Wang, Yang Li

Abstract: Influenced by deep penetration of the new generation of information technology, power systems have gradually evolved into highly coupled cyber-physical systems (CPS). Among many possible power CPS network attacks, a false data injection attacks (FDIAs) is the most serious. Taking account of the fact that the existing knowledge-driven detection process for FDIAs has been in a passive detection stat… ▽ More Influenced by deep penetration of the new generation of information technology, power systems have gradually evolved into highly coupled cyber-physical systems (CPS). Among many possible power CPS network attacks, a false data injection attacks (FDIAs) is the most serious. Taking account of the fact that the existing knowledge-driven detection process for FDIAs has been in a passive detection state for a long time and ignores the advantages of data-driven active capture of features, an active and passive hybrid detection method for power CPS FDIAs with improved adaptive Kalman filter (AKF) and convolutional neural networks (CNN) is proposed in this paper. First, we analyze the shortcomings of the traditional AKF algorithm in terms of filtering divergence and calculation speed. The state estimation algorithm based on non-negative positive-definite adaptive Kalman filter (NDAKF) is improved, and a passive detection method of FDIAs is constructed, with similarity Euclidean distance detection and residual detection at its core. Then, combined with the advantages of gate recurrent unit (GRU) and CNN in terms of temporal memory and feature-expression ability, an active detection method of FDIAs based on a GRU-CNN hybrid neural network is proposed. Finally, the results of joint knowledge-driven and data-driven parallel detection are used to define a mixed fixed-calculation formula, and an active and passive hybrid detection method of FDIAs is established, considering the characteristic constraints of the parallel mode. A simulation system example of power CPS FDIAs verifies the effectiveness and accuracy of the method proposed in this paper. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: Accepted by IET Renewable Power Generation

Journal ref: IET Renewable Power Generation 16 (2022) 1490-1508

arXiv:2202.06548 [pdf, other]

A resource-efficient deep learning framework for low-dose brain PET image reconstruction and analysis

Authors: Yu Fu, Shunjie Dong, Yi Liao, Le Xue, Yuanfan Xu, Feng Li, Qianqian Yang, Tianbai Yu, Mei Tian, Cheng Zhuo

Abstract: 18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both… ▽ More 18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients. Reconstructing the low-dose PET (L-PET) images to the high-quality full-dose PET (F-PET) ones is an effective way that both reduces the radiation exposure and remains diagnostic accuracy. In this paper, we propose a resource-efficient deep learning framework for L-PET reconstruction and analysis, referred to as transGAN-SDAM, to generate F-PET from corresponding L-PET, and quantify the standard uptake value ratios (SUVRs) of these generated F-PET at whole brain. The transGAN-SDAM consists of two modules: a transformer-encoded Generative Adversarial Network (transGAN) and a Spatial Deformable Aggregation Module (SDAM). The transGAN generates higher quality F-PET images, and then the SDAM integrates the spatial information of a sequence of generated F-PET slices to synthesize whole-brain F-PET images. Experimental results demonstrate the superiority and rationality of our approach. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2201.12535 [pdf, ps, other]

doi 10.59275/j.melba.2022-6g33

Validation and Generalizability of Self-Supervised Image Reconstruction Methods for Undersampled MRI

Authors: Thomas Yu, Tom Hilbert, Gian Franco Piredda, Arun Joseph, Gabriele Bonanno, Salim Zenkhri, Patrick Omoumi, Meritxell Bach Cuadra, Erick Jorge Canales-Rodríguez, Tobias Kober, Jean-Philippe Thiran

Abstract: Deep learning methods have become the state of the art for undersampled MR reconstruction. Particularly for cases where it is infeasible or impossible for ground truth, fully sampled data to be acquired, self-supervised machine learning methods for reconstruction are becoming increasingly used. However potential issues in the validation of such methods, as well as their generalizability, remain un… ▽ More Deep learning methods have become the state of the art for undersampled MR reconstruction. Particularly for cases where it is infeasible or impossible for ground truth, fully sampled data to be acquired, self-supervised machine learning methods for reconstruction are becoming increasingly used. However potential issues in the validation of such methods, as well as their generalizability, remain underexplored. In this paper, we investigate important aspects of the validation of self-supervised algorithms for reconstruction of undersampled MR images: quantitative evaluation of prospective reconstructions, potential differences between prospective and retrospective reconstructions, suitability of commonly used quantitative metrics, and generalizability. Two self-supervised algorithms based on self-supervised denoising and the deep image prior were investigated. These methods are compared to a least squares fitting and a compressed sensing reconstruction using in-vivo and phantom data. Their generalizability was tested with prospectively under-sampled data from experimental conditions different to the training. We show that prospective reconstructions can exhibit significant distortion relative to retrospective reconstructions/ground truth. Furthermore, pixel-wise quantitative metrics may not capture differences in perceptual quality accurately, in contrast to a perceptual metric. In addition, all methods showed potential for generalization; however, generalizability is more affected by changes in anatomy/contrast than other changes. We further showed that no-reference image metrics correspond well with human rating of image quality for studying generalizability. Finally, we showed that a well-tuned compressed sensing reconstruction and learned denoising perform similarly on all data. △ Less

Submitted 12 September, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/papers/2022:022.html

Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

arXiv:2201.05501 [pdf, ps, other]

Study of Frequency domain exponential functional link network filters

Authors: T. Yu, S. Tana, R. C. de Lamareb, Y. Yu

Abstract: The exponential functional link network (EFLN) filter has attracted tremendous interest due to its enhanced nonlinear modeling capability. However, the computational complexity will dramatically increase with the dimension growth of the EFLN-based filter. To improve the computational efficiency, we propose a novel frequency domain exponential functional link network (FDEFLN) filter in this paper.… ▽ More The exponential functional link network (EFLN) filter has attracted tremendous interest due to its enhanced nonlinear modeling capability. However, the computational complexity will dramatically increase with the dimension growth of the EFLN-based filter. To improve the computational efficiency, we propose a novel frequency domain exponential functional link network (FDEFLN) filter in this paper. The idea is to organize the samples in blocks of expanded input data, transform them from time domain to frequency domain, and thus execute the filtering and adaptation procedures in frequency domain with the overlap-save method. A FDEFLN-based nonlinear active noise control (NANC) system has also been developed to form the frequency domain exponential filtered-s least mean-square (FDEFsLMS) algorithm. Moreover, the stability, steady-state performance and computational complexity of algorithms are analyzed. Finally, several numerical experiments corroborate the proposed FDEFLN-based algorithms in nonlinear system identification, acoustic echo cancellation and NANC implementations, which demonstrate much better computational efficiency. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: 32 pages, 17 figures

arXiv:2201.02419 [pdf, other]

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Authors: Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

Abstract: Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech… ▽ More Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It comprises philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and analyze them according to their speech type, data source, total size and availability. We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK. △ Less

Submitted 17 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

arXiv:2112.06382 [pdf]

doi 10.1049/rpg2.12428

Dynamic Exploitation Gaussian Bare-Bones Bat Algorithm for Optimal Reactive Power Dispatch to Improve the Safety and Stability of Power System

Authors: Zhaoyang Qu, Yunchang Dong, Sylvère Mugemanyi, Tong Yu, Xiaoyong Bo, Huashun Li, Yang Li, François Xavier Rugema, Christophe Bananeza

Abstract: In this paper, a novel Gaussian bare-bones bat algorithm (GBBBA) and its modified version named as dynamic exploitation Gaussian bare-bones bat algorithm (DeGBBBA) are proposed for solving optimal reactive power dispatch (ORPD) problem. The optimal reactive power dispatch (ORPD) plays a fundamental role in ensuring stable, secure, reliable as well as economical operation of the power system. The O… ▽ More In this paper, a novel Gaussian bare-bones bat algorithm (GBBBA) and its modified version named as dynamic exploitation Gaussian bare-bones bat algorithm (DeGBBBA) are proposed for solving optimal reactive power dispatch (ORPD) problem. The optimal reactive power dispatch (ORPD) plays a fundamental role in ensuring stable, secure, reliable as well as economical operation of the power system. The ORPD problem is formulated as a complex and nonlinear optimization problem of mixed integers including both discrete and continuous control variables. Bat algorithm (BA) is one of the most popular metaheuristic algorithms which mimics the echolocation of the microbats and which has also outperformed some other metaheuristic algorithms in solving various optimization problems. Nevertheless, the standard BA may fail to balance exploration and exploitation for some optimization problems and hence it may often fall into local optima. The proposed GBBBA employs the Gaussian distribution in updating the bat positions in an effort to mitigate the premature convergence problem associated with the standard BA. The GBBBA takes advantages of Gaussian sampling which begins from exploration and continues to exploitation. DeGBBBA is an advanced variant of GBBBA in which a modified Gaussian distribution is introduced so as to allow the dynamic adaptation of exploitation and exploitation in the proposed algorithm. Both GBBBA and DeGBBBA are used to determine the optimal settings of generator bus voltages, tap setting transformers and shunt reactive sources in order to minimize the active power loss, total voltage deviations and voltage stability index. Simulation results show that GBBBA and DeGBBBA are robust and effective in solving the ORPD problem. △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: Accepted by IET Renewable Power Generation

Journal ref: IET Renewable Power Generation 16 (2022) 1401-1424

arXiv:2111.08387 [pdf, other]

S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

Authors: Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

Abstract: In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising is still lacked due to the difficulty of modeling more f… ▽ More In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising is still lacked due to the difficulty of modeling more frequency bands and particularly high frequency components. In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version -- S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. We first employ a cascaded sub-band and full-band processing module, which consists of two small-footprint DCCRNs -- one operates on sub-band signal and one operates on full-band signal, aiming at benefiting from both local and global frequency information. Moreover, instead of simply adopting the STFT feature as input, we use a complex feature encoder trained in an end-to-end manner to refine the information of different frequency bands. We also use a complex feature decoder to revert the feature to time-frequency domain. Finally, a learnable spectrum compression method is adopted to adjust the energy of different frequency bands, which is beneficial for neural network learning. The proposed model, S-DCCRN, has surpassed PercepNet as well as several competitive models and achieves state-of-the-art performance in terms of speech quality and intelligibility. Ablation studies also demonstrate the effectiveness of different contributions. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: Submitted to ICASSP 2022

arXiv:2109.14956 [pdf]

Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark

Authors: Martin Wagner, Beat-Peter Müller-Stich, Anna Kisilenko, Duc Tran, Patrick Heger, Lars Mündermann, David M Lubotsky, Benjamin Müller, Tornike Davitashvili, Manuela Capek, Annika Reinke, Tong Yu, Armine Vardazaryan, Chinedu Innocent Nwoye, Nicolas Padoy, Xinyang Liu, Eung-Joo Lee, Constantin Disch, Hans Meine, Tong Xia, Fucang Jia, Satoshi Kondo, Wolfgang Reiter, Yueming **, Yonghao Long , et al. (16 additional authors not shown)

Abstract: PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo… ▽ More PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2109.03624 [pdf, other]

FaBiAN: A Fetal Brain magnetic resonance Acquisition Numerical phantom

Authors: Hélène Lajous, Christopher W. Roy, Tom Hilbert, Priscille de Dumast, Sébastien Tourbier, Yasser Alemán-Gómez, Jérôme Yerly, Thomas Yu, Hamza Kebiri, Kelly Payette, Jean-Baptiste Ledoux, Reto Meuli, Patric Hagmann, Andras Jakab, Vincent Dunet, Mériam Koob, Tobias Kober, Matthias Stuber, Meritxell Bach Cuadra

Abstract: Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of s… ▽ More Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of sensitive subjects remains scarce, thus hindering the validation of advanced image processing techniques. Numerical phantoms can mitigate these limitations by providing a controlled environment with a known ground truth. In this work, we present FaBiAN, an open-source Fetal Brain magnetic resonance Acquisition Numerical phantom that simulates clinical T2-weighted fast spin echo sequences of the fetal brain. This unique tool is based on a general, flexible and realistic setup that includes stochastic fetal movements, thus providing images of the fetal brain throughout maturation comparable to clinical acquisitions. We demonstrate its value to evaluate the robustness and optimize the accuracy of an algorithm for super-resolution fetal brain magnetic resonance imaging from simulated motion-corrupted 2D low-resolution series as compared to a synthetic high-resolution reference volume. We also show that the images generated can complement clinical datasets to support data-intensive deep learning methods for fetal brain tissue segmentation. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: 23 pages, 9 figures (including Supplementary Material), 4 tables, 1 supplement. Submitted to Scientific Reports (2021)

arXiv:2107.02345 [pdf, other]

Domain Adaptation via CycleGAN for Retina Segmentation in Optical Coherence Tomography

Authors: Ricky Chen, Timothy T. Yu, Gavin Xu, Da Ma, Marinko V. Sarunic, Mirza Faisal Beg

Abstract: With the FDA approval of Artificial Intelligence (AI) for point-of-care clinical diagnoses, model generalizability is of the utmost importance as clinical decision-making must be domain-agnostic. A method of tackling the problem is to increase the dataset to include images from a multitude of domains; while this technique is ideal, the security requirements of medical data is a major limitation. A… ▽ More With the FDA approval of Artificial Intelligence (AI) for point-of-care clinical diagnoses, model generalizability is of the utmost importance as clinical decision-making must be domain-agnostic. A method of tackling the problem is to increase the dataset to include images from a multitude of domains; while this technique is ideal, the security requirements of medical data is a major limitation. Additionally, researchers with developed tools benefit from the addition of open-sourced data, but are limited by the difference in domains. Herewith, we investigated the implementation of a Cycle-Consistent Generative Adversarial Networks (CycleGAN) for the domain adaptation of Optical Coherence Tomography (OCT) volumes. This study was done in collaboration with the Biomedical Optics Research Group and Functional & Anatomical Imaging & Shape Analysis Lab at Simon Fraser University. In this study, we investigated a learning-based approach of adapting the domain of a publicly available dataset, UK Biobank dataset (UKB). To evaluate the performance of domain adaptation, we utilized pre-existing retinal layer segmentation tools developed on a different set of RETOUCH OCT data. This study provides insight on state-of-the-art tools for domain adaptation compared to traditional processing techniques as well as a pipeline for adapting publicly available retinal data to the domains previously used by our collaborators. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 10 pages, 6 figures, 1 table

ACM Class: I.4.0

arXiv:2104.00960 [pdf, other]

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing

Authors: Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang

Abstract: The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distribu… ▽ More The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distributed microphone arrays, which is a non-real-time track and does not have any constraints so that participants could explore any algorithms to obtain high speech quality. Targeting the real video conferencing room application, the challenge database was recorded from real speakers and all recording facilities were located by following the real setup of conferencing room. In this challenge, we open-sourced the list of open source clean speech and noise datasets, simulation scripts, and a baseline system for participants to develop their own system. The final ranking of the challenge will be decided by the subjective evaluation which is performed using Absolute Category Ratings (ACR) to estimate Mean Opinion Score (MOS), speech MOS (S-MOS), and noise MOS (N-MOS). This paper describes the challenge, tasks, datasets, and subjective evaluation. The baseline system which is a complex ratio mask based neural network and its experimental results are also presented. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: 5 pages, submitted to INTERSPEECH 2021

arXiv:2012.06131 [pdf, other]

Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution

Authors: Xin Li, Xin **, Tao Yu, Yingxue Pang, Simeng Sun, Zhizheng Zhang, Zhibo Chen

Abstract: Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image super-resolution (RealSR) problem lies in learning feature repres… ▽ More Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image super-resolution (RealSR) problem lies in learning feature representations that are both informative and content-aware. In this paper, we propose an Omni-frequency Region-adaptive Network (ORNet) to address both challenges, here we call features of all low, middle and high frequencies omni-frequency features. Specifically, we start from the frequency perspective and design a Frequency Decomposition (FD) module to separate different frequency components to comprehensively compensate the information lost for real LR image. Then, considering the different regions of real LR image have different frequency information lost, we further design a Region-adaptive Frequency Aggregation (RFA) module by leveraging dynamic convolution and spatial attention to adaptively restore frequency components for different regions. The extensive experiments endorse the effective, and scenario-agnostic nature of our OR-Net for RealSR. △ Less

Submitted 10 January, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: Accepted by AAAI2021

arXiv:2011.09590 [pdf]

Towards mmWave V2X in 5G and Beyond to Support Automated Driving

Authors: Kei Sakaguchi, Ryuichi Fukatsu, Tao Yu, Eisuke Fukuda, Kim Mahler, Robert Heath, Takeo Fujii, Kazuaki Takahashi, Alexey Khoryaev, Satoshi Nagata, Takayuki Shimizu

Abstract: Millimeter wave provides high data rates for Vehicle-to-Everything (V2X) communications. This paper motivates millimeter wave to support automated driving and begins by explaining V2X use cases that support automated driving with references to several standardi-zation bodies. The paper gives a classification of existing V2X stand-ards: IEEE802.11p and LTE V2X, along with the status of their com-me… ▽ More Millimeter wave provides high data rates for Vehicle-to-Everything (V2X) communications. This paper motivates millimeter wave to support automated driving and begins by explaining V2X use cases that support automated driving with references to several standardi-zation bodies. The paper gives a classification of existing V2X stand-ards: IEEE802.11p and LTE V2X, along with the status of their com-mercial deployment. Then, the paper provides a detailed assessment on how millimeter wave V2X enables the use case of cooperative perception. The explanations provide detailed rate calculations for this use case and show that millimeter wave is the only technology able to achieve the requirements. Furthermore, specific challenges related to millimeter wave for V2X are described, including cover-age enhancement and beam alignment. The paper concludes with some results from three studies, i.e. IEEE802.11ad (WiGig) based V2X, extension of 5G NR (New Radio) toward mmWave V2X, and prototypes of intelligent street with mmWave V2X. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Journal ref: IEICE Trans. Commun., vol.E104-B, no.6, June 2021

arXiv:2010.14841 [pdf, other]

INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices

Authors: Yiwu Yao, Yuchao Li, Chengyu Wang, Tianhang Yu, Houjiang Chen, Xiaotang Jiang, Jun Yang, Jun Huang, Wei Lin, Hui Shu, Chengfei Lv

Abstract: The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this paper, we present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolution to achieve efficient inference acceleration on mobile devices for ASR models. To avoid the information loss due to the combination of quantization… ▽ More The intensive computation of Automatic Speech Recognition (ASR) models obstructs them from being deployed on mobile devices. In this paper, we present a novel quantized Winograd optimization pipeline, which combines the quantization and fast convolution to achieve efficient inference acceleration on mobile devices for ASR models. To avoid the information loss due to the combination of quantization and Winograd convolution, a Range-Scaled Quantization (RSQ) training method is proposed to expand the quantized numerical range and to distill knowledge from high-precision values. Moreover, an improved Conv1D equipped DFSMN (ConvDFSMN) model is designed for mobile deployment. We conduct extensive experiments on both ConvDFSMN and Wav2letter models. Results demonstrate the models can be effectively optimized with the proposed pipeline. Especially, Wav2letter achieves 1.48* speedup with an approximate 0.07% WER decrease on ARMv7-based mobile devices. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:2007.12199 [pdf, other]

doi 10.1007/978-3-030-59713-9_12

T2 Map** from Super-Resolution-Reconstructed Clinical Fast Spin Echo Magnetic Resonance Acquisitions

Authors: Hélène Lajous, Tom Hilbert, Christopher W. Roy, Sébastien Tourbier, Priscille de Dumast, Thomas Yu, Jean-Philippe Thiran, Jean-Baptiste Ledoux, Davide Piccini, Patric Hagmann, Reto Meuli, Tobias Kober, Matthias Stuber, Ruud B. van Heeswijk, Meritxell Bach Cuadra

Abstract: Relaxometry studies in preterm and at-term newborns have provided insight into brain microstructure, thus opening new avenues for studying normal brain development and supporting diagnosis in equivocal neurological situations. However, such quantitative techniques require long acquisition times and therefore cannot be straightforwardly translated to in utero brain developmental studies. In clinica… ▽ More Relaxometry studies in preterm and at-term newborns have provided insight into brain microstructure, thus opening new avenues for studying normal brain development and supporting diagnosis in equivocal neurological situations. However, such quantitative techniques require long acquisition times and therefore cannot be straightforwardly translated to in utero brain developmental studies. In clinical fetal brain magnetic resonance imaging routine, 2D low-resolution T2-weighted fast spin echo sequences are used to minimize the effects of unpredictable fetal motion during acquisition. As super-resolution techniques make it possible to reconstruct a 3D high-resolution volume of the fetal brain from clinical low-resolution images, their combination with quantitative acquisition schemes could provide fast and accurate T2 measurements. In this context, the present work demonstrates the feasibility of using super-resolution reconstruction from conventional T2-weighted fast spin echo sequences for 3D isotropic T2 map**. A quantitative magnetic resonance phantom was imaged using a clinical T2-weighted fast spin echo sequence at variable echo time to allow for super-resolution reconstruction at every echo time and subsequent T2 map** of samples whose relaxometric properties are close to those of fetal brain tissue. We demonstrate that this approach is highly repeatable, accurate and robust when using six echo times (total acquisition time under 9 minutes) as compared to gold-standard single-echo spin echo sequences (several hours for one single 2D slice). △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: 11 pages, 4 figures, 1 table, 1 supplement, to appear in Proceedings in Medical Image Computing and Computer Assisted Intervention (MICCAI), Peru, October 2020

arXiv:2007.11430 [pdf, other]

Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration

Authors: Xin Li, Xin **, Jianxin Lin, Tao Yu, Sen Liu, Yaojun Wu, Wei Zhou, Zhibo Chen

Abstract: Hybrid-distorted image restoration (HD-IR) is dedicated to restore real distorted image that is degraded by multiple distortions. Existing HD-IR approaches usually ignore the inherent interference among hybrid distortions which compromises the restoration performance. To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-c… ▽ More Hybrid-distorted image restoration (HD-IR) is dedicated to restore real distorted image that is degraded by multiple distortions. Existing HD-IR approaches usually ignore the inherent interference among hybrid distortions which compromises the restoration performance. To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions. Specifically, we propose the feature disentanglement module (FDM) to distribute feature representations of different distortions into different channels by revising gain-control-based normalization. We also propose a feature aggregation module (FAM) with channel-wise attention to adaptively filter out the distortion representations and aggregate useful content information from different channels for the construction of raw image. The effectiveness of the proposed scheme is verified by visualizing the correlation matrix of features and channel responses of different distortions. Extensive experimental results also prove superior performance of our approach compared with the latest HD-IR schemes. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: Accepted by ECCV2020

arXiv:2007.10225 [pdf, other]

Model-Informed Machine Learning for Multi-component T2 Relaxometry

Authors: Thomas Yu, Erick Jorge Canales Rodriguez, Marco Pizzolato, Gian Franco Piredda, Tom Hilbert, Elda Fischi-Gomez, Matthias Weigel, Muhamed Barakovic, Meritxell Bach-Cuadra, Cristina Granziera, Tobias Kober, Jean-Philippe Thiran

Abstract: Recovering the T2 distribution from multi-echo T2 magnetic resonance (MR) signals is challenging but has high potential as it provides biomarkers characterizing the tissue micro-structure, such as the myelin water fraction (MWF). In this work, we propose to combine machine learning and aspects of parametric (fitting from the MRI signal using biophysical models) and non-parametric (model-free fitti… ▽ More Recovering the T2 distribution from multi-echo T2 magnetic resonance (MR) signals is challenging but has high potential as it provides biomarkers characterizing the tissue micro-structure, such as the myelin water fraction (MWF). In this work, we propose to combine machine learning and aspects of parametric (fitting from the MRI signal using biophysical models) and non-parametric (model-free fitting of the T2 distribution from the signal) approaches to T2 relaxometry in brain tissue by using a multi-layer perceptron (MLP) for the distribution reconstruction. For training our network, we construct an extensive synthetic dataset derived from biophysical models in order to constrain the outputs with \textit{a priori} knowledge of \textit{in vivo} distributions. The proposed approach, called Model-Informed Machine Learning (MIML), takes as input the MR signal and directly outputs the associated T2 distribution. We evaluate MIML in comparison to non-parametric and parametric approaches on synthetic data, an ex vivo scan, and high-resolution scans of healthy subjects and a subject with Multiple Sclerosis. In synthetic data, MIML provides more accurate and noise-robust distributions. In real data, MWF maps derived from MIML exhibit the greatest conformity to anatomical scans, have the highest correlation to a histological map of myelin volume, and the best unambiguous lesion visualization and localization, with superior contrast between lesions and normal appearing tissue. In whole-brain analysis, MIML is 22 to 4980 times faster than non-parametric and parametric methods, respectively. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: Preprint submitted to Medical Image Analysis (July 14, 2020)

arXiv:2007.05405 [pdf, other]

doi 10.1007/978-3-030-59716-0_35

Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets

Authors: Chinedu Innocent Nwoye, Cristians Gonzalez, Tong Yu, Pietro Mascagni, Didier Mutter, Jacques Marescaux, Nicolas Padoy

Abstract: Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80… ▽ More Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes. Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called Class Activation Guide (CAG), which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D Interaction Space, which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Comments: 13 pages, 4 figures, 6 tables. Accepted and to be published in MICCAI 2020

Journal ref: Medical Image Computing and Computer Assisted Intervention MICCAI 12263 (2020) 364-374

arXiv:2007.04140 [pdf, other]

Mastering the working sequence in human-robot collaborative assembly based on reinforcement learning

Authors: Tian Yu, **g Huang, Qing Chang

Abstract: A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine lear… ▽ More A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine learning algorithms developed by Google Deep Mind like Alphago Zero, in this paper, the human-robot collaborative assembly working process is formatted into a chessboard and the selection of moves in the chessboard is used to analogize the decision making by both human and robot in the HRC assembly working process. To obtain the optimal policy of working sequence to maximize the working efficiency, the robot is trained with a self-play algorithm based on reinforcement learning, without guidance or domain knowledge beyond game rules. A neural network is also trained to predict the distribution of the priority of move selections and whether a working sequence is the one resulting in the maximum of the HRC efficiency. An adjustable desk assembly is used to demonstrate the proposed HRC assembly algorithm and its efficiency. △ Less

Submitted 8 July, 2020; originally announced July 2020.

Comments: 11 pages, 6 figures

arXiv:2007.03053 [pdf, other]

Benefiting from Bicubically Down-Sampled Images for Learning Real-World Image Super-Resolution

Authors: Mohammad Saeed Rad, Thomas Yu, Claudiu Musat, Hazim Kemal Ekenel, Behzad Bozorgtabar, Jean-Philippe Thiran

Abstract: Super-resolution (SR) has traditionally been based on pairs of high-resolution images (HR) and their low-resolution (LR) counterparts obtained artificially with bicubic downsampling. However, in real-world SR, there is a large variety of realistic image degradations and analytically modeling these realistic degradations can prove quite difficult. In this work, we propose to handle real-world SR by… ▽ More Super-resolution (SR) has traditionally been based on pairs of high-resolution images (HR) and their low-resolution (LR) counterparts obtained artificially with bicubic downsampling. However, in real-world SR, there is a large variety of realistic image degradations and analytically modeling these realistic degradations can prove quite difficult. In this work, we propose to handle real-world SR by splitting this ill-posed problem into two comparatively more well-posed steps. First, we train a network to transform real LR images to the space of bicubically downsampled images in a supervised manner, by using both real LR/HR pairs and synthetic pairs. Second, we take a generic SR network trained on bicubically downsampled images to super-resolve the transformed LR image. The first step of the pipeline addresses the problem by registering the large variety of degraded images to a common, well understood space of images. The second step then leverages the already impressive performance of SR on bicubically downsampled images, sidestep** the issues of end-to-end training on datasets with many different image degradations. We demonstrate the effectiveness of our proposed method by comparing it to recent methods in real-world SR and show that our proposed approach outperforms the state-of-the-art works in terms of both qualitative and quantitative results, as well as results of an extensive user study conducted on several real image datasets. △ Less

Submitted 5 November, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: WACV 2021

arXiv:2006.09201 [pdf, other]

A Hybrid Deep Learning Model for Predictive Flood Warning and Situation Awareness using Channel Network Sensors Data

Authors: Shangjia Dong, Tianbo Yu, Hamed Farahmand, Ali Mostafavi

Abstract: The objective of this study is to create and test a hybrid deep learning model, FastGRNN-FCN (Fast, Accurate, Stable and Tiny Gated Recurrent Neural Network-Fully Convolutional Network), for urban flood prediction and situation awareness using channel network sensors data. The study used Harris County, Texas as the testbed, and obtained channel sensor data from three historical flood events (e.g.,… ▽ More The objective of this study is to create and test a hybrid deep learning model, FastGRNN-FCN (Fast, Accurate, Stable and Tiny Gated Recurrent Neural Network-Fully Convolutional Network), for urban flood prediction and situation awareness using channel network sensors data. The study used Harris County, Texas as the testbed, and obtained channel sensor data from three historical flood events (e.g., 2016 Tax Day Flood, 2016 Memorial Day flood, and 2017 Hurricane Harvey Flood) for training and validating the hybrid deep learning model. The flood data are divided into a multivariate time series and used as the model input. Each input comprises nine variables, including information of the studied channel sensor and its predecessor and successor sensors in the channel network. Precision-recall curve and F-measure are used to identify the optimal set of model parameters. The optimal model with a weight of 1 and a critical threshold of 0.59 are obtained through one hundred iterations based on examining different weights and thresholds. The test accuracy and F-measure eventually reach 97.8% and 0.792, respectively. The model is then tested in predicting the 2019 Imelda flood in Houston and the results show an excellent match with the empirical flood. The results show that the model enables accurate prediction of the spatial-temporal flood propagation and recession and provides emergency response officials with a predictive flood warning tool for prioritizing the flood response and resource allocation strategies. △ Less

Submitted 8 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Showing 1–50 of 59 results for author: Yu, T