-
Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images
Authors:
Songhan Jiang,
Zhengyu Gan,
Linghan Cai,
Yifeng Wang,
Yongbing Zhang
Abstract:
Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu…
▽ More
Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tumor microenvironment (TME). (2) Existing multimodal methods often rely on alignment strategies to integrate complementary information, which may lead to information loss due to the inherent heterogeneity between pathology and genes. In this paper, we propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks. Specifically, to capture TME-related features in WSIs, we leverage the subtype classification task to mine tumor regions. Simultaneously, multi-head attention mechanisms are applied in genomic feature extraction, adaptively performing genes grou** to obtain task-related genomic embedding. With the joint representation of pathological images and genomic data, we further introduce a Transport-Guided Attention (TGA) module that uses optimal transport theory to model the correlation between subtype classification and survival analysis tasks, effectively transferring potential information. Extensive experiments demonstrate the superiority of our approaches, with MCTI outperforming state-of-the-art frameworks on three public benchmarks. \href{https://github.com/jsh0792/MCTI}{https://github.com/jsh0792/MCTI}.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Leveraging Symmetries in Gaits for Reinforcement Learning: A Case Study on Quadrupedal Gaits
Authors:
Jiayu Ding,
Xulin Chen,
Garret E. Katz,
Zhenyu Gan
Abstract:
In this research, we address the complex task of develo** versatile and agile quadrupedal gaits for robotic platforms, a domain predominantly governed by model-based trajectory optimization methods. We propose an innovative, reference-free reinforcement learning framework that exploits the intrinsic symmetries of dynamic systems to synthesize a broad array of naturalistic quadrupedal locomotion…
▽ More
In this research, we address the complex task of develo** versatile and agile quadrupedal gaits for robotic platforms, a domain predominantly governed by model-based trajectory optimization methods. We propose an innovative, reference-free reinforcement learning framework that exploits the intrinsic symmetries of dynamic systems to synthesize a broad array of naturalistic quadrupedal locomotion patterns. By capitalizing on distinct symmetry characteristics - namely temporal, morphological, and time-reversal - our approach efficiently facilitates the generation and transition among diverse gaits such as pronking, bounding half-bounding and gallo**, across a spectrum of velocities, circumventing the necessity for expert-generated trajectories or complex reward structures. Implemented on the Petoi Bittle robotic model, our methodology illustrates robust and adaptable gait generation capabilities, significantly broadening the scope for robotic mobility and speed adaptability. This contribution not only advances our comprehension of quadrupedal locomotion mechanisms but also underscores the pivotal role of symmetry in the development of scalable and effective robotic gait strategies. Our findings hold substantial implications for robotic design and control, potentially enhancing operational versatility and efficiency across a variety of deployment environments.
△ Less
Submitted 14 June, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
A Dataset for Deep Learning-based Bone Structure Analyses in Total Hip Arthroplasty
Authors:
Kaidong Zhang,
Ziyang Gan,
Dong Liu,
Xifu Shang
Abstract:
Total hip arthroplasty (THA) is a widely used surgical procedure in orthopedics. For THA, it is of clinical significance to analyze the bone structure from the CT images, especially to observe the structure of the acetabulum and femoral head, before the surgical procedure. For such bone structure analyses, deep learning technologies are promising but require high-quality labeled data for the learn…
▽ More
Total hip arthroplasty (THA) is a widely used surgical procedure in orthopedics. For THA, it is of clinical significance to analyze the bone structure from the CT images, especially to observe the structure of the acetabulum and femoral head, before the surgical procedure. For such bone structure analyses, deep learning technologies are promising but require high-quality labeled data for the learning, while the data labeling is costly. We address this issue and propose an efficient data annotation pipeline for producing a deep learning-oriented dataset. Our pipeline consists of non-learning-based bone extraction (BE) and acetabulum and femoral head segmentation (AFS) and active-learning-based annotation refinement (AAR). For BE we use the classic graph-cut algorithm. For AFS we propose an improved algorithm, including femoral head boundary localization using first-order and second-order gradient regularization, line-based non-maximum suppression, and anatomy prior-based femoral head extraction. For AAR, we refine the algorithm-produced pseudo labels with the help of trained deep models: we measure the uncertainty based on the disagreement between the original pseudo labels and the deep model predictions, and then find out the samples with the largest uncertainty to ask for manual labeling. Using the proposed pipeline, we construct a large-scale bone structure analyses dataset from more than 300 clinical and diverse CT scans. We perform careful manual labeling for the test set of our data. We then benchmark multiple state-of-the art deep learning-based methods of medical image segmentation using the training and test sets of our data. The extensive experimental results validate the efficacy of the proposed data annotation pipeline. The dataset, related codes and models will be publicly available at https://github.com/hitachinsk/THA.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Authors:
Yuhang Ling,
Yuxi Li,
Zhenye Gan,
Jiangning Zhang,
Mingmin Chi,
Yabiao Wang
Abstract:
Audio-Visual Segmentation (AVS) is a challenging task, which aims to segment sounding objects in video frames by exploring audio signals. Generally AVS faces two key challenges: (1) Audio signals inherently exhibit a high degree of information density, as sounds produced by multiple objects are entangled within the same audio stream; (2) Objects of the same category tend to produce similar audio s…
▽ More
Audio-Visual Segmentation (AVS) is a challenging task, which aims to segment sounding objects in video frames by exploring audio signals. Generally AVS faces two key challenges: (1) Audio signals inherently exhibit a high degree of information density, as sounds produced by multiple objects are entangled within the same audio stream; (2) Objects of the same category tend to produce similar audio signals, making it difficult to distinguish between them and thus leading to unclear segmentation results. Toward this end, we propose TransAVS, the first Transformer-based end-to-end framework for AVS task. Specifically, TransAVS disentangles the audio stream as audio queries, which will interact with images and decode into segmentation masks with full transformer architectures. This scheme not only promotes comprehensive audio-image communication but also explicitly excavates instance cues encapsulated in the scene. Meanwhile, to encourage these audio queries to capture distinctive sounding objects instead of degrading to be homogeneous, we devise two self-supervised loss functions at both query and mask levels, allowing the model to capture distinctive features within similar audio data and achieve more precise segmentation. Our experiments demonstrate that TransAVS achieves state-of-the-art results on the AVSBench dataset, highlighting its effectiveness in bridging the gap between audio and visual modalities.
△ Less
Submitted 26 December, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Breaking Symmetries Leads to Diverse Quadrupedal Gaits
Authors:
Jiayu Ding,
Zhenyu Gan
Abstract:
Symmetry manifests itself in legged locomotion in a variety of ways. No matter where a legged system begins to move periodically, the torso and limbs coordinate with each other's movements in a similar manner. Also, in many gaits observed in nature, the legs on both sides of the torso move in exactly the same way, sometimes they are just half a period out of phase. Furthermore, when some animals m…
▽ More
Symmetry manifests itself in legged locomotion in a variety of ways. No matter where a legged system begins to move periodically, the torso and limbs coordinate with each other's movements in a similar manner. Also, in many gaits observed in nature, the legs on both sides of the torso move in exactly the same way, sometimes they are just half a period out of phase. Furthermore, when some animals move forward and backward, their movements are strikingly similar as if the time had been reversed. This work aims to generalize these phenomena and propose formal definitions of symmetries in legged locomotion using group theory terminology. Symmetries in some common quadrupedal gaits such as pronking, bounding, half-bounding, and gallo** have been discussed. Moreover, a spring-mass model has been used to demonstrate how breaking symmetries can alter gaits in a legged system. Studying the symmetries may provide insight into which gaits may be suitable for a particular robotic design, or may enable roboticists to design more agile and efficient robot controllers by using certain gaits.
△ Less
Submitted 8 April, 2024; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning
Authors:
Siyang Yuan,
Pengyu Cheng,
Ruiyi Zhang,
Weituo Hao,
Zhe Gan,
Lawrence Carin
Abstract:
Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and pre-known speakers. However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a ch…
▽ More
Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and pre-known speakers. However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a challenging problem. We propose a novel zero-shot voice transfer method via disentangled representation learning. The proposed method first encodes speaker-related style and voice content of each input voice into separated low-dimensional embedding spaces, and then transfers to a new voice by combining the source content embedding and target style embedding through a decoder. With information-theoretic guidance, the style and content embedding spaces are representative and (ideally) independent of each other. On real-world VCTK datasets, our method outperforms other baselines and obtains state-of-the-art results in terms of transfer accuracy and voice naturalness for voice style transfer experiments under both many-to-many and zero-shot setups.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020
Authors:
Ke Lin,
Zhuoxin Gan,
Liwei Wang
Abstract:
This report describes our model for VATEX Captioning Challenge 2020. First, to gather information from multiple domains, we extract motion, appearance, semantic and audio features. Then we design a feature attention module to attend on different feature when decoding. We apply two types of decoders, top-down and X-LAN and ensemble these models to get the final result. The proposed method outperfor…
▽ More
This report describes our model for VATEX Captioning Challenge 2020. First, to gather information from multiple domains, we extract motion, appearance, semantic and audio features. Then we design a feature attention module to attend on different feature when decoding. We apply two types of decoders, top-down and X-LAN and ensemble these models to get the final result. The proposed method outperforms official baseline with a significant gap. We achieve 76.0 CIDEr and 50.0 CIDEr on English and Chinese private test set. We rank 2nd on both English and Chinese private test leaderboard.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Synthesis of model predictive control based on data-driven learning
Authors:
Yuanqiang Zhou,
Dewei Li,
Yugeng Xi,
Zhongxue Gan
Abstract:
For the application of MPC design in on-line regulation or tracking control problems, several studies have attempted to develop an accurate model, and realize adequate uncertainty description of linear or non-linear plants of the processes. In this study, we employ the data-driven learning technique to iteratively approximate the dynamical parameters, without requiring a priori knowledge of system…
▽ More
For the application of MPC design in on-line regulation or tracking control problems, several studies have attempted to develop an accurate model, and realize adequate uncertainty description of linear or non-linear plants of the processes. In this study, we employ the data-driven learning technique to iteratively approximate the dynamical parameters, without requiring a priori knowledge of system matrices. The proposed MPC approach can predict and optimize the future behaviors using multiorder derivatives of control input as decision variables. Because the proposed algorithm can obtain a linear system model at each sampling, it can adapt to the actual dynamics of time-varying or nonlinear plants. This methodology can serve as a data-driven identification tool to study adaptive optimal control problems for unknown complex systems.
△ Less
Submitted 29 March, 2019;
originally announced April 2019.