Search | arXiv e-print repository

Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones

Authors: Qiangguo **, Jiapeng Huang, Changming Sun, Hui Cui, ** Xuan, Ran Su, Leyi Wei, Yu-Jie Wu, Chia-An Wu, Henry B. L. Duh, Yueh-Hsun Lu

Abstract: The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that… ▽ More The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that leverages low-dose abdominal X-ray imaging combined with location information for the fine-grained diagnosis of urinary stones. LEPD-Net enhances the representation of stone-related features through context-aware region enhancement, incorporates critical location knowledge via stone location embedding, and achieves recognition of fine-grained objects with our innovative fine-grained pairwise distance learning. Additionally, we have established an in-house dataset on urinary tract stones to demonstrate the effectiveness of our proposed approach. Comprehensive experiments conducted on this dataset reveal that our framework significantly surpasses existing state-of-the-art methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Journal ref: MICCAI 2024

arXiv:2406.14964 [pdf, other]

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

Authors: Zixuan Chen, Ruijie Su, Jiahao Zhu, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

Abstract: Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bi… ▽ More Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bias brings inconsistent updating direction, resulting in implausible 3D generation e.g., color deviation, Janus problem, and semantically inconsistent details). In this work, we propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. Specifically, PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps (1-3). Compared to SDS, PCDS can acquire a more accurate updating direction with the same sampling time (1 sampling step), while enabling few-step (2-3) sampling to trade compute for higher generation quality. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details. Extensive experiments demonstrate that our approach outperforms the state-of-the-art in generation quality and training efficiency, conspicuously alleviating the implausible 3D generation issues caused by the deviated updating direction. Moreover, it can be simply applied to many 3D generative applications to yield impressive 3D assets, please see our project page: https://narcissusex.github.io/VividDreamer. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.00341 [pdf, other]

DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation

Authors: Qihang Xie, Mengguo Guo, Lei Mou, Dan Zhang, Da Chen, Caifeng Shan, Yitian Zhao, Ruisheng Su, Jiong Zhang

Abstract: Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main tru… ▽ More Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main trunks and branches are crucial for physicians to accurately quantify diseases. However, achieving accurate CA segmentation in DSA sequences remains a challenging task due to small vessels with low contrast, and ambiguity between vessels and residual skull structures. Moreover, the lack of publicly available datasets limits exploration in the field. In this paper, we introduce a DSA Sequence-based Cerebral Artery segmentation dataset (DSCA), the first publicly accessible dataset designed specifically for pixel-level semantic segmentation of CAs. Additionally, we propose DSANet, a spatio-temporal network for CA segmentation in DSA sequences. Unlike existing DSA segmentation methods that focus only on a single frame, the proposed DSANet introduces a separate temporal encoding branch to capture dynamic vessel details across multiple frames. To enhance small vessel segmentation and improve vessel connectivity, we design a novel TemporalFormer module to capture global context and correlations among sequential frames. Furthermore, we develop a Spatio-Temporal Fusion (STF) module to effectively integrate spatial and temporal features from the encoder. Extensive experiments demonstrate that DSANet outperforms other state-of-the-art methods in CA segmentation, achieving a Dice of 0.9033. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.09744 [pdf, other]

Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts

Authors: Ruolin Su, Biing-Hwang Juang

Abstract: Task-oriented dialogue systems are broadly used in virtual assistants and other automated services, providing interfaces between users and machines to facilitate specific tasks. Nowadays, task-oriented dialogue systems have greatly benefited from pre-trained language models (PLMs). However, their task-solving performance is constrained by the inherent capacities of PLMs, and scaling these models i… ▽ More Task-oriented dialogue systems are broadly used in virtual assistants and other automated services, providing interfaces between users and machines to facilitate specific tasks. Nowadays, task-oriented dialogue systems have greatly benefited from pre-trained language models (PLMs). However, their task-solving performance is constrained by the inherent capacities of PLMs, and scaling these models is expensive and complex as the model size becomes larger. To address these challenges, we propose Soft Mixture-of-Expert Task-Oriented Dialogue system (SMETOD) which leverages an ensemble of Mixture-of-Experts (MoEs) to excel at subproblems and generate specialized outputs for task-oriented dialogues. SMETOD also scales up a task-oriented dialogue system with simplicity and flexibility while maintaining inference efficiency. We extensively evaluate our model on three benchmark functionalities: intent prediction, dialogue state tracking, and dialogue response generation. Experimental results demonstrate that SMETOD achieves state-of-the-art performance on most evaluated metrics. Moreover, comparisons against existing strong baselines show that SMETOD has a great advantage in the cost of inference and correctness in problem-solving. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.08935 [pdf, other]

Function based sim-to-real learning for shape control of deformable free-form surfaces

Authors: Yingjun Tian, Guoxin Fang, Renbo Su, Weiming Wang, Simeon Gill, Andrew Weightman, Charlie C. L. Wang

Abstract: For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the map** between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic map** is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtaine… ▽ More For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the map** between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic map** is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtained from simulators are always different from the physically deformed shapes due to the errors introduced by hardware and the simplification adopted in physical simulation. To fill the gap, we propose a novel deformation function based sim-to-real learning method that can map the geometric shape of a simulated model into its corresponding shape of the physical model. Unlike the existing sim-to-real learning methods that rely on completely acquired dense markers, our method accommodates sparsely distributed markers and can resiliently use all captured frames -- even for those in the presence of missing markers. To demonstrate its effectiveness, our sim-to-real method has been integrated into a neural network-based computational pipeline designed to tackle the inverse kinematic problem on a pneumatically actuated deformable mannequin. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.05017 [pdf, other]

6G Software Engineering: A Systematic Map** Study

Authors: Ruoyu Su, Xiaozhou Li, Davide Taibi

Abstract: 6G will revolutionize the software world allowing faster cellular communications and a massive number of connected devices. 6G will enable a shift towards a continuous edge-to-cloud architecture. Current cloud solutions, where all the data is transferred and computed in the cloud, are not sustainable in such a large network of devices. Current technologies, including development methods, software… ▽ More 6G will revolutionize the software world allowing faster cellular communications and a massive number of connected devices. 6G will enable a shift towards a continuous edge-to-cloud architecture. Current cloud solutions, where all the data is transferred and computed in the cloud, are not sustainable in such a large network of devices. Current technologies, including development methods, software architectures, and orchestration and offloading systems, still need to be prepared to cope with such requirements. In this paper, we conduct a Systematic Map** Study to investigate the current research status of 6G Software Engineering. Results show that 18 research papers have been proposed in software process, software architecture, orchestration and offloading methods. Of these, software architecture and software-defined networks are respectively areas and topics that have received the most attention in 6G Software Engineering. In addition, the main types of results of these papers are methods, architectures, platforms, frameworks and algorithms. For the five tools/frameworks proposed, they are new and not currently studied by other researchers. The authors of these findings are mainly from China, India and Saudi Arabia. The results will enable researchers and practitioners to further research and extend for 6G Software Engineering. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.11119 [pdf, other]

DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

Authors: Kangning Zhang, Yingjie Qin, Ruilong Su, Yifan Liu, Jiarui **, Weinan Zhang, Yong Yu

Abstract: Multimodal Recommendation focuses mainly on how to effectively integrate behavior and multimodal information in the recommendation task. Previous works suffer from two major issues. Firstly, the training process tightly couples the behavior module and multimodal module by jointly optimizing them using the sharing model parameters, which leads to suboptimal performance since behavior signals and mo… ▽ More Multimodal Recommendation focuses mainly on how to effectively integrate behavior and multimodal information in the recommendation task. Previous works suffer from two major issues. Firstly, the training process tightly couples the behavior module and multimodal module by jointly optimizing them using the sharing model parameters, which leads to suboptimal performance since behavior signals and modality signals often provide opposite guidance for the parameters updates. Secondly, previous approaches fail to take into account the significant distribution differences between behavior and modality when they attempt to fuse behavior and modality information. This resulted in a misalignment between the representations of behavior and modality. To address these challenges, in this paper, we propose a novel Dual Representation learning framework for Multimodal Recommendation called DRepMRec, which introduce separate dual lines for coupling problem and Behavior-Modal Alignment (BMA) for misalignment problem. Specifically, DRepMRec leverages two independent lines of representation learning to calculate behavior and modal representations. After obtaining separate behavior and modal representations, we design a Behavior-Modal Alignment Module (BMA) to align and fuse the dual representations to solve the misalignment problem. Furthermore, we integrate the BMA into other recommendation models, resulting in consistent performance improvements. To ensure dual representations maintain their semantic independence during alignment, we introduce Similarity-Supervised Signal (SSS) for representation learning. We conduct extensive experiments on three public datasets and our method achieves state-of-the-art (SOTA) results. The source code will be available upon acceptance. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 8 pages, 9 figures

arXiv:2403.12767 [pdf, other]

doi 10.1016/j.eswa.2023.122093

Inter- and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation

Authors: Qiangguo **, Hui Cui, Changming Sun, Yang Song, Jiangbin Zheng, Leilei Cao, Leyi Wei, Ran Su

Abstract: Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertaint… ▽ More Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertainty (inter-uncertainty) have not been fully utilized by existing methods. To address these issues, we first propose a novel inter- and intra-uncertainty regularization method to measure and constrain both inter- and intra-inconsistencies in the teacher-student architecture. We also propose a new two-stage network with pseudo-mask guided feature aggregation (PG-FANet) as the segmentation model. The two-stage structure complements with the uncertainty regularization strategy to avoid introducing extra modules in solving uncertainties and the aggregation mechanisms enable multi-scale and multi-stage feature integration. Comprehensive experimental results over the MoNuSeg and CRAG datasets show that our PG-FANet outperforms other state-of-the-art methods and our semi-supervised learning framework yields competitive performance with a limited amount of labeled data. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Journal ref: Expert Systems with Applications, 2024, 238: 122093

arXiv:2403.12384 [pdf, other]

An Aligning and Training Framework for Multimodal Recommendations

Authors: Yifan Liu, Kangning Zhang, Xiangyuan Ren, Yanhua Huang, Jiarui **, Yingjie Qin, Ruilong Su, Ruiwen Xu, Weinan Zhang

Abstract: With the development of multimedia applications, multimodal recommendations play an essential role, as they can leverage rich contexts beyond user and item interactions. Existing methods mainly use them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features. Directly using multimodal information as an auxiliary would lead to misalignment in… ▽ More With the development of multimedia applications, multimodal recommendations play an essential role, as they can leverage rich contexts beyond user and item interactions. Existing methods mainly use them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features. Directly using multimodal information as an auxiliary would lead to misalignment in items' and users' representations. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a distinct objective function. To effectively train AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together. As it is essential to analyze whether each multimodal feature helps in training, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by our framework are better than currently used ones, which are to be open-sourced. △ Less

Submitted 21 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 11 pages, revise some typos, correct some explanations

arXiv:2403.05820 [pdf, other]

An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

Authors: Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang

Abstract: Acoustic-to-articulatory inversion (AAI) is to convert audio into articulator movements, such as ultrasound tongue imaging (UTI) data. An issue of existing AAI methods is only using the personalized acoustic information to derive the general patterns of tongue motions, and thus the quality of generated UTI data is limited. To address this issue, this paper proposes an audio-textual diffusion model… ▽ More Acoustic-to-articulatory inversion (AAI) is to convert audio into articulator movements, such as ultrasound tongue imaging (UTI) data. An issue of existing AAI methods is only using the personalized acoustic information to derive the general patterns of tongue motions, and thus the quality of generated UTI data is limited. To address this issue, this paper proposes an audio-textual diffusion model for the UTI data generation task. In this model, the inherent acoustic characteristics of individuals related to the tongue motion details are encoded by using wav2vec 2.0, while the ASR transcriptions related to the universality of tongue motions are encoded by using BERT. UTI data are then generated by using a diffusion module. Experimental results showed that the proposed diffusion model could generate high-quality UTI data with clear tongue contour that is crucial for the linguistic analysis and clinical assessment. The project can be found on the website\footnote{https://yangyudong2020.github.io/wav2uti/ △ Less

Submitted 12 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: ICASSP2024 Accept

arXiv:2403.05753 [pdf, other]

UDCR: Unsupervised Aortic DSA/CTA Rigid Registration Using Deep Reinforcement Learning and Overlap Degree Calculation

Authors: Wentao Liu, Bowen Liang, Wei** Xu, Tong Tian, Qingsheng Lu, Xipeng Pan, Haoyuan Li, Siyu Tian, Huihua Yang, Ruisheng Su

Abstract: The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual… ▽ More The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual annotations or synthetic data, as well as the extraction of landmarks, which is not suitable for cross-modal registration of aortic DSA/CTA. In this paper, we propose an unsupervised method, UDCR, for aortic DSA/CTA rigid registration based on deep reinforcement learning. Leveraging the imaging principles and characteristics of DSA and CTA, we have constructed a cross-dimensional registration environment based on spatial transformations. Specifically, we propose an overlap degree calculation reward function that measures the intensity difference between the foreground and background, aimed at assessing the accuracy of registration between segmentation maps and DSA images. This method is highly flexible, allowing for the loading of pre-trained models to perform registration directly or to seek the optimal spatial transformation parameters through online learning. We manually annotated 61 pairs of aortic DSA/CTA for algorithm evaluation. The results indicate that the proposed UDCR achieved a Mean Absolute Error (MAE) of 2.85 mm in translation and 4.35° in rotation, showing significant potential for clinical applications. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05748 [pdf, other]

Image-Guided Autonomous Guidewire Navigation in Robot-Assisted Endovascular Interventions using Reinforcement Learning

Authors: Wentao Liu, Tong Tian, Wei** Xu, Bowen Liang, Qingsheng Lu, Xipeng Pan, Wenyi Zhao, Huihua Yang, Ruisheng Su

Abstract: Autonomous robots in endovascular interventions possess the potential to navigate guidewires with safety and reliability, while reducing human error and shortening surgical time. However, current methods of guidewire navigation based on Reinforcement Learning (RL) depend on manual demonstration data or magnetic guidance. In this work, we propose an Image-guided Autonomous Guidewire Navigation (IAG… ▽ More Autonomous robots in endovascular interventions possess the potential to navigate guidewires with safety and reliability, while reducing human error and shortening surgical time. However, current methods of guidewire navigation based on Reinforcement Learning (RL) depend on manual demonstration data or magnetic guidance. In this work, we propose an Image-guided Autonomous Guidewire Navigation (IAGN) method. Specifically, we introduce BDA-star, a path planning algorithm with boundary distance constraints, for the trajectory planning of guidewire navigation. We established an IAGN-RL environment where the observations are real-time guidewire feeding images highlighting the position of the guidewire tip and the planned path. We proposed a reward function based on the distances from both the guidewire tip to the planned path and the target to evaluate the agent's actions. Furthermore, in policy network, we employ a pre-trained convolutional neural network to extract features, mitigating stability issues and slow convergence rates associated with direct learning from raw pixels. Experiments conducted on the aortic simulation IAGN platform demonstrated that the proposed method, targeting the left subclavian artery and the brachiocephalic artery, achieved a 100% guidewire navigation success rate, along with reduced movement and retraction distances and trajectories tend to the center of the vessels. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.03448 [pdf, ps, other]

doi 10.1016/j.patcog.2024.110307

Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering

Authors: Rina Su, Yu Guo, Caiying Wu, Qiyu **, Tieyong Zeng

Abstract: The main objective of the Multiple Kernel k-Means (MKKM) algorithm is to extract non-linear information and achieve optimal clustering by optimizing base kernel matrices. Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities. Nevertheless, relying solely on a single metric, such as correla… ▽ More The main objective of the Multiple Kernel k-Means (MKKM) algorithm is to extract non-linear information and achieve optimal clustering by optimizing base kernel matrices. Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities. Nevertheless, relying solely on a single metric, such as correlation or dissimilarity, to define kernel relationships introduces bias and incomplete characterization. Consequently, this limitation hinders efficient information extraction, ultimately compromising clustering performance. To tackle this challenge, we introduce a novel method that systematically integrates both kernel correlation and dissimilarity. Our approach comprehensively captures kernel relationships, facilitating more efficient classification information extraction and improving clustering performance. By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information and significantly improving clustering precision, supported by theoretical rationale. We assess the performance of our algorithm on 13 challenging benchmark datasets, demonstrating its superiority over contemporary state-of-the-art MKKM techniques. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 36 pages. This paper was accepted by Pattern Recognition on January 31, 2024

Journal ref: Pattern Recognition, 2024, 150:110307

arXiv:2401.11867 [pdf, other]

Modular Monolith: Is This the Trend in Software Architecture?

Authors: Ruoyu Su, Xiaozhou Li

Abstract: Recently modular monolith architecture has attracted the attention of practitioners, as Google proposed "Service Weaver" framework to enable developers to write applications as modular monolithic and deploy them as a set of microservices. Google considered it as a framework that has the best of both worlds and it seems to be a trend in software architecture. This paper aims to understand the defin… ▽ More Recently modular monolith architecture has attracted the attention of practitioners, as Google proposed "Service Weaver" framework to enable developers to write applications as modular monolithic and deploy them as a set of microservices. Google considered it as a framework that has the best of both worlds and it seems to be a trend in software architecture. This paper aims to understand the definition of the modular monolith in industry and investigate frameworks and cases building modular monolith architecture. We conducted a systematic grey literature review, and the results show that modular monolith combines the advantages of monoliths with microservices. We found three frameworks and four cases of building modular monolith architecture. In general, the modular monolith is an alternative way to microservices, and it also could be a previous step before systems migrate to microservices. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.07041 [pdf, other]

An automated framework for brain vessel centerline extraction from CTA images

Authors: Sijie Liu, Ruisheng Su, Jianghang Su, **gmin Xin, Jiayi Wu, Wim van Zwam, Pieter Jan van Doormaal, Aad van der Lugt, Wiro J. Niessen, Nanning Zheng, Theo van Walsum

Abstract: Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional an… ▽ More Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional annotation effort by physicians and more effective use of the generated lumen segmentation for improved centerline extraction performance. We propose an automated framework for brain vessel centerline extraction from CTA images. The framework consists of four major components: (1) pre-processing approaches that register CTA images with a CT atlas and divide these images into input patches, (2) lumen segmentation generation from annotated vessel centerlines using graph cuts and robust kernel regression, (3) a dual-branch topology-aware UNet (DTUNet) that can effectively utilize the annotated vessel centerlines and the generated lumen segmentation through a topology-aware loss (TAL) and its dual-branch design, and (4) post-processing approaches that skeletonize the predicted lumen segmentation. Extensive experiments on a multi-center dataset demonstrate that the proposed framework outperforms state-of-the-art methods in terms of average symmetric centerline distance (ASCD) and overlap (OV). Subgroup analyses further suggest that the proposed framework holds promise in clinical applications for stroke treatment. Code is publicly available at https://github.com/Liusj-gh/DTUNet. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.04570 [pdf, other]

An Automatic Cascaded Model for Hemorrhagic Stroke Segmentation and Hemorrhagic Volume Estimation

Authors: Wei** Xu, Zhuang Sha, Huihua Yang, Rongcai Jiang, Zhanying Li, Wentao Liu, Ruisheng Su

Abstract: Hemorrhagic Stroke (HS) has a rapid onset and is a serious condition that poses a great health threat. Promptly and accurately delineating the bleeding region and estimating the volume of bleeding in Computer Tomography (CT) images can assist clinicians in treatment planning, leading to improved treatment outcomes for patients. In this paper, a cascaded 3D model is constructed based on UNet to per… ▽ More Hemorrhagic Stroke (HS) has a rapid onset and is a serious condition that poses a great health threat. Promptly and accurately delineating the bleeding region and estimating the volume of bleeding in Computer Tomography (CT) images can assist clinicians in treatment planning, leading to improved treatment outcomes for patients. In this paper, a cascaded 3D model is constructed based on UNet to perform a two-stage segmentation of the hemorrhage area in CT images from rough to fine, and the hemorrhage volume is automatically calculated from the segmented area. On a dataset with 341 cases of hemorrhagic stroke CT scans, the proposed model provides high-quality segmentation outcome with higher accuracy (DSC 85.66%) and better computation efficiency (6.2 second per sample) when compared to the traditional Tada formula with respect to hemorrhage volume estimation. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by SWITCH2023: Stroke Workshop on Imaging and Treatment CHallenges, a workshop at MICCAI 2023

arXiv:2311.06345 [pdf, other]

Schema Graph-Guided Prompt for Multi-Domain Dialogue State Tracking

Authors: Ruolin Su, Ting-Wei Wu, Biing-Hwang Juang

Abstract: Tracking dialogue states is an essential topic in task-oriented dialogue systems, which involve filling in the necessary information in pre-defined slots corresponding to a schema. While general pre-trained language models have been shown effective in slot-filling, their performance is limited when applied to specific domains. We propose a graph-based framework that learns domain-specific prompts… ▽ More Tracking dialogue states is an essential topic in task-oriented dialogue systems, which involve filling in the necessary information in pre-defined slots corresponding to a schema. While general pre-trained language models have been shown effective in slot-filling, their performance is limited when applied to specific domains. We propose a graph-based framework that learns domain-specific prompts by incorporating the dialogue schema. Specifically, we embed domain-specific schema encoded by a graph neural network into the pre-trained language model, which allows for relations in the schema to guide the model for better adaptation to the specific domain. Our experiments demonstrate that the proposed graph-based method outperforms other multi-domain DST approaches while using similar or fewer trainable parameters. We also conduct a comprehensive study of schema graph architectures, parameter usage, and module ablation that demonstrate the effectiveness of our model on multi-domain dialogue state tracking. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2310.05445 [pdf, other]

doi 10.1007/978-3-031-43990-2_72

AngioMoCo: Learning-based Motion Correction in Cerebral Digital Subtraction Angiography

Authors: Ruisheng Su, Matthijs van der Sluijs, Sandra Cornelissen, Wim van Zwam, Aad van der Lugt, Wiro Niessen, Danny Ruijters, Theo van Walsum, Adrian Dalca

Abstract: Cerebral X-ray digital subtraction angiography (DSA) is the standard imaging technique for visualizing blood flow and guiding endovascular treatments. The quality of DSA is often negatively impacted by body motion during acquisition, leading to decreased diagnostic value. Time-consuming iterative methods address motion correction based on non-rigid registration, and employ sparse key points and no… ▽ More Cerebral X-ray digital subtraction angiography (DSA) is the standard imaging technique for visualizing blood flow and guiding endovascular treatments. The quality of DSA is often negatively impacted by body motion during acquisition, leading to decreased diagnostic value. Time-consuming iterative methods address motion correction based on non-rigid registration, and employ sparse key points and non-rigidity penalties to limit vessel distortion. Recent methods alleviate subtraction artifacts by predicting the subtracted frame from the corresponding unsubtracted frame, but do not explicitly compensate for motion-induced misalignment between frames. This hinders the serial evaluation of blood flow, and often causes undesired vasculature and contrast flow alterations, leading to impeded usability in clinical practice. To address these limitations, we present AngioMoCo, a learning-based framework that generates motion-compensated DSA sequences from X-ray angiography. AngioMoCo integrates contrast extraction and motion correction, enabling differentiation between patient motion and intensity changes caused by contrast flow. This strategy improves registration quality while being substantially faster than iterative elastix-based methods. We demonstrate AngioMoCo on a large national multi-center dataset (MR CLEAN Registry) of clinically acquired angiographic images through comprehensive qualitative and quantitative analyses. AngioMoCo produces high-quality motion-compensated DSA, removing motion artifacts while preserving contrast flow. Code is publicly available at https://github.com/RuishengSu/AngioMoCo. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2308.15281 [pdf, ps, other]

Back to the Future: From Microservice to Monolith

Authors: Ruoyu Su, Xiaozhou Li, Davide Taibi

Abstract: Recently the trend of companies switching from microservice back to monolith has increased, leading to intense debate in the industry. We conduct a multivocal literature review, to investigate reasons for the phenomenon and key aspects to pay attention to during the switching back and analyze the opinions of other practitioners. The results pave the way for further research and provide guidance fo… ▽ More Recently the trend of companies switching from microservice back to monolith has increased, leading to intense debate in the industry. We conduct a multivocal literature review, to investigate reasons for the phenomenon and key aspects to pay attention to during the switching back and analyze the opinions of other practitioners. The results pave the way for further research and provide guidance for industrial companies switching from microservice back to monolith. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2307.12519 [pdf, other]

DEPHN: Different Expression Parallel Heterogeneous Network using virtual gradient optimization for Multi-task Learning

Authors: Menglin Kong, Ri Su, Shaojie Zhao, Muzhou Hou

Abstract: Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.… ▽ More Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation. However, The relationship between real-world tasks is often more complex than existing methods do not handle properly sharing information. In this paper, we propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously. DEPHN constructs the experts at the bottom of the model by using different feature interaction methods to improve the generalization ability of the shared information flow. In view of the model's differentiating ability for different task information flows, DEPHN uses feature explicit map** and virtual gradient coefficient for expert gating during the training process, and adaptively adjusts the learning intensity of the gated unit by considering the difference of gating values and task correlation. Extensive experiments on artificial and real-world datasets demonstrate that our proposed method can capture task correlation in complex situations and achieve better performance than baseline models\footnote{Accepted in IJCNN2023}. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.12518 [pdf, other]

FaFCNN: A General Disease Classification Framework Based on Feature Fusion Neural Networks

Authors: Menglin Kong, Shaojie Zhao, Juan Cheng, Xingquan Li, Ri Su, Muzhou Hou, Cong Cao

Abstract: There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-… ▽ More There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), which introduces a feature-aware interaction module and a feature alignment module based on domain adversarial learning. This is a general framework for disease classification, and FaFCNN improves the way existing methods obtain sample correlation features. The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model\footnote{Accepted in IEEE SMC2023}. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.02935 [pdf, other]

DisAsymNet: Disentanglement of Asymmetrical Abnormality on Bilateral Mammograms using Self-adversarial Learning

Authors: Xin Wang, Tao Tan, Yuan Gao, Luyi Han, Tianyu Zhang, Chunyao Lu, Regina Beets-Tan, Ruisheng Su, Ritse Mann

Abstract: Asymmetry is a crucial characteristic of bilateral mammograms (Bi-MG) when abnormalities are develo**. It is widely utilized by radiologists for diagnosis. The question of 'what the symmetrical Bi-MG would look like when the asymmetrical abnormalities have been removed ?' has not yet received strong attention in the development of algorithms on mammograms. Addressing this question could provide… ▽ More Asymmetry is a crucial characteristic of bilateral mammograms (Bi-MG) when abnormalities are develo**. It is widely utilized by radiologists for diagnosis. The question of 'what the symmetrical Bi-MG would look like when the asymmetrical abnormalities have been removed ?' has not yet received strong attention in the development of algorithms on mammograms. Addressing this question could provide valuable insights into mammographic anatomy and aid in diagnostic interpretation. Hence, we propose a novel framework, DisAsymNet, which utilizes asymmetrical abnormality transformer guided self-adversarial learning for disentangling abnormalities and symmetric Bi-MG. At the same time, our proposed method is partially guided by randomly synthesized abnormalities. We conduct experiments on three public and one in-house dataset, and demonstrate that our method outperforms existing methods in abnormality classification, segmentation, and localization tasks. Additionally, reconstructed normal mammograms can provide insights toward better interpretable visual cues for clinical diagnosis. The code will be accessible to the public. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.12153 [pdf, other]

DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences

Authors: Wentao Liu, Tong Tian, Lemeng Wang, Wei** Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su

Abstract: The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to… ▽ More The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to the inherent limitation of single-frame DSA, which only partially displays vascular contrast, thereby hindering accurate vascular structure representation. In this work, we introduce DIAS, a dataset specifically developed for IA segmentation in DSA sequences. We establish a comprehensive benchmark for evaluating DIAS, covering full, weak, and semi-supervised segmentation methods. Specifically, we propose the vessel sequence segmentation network, in which the sequence feature extraction module effectively captures spatiotemporal representations of intravascular contrast, achieving intracranial artery segmentation in 2D+Time DSA sequences. For weakly-supervised IA segmentation, we propose a novel scribble learning-based image segmentation framework, which, under the guidance of scribble labels, employs cross pseudo-supervision and consistency regularization to improve the performance of the segmentation network. Furthermore, we introduce the random patch-based self-training framework, aimed at alleviating the performance constraints encountered in IA segmentation due to the limited availability of annotated DSA data. Our extensive experiments on the DIAS dataset demonstrate the effectiveness of these methods as potential baselines for future research and clinical applications. The dataset and code are publicly available at https://doi.org/10.5281/zenodo.11396520 and https://github.com/lseventeen/DIAS. △ Less

Submitted 13 June, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2305.12058 [pdf, other]

DADIN: Domain Adversarial Deep Interest Network for Cross Domain Recommender Systems

Authors: Menglin Kong, Muzhou Hou, Shaojie Zhao, Feng Liu, Ri Su, Yinghao Chen

Abstract: Click-Through Rate (CTR) prediction is one of the main tasks of the recommendation system, which is conducted by a user for different items to give the recommendation results. Cross-domain CTR prediction models have been proposed to overcome problems of data sparsity, long tail distribution of user-item interactions, and cold start of items or users. In order to make knowledge transfer from source… ▽ More Click-Through Rate (CTR) prediction is one of the main tasks of the recommendation system, which is conducted by a user for different items to give the recommendation results. Cross-domain CTR prediction models have been proposed to overcome problems of data sparsity, long tail distribution of user-item interactions, and cold start of items or users. In order to make knowledge transfer from source domain to target domain more smoothly, an innovative deep learning cross-domain CTR prediction model, Domain Adversarial Deep Interest Network (DADIN) is proposed to convert the cross-domain recommendation task into a domain adaptation problem. The joint distribution alignment of two domains is innovatively realized by introducing domain agnostic layers and specially designed loss, and optimized together with CTR prediction loss in a way of adversarial training. It is found that the Area Under Curve (AUC) of DADIN is 0.08% higher than the most competitive baseline on Huawei dataset and is 0.71% higher than its competitors on Amazon dataset, achieving the state-of-the-art results on the basis of the evaluation of this model performance on two real datasets. The ablation study shows that by introducing adversarial method, this model has respectively led to the AUC improvements of 2.34% on Huawei dataset and 16.67% on Amazon dataset. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.02948 [pdf, other]

FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead

Authors: Kang Chen, Tao Han, Junchao Gong, Lei Bai, Fenghua Ling, **g-Jia Luo, Xi Chen, Leiming Ma, Tianning Zhang, Rui Su, Yuanzheng Ci, Bin Li, Xiaokang Yang, Wanli Ouyang

Abstract: We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Trans… ▽ More We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25° latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80\% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: 12 pages

arXiv:2303.01091 [pdf, other]

OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution

Authors: Gaochao Song, Luo Zhang, Ran Su, Jianfeng Shi, Ying He, Qian Sun

Abstract: Implicit neural representation (INR) is a popular approach for arbitrary-scale image super-resolution (SR), as a key component of INR, position encoding improves its representation ability. Motivated by position encoding, we propose orthogonal position encoding (OPE) - an extension of position encoding - and an OPE-Upscale module to replace the INR-based upsampling module for arbitrary-scale image… ▽ More Implicit neural representation (INR) is a popular approach for arbitrary-scale image super-resolution (SR), as a key component of INR, position encoding improves its representation ability. Motivated by position encoding, we propose orthogonal position encoding (OPE) - an extension of position encoding - and an OPE-Upscale module to replace the INR-based upsampling module for arbitrary-scale image super-resolution. Same as INR, our OPE-Upscale Module takes 2D coordinates and latent code as inputs; however it does not require training parameters. This parameter-free feature allows the OPE-Upscale Module to directly perform linear combination operations to reconstruct an image in a continuous manner, achieving an arbitrary-scale image reconstruction. As a concise SR framework, our method has high computing efficiency and consumes less memory comparing to the state-of-the-art (SOTA), which has been confirmed by extensive experiments and evaluations. In addition, our method has comparable results with SOTA in arbitrary scale image super-resolution. Last but not the least, we show that OPE corresponds to a set of orthogonal basis, justifying our design principle. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023. 11 pages

arXiv:2302.13201 [pdf, other]

CLICKER: Attention-Based Cross-Lingual Commonsense Knowledge Transfer

Authors: Ruolin Su, Zhongkai Sun, Sixing Lu, Chengyuan Ma, Chenlei Guo

Abstract: Recent advances in cross-lingual commonsense reasoning (CSR) are facilitated by the development of multilingual pre-trained models (mPTMs). While mPTMs show the potential to encode commonsense knowledge for different languages, transferring commonsense knowledge learned in large-scale English corpus to other languages is challenging. To address this problem, we propose the attention-based Cross-LI… ▽ More Recent advances in cross-lingual commonsense reasoning (CSR) are facilitated by the development of multilingual pre-trained models (mPTMs). While mPTMs show the potential to encode commonsense knowledge for different languages, transferring commonsense knowledge learned in large-scale English corpus to other languages is challenging. To address this problem, we propose the attention-based Cross-LIngual Commonsense Knowledge transfER (CLICKER) framework, which minimizes the performance gaps between English and non-English languages in commonsense question-answering tasks. CLICKER effectively improves commonsense reasoning for non-English languages by differentiating non-commonsense knowledge from commonsense knowledge. Experimental results on public benchmarks demonstrate that CLICKER achieves remarkable improvements in the cross-lingual CSR task for languages other than English. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.13013 [pdf, other]

Choice Fusion as Knowledge for Zero-Shot Dialogue State Tracking

Authors: Ruolin Su, **gfeng Yang, Ting-Wei Wu, Biing-Hwang Juang

Abstract: With the demanding need for deploying dialogue systems in new domains with less cost, zero-shot dialogue state tracking (DST), which tracks user's requirements in task-oriented dialogues without training on desired domains, draws attention increasingly. Although prior works have leveraged question-answering (QA) data to reduce the need for in-domain training in DST, they fail to explicitly model k… ▽ More With the demanding need for deploying dialogue systems in new domains with less cost, zero-shot dialogue state tracking (DST), which tracks user's requirements in task-oriented dialogues without training on desired domains, draws attention increasingly. Although prior works have leveraged question-answering (QA) data to reduce the need for in-domain training in DST, they fail to explicitly model knowledge transfer and fusion for tracking dialogue states. To address this issue, we propose CoFunDST, which is trained on domain-agnostic QA datasets and directly uses candidate choices of slot-values as knowledge for zero-shot dialogue-state generation, based on a T5 pre-trained language model. Specifically, CoFunDST selects highly-relevant choices to the reference context and fuses them to initialize the decoder to constrain the model outputs. Our experimental results show that our proposed model achieves outperformed joint goal accuracy compared to existing zero-shot DST approaches in most domains on the MultiWOZ 2.1. Extensive analyses demonstrate the effectiveness of our proposed approach for improving zero-shot DST learning from QA. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.04576 [pdf]

doi 10.5121/ijwest.2023.14101

Research on data integration of overseas discrete archives from the perspective of digital humanties

Authors: Rina Su, 2. Yumeng Li, Xin Yang, Xin Yin, Tao Chen

Abstract: The digitization of displaced archives is of great historical and cultural significance. Through the construction of digital humanistic platforms represented by MISS platform, and the comprehensive application of IIIF technology, knowledge graph technology, ontology technology, and other popular information technologies. We can find that the digital framework of displaced archives built through th… ▽ More The digitization of displaced archives is of great historical and cultural significance. Through the construction of digital humanistic platforms represented by MISS platform, and the comprehensive application of IIIF technology, knowledge graph technology, ontology technology, and other popular information technologies. We can find that the digital framework of displaced archives built through the MISS platform can promote the establishment of a standardized cooperation and dialogue mechanism between the archives authoritiess and other government departments. At the same time, it can embed the works o fichives ction of digital government and the economy, promote the exploration of the integration of archives management, data management, and information resource management, and ultimately promote the construction of a digital society. By fostering a new partnership between archives departments and enterprises, think tanks, research institutes, and industry associations, the role of multiple social subjects in the modernization process of the archives governance system and governance capacity will be brought into play. The National Archives Administration has launched a special operation to recover scattered archives overseas, drawing up a list and a recovery action plan for archives lost to overseas institutions and individuals due to war and other reasons. Through the National Archives Administration, the State Administration of Cultural Heritage, the Ministry of Foreign Affairs, the Supreme People's Court, the Supreme People's Procuratorate, and the Ministry of Justice, specific recovery work is carried out by studying and working on international laws. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Journal ref: International Journal of Web&Semantic Technology,2023,Vol14,Num1

arXiv:2212.01575 [pdf]

Multi-view deep learning based molecule design and structural optimization accelerates the SARS-CoV-2 inhibitor discovery

Authors: Chao Pang, Yu Wang, Yi Jiang, Ruheng Wang, Ran Su, Leyi Wei

Abstract: In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and a… ▽ More In this work, we propose MEDICO, a Multi-viEw Deep generative model for molecule generation, structural optimization, and the SARS-CoV-2 Inhibitor disCOvery. To the best of our knowledge, MEDICO is the first-of-this-kind graph generative model that can generate molecular graphs similar to the structure of targeted molecules, with a multi-view representation learning framework to sufficiently and adaptively learn comprehensive structural semantics from targeted molecular topology and geometry. We show that our MEDICO significantly outperforms the state-of-the-art methods in generating valid, unique, and novel molecules under benchmarking comparisons. In particular, we showcase the multi-view deep learning model enables us to generate not only the molecules structurally similar to the targeted molecules but also the molecules with desired chemical properties, demonstrating the strong capability of our model in exploring the chemical space deeply. Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that by integrating molecule docking into our model as chemical priori, we successfully generate new small molecules with desired drug-like properties for the Mpro, potentially accelerating the de novo design of Covid-19 drugs. Further, we apply MEDICO to the structural optimization of three well-known Mpro inhibitors (N3, 11a, and GC376) and achieve ~88% improvement in their binding affinity to Mpro, demonstrating the application value of our model for the development of therapeutics for SARS-CoV-2 infection. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2211.11324 [pdf, other]

doi 10.1109/TCSVT.2022.3201540

Slow Motion Matters: A Slow Motion Enhanced Network for Weakly Supervised Temporal Action Localization

Authors: Weiqi Sun, Rui Su, Qian Yu, Dong Xu

Abstract: Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e.g. video-level labels). Most existing models handle all input videos with a fixed temporal scale. However, such models are not sensitive to actions whose pace of the movements is different from the ``normal" speed, especially slow-motion action instances, whi… ▽ More Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e.g. video-level labels). Most existing models handle all input videos with a fixed temporal scale. However, such models are not sensitive to actions whose pace of the movements is different from the ``normal" speed, especially slow-motion action instances, which complete the movements with a much slower speed than their counterparts with a normal speed. Here arises the slow-motion blurred issue: It is hard to explore salient slow-motion information from videos at ``normal" speed. In this paper, we propose a novel framework termed Slow Motion Enhanced Network (SMEN) to improve the ability of a WTAL network by compensating its sensitivity on slow-motion action segments. The proposed SMEN comprises a Mining module and a Localization module. The mining module generates mask to mine slow-motion-related features by utilizing the relationships between the normal motion and slow motion; while the localization module leverages the mined slow-motion features as complementary information to improve the temporal action localization results. Our proposed framework can be easily adapted by existing WTAL networks and enable them be more sensitive to slow-motion actions. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of our proposed framework. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2022

arXiv:2211.09375 [pdf, other]

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation

Authors: Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang

Abstract: Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness. Besides, inevitable variations of different datasets make these methods become particularly sensitive to hyper-parameter values and manifest poor generalization capability. In this paper, we address the aforementioned challenges by proposing a novel que… ▽ More Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness. Besides, inevitable variations of different datasets make these methods become particularly sensitive to hyper-parameter values and manifest poor generalization capability. In this paper, we address the aforementioned challenges by proposing a novel query-based method, termed as 3D-QueryIS, which is detector-free, semantic segmentation-free, and cluster-free. Specifically, we propose to generate representative points in an implicit manner, and use them together with the initial queries to generate the informative instance queries. Then, the class and binary instance mask predictions can be produced by simply applying MLP layers on top of the instance queries and the extracted point cloud embeddings. Thus, our 3D-QueryIS is free from the accumulated errors caused by the inter-task dependencies. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and efficiency of our proposed 3D-QueryIS method. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.12385 [pdf, other]

Deep Learning in Single-Cell Analysis

Authors: Dylan Molho, Jiayuan Ding, Zhaoheng Li, Hongzhi Wen, Wenzhuo Tang, Yixin Wang, Julian Venegas, Wei **, Renming Liu, Runze Su, Patrick Danaher, Robert Yang, Yu Leo Lei, Yuying Xie, Jiliang Tang

Abstract: Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performan… ▽ More Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations. △ Less

Submitted 5 November, 2022; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysis

arXiv:2210.05258 [pdf, other]

doi 10.1016/j.eswa.2022.117643

EOCSA: Predicting Prognosis of Epithelial Ovarian Cancer with Whole Slide Histopathological Images

Authors: Tianling Liu, Ran Su, Changming Sun, Xiuting Li, Leyi Wei

Abstract: Ovarian cancer is one of the most serious cancers that threaten women around the world. Epithelial ovarian cancer (EOC), as the most commonly seen subtype of ovarian cancer, has rather high mortality rate and poor prognosis among various gynecological cancers. Survival analysis outcome is able to provide treatment advices to doctors. In recent years, with the development of medical imaging technol… ▽ More Ovarian cancer is one of the most serious cancers that threaten women around the world. Epithelial ovarian cancer (EOC), as the most commonly seen subtype of ovarian cancer, has rather high mortality rate and poor prognosis among various gynecological cancers. Survival analysis outcome is able to provide treatment advices to doctors. In recent years, with the development of medical imaging technology, survival prediction approaches based on pathological images have been proposed. In this study, we designed a deep framework named EOCSA which analyzes the prognosis of EOC patients based on pathological whole slide images (WSIs). Specifically, we first randomly extracted patches from WSIs and grouped them into multiple clusters. Next, we developed a survival prediction model, named DeepConvAttentionSurv (DCAS), which was able to extract patch-level features, removed less discriminative clusters and predicted the EOC survival precisely. Particularly, channel attention, spatial attention, and neuron attention mechanisms were used to improve the performance of feature extraction. Then patient-level features were generated from our weight calculation method and the survival time was finally estimated using LASSO-Cox model. The proposed EOCSA is efficient and effective in predicting prognosis of EOC and the DCAS ensures more informative and discriminative features can be extracted. As far as we know, our work is the first to analyze the survival of EOC based on WSIs and deep neural network technologies. The experimental results demonstrate that our proposed framework has achieved state-of-the-art performance of 0.980 C-index. The implementation of the approach can be found at https://github.com/RanSuLab/EOCprognosis. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Published in Expert Systems with Applications 2022

arXiv:2210.01799 [pdf, other]

STGIN: A Spatial Temporal Graph-Informer Network for Long Sequence Traffic Speed Forecasting

Authors: Ruikang Luo, Yaofeng Song, Li** Huang, Yicheng Zhang, Rong Su

Abstract: Accurate long series forecasting of traffic information is critical for the development of intelligent traffic systems. We may benefit from the rapid growth of neural network analysis technology to better understand the underlying functioning patterns of traffic networks as a result of this progress. Due to the fact that traffic data and facility utilization circumstances are sequentially dependen… ▽ More Accurate long series forecasting of traffic information is critical for the development of intelligent traffic systems. We may benefit from the rapid growth of neural network analysis technology to better understand the underlying functioning patterns of traffic networks as a result of this progress. Due to the fact that traffic data and facility utilization circumstances are sequentially dependent on past and present situations, several related neural network techniques based on temporal dependency extraction models have been developed to solve the problem. The complicated topological road structure, on the other hand, amplifies the effect of spatial interdependence, which cannot be captured by pure temporal extraction approaches. Additionally, the typical Deep Recurrent Neural Network (RNN) topology has a constraint on global information extraction, which is required for comprehensive long-term prediction. This study proposes a new spatial-temporal neural network architecture, called Spatial-Temporal Graph-Informer (STGIN), to handle the long-term traffic parameters forecasting issue by merging the Informer and Graph Attention Network (GAT) layers for spatial and temporal relationships extraction. The attention mechanism potentially guarantees long-term prediction performance without significant information loss from distant inputs. On two real-world traffic datasets with varying horizons, experimental findings validate the long sequence prediction abilities, and further interpretation is provided. △ Less

Submitted 1 October, 2022; originally announced October 2022.

Comments: 12 pages, 18 figures and 2 tables

arXiv:2210.00674 [pdf]

doi 10.3389/fendo.2023.1261088

Multi-view information fusion using multi-view variational autoencoders to predict proximal femoral strength

Authors: Chen Zhao, Joyce H Keyak, Xuewei Cao, Qiuying Sha, Li Wu, Zhe Luo, Lanjuan Zhao, Qing Tian, Chuan Qiu, Ray Su, Hui Shen, Hong-Wen Deng, Weihua Zhou

Abstract: The aim of this paper is to design a deep learning-based model to predict proximal femoral strength using multi-view information fusion. Method: We developed new models using multi-view variational autoencoder (MVAE) for feature representation learning and a product of expert (PoE) model for multi-view information fusion. We applied the proposed models to an in-house Louisiana Osteoporosis Study (… ▽ More The aim of this paper is to design a deep learning-based model to predict proximal femoral strength using multi-view information fusion. Method: We developed new models using multi-view variational autoencoder (MVAE) for feature representation learning and a product of expert (PoE) model for multi-view information fusion. We applied the proposed models to an in-house Louisiana Osteoporosis Study (LOS) cohort with 931 male subjects, including 345 African Americans and 586 Caucasians. With an analytical solution of the product of Gaussian distribution, we adopted variational inference to train the designed MVAE-PoE model to perform common latent feature extraction. We performed genome-wide association studies (GWAS) to select 256 genetic variants with the lowest p-values for each proximal femoral strength and integrated whole genome sequence (WGS) features and DXA-derived imaging features to predict proximal femoral strength. Results: The best prediction model for fall fracture load was acquired by integrating WGS features and DXA-derived imaging features. The designed models achieved the mean absolute percentage error of 18.04%, 6.84% and 7.95% for predicting proximal femoral fracture loads using linear models of fall loading, nonlinear models of fall loading, and nonlinear models of stance loading, respectively. Compared to existing multi-view information fusion methods, the proposed MVAE-PoE achieved the best performance. Conclusion: The proposed models are capable of predicting proximal femoral strength using WGS features and DXA-derived imaging features. Though this tool is not a substitute for FEA using QCT images, it would make improved assessment of hip fracture risk more widely available while avoiding the increased radiation dosage and clinical costs from QCT. △ Less

Submitted 27 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: 16 pages, 3 figures

arXiv:2209.13500 [pdf, other]

Dense-TNT: Efficient Vehicle Type Classification Neural Network Using Satellite Imagery

Authors: Ruikang Luo, Yaofeng Song, Han Zhao, Yicheng Zhang, Yi Zhang, Nanbin Zhao, Li** Huang, Rong Su

Abstract: Accurate vehicle type classification serves a significant role in the intelligent transportation system. It is critical for ruler to understand the road conditions and usually contributive for the traffic light control system to response correspondingly to alleviate traffic congestion. New technologies and comprehensive data sources, such as aerial photos and remote sensing data, provide richer an… ▽ More Accurate vehicle type classification serves a significant role in the intelligent transportation system. It is critical for ruler to understand the road conditions and usually contributive for the traffic light control system to response correspondingly to alleviate traffic congestion. New technologies and comprehensive data sources, such as aerial photos and remote sensing data, provide richer and high-dimensional information. Also, due to the rapid development of deep neural network technology, image based vehicle classification methods can better extract underlying objective features when processing data. Recently, several deep learning models have been proposed to solve the problem. However, traditional pure convolutional based approaches have constraints on global information extraction, and the complex environment, such as bad weather, seriously limits the recognition capability. To improve the vehicle type classification capability under complex environment, this study proposes a novel Densely Connected Convolutional Transformer in Transformer Neural Network (Dense-TNT) framework for the vehicle type classification by stacking Densely Connected Convolutional Network (DenseNet) and Transformer in Transformer (TNT) layers. Three-region vehicle data and four different weather conditions are deployed for recognition capability evaluation. Experimental findings validate the recognition ability of our proposed vehicle classification model with little decay, even under the heavy foggy weather condition. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: 10 pages, 8 figures, 5 tables

arXiv:2209.11318 [pdf, other]

OpenPneu: Compact platform for pneumatic actuation with multi-channels

Authors: Yingjun Tian, Renbo Su, Xilong Wang, Nur Banu Altin, Guoxin Fang, Charlie C. L. Wang

Abstract: This paper presents a compact system, OpenPneu, to support the pneumatic actuation for multi-chambers on soft robots. Micro-pumps are employed in the system to generate airflow and therefore no extra input as compressed air is needed. Our system conducts modular design to provide good scalability, which has been demonstrated on a prototype with ten air channels. Each air channel of OpenPneu is equ… ▽ More This paper presents a compact system, OpenPneu, to support the pneumatic actuation for multi-chambers on soft robots. Micro-pumps are employed in the system to generate airflow and therefore no extra input as compressed air is needed. Our system conducts modular design to provide good scalability, which has been demonstrated on a prototype with ten air channels. Each air channel of OpenPneu is equipped with both the inflation and the deflation functions to provide a full range pressure supply from positive to negative with a maximal flow rate at 1.7 L/min. High precision closed-loop control of pressures has been built into our system to achieve stable and efficient dynamic performance in actuation. An open-source control interface and API in Python are provided. We also demonstrate the functionality of OpenPneu on three soft robotic systems with up to 10 chambers. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.03356 [pdf, other]

AST-GIN: Attribute-Augmented Spatial-Temporal Graph Informer Network for Electric Vehicle Charging Station Availability Forecasting

Authors: Ruikang Luo, Yaofeng Song, Li** Huang, Yicheng Zhang, Rong Su

Abstract: Electric Vehicle (EV) charging demand and charging station availability forecasting is one of the challenges in the intelligent transportation system. With the accurate EV station situation prediction, suitable charging behaviors could be scheduled in advance to relieve range anxiety. Many existing deep learning methods are proposed to address this issue, however, due to the complex road network s… ▽ More Electric Vehicle (EV) charging demand and charging station availability forecasting is one of the challenges in the intelligent transportation system. With the accurate EV station situation prediction, suitable charging behaviors could be scheduled in advance to relieve range anxiety. Many existing deep learning methods are proposed to address this issue, however, due to the complex road network structure and comprehensive external factors, such as point of interests (POIs) and weather effects, many commonly used algorithms could just extract the historical usage information without considering comprehensive influence of external factors. To enhance the prediction accuracy and interpretability, the Attribute-Augmented Spatial-Temporal Graph Informer (AST-GIN) structure is proposed in this study by combining the Graph Convolutional Network (GCN) layer and the Informer layer to extract both external and internal spatial-temporal dependence of relevant transportation data. And the external factors are modeled as dynamic attributes by the attribute-augmented encoder for training. AST-GIN model is tested on the data collected in Dundee City and experimental results show the effectiveness of our model considering external factors influence over various horizon settings compared with other baselines. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 10 pages; 17 figures; Under review for IEEE Transaction on Vehicular Technology

arXiv:2208.07167 [pdf, other]

Where is VALDO? VAscular Lesions Detection and segmentatiOn challenge at MICCAI 2021

Authors: Carole H. Sudre, Kimberlin Van Wijnen, Florian Dubost, Hieab Adams, David Atkinson, Frederik Barkhof, Mahlet A. Birhanu, Esther E. Bron, Robin Camarasa, Nish Chaturvedi, Yuan Chen, Zihao Chen, Shuai Chen, Qi Dou, Tavia Evans, Ivan Ezhov, Haojun Gao, Marta Girones Sanguesa, Juan Domingo Gispert, Beatriz Gomez Anson, Alun D. Hughes, M. Arfan Ikram, Silvia Ingala, H. Rolf Jaeger, Florian Kofler , et al. (24 additional authors not shown)

Abstract: Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular… ▽ More Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular Lesions DetectiOn and Segmentation} (\textit{Where is VALDO?}) challenge that was run as a satellite event at the international conference on Medical Image Computing and Computer Aided Intervention (MICCAI) 2021. This challenge aimed to promote the development of methods for automated detection and segmentation of small and sparse imaging markers of cerebral small vessel disease, namely enlarged perivascular spaces (EPVS) (Task 1), cerebral microbleeds (Task 2) and lacunes of presumed vascular origin (Task 3) while leveraging weak and noisy labels. Overall, 12 teams participated in the challenge proposing solutions for one or more tasks (4 for Task 1 - EPVS, 9 for Task 2 - Microbleeds and 6 for Task 3 - Lacunes). Multi-cohort data was used in both training and evaluation. Results showed a large variability in performance both across teams and across tasks, with promising results notably for Task 1 - EPVS and Task 2 - Microbleeds and not practically useful results yet for Task 3 - Lacunes. It also highlighted the performance inconsistency across cases that may deter use at an individual level, while still proving useful at a population level. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2208.02462 [pdf, other]

doi 10.21437/Interspeech.2021-138

Act-Aware Slot-Value Predicting in Multi-Domain Dialogue State Tracking

Authors: Ruolin Su, Ting-Wei Wu, Biing-Hwang Juang

Abstract: As an essential component in task-oriented dialogue systems, dialogue state tracking (DST) aims to track human-machine interactions and generate state representations for managing the dialogue. Representations of dialogue states are dependent on the domain ontology and the user's goals. In several task-oriented dialogues with a limited scope of objectives, dialogue states can be represented as a s… ▽ More As an essential component in task-oriented dialogue systems, dialogue state tracking (DST) aims to track human-machine interactions and generate state representations for managing the dialogue. Representations of dialogue states are dependent on the domain ontology and the user's goals. In several task-oriented dialogues with a limited scope of objectives, dialogue states can be represented as a set of slot-value pairs. As the capabilities of dialogue systems expand to support increasing naturalness in communication, incorporating dialogue act processing into dialogue model design becomes essential. The lack of such consideration limits the scalability of dialogue state tracking models for dialogues having specific objectives and ontology. To address this issue, we formulate and incorporate dialogue acts, and leverage recent advances in machine reading comprehension to predict both categorical and non-categorical types of slots for multi-domain dialogue state tracking. Experimental results show that our models can improve the overall accuracy of dialogue state tracking on the MultiWOZ 2.1 dataset, and demonstrate that incorporating dialogue acts can guide dialogue state design for future task-oriented dialogue systems. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: Published in Spoken Dialogue Systems I, Interspeech 2021. Code is now publicly available on Github: https://github.com/youlandasu/ACT-AWARE-DST

Journal ref: Proc. Interspeech 2021, 236-240 (2021)

arXiv:2207.10388 [pdf, other]

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Authors: Boyang Xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

Abstract: It is challenging for artificial intelligence systems to achieve accurate video recognition under the scenario of low computation costs. Adaptive inference based efficient video recognition methods typically preview videos and focus on salient parts to reduce computation costs. Most existing works focus on complex networks learning with video classification based objectives. Taking all frames as p… ▽ More It is challenging for artificial intelligence systems to achieve accurate video recognition under the scenario of low computation costs. Adaptive inference based efficient video recognition methods typically preview videos and focus on salient parts to reduce computation costs. Most existing works focus on complex networks learning with video classification based objectives. Taking all frames as positive samples, few of them pay attention to the discrimination between positive samples (salient frames) and negative samples (non-salient frames) in supervisions. To fill this gap, in this paper, we propose a novel Non-saliency Suppression Network (NSNet), which effectively suppresses the responses of non-salient frames. Specifically, on the frame level, effective pseudo labels that can distinguish between salient and non-salient frames are generated to guide the frame saliency learning. On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations. Saliency measurements from both two levels are combined for exploitation of multi-granularity complementary information. Extensive experiments conducted on four well-known benchmarks verify our NSNet not only achieves the state-of-the-art accuracy-efficiency trade-off but also present a significantly faster (2.4~4.3x) practical inference speed than state-of-the-art methods. Our project page is at https://lawrencexia2008.github.io/projects/nsnet . △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.08215 [pdf, other]

doi 10.1109/LRA.2022.3193487

Optimizing out-of-plane stiffness for soft grippers

Authors: Renbo Su, Yingjun Tian, Mingwei Du, Charlie C. L. Wang

Abstract: In this paper, we presented a data-driven framework to optimize the out-of-plane stiffness for soft grippers to achieve mechanical properties as hard-to-twist and easy-to-bend. The effectiveness of this method is demonstrated in the design of a soft pneumatic bending actuator (SPBA). First, a new objective function is defined to quantitatively evaluate the out-of-plane stiffness as well as the ben… ▽ More In this paper, we presented a data-driven framework to optimize the out-of-plane stiffness for soft grippers to achieve mechanical properties as hard-to-twist and easy-to-bend. The effectiveness of this method is demonstrated in the design of a soft pneumatic bending actuator (SPBA). First, a new objective function is defined to quantitatively evaluate the out-of-plane stiffness as well as the bending performance. Then, sensitivity analysis is conducted on the parametric model of an SPBA design to determine the optimized design parameters with the help of Finite Element Analysis (FEA). To enable the computation of numerical optimization, a data-driven approach is employed to learn a cost function that directly represents the out-of-plane stiffness as a differentiable function of the design variables. A gradient-based method is used to maximize the out-of-plane stiffness of the SPBA while ensuring specific bending performance. The effectiveness of our method has been demonstrated in physical experiments taken on 3D-printed grippers. △ Less

Submitted 29 July, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

arXiv:2206.15076 [pdf, other]

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Authors: Jason Alan Fries, Leon Weber, Natasha Seelam, Gabriel Altay, Debajyoti Datta, Samuele Garda, Myungsun Kang, Ruisi Su, Wojciech Kusa, Samuel Cahyawijaya, Fabio Barth, Simon Ott, Matthias Samwald, Stephen Bach, Stella Biderman, Mario Sänger, Bo Wang, Alison Callahan, Daniel León Periñán, Théo Gigant, Patrick Haller, Jenny Chim, Jose David Posada, John Michael Giorgi, Karthik Rangasai Sivaraman , et al. (18 additional authors not shown)

Abstract: Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the benefits of meta-dataset curation. While successful i… ▽ More Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the benefits of meta-dataset curation. While successful in general-domain text, translating these data-centric approaches to biomedical language modeling remains challenging, as labeled biomedical datasets are significantly underrepresented in popular data hubs. To address this challenge, we introduce BigBIO a community library of 126+ biomedical NLP datasets, currently covering 12 task categories and 10+ languages. BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata, and is compatible with current platforms for prompt engineering and end-to-end few/zero shot language model evaluation. We discuss our process for task schema harmonization, data auditing, contribution guidelines, and outline two illustrative use cases: zero-shot evaluation of biomedical prompts and large-scale, multi-task learning. BigBIO is an ongoing community effort and is available at https://github.com/bigscience-workshop/biomedical △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: Submitted to NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2206.14541 [pdf, other]

Why patient data cannot be easily forgotten?

Authors: Ruolin Su, Xiao Liu, Sotirios A. Tsaftaris

Abstract: Rights provisioned within data protection regulations, permit patients to request that knowledge about their information be eliminated by data holders. With the advent of AI learned on data, one can imagine that such rights can extent to requests for forgetting knowledge of patient's data within AI models. However, forgetting patients' imaging data from AI models, is still an under-explored proble… ▽ More Rights provisioned within data protection regulations, permit patients to request that knowledge about their information be eliminated by data holders. With the advent of AI learned on data, one can imagine that such rights can extent to requests for forgetting knowledge of patient's data within AI models. However, forgetting patients' imaging data from AI models, is still an under-explored problem. In this paper, we study the influence of patient data on model performance and formulate two hypotheses for a patient's data: either they are common and similar to other patients or form edge cases, i.e. unique and rare cases. We show that it is not possible to easily forget patient data. We propose a targeted forgetting approach to perform patient-wise forgetting. Extensive experiments on the benchmark Automated Cardiac Diagnosis Challenge dataset showcase the improved performance of the proposed targeted forgetting approach as opposed to a state-of-the-art method. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: Ruolin Su and Xiao Liu contributed equally. Accepted by MICCAI 2022

arXiv:2206.09184 [pdf, other]

PHN: Parallel heterogeneous network with soft gating for CTR prediction

Authors: Ri Su, Alphonse Houssou Hounye, Cong Cao, Muzhou Hou

Abstract: The Click-though Rate (CTR) prediction task is a basic task in recommendation system. Most of the previous researches of CTR models built based on Wide \& deep structure and gradually evolved into parallel structures with different modules. However, the simple accumulation of parallel structures can lead to higher structural complexity and longer training time. Based on the Sigmoid activation func… ▽ More The Click-though Rate (CTR) prediction task is a basic task in recommendation system. Most of the previous researches of CTR models built based on Wide \& deep structure and gradually evolved into parallel structures with different modules. However, the simple accumulation of parallel structures can lead to higher structural complexity and longer training time. Based on the Sigmoid activation function of output layer, the linear addition activation value of parallel structures in the training process is easy to make the samples fall into the weak gradient interval, resulting in the phenomenon of weak gradient, and reducing the effectiveness of training. To this end, this paper proposes a Parallel Heterogeneous Network (PHN) model, which constructs a network with parallel structure through three different interaction analysis methods, and uses Soft Selection Gating (SSG) to feature heterogeneous data with different structure. Finally, residual link with trainable parameters are used in the network to mitigate the influence of weak gradient phenomenon. Furthermore, we demonstrate the effectiveness of PHN in a large number of comparative experiments, and visualize the performance of the model in training process and structure. △ Less

Submitted 18 June, 2022; originally announced June 2022.

arXiv:2204.08185 [pdf, ps, other]

Completion Delay of Random Linear Network Coding in Full-Duplex Relay Networks

Authors: Rina Su, Qifu Tyler Sun, Zhongshan Zhang, Zongpeng Li

Abstract: As the next-generation wireless networks thrive, full-duplex and relay techniques are combined to improve the network performance. Random linear network coding (RLNC) is another popular technique to enhance the efficiency and reliability of wireless communications. In this paper, in order to explore the potential of RLNC in full-duplex relay networks, we investigate two fundamental perfect RLNC sc… ▽ More As the next-generation wireless networks thrive, full-duplex and relay techniques are combined to improve the network performance. Random linear network coding (RLNC) is another popular technique to enhance the efficiency and reliability of wireless communications. In this paper, in order to explore the potential of RLNC in full-duplex relay networks, we investigate two fundamental perfect RLNC schemes and theoretically analyze their completion delay performance. The first scheme is a straightforward application of conventional perfect RLNC studied in wireless broadcast, so it involves no additional process at the relay. Its performance serves as an upper bound for all perfect RLNC schemes. The other scheme allows sufficiently large buffer and unconstrained linear coding at the relay. It attains the optimal performance and serves as a lower bound for all RLNC schemes. For both schemes, closed-form formulae to characterize the expected completion delay at a single receiver as well as for the whole system are derived. Numerical results are also demonstrated to validate the theoretical characterizations, and compare the two fundamental schemes with the existing one. △ Less

Submitted 3 November, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

arXiv:2204.07579 [pdf, other]

Interpretable Fault Diagnosis of Rolling Element Bearings with Temporal Logic Neural Network

Authors: Gang Chen, Yu Lu, Rong Su, Zhaodan Kong

Abstract: Machine learning-based methods have achieved successful applications in machinery fault diagnosis. However, the main limitation that exists for these methods is that they operate as a black box and are generally not interpretable. This paper proposes a novel neural network structure, called temporal logic neural network (TLNN), in which the neurons of the network are logic propositions. More impor… ▽ More Machine learning-based methods have achieved successful applications in machinery fault diagnosis. However, the main limitation that exists for these methods is that they operate as a black box and are generally not interpretable. This paper proposes a novel neural network structure, called temporal logic neural network (TLNN), in which the neurons of the network are logic propositions. More importantly, the network can be described and interpreted as a weighted signal temporal logic. TLNN not only keeps the nice properties of traditional neuron networks but also provides a formal interpretation of itself with formal language. Experiments with real datasets show the proposed neural network can obtain highly accurate fault diagnosis results with good computation efficiency. Additionally, the embedded formal language of the neuron network can provide explanations about the decision process, thus achieve interpretable fault diagnosis. △ Less

Submitted 19 April, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

arXiv:2204.04090 [pdf, other]

Single-level Adversarial Data Synthesis based on Neural Tangent Kernels

Authors: Yu-Rong Zhang, Ruei-Yang Su, Sheng Yen Chou, Shan-Hung Wu

Abstract: Abstract Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial N… ▽ More Abstract Generative adversarial networks (GANs) have achieved impressive performance in data synthesis and have driven the development of many applications. However, GANs are known to be hard to train due to their bilevel objective, which leads to the problems of convergence, mode collapse, and gradient vanishing. In this paper, we propose a new generative model called the generative adversarial NTK (GA-NTK) that has a single-level objective. The GA-NTK keeps the spirit of adversarial learning (which helps generate plausible data) while avoiding the training difficulties of GANs. This is done by modeling the discriminator as a Gaussian process with a neural tangent kernel (NTK-GP) whose training dynamics can be completely described by a closed-form formula. We analyze the convergence behavior of GA-NTK trained by gradient descent and give some sufficient conditions for convergence. We also conduct extensive experiments to study the advantages and limitations of GA-NTK and propose some techniques that make GA-NTK more practical. △ Less

Submitted 20 November, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

Showing 1–50 of 84 results for author: Su, R