Search | arXiv e-print repository

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Authors: Lu Zhang, Tiancheng Zhao, Heting Ying, Yibo Ma, Kyusong Lee

Abstract: Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents significant challenges due to the vast data and processing demands. Traditional methods, like extracting key frames or converting frames to text, ofte… ▽ More Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents significant challenges due to the vast data and processing demands. Traditional methods, like extracting key frames or converting frames to text, often result in substantial information loss. To address these shortcomings, we develop OmAgent, efficiently stores and retrieves relevant video frames for specific queries, preserving the detailed content of videos. Additionally, it features an Divide-and-Conquer Loop capable of autonomous reasoning, dynamically invoking APIs and tools to enhance query processing and accuracy. This approach ensures robust video understanding, significantly reducing information loss. Experimental results affirm OmAgent's efficacy in handling various types of videos and complex tasks. Moreover, we have endowed it with greater autonomy and a robust tool-calling system, enabling it to accomplish even more intricate tasks. △ Less

Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.03847 [pdf, other]

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

Authors: Huaiyuan Ying, Zijian Wu, Yihan Geng, Jiayu Wang, Dahua Lin, Kai Chen

Abstract: Large language models have demonstrated impressive capabilities across various natural language processing tasks, especially in solving mathematical problems. However, large language models are not good at math theorem proving using formal languages like Lean. A significant challenge in this area is the scarcity of training data available in these formal languages. To address this issue, we propos… ▽ More Large language models have demonstrated impressive capabilities across various natural language processing tasks, especially in solving mathematical problems. However, large language models are not good at math theorem proving using formal languages like Lean. A significant challenge in this area is the scarcity of training data available in these formal languages. To address this issue, we propose a novel pipeline that iteratively generates and filters synthetic data to translate natural language mathematical problems into Lean 4 statements, and vice versa. Our results indicate that the synthetic data pipeline can provide useful training data and improve the performance of LLMs in translating and understanding complex mathematical problems and proofs. Our final dataset contains about 57K formal-informal question pairs along with searched proof from the math contest forum and 21 new IMO questions. We open-source our code at https://github.com/InternLM/InternLM-Math and our data at https://huggingface.co/datasets/InternLM/Lean-Workbook. △ Less

Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.18800 [pdf]

Face processing emerges from object-trained convolutional neural networks

Authors: Zhenhua Zhao, Ji Chen, Zhicheng Lin, Haojiang Ying

Abstract: Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks… ▽ More Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks (CNNs) that can be trained without face exposure (i.e., pre-trained weights). Domain-general mechanism accounts posit that face processing can emerge from a neural network without specialized pre-training on faces. Consequently, we trained CNNs solely on objects and tested their ability to recognize and represent faces as well as objects that look like faces (face pareidolia stimuli).... Due to the character limits, for more details see in attached pdf △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 31 pages, 5 Figures

arXiv:2405.14900 [pdf, other]

doi 10.1016/j.media.2024.103206.

Fair Evaluation of Federated Learning Algorithms for Automated Breast Density Classification: The Results of the 2022 ACR-NCI-NVIDIA Federated Learning Challenge

Authors: Kendall Schmidt, Benjamin Bearce, Ken Chang, Laura Coombs, Keyvan Farahani, Marawan Elbatele, Kaouther Mouhebe, Robert Marti, Ruipeng Zhang, Yao Zhang, Yanfeng Wang, Yaojun Hu, Haochao Ying, Yuyang Xu, Conrad Testagrose, Mutlu Demirer, Vikash Gupta, Ünal Akünal, Markus Bujotzek, Klaus H. Maier-Hein, Yi Qin, Xiaomeng Li, Jayashree Kalpathy-Cramer, Holger R. Roth

Abstract: The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the… ▽ More The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical School's Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 16 pages, 9 figures

Journal ref: Medical Image Analysis Volume 95, July 2024, 103206

arXiv:2404.19246 [pdf]

Logistic Map Pseudo Random Number Generator in FPGA

Authors: Mateo Jalen Andrew Calderon, Lee Jun Lei Lucas, Syarifuddin Azhar Bin Rosli, Stephanie See Hui Ying, Jarell Lim En Yu, Maoyang Xiang, T. Hui Teo

Abstract: This project develops a pseudo-random number generator (PRNG) using the logistic map, implemented in Verilog HDL on an FPGA and processes its output through a Central Limit Theorem (CLT) function to achieve a Gaussian distribution. The system integrates additional FPGA modules for real-time interaction and visualisation, including a clock generator, UART interface, XADC, and a 7-segment display dr… ▽ More This project develops a pseudo-random number generator (PRNG) using the logistic map, implemented in Verilog HDL on an FPGA and processes its output through a Central Limit Theorem (CLT) function to achieve a Gaussian distribution. The system integrates additional FPGA modules for real-time interaction and visualisation, including a clock generator, UART interface, XADC, and a 7-segment display driver. These components facilitate the direct display of PRNG values on the FPGA and the transmission of data to a laptop for histogram analysis, verifying the Gaussian nature of the output. This approach demonstrates the practical application of chaotic systems for generating Gaussian-distributed pseudo-random numbers in digital hardware, highlighting the logistic map's potential in PRNG design. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures

arXiv:2404.11171 [pdf, other]

Personalized Heart Disease Detection via ECG Digital Twin Generation

Authors: Yaojun Hu, **tai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

Abstract: Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ dig… ▽ More Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. △ Less

Submitted 11 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.19124 [pdf, other]

PoCo: A Self-Supervised Approach via Polar Transformation Based Progressive Contrastive Learning for Ophthalmic Disease Diagnosis

Authors: **hong Wang, Tingting Chen, **tai Chen, Yixuan Wu, Yuyang Xu, Danny Chen, Haochao Ying, Jian Wu

Abstract: Automatic ophthalmic disease diagnosis on fundus images is important in clinical practice. However, due to complex fundus textures and limited annotated data, develo** an effective automatic method for this problem is still challenging. In this paper, we present a self-supervised method via polar transformation based progressive contrastive learning, called PoCo, for ophthalmic disease diagnosis… ▽ More Automatic ophthalmic disease diagnosis on fundus images is important in clinical practice. However, due to complex fundus textures and limited annotated data, develo** an effective automatic method for this problem is still challenging. In this paper, we present a self-supervised method via polar transformation based progressive contrastive learning, called PoCo, for ophthalmic disease diagnosis. Specifically, we novelly inject the polar transformation into contrastive learning to 1) promote contrastive learning pre-training to be faster and more stable and 2) naturally capture task-free and rotation-related textures, which provides insights into disease recognition on fundus images. Beneficially, simple normal translation-invariant convolution on transformed images can equivalently replace the complex rotation-invariant and sector convolution on raw images. After that, we develop a progressive contrastive learning method to efficiently utilize large unannotated images and a novel progressive hard negative sampling scheme to gradually reduce the negative sample number for efficient training and performance enhancement. Extensive experiments on three public ophthalmic disease datasets show that our PoCo achieves state-of-the-art performance with good generalization ability, validating that our method can reduce annotation efforts and provide reliable diagnosis. Codes are available at \url{https://github.com/wjh892521292/PoCo}. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17297 [pdf, other]

InternLM2 Technical Report

Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15876 [pdf, other]

Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content

Authors: Zhicheng Du, Zhaotian Xie, Huazhang Ying, Likun Zhang, Peiwu Qin

Abstract: This study explores the ability of Image Captioning (IC) models to decode masked visual content sourced from diverse datasets. Our findings reveal the IC model's capability to generate captions from masked images, closely resembling the original content. Notably, even in the presence of masks, the model adeptly crafts descriptive textual information that goes beyond what is observable in the origi… ▽ More This study explores the ability of Image Captioning (IC) models to decode masked visual content sourced from diverse datasets. Our findings reveal the IC model's capability to generate captions from masked images, closely resembling the original content. Notably, even in the presence of masks, the model adeptly crafts descriptive textual information that goes beyond what is observable in the original image-generated captions. While the decoding performance of the IC model experiences a decline with an increase in the masked region's area, the model still performs well when important regions of the image are not masked at high coverage. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Accepted as tiny paper in ICLR 2024

arXiv:2402.17246 [pdf, other]

SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging

Authors: Meng Lou, Hanning Ying, Xiaoqing Liu, Hong-Yu Zhou, Yuqing Zhang, Yizhou Yu

Abstract: Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) t… ▽ More Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) to process multi-phase imaging inputs, possessing robust feature representations while maintaining computational efficiency. The weight-sharing feature of the SNN is further enriched by a hybrid Dual-Resolution Transformer (DR-Former), comprising a 3D Convolutional Neural Network (CNN) and a tailored 3D Transformer for processing high- and low-resolution images, respectively. This hybrid sub-architecture excels in capturing detailed local features and understanding global contextual information, thereby, boosting the SNN's feature extraction capabilities. Additionally, a novel Adaptive Phase Selection Module (APSM) is introduced, promoting phase-specific intercommunication and dynamically adjusting each phase's influence on the diagnostic outcome. The proposed SDR-Former framework has been validated through comprehensive experiments on two clinical datasets: a three-phase CT dataset and an eight-phase MR dataset. The experimental results affirm the efficacy of the proposed framework. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public. This pioneering dataset, being the first publicly available multi-phase MR dataset in this field, also underpins the MICCAI LLD-MMRI Challenge. The dataset is accessible at:https://bit.ly/3IyYlgN. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 13 pages, 7 figures

arXiv:2402.11177 [pdf, other]

A Question Answering Based Pipeline for Comprehensive Chinese EHR Information Extraction

Authors: Huaiyuan Ying, Sheng Yu

Abstract: Electronic health records (EHRs) hold significant value for research and applications. As a new way of information extraction, question answering (QA) can extract more flexible information than conventional methods and is more accessible to clinical researchers, but its progress is impeded by the scarcity of annotated data. In this paper, we propose a novel approach that automatically generates tr… ▽ More Electronic health records (EHRs) hold significant value for research and applications. As a new way of information extraction, question answering (QA) can extract more flexible information than conventional methods and is more accessible to clinical researchers, but its progress is impeded by the scarcity of annotated data. In this paper, we propose a novel approach that automatically generates training data for transfer learning of QA models. Our pipeline incorporates a preprocessing module to handle challenges posed by extraction types that are not readily compatible with extractive QA frameworks, including cases with discontinuous answers and many-to-one relationships. The obtained QA model exhibits excellent performance on subtasks of information extraction in EHRs, and it can effectively handle few-shot or zero-shot settings involving yes-no questions. Case studies and ablation studies demonstrate the necessity of each component in our design, and the resulting model is deemed suitable for practical use. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.06332 [pdf, other]

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Authors: Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

Abstract: The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatil… ▽ More The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}. △ Less

Submitted 24 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.02334 [pdf, other]

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

Authors: Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin

Abstract: Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetic… ▽ More Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer. △ Less

Submitted 19 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: 11 pages, 8 figures, to be published to AAAI2024

ACM Class: I.2.4

arXiv:2312.08036 [pdf]

CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs

Authors: Huaiyuan Ying, Zhengyun Zhao, Yang Zhao, Sihang Zeng, Sheng Yu

Abstract: Objective: Biomedical Knowledge Graphs play a pivotal role in various biomedical research domains. Concurrently, term clustering emerges as a crucial step in constructing these knowledge graphs, aiming to identify synonymous terms. Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and… ▽ More Objective: Biomedical Knowledge Graphs play a pivotal role in various biomedical research domains. Concurrently, term clustering emerges as a crucial step in constructing these knowledge graphs, aiming to identify synonymous terms. Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and do not generalize well beyond UMLS terms. In this work, we leverage the world knowledge from Large Language Models (LLMs) and propose Contrastive Learning for Representing Terms via Explanations (CoRTEx) to enhance term representation and significantly improves term clustering. Materials and Methods: The model training involves generating explanations for a cleaned subset of UMLS terms using ChatGPT. We employ contrastive learning, considering term and explanation embeddings simultaneously, and progressively introduce hard negative samples. Additionally, a ChatGPT-assisted BIRCH algorithm is designed for efficient clustering of a new ontology. Results: We established a clustering test set and a hard negative test set, where our model consistently achieves the highest F1 score. With CoRTEx embeddings and the modified BIRCH algorithm, we grouped 35,580,932 terms from the Biomedical Informatics Ontology System (BIOS) into 22,104,559 clusters with O(N) queries to ChatGPT. Case studies highlight the model's efficacy in handling challenging samples, aided by information from explanations. Conclusion: By aligning terms to their explanations, CoRTEx demonstrates superior accuracy over benchmark models and robustness beyond its training set, and it is suitable for clustering terms for large-scale biomedical ontologies. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.06171 [pdf, other]

Jointly Explicit and Implicit Cross-Modal Interaction Network for Anterior Chamber Inflammation Diagnosis

Authors: Qian Shao, Ye Dai, Haochao Ying, Kan Xu, **hong Wang, Wei Chi, Jian Wu

Abstract: Uveitis demands the precise diagnosis of anterior chamber inflammation (ACI) for optimal treatment. However, current diagnostic methods only rely on a limited single-modal disease perspective, which leads to poor performance. In this paper, we investigate a promising yet challenging way to fuse multimodal data for ACI diagnosis. Notably, existing fusion paradigms focus on empowering implicit modal… ▽ More Uveitis demands the precise diagnosis of anterior chamber inflammation (ACI) for optimal treatment. However, current diagnostic methods only rely on a limited single-modal disease perspective, which leads to poor performance. In this paper, we investigate a promising yet challenging way to fuse multimodal data for ACI diagnosis. Notably, existing fusion paradigms focus on empowering implicit modality interactions (i.e., self-attention and its variants), but neglect to inject explicit modality interactions, especially from clinical knowledge and imaging property. To this end, we propose a jointly Explicit and implicit Cross-Modal Interaction Network (EiCI-Net) for Anterior Chamber Inflammation Diagnosis that uses anterior segment optical coherence tomography (AS-OCT) images, slit-lamp images, and clinical data jointly. Specifically, we first develop CNN-Based Encoders and Tabular Processing Module (TPM) to extract efficient feature representations in different modalities. Then, we devise an Explicit Cross-Modal Interaction Module (ECIM) to generate attention maps as a kind of explicit clinical knowledge based on the tabular feature maps, then integrated them into the slit-lamp feature maps, allowing the CNN-Based Encoder to focus on more effective informativeness of the slit-lamp images. After that, the Implicit Cross-Modal Interaction Module (ICIM), a transformer-based network, further implicitly enhances modality interactions. Finally, we construct a considerable real-world dataset from our collaborative hospital and conduct sufficient experiments to demonstrate the superior performance of our proposed EiCI-Net compared with the state-of-the-art classification methods in various metrics. △ Less

Submitted 19 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03023 [pdf, ps, other]

A study of topological quantities of lattice QCD by a modified DCGAN frame

Authors: Lin Gao, He** Ying, Jianbo Zhang

Abstract: A modified deep convolutional generative adversarial network (M-DCGAN) frame is proposed to study the N-dimensional (ND) topological quantities in lattice QCD based on the Monte Carlo (MC) simulations. We construct a new scaling structure including fully connected layers to support the generation of high-quality high-dimensional images for the M-DCGAN. Our results show that the M-DCGAN scheme of t… ▽ More A modified deep convolutional generative adversarial network (M-DCGAN) frame is proposed to study the N-dimensional (ND) topological quantities in lattice QCD based on the Monte Carlo (MC) simulations. We construct a new scaling structure including fully connected layers to support the generation of high-quality high-dimensional images for the M-DCGAN. Our results show that the M-DCGAN scheme of the Machine learning should be helpful for us to calculate efficiently the 1D distribution of topological charge and the 4D topological charge density compared with the case by the MC simulation alone. △ Less

Submitted 17 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.13234 [pdf, other]

TSegFormer: 3D Tooth Segmentation in Intraoral Scans with Geometry Guided Transformer

Authors: Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, ** Hao, Haochao Ying, Jian Wu, Zuozhu Liu

Abstract: Optical Intraoral Scanners (IOS) are widely used in digital dentistry to provide detailed 3D information of dental crowns and the gingiva. Accurate 3D tooth segmentation in IOSs is critical for various dental applications, while previous methods are error-prone at complicated boundaries and exhibit unsatisfactory results across patients. In this paper, we propose TSegFormer which captures both loc… ▽ More Optical Intraoral Scanners (IOS) are widely used in digital dentistry to provide detailed 3D information of dental crowns and the gingiva. Accurate 3D tooth segmentation in IOSs is critical for various dental applications, while previous methods are error-prone at complicated boundaries and exhibit unsatisfactory results across patients. In this paper, we propose TSegFormer which captures both local and global dependencies among different teeth and the gingiva in the IOS point clouds with a multi-task 3D transformer architecture. Moreover, we design a geometry-guided loss based on a novel point curvature to refine boundaries in an end-to-end manner, avoiding time-consuming post-processing to reach clinically applicable segmentation. In addition, we create a dataset with 16,000 IOSs, the largest ever IOS dataset to the best of our knowledge. The experimental results demonstrate that our TSegFormer consistently surpasses existing state-of-the-art baselines. The superiority of TSegFormer is corroborated by extensive analysis, visualizations and real-world clinical applicability tests. Our code is available at https://github.com/huiminxiong/TSegFormer. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: MICCAI 2023, STAR(Student Travel) award. 11 pages, 3 figures, 5 tables. arXiv admin note: text overlap with arXiv:2210.16627

arXiv:2311.11666 [pdf, other]

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Authors: Haiyang Ying, Yixuan Yin, **zhi Zhang, Fan Wang, Tao Yu, Ruqi Huang, Lu Fang

Abstract: Towards holistic understanding of 3D scenes, a general 3D segmentation method is needed that can segment diverse objects without restrictions on object quantity or categories, while also reflecting the inherent hierarchical structure. To achieve this, we propose OmniSeg3D, an omniversal segmentation method aims for segmenting anything in 3D all at once. The key insight is to lift multi-view incons… ▽ More Towards holistic understanding of 3D scenes, a general 3D segmentation method is needed that can segment diverse objects without restrictions on object quantity or categories, while also reflecting the inherent hierarchical structure. To achieve this, we propose OmniSeg3D, an omniversal segmentation method aims for segmenting anything in 3D all at once. The key insight is to lift multi-view inconsistent 2D segmentations into a consistent 3D feature field through a hierarchical contrastive learning framework, which is accomplished by two steps. Firstly, we design a novel hierarchical representation based on category-agnostic 2D segmentations to model the multi-level relationship among pixels. Secondly, image features rendered from the 3D feature field are clustered at different levels, which can be further drawn closer or pushed apart according to the hierarchical relationship between different levels. In tackling the challenges posed by inconsistent 2D segmentations, this framework yields a global consistent 3D feature field, which further enables hierarchical segmentation, multi-object selection, and global discretization. Extensive experiments demonstrate the effectiveness of our method on high-quality 3D segmentation and accurate hierarchical structure understanding. A graphical user interface further facilitates flexible interaction for omniversal 3D segmentation. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.10108 [pdf, ps, other]

doi 10.1103/PhysRevD.109.074509

Study of topological quantities of lattice QCD with a modified Wasserstein generative adversarial network

Authors: Lin Gao, He** Ying, Jianbo Zhang

Abstract: We propose a modified Wasserstein generative adversarial network (M-WGAN) to study the distribution of the topological charge in lattice QCD based on Monte Carlo simulations. We construct new generator and discriminator in M-WGAN to support the generation of high-quality distribution. Our results show that the M-WGAN scheme of machine learning should be helpful for us to calculate efficiently the… ▽ More We propose a modified Wasserstein generative adversarial network (M-WGAN) to study the distribution of the topological charge in lattice QCD based on Monte Carlo simulations. We construct new generator and discriminator in M-WGAN to support the generation of high-quality distribution. Our results show that the M-WGAN scheme of machine learning should be helpful for us to calculate efficiently the 1D distribution of topological charge compared with the method by the MC simulation alone. △ Less

Submitted 10 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.09757 [pdf, other]

UFPS: A unified framework for partially-annotated federated segmentation in heterogeneous data distribution

Authors: Le Jiang, Li Yan Ma, Tie Yong Zeng, Shi Hui Ying

Abstract: Partially supervised segmentation is a label-saving method based on datasets with fractional classes labeled and intersectant. However, it is still far from landing on real-world medical applications due to privacy concerns and data heterogeneity. As a remedy without privacy leakage, federated partially supervised segmentation (FPSS) is formulated in this work. The main challenges for FPSS are cla… ▽ More Partially supervised segmentation is a label-saving method based on datasets with fractional classes labeled and intersectant. However, it is still far from landing on real-world medical applications due to privacy concerns and data heterogeneity. As a remedy without privacy leakage, federated partially supervised segmentation (FPSS) is formulated in this work. The main challenges for FPSS are class heterogeneity and client drift. We propose a Unified Federated Partially-labeled Segmentation (UFPS) framework to segment pixels within all classes for partially-annotated datasets by training a totipotential global model without class collision. Our framework includes Unified Label Learning and sparsed Unified Sharpness Aware Minimization for unification of class and feature space, respectively. We find that vanilla combinations for traditional methods in partially supervised segmentation and federated learning are mainly hampered by class collision through empirical study. Our comprehensive experiments on real medical datasets demonstrate better deconflicting and generalization ability of UFPS compared with modified methods. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.13674 [pdf, other]

Using Human-like Mechanism to Weaken Effect of Pre-training Weight Bias in Face-Recognition Convolutional Neural Network

Authors: Haojiang Ying, Yi-Fan Li, Yiyang Chen

Abstract: Convolutional neural network (CNN), as an important model in artificial intelligence, has been widely used and studied in different disciplines. The computational mechanisms of CNNs are still not fully revealed due to the their complex nature. In this study, we focused on 4 extensively studied CNNs (AlexNet, VGG11, VGG13, and VGG16) which has been analyzed as human-like models by neuroscientists w… ▽ More Convolutional neural network (CNN), as an important model in artificial intelligence, has been widely used and studied in different disciplines. The computational mechanisms of CNNs are still not fully revealed due to the their complex nature. In this study, we focused on 4 extensively studied CNNs (AlexNet, VGG11, VGG13, and VGG16) which has been analyzed as human-like models by neuroscientists with ample evidence. We trained these CNNs to emotion valence classification task by transfer learning. Comparing their performance with human data, the data unveiled that these CNNs would partly perform as human does. We then update the object-based AlexNet using self-attention mechanism based on neuroscience and behavioral data. The updated FE-AlexNet outperformed all the other tested CNNs and closely resembles human perception. The results further unveil the computational mechanisms of these CNNs. Moreover, this study offers a new paradigm to better understand and improve CNN performance via human data. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 24 pages, 6 figures

arXiv:2310.10958 [pdf]

Enhancing Deep Neural Network Training Efficiency and Performance through Linear Prediction

Authors: Hejie Ying, Mengmeng Song, Yaohong Tang, Shungen Xiao, Zimin Xiao

Abstract: Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing. However, training an effective DNN model still poses challenges. This paper aims to propose a method to optimize the training effectiveness of DNN, with the goal of improving model performance. Firstly, based on the observation that the DNN parameters change in… ▽ More Deep neural networks (DNN) have achieved remarkable success in various fields, including computer vision and natural language processing. However, training an effective DNN model still poses challenges. This paper aims to propose a method to optimize the training effectiveness of DNN, with the goal of improving model performance. Firstly, based on the observation that the DNN parameters change in certain laws during training process, the potential of parameter prediction for improving model training efficiency and performance is discovered. Secondly, considering the magnitude of DNN model parameters, hardware limitations and characteristics of Stochastic Gradient Descent (SGD) for noise tolerance, a Parameter Linear Prediction (PLP) method is exploit to perform DNN parameter prediction. Finally, validations are carried out on some representative backbones. Experiment results show that compare to the normal training ways, under the same training conditions and epochs, by employing proposed PLP method, the optimal model is able to obtain average about 1% accuracy improvement and 0.01 top-1/top-5 error reduction for Vgg16, Resnet18 and GoogLeNet based on CIFAR-100 dataset, which shown the effectiveness of the proposed method on different DNN structures, and validated its capacity in enhancing DNN training efficiency and performance. △ Less

Submitted 2 July, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.17190 [pdf, other]

PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis

Authors: Haiyang Ying, Baowei Jiang, **zhi Zhang, Di Xu, Tao Yu, Qionghai Dai, Lu Fang

Abstract: This paper proposes a method for fast scene radiance field reconstruction with strong novel view synthesis performance and convenient scene editing functionality. The key idea is to fully utilize semantic parsing and primitive extraction for constraining and accelerating the radiance field reconstruction process. To fulfill this goal, a primitive-aware hybrid rendering strategy was proposed to enj… ▽ More This paper proposes a method for fast scene radiance field reconstruction with strong novel view synthesis performance and convenient scene editing functionality. The key idea is to fully utilize semantic parsing and primitive extraction for constraining and accelerating the radiance field reconstruction process. To fulfill this goal, a primitive-aware hybrid rendering strategy was proposed to enjoy the best of both volumetric and primitive rendering. We further contribute a reconstruction pipeline conducts primitive parsing and radiance field learning iteratively for each input frame which successfully fuses semantic, primitive, and radiance information into a single framework. Extensive evaluations demonstrate the fast reconstruction ability, high rendering quality, and convenient editing functionality of our method. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: Accepted to ICCV 2023; Project page: https://oceanying.github.io/PARF/

arXiv:2309.13235 [pdf, other]

M$^3$CS: Multi-Target Masked Point Modeling with Learnable Codebook and Siamese Decoders

Authors: Qibo Qiu, Honghui Yang, Wenxiao Wang, Shun Zhang, Haiming Gao, Haochao Ying, Wei Hua, Xiaofei He

Abstract: Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and… ▽ More Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the original points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities to capture geometric details and semantic contexts during pre-training. To this end, M$^3$CS is proposed to enable the model with the above abilities. Specifically, with masked point cloud as input, M$^3$CS introduces two decoders to predict masked representations and the original points simultaneously. While an extra decoder doubles parameters for the decoding process and may lead to overfitting, we propose siamese decoders to keep the amount of learnable parameters unchanged. Further, we propose an online codebook projecting continuous tokens into discrete ones before reconstructing masked points. In such way, we can enforce the decoder to take effect through the combinations of tokens rather than remembering each token. Comprehensive experiments show that M$^3$CS achieves superior performance at both classification and segmentation tasks, outperforming existing methods. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2307.08348 [pdf, other]

Adaptive Local Basis Functions for Shape Completion

Authors: Hui Ying, Tianjia Shao, He Wang, Yin Yang, Kun Zhou

Abstract: In this paper, we focus on the task of 3D shape completion from partial point clouds using deep implicit functions. Existing methods seek to use voxelized basis functions or the ones from a certain family of functions (e.g., Gaussians), which leads to high computational costs or limited shape expressivity. On the contrary, our method employs adaptive local basis functions, which are learned end-to… ▽ More In this paper, we focus on the task of 3D shape completion from partial point clouds using deep implicit functions. Existing methods seek to use voxelized basis functions or the ones from a certain family of functions (e.g., Gaussians), which leads to high computational costs or limited shape expressivity. On the contrary, our method employs adaptive local basis functions, which are learned end-to-end and not restricted in certain forms. Based on those basis functions, a local-to-local shape completion framework is presented. Our algorithm learns sparse parameterization with a small number of basis functions while preserving local geometric details during completion. Quantitative and qualitative experiments demonstrate that our method outperforms the state-of-the-art methods in shape completion, detail preservation, generalization to unseen geometries, and computational cost. Code and data are at https://github.com/yinghdb/Adaptive-Local-Basis-Functions. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: In SIGGRAPH 2023

arXiv:2305.04213 [pdf, other]

Robust Image Ordinal Regression with Controllable Image Generation

Authors: Yi Cheng, Haochao Ying, Renjun Hu, **hong Wang, Wenhao Zheng, Xiao Zhang, Danny Chen, Jian Wu

Abstract: Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generatio… ▽ More Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories. △ Less

Submitted 21 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: 8 pages, 12 figures, to be published in IJCAI2023

arXiv:2305.00274 [pdf]

Evolution of medium-range order and its correlation with magnetic nanodomains in Fe-Dy-B-Nb bulk metallic glasses

Authors: Jiacheng Ge, Yao Gu, Zhongzhen Yao, Sinan Liu, Huiqiang Ying, Chenyu Lu, Zhenduo Wu, Yang Ren, Jun-ichi Suzuki, Zhenhua Xie, Yubin Ke, He Zhu, Song Tang, Xun-Li Wang, Si Lan

Abstract: Fe-based metallic glasses are promising functional materials for advanced magnetism and sensor fields. Tailoring magnetic performance in amorphous materials requires a thorough knowledge of the correlation between structural disorder and magnetic order, which remains ambiguous. Two practical difficulties remain: the first is directly observing subtle magnetic structural changes on multiple scales,… ▽ More Fe-based metallic glasses are promising functional materials for advanced magnetism and sensor fields. Tailoring magnetic performance in amorphous materials requires a thorough knowledge of the correlation between structural disorder and magnetic order, which remains ambiguous. Two practical difficulties remain: the first is directly observing subtle magnetic structural changes on multiple scales, and the second is precisely regulating the various amorphous states. Here we propose a novel approach to tailor the amorphous structure through the liquid liquid phase transition. In-situ synchrotron diffraction has unraveled a medium-range ordering process dominated by edge-sharing cluster connectivity during the liquid-liquid phase transition. Moreover, nanodomains with topological order have been found to exist in composition with liquid-liquid phase transition, manifesting as hexagonal patterns in small-angle neutron scattering profiles. The liquid-liquid phase transition can induce the nanodomains to be more locally ordered, generating stronger exchange interactions due to the reduced Fe-Fe bond and the enhanced structural order, leading to the increment of saturation magnetization. Furthermore, the increased local heterogeneity in the medium range scale enhances the magnetic anisotropy, promoting the permeability response under applied stress and leading to a better stress-impedance effect. These experimental results pave the way to tailor the magnetic structure and performance through the liquid-liquid phase transition. △ Less

Submitted 29 April, 2023; originally announced May 2023.

Comments: number of pages is 31 and number of figures is 14, including the Supplementary Material

arXiv:2304.11672 [pdf]

CBIM: A Graph-based Approach to Enhance Interoperability Using Semantic Enrichment

Authors: Zijian Wang, Huaquan Ying, Rafael Sacks, André Borrmann

Abstract: Interoperability remains a challenge in the construction industry. In this study, we propose a semantic enrichment approach to construct BIM knowledge graphs from pure building object geometries and demonstrate its potential to support BIM interoperability. Our approach involves machine learning and rule-based methods for object classification, relationship determination (e.g., hosting and adjacen… ▽ More Interoperability remains a challenge in the construction industry. In this study, we propose a semantic enrichment approach to construct BIM knowledge graphs from pure building object geometries and demonstrate its potential to support BIM interoperability. Our approach involves machine learning and rule-based methods for object classification, relationship determination (e.g., hosting and adjacent) and attribute computation. The enriched results are compiled into a BIM graph. A case study was conducted to illustrate the approach for facilitating interoperability between different versions of the BIM authoring software Autodesk Revit. First, pure object geometries of an architectural apartment model were exported from Revit 2023 and fed into the developed tools in sequence to generate a BIM graph. Then, essential information was extracted from the graph and used to reconstruct an architectural model in the version 2022 of Revit. Upon examination, the reconstructed model was consistent with the original one. The success of this experiment demonstrates the feasibility of generating a BIM graph from object geometries and utilizing it to support interoperability. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2303.15116 [pdf]

An ontology-aided, natural language-based approach for multi-constraint BIM model querying

Authors: Mengtian Yin, Llewellyn Tang, Chris Webster, Shen Xu, Xiongyi Li, Huaquan Ying

Abstract: Being able to efficiently retrieve the required building information is critical for construction project stakeholders to carry out their engineering and management activities. Natural language interface (NLI) systems are emerging as a time and cost-effective way to query Building Information Models (BIMs). However, the existing methods cannot logically combine different constraints to perform fin… ▽ More Being able to efficiently retrieve the required building information is critical for construction project stakeholders to carry out their engineering and management activities. Natural language interface (NLI) systems are emerging as a time and cost-effective way to query Building Information Models (BIMs). However, the existing methods cannot logically combine different constraints to perform fine-grained queries, dampening the usability of natural language (NL)-based BIM queries. This paper presents a novel ontology-aided semantic parser to automatically map natural language queries (NLQs) that contain different attribute and relational constraints into computer-readable codes for querying complex BIM models. First, a modular ontology was developed to represent NL expressions of Industry Foundation Classes (IFC) concepts and relationships, and was then populated with entities from target BIM models to assimilate project-specific information. Hereafter, the ontology-aided semantic parser progressively extracts concepts, relationships, and value restrictions from NLQs to fully identify constraint conditions, resulting in standard SPARQL queries with reasoning rules to successfully retrieve IFC-based BIM models. The approach was evaluated based on 225 NLQs collected from BIM users, with a 91% accuracy rate. Finally, a case study about the design-checking of a real-world residential building demonstrates the practical value of the proposed approach in the construction industry. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.13491 [pdf, other]

doi 10.1109/TVCG.2023.3261910

FraudAuditor: A Visual Analytics Approach for Collusive Fraud in Health Insurance

Authors: Jiehui Zhou, Xumeng Wang, Jie Wang, Hui Ye, Huanliang Wang, Zihan Zhou, Dongming Han, Haochao Ying, Jian Wu, Wei Chen

Abstract: Collusive fraud, in which multiple fraudsters collude to defraud health insurance funds, threatens the operation of the healthcare system. However, existing statistical and machine learning-based methods have limited ability to detect fraud in the scenario of health insurance due to the high similarity of fraudulent behaviors to normal medical visits and the lack of labeled data. To ensure the acc… ▽ More Collusive fraud, in which multiple fraudsters collude to defraud health insurance funds, threatens the operation of the healthcare system. However, existing statistical and machine learning-based methods have limited ability to detect fraud in the scenario of health insurance due to the high similarity of fraudulent behaviors to normal medical visits and the lack of labeled data. To ensure the accuracy of the detection results, expert knowledge needs to be integrated with the fraud detection process. By working closely with health insurance audit experts, we propose FraudAuditor, a three-stage visual analytics approach to collusive fraud detection in health insurance. Specifically, we first allow users to interactively construct a co-visit network to holistically model the visit relationships of different patients. Second, an improved community detection algorithm that considers the strength of fraud likelihood is designed to detect suspicious fraudulent groups. Finally, through our visual interface, users can compare, investigate, and verify suspicious patient behavior with tailored visualizations that support different time scales. We conducted case studies in a real-world healthcare scenario, i.e., to help locate the actual fraud group and exclude the false positive group. The results and expert feedback proved the effectiveness and usability of the approach. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: 12 pages, 7 figures

Journal ref: IEEE Transactions on Visualization and Computer Graphics, 2023

arXiv:2212.14254 [pdf]

The Markovian and Memoryless Properties of Visual System: Evidence from Serial Face Processing

Authors: Jun-Ming Yu, Haojiang Ying

Abstract: The visual system can be viewed and studied as an information processing system. If so, then the visual system should follow specific fundamental properties: either a memory or a memoryless system. Previous studies in serial dependence in vision found that the perception of the current stimulus is positively determined by the previous one. However, we are not entirely sure whether this phenomenon… ▽ More The visual system can be viewed and studied as an information processing system. If so, then the visual system should follow specific fundamental properties: either a memory or a memoryless system. Previous studies in serial dependence in vision found that the perception of the current stimulus is positively determined by the previous one. However, we are not entirely sure whether this phenomenon is a Markov processing. In this study, participants were asked to rate the social characteristics (attractiveness, trustworthiness, and dominance) of a face, either followed by the same characteristic (the one-trait condition) or another one (the two-trait condition) in randomized orders. By doing so, we can directly test the contribution of the previous input and output to the current output and thus study the properties of the system. Using Derivative of Gaussian, Markov Chain and Linear Mixed effect modeling, convergent results suggested that the serial dependence was absent and the memoryless and Markovian properties were violated in the two-trait condition when testing both attractiveness and dominance, but not in the other conditions. Thus, different facets of (presumably) the same computational task may follow asymmetrical system properties. The study also develops serial dependence as an effective technique to reveal the relationships between different computation tasks. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.05794 [pdf, other]

CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction

Authors: **hong Wang, **gwen Wang, Tingting Chen, Wenhao Zheng, Zhe Xu, Xingdi Wu, Wen Xu, Haochao Ying, Danny Chen, Jian Wu

Abstract: Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for med… ▽ More Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for medical experts. Deep learning methods for this problem were developed in recent years. Although effective, these methods still face several issues, such as not efficiently exploring potential relations between multi-view OCT images, neglecting the key role of clinical prior knowledge (e.g., preoperative VA value), and using only regression-based metrics which are lacking reference. In this paper, we propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction by analyzing both the multi-view OCT images and preoperative VA. To effectively fuse multi-view features of OCT images, we develop cross-token attention that could restrict redundant/unnecessary attention flow. Further, we utilize the preoperative VA value to provide more information for postoperative VA prediction and facilitate fusion between views. Moreover, we design an auxiliary classification loss to improve model performance and assess VA recovery more sufficiently, avoiding the limitation by only using the regression metrics. To evaluate CTT-Net, we build a multi-view OCT image dataset collected from our collaborative hospital. A set of extensive experiments validate the effectiveness of our model compared to existing methods in various metrics. Code is available at: https://github.com/wjh892521292/Cataract OCT. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: 5 pages, 3 figures, accepted for publication in BIBM

arXiv:2211.06614 [pdf, other]

Robust Training of Graph Neural Networks via Noise Governance

Authors: Siyi Qian, Haochao Ying, Renjun Hu, **gbo Zhou, **tai Chen, Danny Z. Chen, Jian Wu

Abstract: Graph Neural Networks (GNNs) have become widely-used models for semi-supervised learning. However, the robustness of GNNs in the presence of label noise remains a largely under-explored problem. In this paper, we consider an important yet challenging scenario where labels on nodes of graphs are not only noisy but also scarce. In this scenario, the performance of GNNs is prone to degrade due to lab… ▽ More Graph Neural Networks (GNNs) have become widely-used models for semi-supervised learning. However, the robustness of GNNs in the presence of label noise remains a largely under-explored problem. In this paper, we consider an important yet challenging scenario where labels on nodes of graphs are not only noisy but also scarce. In this scenario, the performance of GNNs is prone to degrade due to label noise propagation and insufficient learning. To address these issues, we propose a novel RTGNN (Robust Training of Graph Neural Networks via Noise Governance) framework that achieves better robustness by learning to explicitly govern label noise. More specifically, we introduce self-reinforcement and consistency regularization as supplemental supervision. The self-reinforcement supervision is inspired by the memorization effects of deep neural networks and aims to correct noisy labels. Further, the consistency regularization prevents GNNs from overfitting to noisy labels via mimicry loss in both the inter-view and intra-view perspectives. To leverage such supervisions, we divide labels into clean and noisy types, rectify inaccurate labels, and further generate pseudo-labels on unlabeled nodes. Supervision for nodes with different types of labels is then chosen adaptively. This enables sufficient learning from clean labels while limiting the impact of noisy ones. We conduct extensive experiments to evaluate the effectiveness of our RTGNN framework, and the results validate its consistent superior performance over state-of-the-art methods with two types of label noises and various noise rates. △ Less

Submitted 25 February, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: 9 pages, accepted to WSDM 2023 Research Track

arXiv:2208.13418 [pdf, other]

doi 10.1109/TVCG.2022.3209391

DPVisCreator: Incorporating Pattern Constraints to Privacy-preserving Visualizations via Differential Privacy

Authors: Jiehui Zhou, Xumeng Wang, Jason K. Wong, Huanliang Wang, Zhongwei Wang, Xiaoyu Yang, Xiaoran Yan, Haozhe Feng, Huamin Qu, Haochao Ying, Wei Chen

Abstract: Data privacy is an essential issue in publishing data visualizations. However, it is challenging to represent multiple data patterns in privacy-preserving visualizations. The prior approaches target specific chart types or perform an anonymization model uniformly without considering the importance of data patterns in visualizations. In this paper, we propose a visual analytics approach that facili… ▽ More Data privacy is an essential issue in publishing data visualizations. However, it is challenging to represent multiple data patterns in privacy-preserving visualizations. The prior approaches target specific chart types or perform an anonymization model uniformly without considering the importance of data patterns in visualizations. In this paper, we propose a visual analytics approach that facilitates data custodians to generate multiple private charts while maintaining user-preferred patterns. To this end, we introduce pattern constraints to model users' preferences over data patterns in the dataset and incorporate them into the proposed Bayesian network-based Differential Privacy (DP) model PriVis. A prototype system, DPVisCreator, is developed to assist data custodians in implementing our approach. The effectiveness of our approach is demonstrated with quantitative evaluation of pattern utility under the different levels of privacy protection, case studies, and semi-structured expert interviews. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: 9 pages, 5 figures

Journal ref: IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 1, pp. 809-819, Jan. 2023

arXiv:2207.10670 [pdf, other]

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

Authors: **tai Chen, Kuanlun Liao, Kun Wei, Haochao Ying, Danny Z. Chen, Jian Wu

Abstract: Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases. Many studies have devised ECG analysis models (e.g., classifiers) to assist diagnosis. As an upstream task, researches have built generative models to synthesize ECG data, which are beneficial to providing training samples, privacy protection, and annotation reduction. However, previous generative methods for… ▽ More Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases. Many studies have devised ECG analysis models (e.g., classifiers) to assist diagnosis. As an upstream task, researches have built generative models to synthesize ECG data, which are beneficial to providing training samples, privacy protection, and annotation reduction. However, previous generative methods for ECG often neither synthesized multi-view data, nor dealt with heart disease conditions. In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. Since ECG manifestations of heart diseases are often localized in specific waveforms, we propose a new "mixup normalization" to inject disease information precisely into suitable locations. In addition, we propose a view discriminator to revert disordered ECG views into a pre-determined order, supervising the generator to obtain ECG representing correct view characteristics. Besides, a new metric, rFID, is presented to assess the quality of the synthesized ECG signals. Comprehensive experiments verify that our ME-GAN performs well on multi-view ECG signal synthesis with trusty morbid manifestations. △ Less

Submitted 29 May, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Journal ref: In International Conference on Machine Learning, 3360--3370, (2022), PMLR

arXiv:2207.08962 [pdf, ps, other]

$p$-numerical semigroups with $p$-symmetric properties

Authors: Takao Komatsu, Haotian Ying

Abstract: The so-called Frobenius number in the famous linear Diophantine problem of Frobenius is the largest integer such that the linear equation $a_1 x_1+\cdots+a_k x_k=n$ ($a_1,\dots,a_k$ are given positive integers with $\gcd(a_1,\dots,a_k)=1$) does not have a non-negative integer solution $(x_1,\dots,x_k)$. The generalized Frobenius number (called the $p$-Frobenius number) is the largest integer such… ▽ More The so-called Frobenius number in the famous linear Diophantine problem of Frobenius is the largest integer such that the linear equation $a_1 x_1+\cdots+a_k x_k=n$ ($a_1,\dots,a_k$ are given positive integers with $\gcd(a_1,\dots,a_k)=1$) does not have a non-negative integer solution $(x_1,\dots,x_k)$. The generalized Frobenius number (called the $p$-Frobenius number) is the largest integer such that this linear equation has at most $p$ solutions. That is, when $p=0$, the $0$-Frobenius number is the original Frobenius number. In this paper, we introduce and discuss $p$-numerical semigroups by develo** a generalization of the theory of numerical semigroups based on this flow of the number of representations. That is, for a certain non-negative integer $p$, $p$-gaps, $p$-symmetric semigroups, $p$-pseudo-symmetric semigroups, and the like are defined, and their properties are obtained. When $p=0$, they correspond to the original gaps, symmetric semigroups, and pseudo-symmetric semigroups, respectively. △ Less

Submitted 18 June, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: Journal of Algebra and its Applications (2024)

MSC Class: 20M14; 11D07; 20M05; 05A15; 11B25

arXiv:2206.13052 [pdf, ps, other]

The $p$-numerical semigroup of the triple of arithmetic progressions

Authors: Takao Komatsu, Haotian Ying

Abstract: For given positive integers $a_1,a_2,\dots,a_k$ with $\gcd(a_1,a_2,\dots,a_k)=1$, the denumerant $d(n)=d(n;a_1,a_2,\dots,a_k)$ is the number of nonnegative solutions $(x_1,x_2,\dots,x_k)$ of the linear equation $a_1 x_1+a_2 x_2+\dots+a_k x_k=n$ for a positive integer $n$. For a given nonnegative integer $p$, let $S_p=S_p(a_1,a_2,\dots,a_k)$ be the set of all nonnegative integers $n$'s such that… ▽ More For given positive integers $a_1,a_2,\dots,a_k$ with $\gcd(a_1,a_2,\dots,a_k)=1$, the denumerant $d(n)=d(n;a_1,a_2,\dots,a_k)$ is the number of nonnegative solutions $(x_1,x_2,\dots,x_k)$ of the linear equation $a_1 x_1+a_2 x_2+\dots+a_k x_k=n$ for a positive integer $n$. For a given nonnegative integer $p$, let $S_p=S_p(a_1,a_2,\dots,a_k)$ be the set of all nonnegative integers $n$'s such that $d(n)>p$. In this paper, we are interested in the $p$-Frobenius number, which is the maximum of the set of gaps $\mathbb N_0\backslash S_p$. Here $\mathbb N_0$ denotes the set of nonnegative integers. When $p=0$, $S=S_0$ is the original numerical semigroup, and the $0$-Frobenius number is the original Frobenius number. The explicit formula for two variables is known not only for $p=0$ but also for $p>0$, but when there are three or more variables, it is difficult even in the special case of $p=0$. For $p>0$, it is not only more difficult, but no explicit formula had been found. In this paper, explicit formulas of the $p$-Frobenius number and related values are given for the triple of arithmetic progressions. The main tool is to determine the elements of the $p$-Apéry set. △ Less

Submitted 27 June, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Symmetry Vol.15 (2023)

arXiv:2206.08281 [pdf, ps, other]

doi 10.1364/OL.466325

Strain modulation of photocurrent in Weyl semimetal TaIrTe4

Authors: Ying Ding, XinRu Wang, LieHong Liao, XinYu Chen, JiaYan Zhang, YueYue Wang, Hao Ying, Yuan Li

Abstract: We study the effect of the strain on the energy bands of TaIrTe4 sheet and the photocurrent in the Cu-TaIrTe4-Cu heterojunction by using the quantum transport simulations. It is found that the Weyl points can be completely broken with increasing of the strain along z dirction. One can obtain a large photocurrent in the Cu-TaIrTe4-Cu heterojunction in the absence of the strain. While the photocurre… ▽ More We study the effect of the strain on the energy bands of TaIrTe4 sheet and the photocurrent in the Cu-TaIrTe4-Cu heterojunction by using the quantum transport simulations. It is found that the Weyl points can be completely broken with increasing of the strain along z dirction. One can obtain a large photocurrent in the Cu-TaIrTe4-Cu heterojunction in the absence of the strain. While the photocurrent can be sharply enhanced by the strain and reach a large value. Accordingly, the maximum values of the photocurrent can be explained in terms of the transitions between peaks of density of states and band structures. The strain-induced energy bands and photocurrent exhibit anisotropic behaviors. Our results provide a novel route to effectively modulate the energy bands and the photocurrent by utilizing mechanical methods for TaIrTe4-based devices. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: 5 pages, 7 figures

arXiv:2206.05660 [pdf, ps, other]

The $p$-Frobenius and $p$-Sylvester numbers for Fibonacci and Lucas triplets

Authors: Takao Komatsu, Haotian Ying

Abstract: In this paper we study a certain kind of generalized linear Diophantine problem of Frobenius. Let $a_1,a_2,\dots,a_l$ be positive integers such that their greatest common divisor is one. For a nonnegative integer $p$, denote the $p$-Frobenius number by $g_p(a_1,a_2,\dots,a_l)$, which is the largest integer that can be represented at most $p$ ways by a linear combination with nonnegative integer co… ▽ More In this paper we study a certain kind of generalized linear Diophantine problem of Frobenius. Let $a_1,a_2,\dots,a_l$ be positive integers such that their greatest common divisor is one. For a nonnegative integer $p$, denote the $p$-Frobenius number by $g_p(a_1,a_2,\dots,a_l)$, which is the largest integer that can be represented at most $p$ ways by a linear combination with nonnegative integer coefficients of $a_1,a_2,\dots,a_l$. When $p=0$, $0$-Frobenius number is the classical Frobenius number. When $l=2$, $p$-Frobenius number is explicitly given. However, when $l=3$ and even larger, even in special cases, it is not easy to give the Frobenius number explicitly, and it is even more difficult when $p>0$, and no specific example has been known. However, very recently, we have succeeded in giving explicit formulas for the case where the sequence is of triangular numbers or of repunits for the case where $l=3$. In this paper, we show the explicit formula for the Fibonacci triple when $p>0$. In addition, we give an explicit formula for the $p$-Sylvester number, that is, the total number of nonnegative integers that can be represented in at most $p$ ways. Furthermore, explicit formulas are shown concerning the Lucas triple. △ Less

Submitted 3 December, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: Mathematical Biosciences and Engineering

MSC Class: 11D07; 05A15; 05A17; 05A19; 11B68; 11D04; 11P81; 20M14

arXiv:2204.00194 [pdf]

Direct synthesis of single-crystal bilayer graphene on various dielectric substrates

Authors: ** Chen, Xianqin Xing, Wenyu Liu, Zhanjie Lu, Hao Ying, Le Huang, Zhiyong Zhang, Shunqing Wu, Zhihai Cheng, Shanshan Chen

Abstract: In this work, a novel method to grow high-quality and large bilayer graphene (BLG) directly on various dielectric substrates was demonstrated. Large area single-crystal monolayer graphene was applied as a seeding layer to facilitate the homo-epitaxial synthesis of single crystal BLG directly on insulating substrates. The Cu nano-powders (Cu NP) with nanostructure and high surface-area were used as… ▽ More In this work, a novel method to grow high-quality and large bilayer graphene (BLG) directly on various dielectric substrates was demonstrated. Large area single-crystal monolayer graphene was applied as a seeding layer to facilitate the homo-epitaxial synthesis of single crystal BLG directly on insulating substrates. The Cu nano-powders (Cu NP) with nanostructure and high surface-area were used as the remote catalysis to provide long-lasting catalytic activity during the graphene growth. The TEM results confirmed the single-crystalline nature of the BLG domains, which validates the superiority of the homo-epitaxial growth technique. The as-grown BLG show comparable quality with the CVD-grown BLG on metal surface. Field-effect transistors directly fabricated on the as-grown BLG/SiO$ _2 $/Si showed a room temperature carrier mobility as high as 2297 cm $ ^2 $ V$ ^{-1}$ s $^{-1} $. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2203.09975 [pdf]

BIOS: An Algorithmically Generated Biomedical Knowledge Graph

Authors: Sheng Yu, Zheng Yuan, Jun Xia, Shengxuan Luo, Huaiyuan Ying, Sihang Zeng, **gyi Ren, Hongyi Yuan, Zhengyun Zhao, Yucong Lin, Keming Lu, **g Wang, Yutao Xie, Heung-Yeung Shum

Abstract: Biomedical knowledge graphs (BioMedKGs) are essential infrastructures for biomedical and healthcare big data and artificial intelligence (AI), facilitating natural language processing, model development, and data exchange. For decades, these knowledge graphs have been developed via expert curation; however, this method can no longer keep up with today's AI development, and a transition to algorith… ▽ More Biomedical knowledge graphs (BioMedKGs) are essential infrastructures for biomedical and healthcare big data and artificial intelligence (AI), facilitating natural language processing, model development, and data exchange. For decades, these knowledge graphs have been developed via expert curation; however, this method can no longer keep up with today's AI development, and a transition to algorithmically generated BioMedKGs is necessary. In this work, we introduce the Biomedical Informatics Ontology System (BIOS), the first large-scale publicly available BioMedKG generated completely by machine learning algorithms. BIOS currently contains 4.1 million concepts, 7.4 million terms in two languages, and 7.3 million relation triplets. We present the methodology for develo** BIOS, including the curation of raw biomedical terms, computational identification of synonymous terms and aggregation of these terms to create concept nodes, semantic type classification of the concepts, relation identification, and biomedical machine translation. We provide statistics on the current BIOS content and perform preliminary assessments of term quality, synonym grou**, and relation extraction. The results suggest that machine learning-based BioMedKG development is a viable alternative to traditional expert curation. △ Less

Submitted 24 April, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

arXiv:2108.07975 [pdf, other]

Unsupervised Image Generation with Infinite Generative Adversarial Networks

Authors: Hui Ying, He Wang, Tianjia Shao, Yin Yang, Kun Zhou

Abstract: Image generation has been heavily investigated in computer vision, where one core research challenge is to generate images from arbitrarily complex distributions with little supervision. Generative Adversarial Networks (GANs) as an implicit approach have achieved great successes in this direction and therefore been employed widely. However, GANs are known to suffer from issues such as mode collaps… ▽ More Image generation has been heavily investigated in computer vision, where one core research challenge is to generate images from arbitrarily complex distributions with little supervision. Generative Adversarial Networks (GANs) as an implicit approach have achieved great successes in this direction and therefore been employed widely. However, GANs are known to suffer from issues such as mode collapse, non-structured latent space, being unable to compute likelihoods, etc. In this paper, we propose a new unsupervised non-parametric method named mixture of infinite conditional GANs or MIC-GANs, to tackle several GAN issues together, aiming for image generation with parsimonious prior knowledge. Through comprehensive evaluations across different datasets, we show that MIC-GANs are effective in structuring the latent space and avoiding mode collapse, and outperform state-of-the-art methods. MICGANs are adaptive, versatile, and robust. They offer a promising solution to several well-known GAN issues. Code available: github.com/yinghdb/MICGANs. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: 18 pages, 11 figures

arXiv:2104.08588 [pdf]

Sentence Alignment with Parallel Documents Facilitates Biomedical Machine Translation

Authors: Shengxuan Luo, Huaiyuan Ying, Jiao Li, Sheng Yu

Abstract: Objective: Today's neural machine translation (NMT) can achieve near human-level translation quality and greatly facilitates international communications, but the lack of parallel corpora poses a key problem to the development of translation systems for highly specialized domains, such as biomedicine. This work presents an unsupervised algorithm for deriving parallel corpora from document-level tr… ▽ More Objective: Today's neural machine translation (NMT) can achieve near human-level translation quality and greatly facilitates international communications, but the lack of parallel corpora poses a key problem to the development of translation systems for highly specialized domains, such as biomedicine. This work presents an unsupervised algorithm for deriving parallel corpora from document-level translations by using sentence alignment and explores how training materials affect the performance of biomedical NMT systems. Materials and Methods: Document-level translations are mixed to train bilingual word embeddings (BWEs) for the evaluation of cross-lingual word similarity, and sentence distance is defined by combining semantic and positional similarities of the sentences. The alignment of sentences is formulated as an extended earth mover's distance problem. A Chinese-English biomedical parallel corpus is derived with the proposed algorithm using bilingual articles from UpToDate and translations of PubMed abstracts, which is then used for the training and evaluation of NMT. Results: On two manually aligned translation datasets, the proposed algorithm achieved accurate sentence alignment in the 1-to-1 cases and outperformed competing algorithms in the many-to-many cases. The NMT model fine-tuned on biomedical data significantly improved the in-domain translation quality (zh-en: +17.72 BLEU; en-zh: +17.02 BLEU). Both the size of the training data and the combination of different corpora can significantly affect the model's performance. Conclusion: The proposed algorithm relaxes the assumption for sentence alignment and effectively generates accurate translation pairs that facilitate training high quality biomedical NMT models. △ Less

Submitted 7 February, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

Comments: 16 pages, 5 figures

arXiv:2103.11083 [pdf, other]

doi 10.1109/JIOT.2021.3063497

Compacting Deep Neural Networks for Internet of Things: Methods and Applications

Authors: Ke Zhang, Hanbo Ying, Hong-Ning Dai, Lin Li, Yuangyuang Peng, Keyi Guo, Hongfang Yu

Abstract: Deep Neural Networks (DNNs) have shown great success in completing complex tasks. However, DNNs inevitably bring high computational cost and storage consumption due to the complexity of hierarchical structures, thereby hindering their wide deployment in Internet-of-Things (IoT) devices, which have limited computational capability and storage capacity. Therefore, it is a necessity to investigate th… ▽ More Deep Neural Networks (DNNs) have shown great success in completing complex tasks. However, DNNs inevitably bring high computational cost and storage consumption due to the complexity of hierarchical structures, thereby hindering their wide deployment in Internet-of-Things (IoT) devices, which have limited computational capability and storage capacity. Therefore, it is a necessity to investigate the technologies to compact DNNs. Despite tremendous advances in compacting DNNs, few surveys summarize compacting-DNNs technologies, especially for IoT applications. Hence, this paper presents a comprehensive study on compacting-DNNs technologies. We categorize compacting-DNNs technologies into three major types: 1) network model compression, 2) Knowledge Distillation (KD), 3) modification of network structures. We also elaborate on the diversity of these approaches and make side-by-side comparisons. Moreover, we discuss the applications of compacted DNNs in various IoT applications and outline future directions. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: 25 pages, 11 figures

MSC Class: 68T07 ACM Class: I.2.6; C.2

Journal ref: IEEE Internet of Things Journal, 2021

arXiv:2102.05281 [pdf, other]

doi 10.1145/3490238

Biomedical Question Answering: A Survey of Approaches and Challenges

Authors: Qiao **, Zheng Yuan, Guangzhi Xiong, Qianlan Yu, Huaiyuan Ying, Chuanqi Tan, Mosha Chen, Songfang Huang, Xiaozhong Liu, Sheng Yu

Abstract: Automatic Question Answering (QA) has been successfully applied in various domains such as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables innovative applications to effectively perceive, access and understand complex biomedical knowledge. There have been tremendous developments of BQA in the past two decades, which we classify into 5 distinctive approaches: class… ▽ More Automatic Question Answering (QA) has been successfully applied in various domains such as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables innovative applications to effectively perceive, access and understand complex biomedical knowledge. There have been tremendous developments of BQA in the past two decades, which we classify into 5 distinctive approaches: classic, information retrieval, machine reading comprehension, knowledge base and question entailment approaches. In this survey, we introduce available datasets and representative methods of each BQA approach in detail. Despite the developments, BQA systems are still immature and rarely used in real-life settings. We identify and characterize several key challenges in BQA that might lead to this issue, and discuss some potential future directions to explore. △ Less

Submitted 8 September, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: In submission to ACM Computing Surveys

arXiv:2101.02969 [pdf, other]

doi 10.1145/3397271.3401090

Spatial Object Recommendation with Hints: When Spatial Granularity Matters

Authors: Hui Luo, **gbo Zhou, Zhifeng Bao, Shuangli Li, J. Shane Culpepper, Haochao Ying, Hao Liu, Hui Xiong

Abstract: Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at… ▽ More Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different stages of data exploration. In this paper, we study how to support top-k spatial object recommendations at varying levels of spatial granularity, enabling spatial objects at varying granularity, such as a city, suburb, or building, as a Point of Interest (POI). To solve this problem, we propose the use of a POI tree, which captures spatial containment relationships between POIs. We design a novel multi-task learning model called MPR (short for Multi-level POI Recommendation), where each task aims to return the top-k POIs at a certain spatial granularity level. Each task consists of two subtasks: (i) attribute-based representation learning; (ii) interaction-based representation learning. The first subtask learns the feature representations for both users and POIs, capturing attributes directly from their profiles. The second subtask incorporates user-POI interactions into the model. Additionally, MPR can provide insights into why certain recommendations are being made to a user based on three types of hints: user-aspect, POI-aspect, and interaction-aspect. We empirically validate our approach using two real-life datasets, and show promising performance improvements over several state-of-the-art methods. △ Less

Submitted 8 January, 2021; originally announced January 2021.

Journal ref: SIGIR Conference (2020) 781-790

arXiv:2003.04873 [pdf, other]

Moving Target Monte Carlo

Authors: Haoyun Ying, Keheng Mao, Klaus Mosegaard

Abstract: The Markov Chain Monte Carlo (MCMC) methods are popular when considering sampling from a high-dimensional random variable $\mathbf{x}$ with possibly unnormalised probability density $p$ and observed data $\mathbf{d}$. However, MCMC requires evaluating the posterior distribution $p(\mathbf{x}|\mathbf{d})$ of the proposed candidate $\mathbf{x}$ at each iteration when constructing the acceptance rate… ▽ More The Markov Chain Monte Carlo (MCMC) methods are popular when considering sampling from a high-dimensional random variable $\mathbf{x}$ with possibly unnormalised probability density $p$ and observed data $\mathbf{d}$. However, MCMC requires evaluating the posterior distribution $p(\mathbf{x}|\mathbf{d})$ of the proposed candidate $\mathbf{x}$ at each iteration when constructing the acceptance rate. This is costly when such evaluations are intractable. In this paper, we introduce a new non-Markovian sampling algorithm called Moving Target Monte Carlo (MTMC). The acceptance rate at $n$-th iteration is constructed using an iteratively updated approximation of the posterior distribution $a_n(\mathbf{x})$ instead of $p(\mathbf{x}|\mathbf{d})$. The true value of the posterior $p(\mathbf{x}|\mathbf{d})$ is only calculated if the candidate $\mathbf{x}$ is accepted. The approximation $a_n$ utilises these evaluations and converges to $p$ as $n \rightarrow \infty$. A proof of convergence and estimation of convergence rate in different situations are given. △ Less

Submitted 10 March, 2020; originally announced March 2020.

arXiv:1912.01954 [pdf, other]

EmbedMask: Embedding Coupling for One-stage Instance Segmentation

Authors: Hui Ying, Zhao** Huang, Shu Liu, Tianjia Shao, Kun Zhou

Abstract: Current instance segmentation methods can be categorized into segmentation-based methods that segment first then do clustering, and proposal-based methods that detect first then predict masks for each instance proposal using repooling. In this work, we propose a one-stage method, named EmbedMask, that unifies both methods by taking advantages of them. Like proposal-based methods, EmbedMask builds… ▽ More Current instance segmentation methods can be categorized into segmentation-based methods that segment first then do clustering, and proposal-based methods that detect first then predict masks for each instance proposal using repooling. In this work, we propose a one-stage method, named EmbedMask, that unifies both methods by taking advantages of them. Like proposal-based methods, EmbedMask builds on top of detection models making it strong in detection capability. Meanwhile, EmbedMask applies extra embedding modules to generate embeddings for pixels and proposals, where pixel embeddings are guided by proposal embeddings if they belong to the same instance. Through this embedding coupling process, pixels are assigned to the mask of the proposal if their embeddings are similar. The pixel-level clustering enables EmbedMask to generate high-resolution masks without missing details from repooling, and the existence of proposal embedding simplifies and strengthens the clustering procedure to achieve high speed with higher performance than segmentation-based methods. Without any bells and whistles, EmbedMask achieves comparable performance as Mask R-CNN, which is the representative two-stage method, and can produce more detailed masks at a higher speed. Code is available at github.com/yinghdb/EmbedMask. △ Less

Submitted 5 December, 2019; v1 submitted 4 December, 2019; originally announced December 2019.

Comments: Code is available at github.com/yinghdb/EmbedMask

arXiv:1811.07234 [pdf, other]

Improving Automatic Source Code Summarization via Deep Reinforcement Learning

Authors: Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, Philip S. Yu

Abstract: Code summarization provides a high level natural language description of the function performed by code, as it can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major d… ▽ More Code summarization provides a high level natural language description of the function performed by code, as it can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization, b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given. However, it is expected to generate the entire sequence from scratch at test time. This discrepancy can cause an \textit{exposure bias} issue, making the learnt decoder suboptimal. In this paper, we incorporate an abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network). The actor network provides the confidence of predicting the next word according to current state. On the other hand, the critic network evaluates the reward value of all possible extensions of the current state and can provide global guidance for explorations. We employ an advantage reward composed of BLEU metric to train both networks. Comprehensive experiments on a real-world dataset show the effectiveness of our proposed model when compared with some state-of-the-art methods. △ Less

Submitted 17 November, 2018; originally announced November 2018.

arXiv:1809.07672 [pdf, other]

Probing the Magnetodynamics of Magnetic Tunnel Junctions with the Aid of SiGe HBTs

Authors: Jason Dark, Hanbin Ying, Grant Nunn, John D. Cressler, Dragomir Davidovic

Abstract: High impedance (about 1 Megaohm) magnetic tunnel junctions (MTJs) are used to observe and record the magnetodynamics of the nanomagnets that form the junctions themselves. To counteract the bandwidth limitations caused by the high impedance of the junction and the parasitic capacitance intrinsic to any cryogenic system, silicon-germanium heterojunction bipolar transistors (SiGe HBTs) are used as c… ▽ More High impedance (about 1 Megaohm) magnetic tunnel junctions (MTJs) are used to observe and record the magnetodynamics of the nanomagnets that form the junctions themselves. To counteract the bandwidth limitations caused by the high impedance of the junction and the parasitic capacitance intrinsic to any cryogenic system, silicon-germanium heterojunction bipolar transistors (SiGe HBTs) are used as cryogenic preamplifiers for the MTJs. The resulting measurement improvements include an increase in bandwidth by a factor of 3.89, an increase in signal-to-noise ratio by a factor of 6.62, and a gain of 7.75 of the TMR signal produced by the MTJ. The limitation to the measurement system was found to be from the external, room temperature electronics. Despite this limitation, these improvements allow for better time-resolved magnetodynamics measurements of the MTJs. These experiments pave the way for future cryogenic, magnetodynamics measurement improvements, and could even be useful in cryogenic memory applications. △ Less

Submitted 20 September, 2018; originally announced September 2018.

Comments: 9 pages, 8 figures

Showing 1–50 of 75 results for author: Ying, H