Search | arXiv e-print repository

doi 10.1145/3637528.3671787

Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Authors: Zhijie Nie, Richong Zhang, Zhangchi Feng, Hailang Huang, Xudong Liu

Abstract: Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on l… ▽ More Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on large-scale data have significantly improved retrieval tasks. However, these methods directly follow the existing pre-training methods in the cross-lingual or cross-modal domain, leading to two problems of inconsistency in CCR: The methods with cross-lingual style suffer from the intra-modal error propagation, resulting in inconsistent recall performance across languages in the whole dataset. The methods with cross-modal style suffer from the inter-modal optimization direction bias, resulting in inconsistent rank across languages within each instance, which cannot be reflected by Recall@K. To solve these problems, we propose a simple but effective 1-to-K contrastive learning method, which treats each language equally and eliminates error propagation and optimization bias. In addition, we propose a new evaluation metric, Mean Rank Variance (MRV), to reflect the rank inconsistency across languages within each instance. Extensive experiments on four CCR datasets show that our method improves both recall rates and MRV with smaller-scale pre-trained data, achieving the new state-of-art. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024 Research Track

arXiv:2406.17378 [pdf, other]

A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens

Authors: Zhijie Nie, Richong Zhang, Zhanyu Wu

Abstract: Text embeddings from large language models (LLMs) have achieved excellent results in tasks such as information retrieval, semantic textual similarity, etc. In this work, we show an interesting finding: when feeding a text into the embedding LLMs, the obtained text embedding will be able to be aligned with the key tokens in the input text. We first fully analyze this phenomenon on eight embedding L… ▽ More Text embeddings from large language models (LLMs) have achieved excellent results in tasks such as information retrieval, semantic textual similarity, etc. In this work, we show an interesting finding: when feeding a text into the embedding LLMs, the obtained text embedding will be able to be aligned with the key tokens in the input text. We first fully analyze this phenomenon on eight embedding LLMs and show that this phenomenon is universal and is not affected by model architecture, training strategy, and embedding method. With a deeper analysis, we then find that the main change in embedding space between the embedding LLMs and their original generative LLMs is in the first principal component. By adjusting the first principal component, we can align text embedding with the key tokens. Finally, we give several examples to demonstrate the vast application potential of this finding: (1) we propose a simple and practical sparse retrieval method based on the aligned tokens, which can achieve 80\% of the dense retrieval effect of the same model while reducing the computation significantly; (2) we show that our findings provide a fresh perspective to help understand fuzzy concepts (e.g., semantic relatedness vs. semantic similarity) and emerging technologies (e.g., instruction-following embedding) in this field. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Work in Progress

arXiv:2406.09841 [pdf, other]

Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

Authors: Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zikun Nie, Hao Zhou, Zaiqing Nie

Abstract: Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular repr… ▽ More Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular representations, due to challenges in explicitly incorporating view information and handling molecular knowledge from heterogeneous sources. To address these issues, we present MV-Mol, a molecular representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs. We utilize text prompts to model view information and design a fusion architecture to extract view-based molecular representations. We develop a two-stage pre-training procedure, exploiting heterogeneous data of varying quality and quantity. Through extensive experiments, we show that MV-Mol provides improved representations that substantially benefit molecular property prediction. Additionally, MV-Mol exhibits state-of-the-art performance in multi-modal comprehension of molecular structures and texts. Code and data are available at https://github.com/PharMolix/OpenBioMed. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures

arXiv:2405.15158 [pdf, other]

ProtFAD: Introducing function-aware domains as implicit modality towards protein function perception

Authors: Mingqing Wang, Zhiwei Nie, Yonghong He, Zhixiang Ren

Abstract: Protein function prediction is currently achieved by encoding its sequence or structure, where the sequence-to-function transcendence and high-quality structural data scarcity lead to obvious performance bottlenecks. Protein domains are "building blocks" of proteins that are functionally independent, and their combinations determine the diverse biological functions. However, most existing studies… ▽ More Protein function prediction is currently achieved by encoding its sequence or structure, where the sequence-to-function transcendence and high-quality structural data scarcity lead to obvious performance bottlenecks. Protein domains are "building blocks" of proteins that are functionally independent, and their combinations determine the diverse biological functions. However, most existing studies have yet to thoroughly explore the intricate functional information contained in the protein domains. To fill this gap, we propose a synergistic integration approach for a function-aware domain representation, and a domain-joint contrastive learning strategy to distinguish different protein functions while aligning the modalities. Specifically, we associate domains with the GO terms as function priors to pre-train domain embeddings. Furthermore, we partition proteins into multiple sub-views based on continuous joint domains for contrastive training under the supervision of a novel triplet InfoNCE loss. Our approach significantly and comprehensively outperforms the state-of-the-art methods on various benchmarks, and clearly differentiates proteins carrying distinct functions compared to the competitor. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 16 pages, 6 figures, 5 tables

arXiv:2405.06708 [pdf, other]

LangCell: Language-Cell Pre-training for Cell Identity Understanding

Authors: Suyuan Zhao, Jiahuan Zhang, Yushuai Wu, Yizhen Luo, Zaiqing Nie

Abstract: Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, has become an important task in bioinformatics. As these semantic aspects are determine… ▽ More Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, has become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce $\textbf{LangCell}$, the first $\textbf{Lang}$uage-$\textbf{Cell}$ pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios. △ Less

Submitted 11 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Accpeted by ICML 2024, code released

arXiv:2404.11317 [pdf, other]

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Authors: Zhangchi Feng, Richong Zhang, Zhijie Nie

Abstract: The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Fur… ▽ More The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation space rapidly. The above two improvements can be effectively stacked and designed to be plug-and-play, easily applied to existing CIR models without changing their original architectures. Extensive experiments and ablation analysis demonstrate that our method effectively scales positives and negatives and achieves state-of-the-art results on both FashionIQ and CIRR datasets. In addition, our methods also perform well in zero-shot composed image retrieval, providing a new CIR solution for the low-resources scenario. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 12 pages, 11 figures

arXiv:2404.09729 [pdf]

Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

Authors: Shuaicong Hu, Yanan Wang, Jian Liu, **gyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG morphology, to comprehensively describe the fusion of amplitude and phase patterns. MEE is computed based on beat-level samples, enabling detailed analysis of each cardiac cycle. Experimental results demonstrate that MEE achieves rapid, accurate, and label-free localization of abnormal ECG arrhythmia regions. Furthermore, MEE provides a method for assessing sample diversity, facilitating compression of imbalanced training sets (via representative sample selection), and outperforms random pruning. Additionally, MEE exhibits the ability to describe areas of poor quality. By discussing, it proves the robustness of MEE value calculation to noise interference and its low computational complexity. Finally, we integrate this method into a clinical interactive interface to provide a more convenient and intuitive user experience. These findings indicate that MEE serves as a valuable clinical descriptor for ECG characterization. The implementation code can be referenced at the following link: https://github.com/fdu-harry/ECG-MEE-metric. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 16 pages, 12 figures

ACM Class: I.5.2

arXiv:2404.04395 [pdf, ps, other]

A Critique of Du's "A Polynomial-Time Algorithm for 3-SAT

Authors: Yumeng He, Matan Kotler-Berkowitz, Harry Liuson, Zeyu Nie

Abstract: In this paper, we examine the claims made by the paper "A polynomial-time algorithm for 3-SAT" by Lizhi Du. The paper claims to provide a polynomial-time algorithm for solving the NP-complete problem 3-SAT. In examining the paper's argument, we find a flaw in one of the main sections of its algorithm. We argue that this flaw causes the paper's algorithm to incorrectly decide that an infinite famil… ▽ More In this paper, we examine the claims made by the paper "A polynomial-time algorithm for 3-SAT" by Lizhi Du. The paper claims to provide a polynomial-time algorithm for solving the NP-complete problem 3-SAT. In examining the paper's argument, we find a flaw in one of the main sections of its algorithm. We argue that this flaw causes the paper's algorithm to incorrectly decide that an infinite family of satisfiable 3-CNF boolean formulas are not satisfiable. Therefore, the paper does not establish that P = NP. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.00717 [pdf, other]

End-to-End Autonomous Driving through V2X Cooperation

Authors: Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, ** Luo, Zaiqing Nie

Abstract: Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pio… ▽ More Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online map**, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. Code is at https://github.com/AIR-THU/UniV2X. △ Less

Submitted 19 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.12995 [pdf, other]

ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

Authors: Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole… ▽ More Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA. △ Less

Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: ICML2024 camera-ready, update some experimental results, add github url, fix some typos

arXiv:2403.10145 [pdf, other]

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Authors: Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

Abstract: The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and bl… ▽ More The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper. △ Less

Submitted 31 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024. 10 pages with 6 figures

ACM Class: I.4.8; I.5.4

arXiv:2403.05261 [pdf, other]

doi 10.1609/aaai.v38i16.29789

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

Authors: Hailang Huang, Zhijie Nie, Ziqiao Wang, Ziyu Shang

Abstract: Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Align… ▽ More Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plug-and-play, meaning it can be easily applied to existing image-text retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24_itr_cusa. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 9 pages, Accepted by AAAI2024

arXiv:2403.03768 [pdf, other]

DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response Evaluation

Authors: Yushuai Wu, Ting Zhang, Hao Zhou, Hainan Wu, Hanwen Sunchu, Lei Hu, Xiaofang Chen, Suyuan Zhao, Gaochao Liu, Chao Sun, Jiahuan Zhang, Yizhen Luo, Peng Liu, Zaiqing Nie, Yushuai Wu

Abstract: The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models b… ▽ More The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models bring a promising solution, existing methodologies are restricted to early stages of drug R&D, such as target and cell-line levels, offering limited improvement to clinical success rates. Herein, we introduce DeepCRE, a pioneering AI model designed to predict CRE effectively in the late stages of drug R&D. DeepCRE outperforms the existing best models by achieving an average performance improvement of 17.7% in patient-level CRE, and a 5-fold increase in indication-level CRE, facilitating more accurate personalized treatment predictions and better pharmaceutical value assessment for indications, respectively. Furthermore, DeepCRE has identified a set of six drug candidates that show significantly greater effectiveness than a comparator set of two approved drugs in 5/8 colorectal cancer organoids. This demonstrates the capability of DeepCRE to systematically uncover a spectrum of drug candidates with enhanced therapeutic effects, highlighting its potential to transform drug R&D. △ Less

Submitted 18 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.18281 [pdf, other]

Towards Better Understanding of Contrastive Sentence Representation Learning: A Unified Paradigm for Gradient

Authors: Mingxin Li, Richong Zhang, Zhijie Nie

Abstract: Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP), where contrastive Self-Supervised Learning (SSL) is currently a mainstream approach. However, the reasons behind its remarkable effectiveness remain unclear. Specifically, many studies have investigated the similarities between contrastive and non-contrastive SSL from a theoretical perspective. Such simi… ▽ More Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP), where contrastive Self-Supervised Learning (SSL) is currently a mainstream approach. However, the reasons behind its remarkable effectiveness remain unclear. Specifically, many studies have investigated the similarities between contrastive and non-contrastive SSL from a theoretical perspective. Such similarities can be verified in classification tasks, where the two approaches achieve comparable performance. But in ranking tasks (i.e., Semantic Textual Similarity (STS) in SRL), contrastive SSL significantly outperforms non-contrastive SSL. Therefore, two questions arise: First, *what commonalities enable various contrastive losses to achieve superior performance in STS?* Second, *how can we make non-contrastive SSL also effective in STS?* To address these questions, we start from the perspective of gradients and discover that four effective contrastive losses can be integrated into a unified paradigm, which depends on three components: the **Gradient Dissipation**, the **Weight**, and the **Ratio**. Then, we conduct an in-depth analysis of the roles these components play in optimization and experimentally demonstrate their significance for model performance. Finally, by adjusting these components, we enable non-contrastive SSL to achieve outstanding performance in STS. △ Less

Submitted 5 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted at ACL 2024 Main Conference

arXiv:2312.06706 [pdf, other]

UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D Images in Unsupervised Reconstruction

Authors: Hongbin Lin, Juangui Xu, Qingfeng Xu, Zhengyu Hu, Handing Xu, Yunzhi Chen, Yongjun Hu, Zhenguo Nie

Abstract: In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introdu… ▽ More In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus resha** the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 17 pages

arXiv:2312.02071 [pdf, ps, other]

Evaluating the Claims of "SAT Requires Exhaustive Search"

Authors: Michael C. Chavrimootoo, Yumeng He, Matan Kotler-Berkowitz, Harry Liuson, Zeyu Nie

Abstract: In this paper, we take a closer look at the claims made by Xu and Zhou in their paper "SAT Requires Exhaustive Search" [XZ23], which claims to provide a lower bound on the complexity of the so-called Model RB. Xu and Zhou conclude that their result implies a separation between P and NP, since the lower bound purportedly proves that the Strong Exponential Time Hypothesis (SETH) is true. In examinin… ▽ More In this paper, we take a closer look at the claims made by Xu and Zhou in their paper "SAT Requires Exhaustive Search" [XZ23], which claims to provide a lower bound on the complexity of the so-called Model RB. Xu and Zhou conclude that their result implies a separation between P and NP, since the lower bound purportedly proves that the Strong Exponential Time Hypothesis (SETH) is true. In examining Xu and Zhou's arguments, we find a flaw in their main theorems. The authors assume that an algorithm for Model RB must have a certain structure that can leverage downward self-reducibility, and argue that such an algorithm cannot run in polynomial time. We argue that this structure is not guaranteed to exist and thus their paper neither proves SETH to be true nor proves P $\neq$ NP. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.01682 [pdf, other]

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Authors: Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, ** Luo, Zaiqing Nie

Abstract: Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, the uncertain temporal asynchrony and limited communication conditions can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we… ▽ More Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, the uncertain temporal asynchrony and limited communication conditions can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework. FFNet is a flow-based feature fusion framework that uses a feature flow prediction module to predict future features and compensate for asynchrony. Instead of transmitting feature maps extracted from still-images, FFNet transmits feature flow, leveraging the temporal coherence of sequential infrastructure frames. Furthermore, we introduce a self-supervised training approach that enables FFNet to generate feature flow with feature prediction ability from raw infrastructure sequences. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while only requiring about 1/100 of the transmission cost of raw data and covers all latency in one model on the DAIR-V2X dataset. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted by NeurIPs2023. arXiv admin note: text overlap with arXiv:2303.10552

arXiv:2311.00371 [pdf, other]

Learning Cooperative Trajectory Representations for Motion Forecasting

Authors: Hongzhi Ruan, Haibao Yu, Wenxian Yang, Siqi Fan, Yingjuan Tang, Zaiqing Nie

Abstract: Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction infor… ▽ More Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction information of traffic participants observed from cooperative devices. In this paper, we first propose the cooperative trajectory representations learning paradigm. Specifically, we present V2X-Graph, the first interpretable and end-to-end learning framework for cooperative motion forecasting. V2X-Graph employs an interpretable graph to fully leverage the cooperative motion and interaction contexts. Experimental results on the vehicle-to-infrastructure (V2I) motion forecasting dataset, V2X-Seq, demonstrate the effectiveness of V2X-Graph. To further evaluate on V2X scenario, we construct the first real-world vehicle-to-everything (V2X) motion forecasting dataset V2X-Traj, and the performance shows the advantage of our method. We hope both V2X-Graph and V2X-Traj can facilitate the further development of cooperative motion forecasting. Find project at https://github.com/AIR-THU/V2X-Graph, find data at https://github.com/AIR-THU/DAIR-V2X-Seq. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.06453 [pdf, other]

Narrowing the Gap between Supervised and Unsupervised Sentence Representation Learning with Large Language Model

Authors: Mingxin Li, Richong Zhang, Zhijie Nie, Yongyi Mao

Abstract: Sentence Representation Learning (SRL) is a fundamental task in Natural Language Processing (NLP), with the Contrastive Learning of Sentence Embeddings (CSE) being the mainstream technique due to its superior performance. An intriguing phenomenon in CSE is the significant performance gap between supervised and unsupervised methods, with their only difference lying in the training data. Previous wo… ▽ More Sentence Representation Learning (SRL) is a fundamental task in Natural Language Processing (NLP), with the Contrastive Learning of Sentence Embeddings (CSE) being the mainstream technique due to its superior performance. An intriguing phenomenon in CSE is the significant performance gap between supervised and unsupervised methods, with their only difference lying in the training data. Previous works attribute this performance gap to differences in two representation properties (alignment and uniformity). However, since alignment and uniformity only measure the results, they fail to answer "What aspects of the training data contribute to the performance gap?" and "How can the performance gap be narrowed?", In this paper, we conduct empirical experiments to answer these "What" and "How" questions. We first answer the "What" question by thoroughly comparing the behavior of supervised and unsupervised CSE during their respective training processes. From the comparison, we identify the similarity pattern as a key factor to the performance gap, and introduce a metric, called Relative Fitting Difficulty (RFD), to measure the complexity of the similarity pattern. Then, based on the insights gained from the "What" question, we tackle the "How" question by increasing the pattern complexity of the training data. We achieve this by leveraging the In-Context Learning (ICL) capability of the Large Language Model (LLM) to generate data that simulates complex patterns. By utilizing the hierarchical patterns in the LLM-generated data, we effectively narrow the gap between supervised and unsupervised CSE. We release our codes and appendix at https://github.com/BDBC-KG-NLP/NGCSE. △ Less

Submitted 19 December, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: Accepted at AAAI24

arXiv:2309.04695 [pdf, other]

Code-Style In-Context Learning for Knowledge-Based Question Answering

Authors: Zhijie Nie, Richong Zhang, Zhongyuan Wang, Xudong Liu

Abstract: Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities in Large Language Models (LLMs) provides a simple and training-free semantic parsing paradigm for KBQA: Given a small number of questions and their lab… ▽ More Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities in Large Language Models (LLMs) provides a simple and training-free semantic parsing paradigm for KBQA: Given a small number of questions and their labeled logical forms as demo examples, LLMs can understand the task intent and generate the logic form for a new question. However, current powerful LLMs have little exposure to logic forms during pre-training, resulting in a high format error rate. To solve this problem, we propose a code-style in-context learning method for KBQA, which converts the generation process of unfamiliar logical form into the more familiar code generation process for LLMs. Experimental results on three mainstream datasets show that our method dramatically mitigated the formatting error problem in generating logic forms while realizing a new SOTA on WebQSP, GrailQA, and GraphQ under the few-shot setting. The code and supplementary files are released at https://github.com/Arthurizijar/KB-Coder . △ Less

Submitted 5 January, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: AAAI2024 Camera Ready

arXiv:2308.09442 [pdf, other]

BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine

Authors: Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, Zaiqing Nie

Abstract: Foundation models (FMs) have exhibited remarkable performance across a wide range of downstream tasks in many domains. Nevertheless, general-purpose FMs often face challenges when confronted with domain-specific problems, due to their limited access to the proprietary training data in a particular domain. In biomedicine, there are various biological modalities, such as molecules, proteins, and cel… ▽ More Foundation models (FMs) have exhibited remarkable performance across a wide range of downstream tasks in many domains. Nevertheless, general-purpose FMs often face challenges when confronted with domain-specific problems, due to their limited access to the proprietary training data in a particular domain. In biomedicine, there are various biological modalities, such as molecules, proteins, and cells, which are encoded by the language of life and exhibit significant modality gaps with human natural language. In this paper, we introduce BioMedGPT, an open multimodal generative pre-trained transformer (GPT) for biomedicine, to bridge the gap between the language of life and human natural language. BioMedGPT allows users to easily ``communicate'' with diverse biological modalities through free text, which is the first of its kind. BioMedGPT aligns different biological modalities with natural language via a large generative language model, namely, BioMedGPT-LM. We publish BioMedGPT-10B, which unifies the feature spaces of molecules, proteins, and natural language via encoding and alignment. Through fine-tuning, BioMedGPT-10B outperforms or is on par with human and significantly larger general-purpose foundation models on the biomedical QA task. It also demonstrates promising performance in the molecule QA and protein QA tasks, which could greatly accelerate the discovery of new drugs and therapeutic targets. In addition, BioMedGPT-LM-7B is the first large generative language model based on Llama2 in the biomedical domain, therefore is commercial friendly. Both BioMedGPT-10B and BioMedGPT-LM-7B are open-sourced to the research community. In addition, we publish the datasets that are meticulously curated for the alignment of multi-modalities, i.e., PubChemQA and UniProtQA. All the models, codes, and datasets are available at \url{https://github.com/PharMolix/OpenBioMed}. △ Less

Submitted 21 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 12 pages, 4 figures

arXiv:2308.09021 [pdf, ps, other]

Simpler Analyses of Union-Find

Authors: Zhiyi Huang, Chris Lambert, Zipei Nie, Richard Peng

Abstract: We analyze union-find using potential functions motivated by continuous algorithms, and give alternate proofs of the $O(\log\log{n})$, $O(\log^{*}n)$, $O(\log^{**}n)$, and $O(α(n))$ amortized cost upper bounds. The proof of the $O(\log\log{n})$ amortized bound goes as follows. Let each node's potential be the square root of its size, i.e., the size of the subtree rooted from it. The overall potent… ▽ More We analyze union-find using potential functions motivated by continuous algorithms, and give alternate proofs of the $O(\log\log{n})$, $O(\log^{*}n)$, $O(\log^{**}n)$, and $O(α(n))$ amortized cost upper bounds. The proof of the $O(\log\log{n})$ amortized bound goes as follows. Let each node's potential be the square root of its size, i.e., the size of the subtree rooted from it. The overall potential increase is $O(n)$ because the node sizes increase geometrically along any tree path. When compressing a path, each node on the path satisfies that either its potential decreases by $Ω(1)$, or its child's size along the path is less than the square root of its size: this can happen at most $O(\log\log{n})$ times along any tree path. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 13 pages, 1 figure

arXiv:2308.01804 [pdf, other]

QUEST: Query Stream for Practical Cooperative Perception

Authors: Siqi Fan, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

Abstract: Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifica… ▽ More Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifically explain the concept, we propose a cooperative perception framework, termed QUEST, which let query stream flow among agents. The cross-agent queries are interacted via fusion for co-aware instances and complementation for individual unaware instances. Taking camera-based vehicle-infrastructure perception as a typical practical application scene, the experimental results on the real-world dataset, DAIR-V2X-Seq, demonstrate the effectiveness of QUEST and further reveal the advantage of the query cooperation paradigm on transmission flexibility and robustness to packet dropout. We hope our work can further facilitate the cross-agent representation interaction for better cooperative perception in practice. △ Less

Submitted 22 May, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: ICRA 2024

arXiv:2307.12213 [pdf, other]

LiveRetro: Visual Analytics for Strategic Retrospect in Livestream E-Commerce

Authors: Yuchen Wu, Yuansong Xu, Shenghan Gao, Xingbo Wang, Wenkai Song, Zhiheng Nie, Xiaomeng Fan, Quan Li

Abstract: Livestream e-commerce integrates live streaming and online shop**, allowing viewers to make purchases while watching. However, effective marketing strategies remain a challenge due to limited empirical research and subjective biases from the absence of quantitative data. Current tools fail to capture the interdependence between live performances and feedback. This study identified computational… ▽ More Livestream e-commerce integrates live streaming and online shop**, allowing viewers to make purchases while watching. However, effective marketing strategies remain a challenge due to limited empirical research and subjective biases from the absence of quantitative data. Current tools fail to capture the interdependence between live performances and feedback. This study identified computational features, formulated design requirements, and developed LiveRetro, an interactive visual analytics system. It enables comprehensive retrospective analysis of livestream e-commerce for streamers, viewers, and merchandise. LiveRetro employs enhanced visualization and time-series forecasting models to align performance features and feedback, identifying influences at channel, merchandise, feature, and segment levels. Through case studies and expert interviews, the system provides deep insights into the relationship between live performance and streaming statistics, enabling efficient strategic analysis from multiple perspectives. △ Less

Submitted 2 August, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE VIS 2023

arXiv:2307.09484 [pdf, other]

MolFM: A Multimodal Molecular Foundation Model

Authors: Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zaiqing Nie

Abstract: Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections be… ▽ More Molecular knowledge resides within three different modalities of information sources: molecular structures, biomedical documents, and knowledge bases. Effective incorporation of molecular knowledge from these modalities holds paramount significance in facilitating biomedical research. However, existing multimodal molecular foundation models exhibit limitations in capturing intricate connections between molecular structures and texts, and more importantly, none of them attempt to leverage a wealth of molecular expertise derived from knowledge graphs. In this study, we introduce MolFM, a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We propose cross-modal attention between atoms of molecular structures, neighbors of molecule entities and semantically related texts to facilitate cross-modal comprehension. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule, as well as molecules sharing similar structures or functions. MolFM achieves state-of-the-art performance on various downstream tasks. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively. Furthermore, qualitative analysis showcases MolFM's implicit ability to provide grounding from molecular substructures and knowledge graphs. Code and models are available on https://github.com/BioFM/OpenBioMed. △ Less

Submitted 21 July, 2023; v1 submitted 6 June, 2023; originally announced July 2023.

Comments: 31 pages, 15 figures, and 15 tables

arXiv:2306.04371 [pdf, other]

Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning

Authors: Suyuan Zhao, Jiahuan Zhang, Zaiqing Nie

Abstract: Single-cell RNA sequencing (scRNA-seq) data is a potent tool for comprehending the "language of life" and can provide insights into various downstream biomedical tasks. Large-scale language models (LLMs) are starting to be used for cell representation learning. However, current LLM-based cell representation learning methods depend solely on the BERT architecture, causing an anisotropic embedding s… ▽ More Single-cell RNA sequencing (scRNA-seq) data is a potent tool for comprehending the "language of life" and can provide insights into various downstream biomedical tasks. Large-scale language models (LLMs) are starting to be used for cell representation learning. However, current LLM-based cell representation learning methods depend solely on the BERT architecture, causing an anisotropic embedding space that leads to inefficient semantic representation. Contrastive learning alleviates this problem by distributing the embeddings uniformly. As a larger batch size in contrastive learning results in better representation, the practical application of contrastive learning in cell representation learning is hampered by the high dimensionality of scRNA-seq data and the large parameter volume of LLMs. To address the batch size limitation, we propose a novel divide-and-conquer contrastive learning approach to decouple the batch size from the GPU memory size for cell representation learning. Based on our divide-and-conquer contrastive learning approach, we introduce Single-Cell Language Model CellLM, a large-scale cell representation learning model to handle high-dimensional scRNA-seq data with tens of thousands of genes. CellLM has over 50 million parameters trained with 2 million scRNA-seq data and makes the first attempt to learn cell language models from both normal cells and cancer cells. CellLM achieves new state-of-the-art (SOTA) results in all evaluated downstream tasks: including a 71.8 F_1-score for cell type annotation (a 3.0% absolute improvement over scBERT), an average F_1-score of 88.9 for single-cell drug sensitivity prediction in a few-shot scenario (an 8.3% absolute improvement), and a 93.4 Pearson's correlation for single-omics cell line drug sensitivity prediction (a 6.2% absolute improvement). △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.05938 [pdf, other]

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting

Authors: Haibao Yu, Wenxian Yang, Hongzhi Ruan, Zhenwei Yang, Yingjuan Tang, Xu Gao, Xin Hao, Yifeng Shi, Yifeng Pan, Ning Sun, Juan Song, Jirui Yuan, ** Luo, Zaiqing Nie

Abstract: Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frame… ▽ More Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frames, trajectories, vector maps, and traffic lights captured from natural scenery. V2X-Seq comprises two parts: the sequential perception dataset, which includes more than 15,000 frames captured from 95 scenarios, and the trajectory forecasting dataset, which contains about 80,000 infrastructure-view scenarios, 80,000 vehicle-view scenarios, and 50,000 cooperative-view scenarios captured from 28 intersections' areas, covering 672 hours of data. Based on V2X-Seq, we introduce three new tasks for vehicle-infrastructure cooperative (VIC) autonomous driving: VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting. We also provide benchmarks for the introduced tasks. Find data, code, and more up-to-date information at \href{https://github.com/AIR-THU/DAIR-V2X-Seq}{https://github.com/AIR-THU/DAIR-V2X-Seq}. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: CVPR2023

arXiv:2305.01523 [pdf, other]

Towards Unified AI Drug Discovery with Multiple Knowledge Modalities

Authors: Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie

Abstract: In recent years, AI models that mine intrinsic patterns from molecular structures and protein sequences have shown promise in accelerating drug discovery. However, these methods partly lag behind real-world pharmaceutical approaches of human experts that additionally grasp structured knowledge from knowledge bases and unstructured knowledge from biomedical literature. To bridge this gap, we propos… ▽ More In recent years, AI models that mine intrinsic patterns from molecular structures and protein sequences have shown promise in accelerating drug discovery. However, these methods partly lag behind real-world pharmaceutical approaches of human experts that additionally grasp structured knowledge from knowledge bases and unstructured knowledge from biomedical literature. To bridge this gap, we propose KEDD, a unified, end-to-end, and multimodal deep learning framework that optimally incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first extracts underlying characteristics from heterogeneous inputs, and then applies multimodal fusion for accurate prediction. To mitigate the problem of missing modalities, we leverage multi-head sparse attention and a modality masking mechanism to extract relevant information robustly. Benefiting from integrated knowledge, our framework achieves a deeper understanding of molecule entities, brings significant improvements over state-of-the-art methods on a wide range of tasks and benchmarks, and reveals its promising potential in assisting real-world drug discovery. △ Less

Submitted 14 October, 2023; v1 submitted 17 April, 2023; originally announced May 2023.

Comments: 10 pages, 6 figures

arXiv:2304.11281 [pdf, other]

Euclidean Capacitated Vehicle Routing in Random Setting: A $1.55$-Approximation Algorithm

Authors: Zipei Nie, Hang Zhou

Abstract: We study the unit-demand capacitated vehicle routing problem in the random setting of the Euclidean plane. The objective is to visit $n$ random terminals in a square using a set of tours of minimum total length, such that each tour visits the depot and at most $k$ terminals. We design an elegant algorithm combining the classical sweep heuristic and Arora's framework for the Euclidean traveling s… ▽ More We study the unit-demand capacitated vehicle routing problem in the random setting of the Euclidean plane. The objective is to visit $n$ random terminals in a square using a set of tours of minimum total length, such that each tour visits the depot and at most $k$ terminals. We design an elegant algorithm combining the classical sweep heuristic and Arora's framework for the Euclidean traveling salesman problem [Journal of the ACM 1998]. We show that our algorithm is a polynomial-time approximation of ratio at most $1.55$ asymptotically almost surely. This improves on previous approximation ratios of $1.995$ due to Bompadre, Dror, and Orlin [Journal of Applied Probability 2007] and $1.915$ due to Mathieu and Zhou [Random Structures and Algorithms 2022]. In addition, we conjecture that, for any $\varepsilon>0$, our algorithm is a $(1+\varepsilon)$-approximation asymptotically almost surely. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 21 pages, 0 figures

arXiv:2304.08502 [pdf, other]

CyFormer: Accurate State-of-Health Prediction of Lithium-Ion Batteries via Cyclic Attention

Authors: Zhiqiang Nie, Jiankun Zhao, Qicheng Li, Yong Qin

Abstract: Predicting the State-of-Health (SoH) of lithium-ion batteries is a fundamental task of battery management systems on electric vehicles. It aims at estimating future SoH based on historical aging data. Most existing deep learning methods rely on filter-based feature extractors (e.g., CNN or Kalman filters) and recurrent time sequence models. Though efficient, they generally ignore cyclic features a… ▽ More Predicting the State-of-Health (SoH) of lithium-ion batteries is a fundamental task of battery management systems on electric vehicles. It aims at estimating future SoH based on historical aging data. Most existing deep learning methods rely on filter-based feature extractors (e.g., CNN or Kalman filters) and recurrent time sequence models. Though efficient, they generally ignore cyclic features and the domain gap between training and testing batteries. To address this problem, we present CyFormer, a transformer-based cyclic time sequence model for SoH prediction. Instead of the conventional CNN-RNN structure, we adopt an encoder-decoder architecture. In the encoder, row-wise and column-wise attention blocks effectively capture intra-cycle and inter-cycle connections and extract cyclic features. In the decoder, the SoH queries cross-attend to these features to form the final predictions. We further utilize a transfer learning strategy to narrow the domain gap between the training and testing set. To be specific, we use fine-tuning to shift the model to a target working condition. Finally, we made our model more efficient by pruning. The experiment shows that our method attains an MAE of 0.75\% with only 10\% data for fine-tuning on a testing battery, surpassing prior methods by a large margin. Effective and robust, our method provides a potential solution for all cyclic time sequence prediction tasks. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2303.10552 [pdf, other]

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction

Authors: Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Jirui Yuan, ** Luo, Zaiqing Nie

Abstract: Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, temporal asynchrony and limited wireless communication in traffic environments can lead to fusion misalignment and impact detection performance. This paper proposes Feature Flow Net (FFNet), a novel cooperative detection framework that uses a feature f… ▽ More Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, temporal asynchrony and limited wireless communication in traffic environments can lead to fusion misalignment and impact detection performance. This paper proposes Feature Flow Net (FFNet), a novel cooperative detection framework that uses a feature flow prediction module to address these issues in vehicle-infrastructure cooperative 3D object detection. Rather than transmitting feature maps extracted from still-images, FFNet transmits feature flow, which leverages the temporal coherence of sequential infrastructure frames to predict future features and compensate for asynchrony. Additionally, we introduce a self-supervised approach to enable FFNet to generate feature flow with feature prediction ability. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while requiring no more than 1/10 transmission cost of raw data on the DAIR-V2X dataset when temporal asynchrony exceeds 200$ms$. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}. △ Less

Submitted 18 March, 2023; originally announced March 2023.

Comments: Under Review

arXiv:2303.05730 [pdf, other]

IC classifier: a classifier for 3D industrial components based on geometric prior using GNN

Authors: Zipeng Lin, Zhenguo Nie

Abstract: In this paper, we propose an approach to address the problem of classifying 3D industrial components by introducing a novel framework named IC-classifier (Industrial Component classifier). Our framework is designed to focus on the object's local and global structures, emphasizing the former by incorporating specific local features for embedding the model. By utilizing graphical neural networks and… ▽ More In this paper, we propose an approach to address the problem of classifying 3D industrial components by introducing a novel framework named IC-classifier (Industrial Component classifier). Our framework is designed to focus on the object's local and global structures, emphasizing the former by incorporating specific local features for embedding the model. By utilizing graphical neural networks and embedding derived from geometric properties, IC-classifier facilitates the exploration of the local structures of the object while using geometric attention for the analysis of global structures. Furthermore, the framework uses point clouds to circumvent the heavy computation workload. The proposed framework's performance is benchmarked against state-of-the-art models, demonstrating its potential to compete in the field. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 15 pages including citations, 3 pages of figures

arXiv:2303.02967 [pdf, other]

Automated Peripancreatic Vessel Segmentation and Labeling Based on Iterative Trunk Growth and Weakly Supervised Mechanism

Authors: Liwen Zou, Zhenghua Cai, Liang Mao, Ziwei Nie, Yudong Qiu, ** Yang

Abstract: Peripancreatic vessel segmentation and anatomical labeling play extremely important roles to assist the early diagnosis, surgery planning and prognosis for patients with pancreatic tumors. However, most current techniques cannot achieve satisfactory segmentation performance for peripancreatic veins and usually make predictions with poor integrity and connectivity. Besides, unsupervised labeling al… ▽ More Peripancreatic vessel segmentation and anatomical labeling play extremely important roles to assist the early diagnosis, surgery planning and prognosis for patients with pancreatic tumors. However, most current techniques cannot achieve satisfactory segmentation performance for peripancreatic veins and usually make predictions with poor integrity and connectivity. Besides, unsupervised labeling algorithms cannot deal with complex anatomical variation while fully supervised methods require a large number of voxel-wise annotations for training, which is very labor-intensive and time-consuming. To address these problems, we propose our Automated Peripancreatic vEssel Segmentation and lAbeling (APESA) framework, to not only highly improve the segmentation performance for peripancreatic veins, but also efficiently identify the peripancreatic artery branches. There are two core modules in our proposed APESA framework: iterative trunk growth module (ITGM) for vein segmentation and weakly supervised labeling mechanism (WSLM) for artery branch identification. Our proposed ITGM is composed of a series of trunk growth modules, each of which chooses the most reliable trunk of a basic vessel prediction by the largest connected constraint, and seeks for the possible growth branches by branch proposal network. Our designed iterative process guides the raw trunk to be more complete and fully connected. Our proposed WSLM consists of an unsupervised rule-based preprocessing for generating pseudo branch annotations, and an anatomical labeling network to learn the branch distribution voxel by voxel. We achieve Dice of 94.01% for vein segmentation on our collected dataset, which boosts the accuracy by nearly 10% compared with the state-of-the-art methods. Additionally, we also achieve Dice of 97.01% on segmentation and competitive performance on anatomical labeling for peripancreatic arteries. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2301.05704 [pdf, ps, other]

On a conjecture of Knuth about forward and back arcs

Authors: Zipei Nie

Abstract: Following Janson's method, we prove a conjecture of Knuth: the numbers of forward and back arcs for the depth-first search (DFS) in a digraph with a geometric outdegree distribution have the same distribution. Following Janson's method, we prove a conjecture of Knuth: the numbers of forward and back arcs for the depth-first search (DFS) in a digraph with a geometric outdegree distribution have the same distribution. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 6 pages, 0 figures

arXiv:2207.06345 [pdf, other]

You Only Align Once: Bidirectional Interaction for Spatial-Temporal Video Super-Resolution

Authors: Mengshun Hu, Kui Jiang, Zhixiang Nie, Zheng Wang

Abstract: Spatial-Temporal Video Super-Resolution (ST-VSR) technology generates high-quality videos with higher resolution and higher frame rates. Existing advanced methods accomplish ST-VSR tasks through the association of Spatial and Temporal video super-resolution (S-VSR and T-VSR). These methods require two alignments and fusions in S-VSR and T-VSR, which is obviously redundant and fails to sufficiently… ▽ More Spatial-Temporal Video Super-Resolution (ST-VSR) technology generates high-quality videos with higher resolution and higher frame rates. Existing advanced methods accomplish ST-VSR tasks through the association of Spatial and Temporal video super-resolution (S-VSR and T-VSR). These methods require two alignments and fusions in S-VSR and T-VSR, which is obviously redundant and fails to sufficiently explore the information flow of consecutive spatial LR frames. Although bidirectional learning (future-to-past and past-to-future) was introduced to cover all input frames, the direct fusion of final predictions fails to sufficiently exploit intrinsic correlations of bidirectional motion learning and spatial information from all frames. We propose an effective yet efficient recurrent network with bidirectional interaction for ST-VSR, where only one alignment and fusion is needed. Specifically, it first performs backward inference from future to past, and then follows forward inference to super-resolve intermediate frames. The backward and forward inferences are assigned to learn structures and details to simplify the learning task with joint optimizations. Furthermore, a Hybrid Fusion Module (HFM) is designed to aggregate and distill information to refine spatial information and reconstruct high-quality video frames. Extensive experiments on two public datasets demonstrate that our method outperforms state-of-the-art methods in efficiency, and reduces calculation cost by about 22%. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: ACMMM 2022

arXiv:2204.05575 [pdf, other]

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

Authors: Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, Zaiqing Nie

Abstract: Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related pr… ▽ More Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: CVPR2022

arXiv:2111.05553 [pdf, other]

Matrix anti-concentration inequalities with applications

Authors: Zipei Nie

Abstract: We provide a polynomial lower bound on the minimum singular value of an $m\times m$ random matrix $M$ with jointly Gaussian entries, under a polynomial bound on the matrix norm and a global small-ball probability bound $$\inf_{x,y\in S^{m-1}}\mathbb{P}\left(\left|x^* M y\right|>m^{-O(1)}\right)\ge \frac{1}{2}.$$ With the additional assumption that $M$ is self-adjoint, the global small-ball probabi… ▽ More We provide a polynomial lower bound on the minimum singular value of an $m\times m$ random matrix $M$ with jointly Gaussian entries, under a polynomial bound on the matrix norm and a global small-ball probability bound $$\inf_{x,y\in S^{m-1}}\mathbb{P}\left(\left|x^* M y\right|>m^{-O(1)}\right)\ge \frac{1}{2}.$$ With the additional assumption that $M$ is self-adjoint, the global small-ball probability bound can be replaced by a weaker version. We establish two matrix anti-concentration inequalities, which lower bound the minimum singular values of the sum of independent positive semidefinite self-adjoint matrices and the linear combination of independent random matrices with independent Gaussian coefficients. Both are under a global small-ball probability assumption. As a major application, we prove a better singular value bound for the Krylov space matrix, which leads to a faster and simpler algorithm for solving sparse linear systems. Our algorithm runs in $\tilde{O}\left(n^{\frac{3ω-4}{ω-1}}\right)=O(n^{2.2716})$ time where $ω<2.37286$ is the matrix multiplication exponent, improving on the previous fastest one in $\tilde{O}\left(n^{\frac{5ω-4}{ω+1}}\right)=O(n^{2.33165})$ time by Peng and Vempala. △ Less

Submitted 2 December, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: 42 pages, 1 figure, more references for better introduction, pseudocode for simplified block Krylov space algorithm added

arXiv:2108.13239 [pdf, ps, other]

Adaptive perturbation adversarial training: based on reinforcement learning

Authors: Zhishen Nie, Ying Lin, Sp Ren, Lan Zhang

Abstract: Adversarial training has become the primary method to defend against adversarial samples. However, it is hard to practically apply due to many shortcomings. One of the shortcomings of adversarial training is that it will reduce the recognition accuracy of normal samples. Adaptive perturbation adversarial training is proposed to alleviate this problem. It uses marginal adversarial samples that are… ▽ More Adversarial training has become the primary method to defend against adversarial samples. However, it is hard to practically apply due to many shortcomings. One of the shortcomings of adversarial training is that it will reduce the recognition accuracy of normal samples. Adaptive perturbation adversarial training is proposed to alleviate this problem. It uses marginal adversarial samples that are close to the decision boundary but does not cross the decision boundary for adversarial training, which improves the accuracy of model recognition while maintaining the robustness of the model. However, searching for marginal adversarial samples brings additional computational costs. This paper proposes a method for finding marginal adversarial samples based on reinforcement learning, and combines it with the latest fast adversarial training technology, which effectively speeds up training process and reduces training costs. △ Less

Submitted 30 August, 2021; originally announced August 2021.

arXiv:2106.04224 [pdf, ps, other]

Improved Online Correlated Selection

Authors: Ruiquan Gao, Zhongtian He, Zhiyi Huang, Zipei Nie, Bijun Yuan, Yan Zhong

Abstract: This paper studies the online correlated selection (OCS) problem. It was introduced by Fahrbach, Huang, Tao, and Zadimoghaddam (2020) to obtain the first edge-weighted online bipartite matching algorithm that breaks the $0.5$ barrier. Suppose that we receive a pair of elements in each round and immediately select one of them. Can we select with negative correlation to be more effective than indepe… ▽ More This paper studies the online correlated selection (OCS) problem. It was introduced by Fahrbach, Huang, Tao, and Zadimoghaddam (2020) to obtain the first edge-weighted online bipartite matching algorithm that breaks the $0.5$ barrier. Suppose that we receive a pair of elements in each round and immediately select one of them. Can we select with negative correlation to be more effective than independent random selections? Our contributions are threefold. For semi-OCS, which considers the probability that an element remains unselected after appearing in $k$ rounds, we give an optimal algorithm that minimizes this probability for all $k$. It leads to $0.536$-competitive unweighted and vertex-weighted online bipartite matching algorithms that randomize over only two options in each round, improving the $0.508$-competitive ratio by Fahrbach et al. (2020). Further, we develop the first multi-way semi-OCS that allows an arbitrary number of elements with arbitrary masses in each round. As an application, it rounds the Balance algorithm in unweighted and vertex-weighted online bipartite matching and is $0.593$-competitive. Finally, we study OCS, which further considers the probability that an element is unselected in an arbitrary subset of rounds. We prove that the optimal "level of negative correlation" is between $0.167$ and $0.25$, improving the previous bounds of $0.109$ and $1$ by Fahrbach et al. (2020). Our OCS gives a $0.519$-competitive edge-weighted online bipartite matching algorithm, improving the previous $0.508$-competitive ratio by Fahrbach et al. (2020). △ Less

Submitted 15 December, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: Compared to the first version, this version adds a discussion on two concurrent works on the same topic, gives a more accurate description of previous results, and improves the presentation based on the feedbacks by anonymous reviewers. The conference version appears in FOCS 2021

arXiv:2105.09451 [pdf, other]

doi 10.1016/j.cviu.2019.04.006

Anabranch Network for Camouflaged Object Segmentation

Authors: Trung-Nghia Le, Tam V. Nguyen, Zhongliang Nie, Minh-Triet Tran, Akihiro Sugimoto

Abstract: Camouflaged objects attempt to conceal their texture into the background and discriminating them from the background is hard even for human beings. The main objective of this paper is to explore the camouflaged object segmentation problem, namely, segmenting the camouflaged object(s) for a given image. This problem has not been well studied in spite of a wide range of potential applications includ… ▽ More Camouflaged objects attempt to conceal their texture into the background and discriminating them from the background is hard even for human beings. The main objective of this paper is to explore the camouflaged object segmentation problem, namely, segmenting the camouflaged object(s) for a given image. This problem has not been well studied in spite of a wide range of potential applications including the preservation of wild animals and the discovery of new species, surveillance systems, search-and-rescue missions in the event of natural disasters such as earthquakes, floods or hurricanes. This paper addresses a new challenging problem of camouflaged object segmentation. To address this problem, we provide a new image dataset of camouflaged objects for benchmarking purposes. In addition, we propose a general end-to-end network, called the Anabranch Network, that leverages both classification and segmentation tasks. Different from existing networks for segmentation, our proposed network possesses the second branch for classification to predict the probability of containing camouflaged object(s) in an image, which is then fused into the main branch for segmentation to boost up the segmentation accuracy. Extensive experiments conducted on the newly built dataset demonstrate the effectiveness of our network using various fully convolutional networks. \url{https://sites.google.com/view/ltnghia/research/camo} △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: Published in CVIU 2019. Project page: https://sites.google.com/view/ltnghia/research/camo

Journal ref: Computer Vision and Image Understanding 184 (2019) 45-56

arXiv:2104.09276 [pdf, other]

SuperMeshing: A New Deep Learning Architecture for Increasing the Mesh Density of Metal Forming Stress Field with Attention Mechanism and Perceptual Features

Authors: Qingfeng Xu, Zhenguo Nie, Handing Xu, Haosu Zhou, Xinjun Liu

Abstract: In stress field analysis, the finite element analysis is a crucial approach, in which the mesh-density has a significant impact on the results. High mesh density usually contributes authentic to simulation results but costs more computing resources, leading to curtailing efficiency during the design process. To eliminate this drawback, we propose a new data-driven mesh-density boost model named Su… ▽ More In stress field analysis, the finite element analysis is a crucial approach, in which the mesh-density has a significant impact on the results. High mesh density usually contributes authentic to simulation results but costs more computing resources, leading to curtailing efficiency during the design process. To eliminate this drawback, we propose a new data-driven mesh-density boost model named SuperMeshingNet that strengthens the advantages of finite element analysis (FEA) with low mesh-density as inputs to the deep learning model, which consisting of Res-UNet architecture, to acquire high-density stress field instantaneously, shortening computing time and cost automatically. Moreover, the attention mechanism and the perceptual features are utilized, enhancing the performance of SuperMeshingNet. Compared to the baseline that applied the linear interpolation method, SuperMeshingNet achieves a prominent reduction in the mean squared error (MSE) and mean absolute error (MAE) on test data, which contains prior unseen cases. Based on the data set of metal forming, the comparable experiments are proceeded to demonstrate the high quality and superior precision of the reconstructed results generated by our model. The well-trained model can successfully show more excellent performance than the baseline and other methods on the multiple scaled mesh-density, including $2\times$, $4\times$, and $8\times$. With the refined result owning broaden scaling of mesh density and high precision, the FEA process can be accelerated with seldom cost on computation resources. We publicly share our work with full detail of implementation at https://github.com/zhenguonie/2021_SuperMeshing_2D_Metal_Forming △ Less

Submitted 12 March, 2021; originally announced April 2021.

Comments: 15 pages, 12 figures

MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary) ACM Class: F.2.2; I.2.7

arXiv:2102.10284 [pdf, other]

Artificial Intelligence Enhanced Rapid and Efficient Diagnosis of Mycoplasma Pneumoniae Pneumonia in Children Patients

Authors: Chenglin Pan, Kuan Yan, Xiao Liu, Yanjie Chen, Yanyan Luo, Xiaoming Li, Zhenguo Nie, Xinjun Liu

Abstract: Artificial intelligence methods have been increasingly turning into a potentially powerful tool in the diagnosis and management of diseases. In this study, we utilized logistic regression (LR), decision tree (DT), gradient boosted decision tree (GBDT), support vector machine (SVM), and multilayer perceptron (MLP) as machine learning models to rapidly diagnose the mycoplasma pneumoniae pneumonia (M… ▽ More Artificial intelligence methods have been increasingly turning into a potentially powerful tool in the diagnosis and management of diseases. In this study, we utilized logistic regression (LR), decision tree (DT), gradient boosted decision tree (GBDT), support vector machine (SVM), and multilayer perceptron (MLP) as machine learning models to rapidly diagnose the mycoplasma pneumoniae pneumonia (MPP) in children patients. The classification task was carried out after applying the preprocessing procedure to the MPP dataset. The most efficient results are obtained by GBDT. It provides the best performance with an accuracy of 93.7%. In contrast to standard raw feature weighting, the feature importance takes the underlying correlation structure of the features into account. The most crucial feature of GBDT is the "pulmonary infiltrates range" with a score of 0.5925, followed by "cough" (0.0953) and "pleural effusion" (0.0492). We publicly share our full implementation with the dataset and trained models at https://github.com/zhenguonie/2021_AI4MPP. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 23 pages

MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary) ACM Class: F.2.2; I.2.7

arXiv:2012.15136 [pdf, other]

Exploring Large Context for Cerebral Aneurysm Segmentation

Authors: Jun Ma, Ziwei Nie

Abstract: Automated segmentation of aneurysms from 3D CT is important for the diagnosis, monitoring, and treatment planning of the cerebral aneurysm disease. This short paper briefly presents the main technique details of the aneurysm segmentation method in the MICCAI 2020 CADA challenge. The main contribution is that we configure the 3D U-Net with a large patch size, which can obtain the large context. Our… ▽ More Automated segmentation of aneurysms from 3D CT is important for the diagnosis, monitoring, and treatment planning of the cerebral aneurysm disease. This short paper briefly presents the main technique details of the aneurysm segmentation method in the MICCAI 2020 CADA challenge. The main contribution is that we configure the 3D U-Net with a large patch size, which can obtain the large context. Our method ranked second on the MICCAI 2020 CADA testing dataset with an average Jaccard of 0.7593. Our code and trained models are publicly available at \url{https://github.com/JunMa11/CADA2020}. △ Less

Submitted 30 December, 2020; originally announced December 2020.

Comments: 2nd place in MICCAI 2020 CADA challenge

arXiv:2012.09610 [pdf]

Validate and Enable Machine Learning in Industrial AI

Authors: Hongbo Zou, Guang**g Chen, Pengtao Xie, Sean Chen, Yongtian He, Hochih Huang, Zheng Nie, Hongbao Zhang, Tristan Bala, Kazi Tulip, Yuqi Wang, Shenlin Qin, Eric P. Xing

Abstract: Industrial Artificial Intelligence (Industrial AI) is an emerging concept which refers to the application of artificial intelligence to industry. Industrial AI promises more efficient future industrial control systems. However, manufacturers and solution partners need to understand how to implement and integrate an AI model into the existing industrial control system. A well-trained machine learni… ▽ More Industrial Artificial Intelligence (Industrial AI) is an emerging concept which refers to the application of artificial intelligence to industry. Industrial AI promises more efficient future industrial control systems. However, manufacturers and solution partners need to understand how to implement and integrate an AI model into the existing industrial control system. A well-trained machine learning (ML) model provides many benefits and opportunities for industrial control optimization; however, an inferior Industrial AI design and integration limits the capability of ML models. To better understand how to develop and integrate trained ML models into the traditional industrial control system, test the deployed AI control system, and ultimately outperform traditional systems, manufacturers and their AI solution partners need to address a number of challenges. Six top challenges, which were real problems we ran into when deploying Industrial AI, are explored in the paper. The Petuum Optimum system is used as an example to showcase the challenges in making and testing AI models, and more importantly, how to address such challenges in an Industrial AI system. △ Less

Submitted 30 October, 2020; originally announced December 2020.

Comments: 9 pages, 8 figures

arXiv:2012.01675 [pdf, other]

Federated Learning for Personalized Humor Recognition

Authors: Xu Guo, Han Yu, Boyang Li, Hao Wang, Pengwei Xing, Siwei Feng, Zaiqing Nie, Chunyan Miao

Abstract: Computational understanding of humor is an important topic under creative language understanding and modeling. It can play a key role in complex human-AI interactions. The challenge here is that human perception of humorous content is highly subjective. The same joke may receive different funniness ratings from different readers. This makes it highly challenging for humor recognition models to ach… ▽ More Computational understanding of humor is an important topic under creative language understanding and modeling. It can play a key role in complex human-AI interactions. The challenge here is that human perception of humorous content is highly subjective. The same joke may receive different funniness ratings from different readers. This makes it highly challenging for humor recognition models to achieve personalization in practical scenarios. Existing approaches are generally designed based on the assumption that users have a consensus on whether a given text is humorous or not. Thus, they cannot handle diverse humor preferences well. In this paper, we propose the FedHumor approach for the recognition of humorous content in a personalized manner through Federated Learning (FL). Extending a pre-trained language model, FedHumor guides the fine-tuning process by considering diverse distributions of humor preferences from individuals. It incorporates a diversity adaptation strategy into the FL paradigm to train a personalized humor recognition model. To the best of our knowledge, FedHumor is the first text-based personalized humor recognition model through federated learning. Extensive experiments demonstrate the advantage of FedHumor in recognizing humorous texts compared to nine state-of-the-art humor recognition approaches with superior capability for handling the diversity in humor labels produced by users with diverse preferences. △ Less

Submitted 6 April, 2022; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: 18 pages

arXiv:2010.06694 [pdf, other]

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

Authors: Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovic, Zhen Nie

Abstract: High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection… ▽ More High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to the demo track of EMNLP 2020

arXiv:2010.05522 [pdf, other]

Pre-trained Language Model Based Active Learning for Sentence Matching

Authors: Guirong Bai, Shizhu He, Kang Liu, Jun Zhao, Zaiqing Nie

Abstract: Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differin… ▽ More Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria to measure instances and help select more efficient instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: Accepted by the conference of coling 2020

arXiv:2006.11376 [pdf, other]

doi 10.1115/1.4049805

StressGAN: A Generative Deep Learning Model for 2D Stress Distribution Prediction

Authors: Haoliang Jiang, Zhenguo Nie, Roselyn Yeo, Amir Barati Farimani, Levent Burak Kara

Abstract: Using deep learning to analyze mechanical stress distributions has been gaining interest with the demand for fast stress analysis methods. Deep learning approaches have achieved excellent outcomes when utilized to speed up stress computation and learn the physics without prior knowledge of underlying equations. However, most studies restrict the variation of geometry or boundary conditions, making… ▽ More Using deep learning to analyze mechanical stress distributions has been gaining interest with the demand for fast stress analysis methods. Deep learning approaches have achieved excellent outcomes when utilized to speed up stress computation and learn the physics without prior knowledge of underlying equations. However, most studies restrict the variation of geometry or boundary conditions, making these methods difficult to be generalized to unseen configurations. We propose a conditional generative adversarial network (cGAN) model for predicting 2D von Mises stress distributions in solid structures. The cGAN learns to generate stress distributions conditioned by geometries, load, and boundary conditions through a two-player minimax game between two neural networks with no prior knowledge. By evaluating the generative network on two stress distribution datasets under multiple metrics, we demonstrate that our model can predict more accurate high-resolution stress distributions than a baseline convolutional neural network model, given various and complex cases of geometry, load and boundary conditions. △ Less

Submitted 29 May, 2020; originally announced June 2020.

arXiv:2004.12537 [pdf, other]

doi 10.1002/mp.14676

Towards Data-Efficient Learning: A Benchmark for COVID-19 CT Lung and Infection Segmentation

Authors: Jun Ma, Yixin Wang, Xingle An, Cheng Ge, Ziqi Yu, Jianan Chen, Qiongjie Zhu, Guoqiang Dong, Jian He, Zhiqiang He, Yuntao Zhu, Ziwei Nie, ** Yang

Abstract: Purpose: Accurate segmentation of lung and infection in COVID-19 CT scans plays an important role in the quantitative management of patients. Most of the existing studies are based on large and private annotated datasets that are impractical to obtain from a single institution, especially when radiologists are busy fighting the coronavirus disease. Furthermore, it is hard to compare current COVID-… ▽ More Purpose: Accurate segmentation of lung and infection in COVID-19 CT scans plays an important role in the quantitative management of patients. Most of the existing studies are based on large and private annotated datasets that are impractical to obtain from a single institution, especially when radiologists are busy fighting the coronavirus disease. Furthermore, it is hard to compare current COVID-19 CT segmentation methods as they are developed on different datasets, trained in different settings, and evaluated with different metrics. Methods: To promote the development of data-efficient deep learning methods, in this paper, we built three benchmarks for lung and infection segmentation based on 70 annotated COVID-19 cases, which contain current active research areas, e.g., few-shot learning, domain generalization, and knowledge transfer. For a fair comparison among different segmentation methods, we also provide standard training, validation and testing splits, evaluation metrics and, the corresponding code. Results: Based on the state-of-the-art network, we provide more than 40 pre-trained baseline models, which not only serve as out-of-the-box segmentation tools but also save computational time for researchers who are interested in COVID-19 lung and infection segmentation. We achieve average Dice Similarity Coefficient (DSC) scores of 97.3\%, 97.7\%, and 67.3\% and average Normalized Surface Dice (NSD) scores of 90.6\%, 91.4\%, and 70.0\% for left lung, right lung, and infection, respectively. Conclusions: To the best of our knowledge, this work presents the first data-efficient learning benchmark for medical image segmentation and the largest number of pre-trained models up to now. All these resources are publicly available, and our work lays the foundation for promoting the development of deep learning methods for efficient COVID-19 CT segmentation with limited data. △ Less

Submitted 3 December, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: accepted for publication in Medical Physics

arXiv:2004.11588 [pdf, other]

Learning Hierarchical Review Graph Representations for Recommendation

Authors: Yong Liu, Susen Yang, Yinan Zhang, Chunyan Miao, Zaiqing Nie, Juyong Zhang

Abstract: The user review data have been demonstrated to be effective in solving different recommendation problems. Previous review-based recommendation methods usually employ sophisticated compositional models, such as Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to learn semantic representations from the review data for recommendation. However, these methods mainly capture the… ▽ More The user review data have been demonstrated to be effective in solving different recommendation problems. Previous review-based recommendation methods usually employ sophisticated compositional models, such as Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), to learn semantic representations from the review data for recommendation. However, these methods mainly capture the local dependency between neighbouring words in a word window, and they treat each review equally. Therefore, they may not be effective in capturing the global dependency between words, and tend to be easily biased by noise review information. In this paper, we propose a novel review-based recommendation model, named Review Graph Neural Network (RGNN). Specifically, RGNN builds a specific review graph for each individual user/item, which provides a global view about the user/item properties to help weaken the biases caused by noise review information. A type-aware graph attention mechanism is developed to learn semantic embeddings of words. Moreover, a personalized graph pooling operator is proposed to learn hierarchical representations of the review graph to form the semantic representation for each user/item. We compared RGNN with state-of-the-art review-based recommendation approaches on two real-world datasets. The experimental results indicate that RGNN consistently outperforms baseline methods, in terms of Mean Square Error (MSE). △ Less

Submitted 24 January, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

Showing 1–50 of 57 results for author: Nie, Z