Search | arXiv e-print repository

Neural Residual Diffusion Models for Deep Scalable Vision Generation

Authors: Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

Abstract: The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, w… ▽ More The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, which hinders massively deep scalable training of vision generation models. In this paper, we first uncover the nature that neural networks being able to effectively perform generative denoising lies in the fact that the intrinsic residual unit has consistent dynamic property with the input signal's reverse diffusion process, thus supporting excellent generative abilities. Afterwards, we stand on the shoulders of two common types of deep stacked networks to propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM for short), which is a simple yet meaningful change to the common architecture of deep generative networks by introducing a series of learnable gated residual parameters that conform to the generative dynamics. Experimental results on various generative tasks show that the proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks. Rigorous theoretical proofs and extensive experiments also demonstrate the advantages of this simple gated residual mechanism consistent with dynamic modeling in improving the fidelity and consistency of generated content and supporting large-scale scalable training. Code is available at https://github.com/Anonymous/Neural-RDM. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12295 [pdf, other]

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Authors: Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou

Abstract: Large Language Models (LLMs) demonstrate impressive performance in diverse applications, yet they face significant drawbacks, including high inference latency, expensive training cost, and generation of hallucination. Collaborative decoding between large and small language models (SLMs) offers a novel approach to address these challenges. Inspired by dual-process cognitive theory, we integrate the… ▽ More Large Language Models (LLMs) demonstrate impressive performance in diverse applications, yet they face significant drawbacks, including high inference latency, expensive training cost, and generation of hallucination. Collaborative decoding between large and small language models (SLMs) offers a novel approach to address these challenges. Inspired by dual-process cognitive theory, we integrate these methods into a unified framework termed Fast and Slow Generating (FS-GEN). This paper explores several techniques within the FS-GEN framework, including speculative decoding, contrastive decoding, and emulator or proxy fine-tuning. We provide a comprehensive analysis of these methodologies, offering insights into their similarities and differences under this framework. Our study delves into the differential knowledge capabilities of LLMs versus SLMs through the FS-GEN lens, revealing that fewer than 20% of collaborative interactions are required across various methods. These interactions adhere to a scaling law relative to the parameter ratios, thereby facilitating predictable collaboration. Furthermore, we investigate the specific positions where collaboration is most effective from an uncertainty perspective, yielding novel insights that could refine FS-GEN methods. Our findings reveal that the essential difference between models of different sizes lies in the uncertainty of the next token prediction, where interventions by larger models are most needed to assist the smaller ones. Code for Reproduction: https://github.com/TsinghuaC3I/FS-GEN △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.05666 [pdf, other]

General Distribution Learning: A theoretical framework for Deep Learning

Authors: Binchuan Qi, Li Li, Wei Gong

Abstract: There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures… ▽ More There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures in solving physical problems. This paper introduces General Distribution Learning (GD Learning), a novel theoretical learning framework designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from traditional statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors due to models and algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solutions in non-convex optimization can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight leads to the development of the gradient structure control algorithm. GD Learning also offers fresh insights into the questions on deep learning, including overparameterization and non-convex optimization, bias-variance trade-off, and the mechanism of flat minima. △ Less

Submitted 26 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2105.04026 by other authors. arXiv admin note: text overlap with arXiv:2105.04026 by other authors

arXiv:2406.05535 [pdf, other]

Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability

Authors: Junqi Gao, Biqing Qi, Yao Li, Zhichang Guo, Dong Li, Yuming Xing, Dazhi Zhang

Abstract: The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class… ▽ More The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios. Further theoretical and experimental verification demonstrates that easy samples with low loss are more likely to be located in HSDR. Perturbations towards such easy samples in the target class can avoid density estimation for HSDR location. Based on the above facts, we verified that adding perturbations to easy samples in the target class improves targeted adversarial transferability of existing attack methods. A generative targeted attack strategy named Easy Sample Matching Attack (ESMA) is proposed, which has a higher success rate for targeted attacks and outperforms the SOTA generative method. Moreover, ESMA requires only 5% of the storage space and much less computation time comparing to the current SOTA, as ESMA attacks all classes with only one model instead of seperate models for each class. Our code is available at https://github.com/gjq100/ESMA. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Journal ref: Advances in Neural Information Processing Systems 36, 2023

arXiv:2406.05534 [pdf, other]

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

Authors: Biqing Qi, Pengfei Li, Fangyuan Li, Junqi Gao, Kaiyan Zhang, Bowen Zhou

Abstract: Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the need for reward models. However, due to the presence of cross-domain human preferences, direct continual training can lead to catastrophic forgetting, limiting DPO's performance and efficiency. Inspired by intraspecific com… ▽ More Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the need for reward models. However, due to the presence of cross-domain human preferences, direct continual training can lead to catastrophic forgetting, limiting DPO's performance and efficiency. Inspired by intraspecific competition driving species evolution, we propose a Online Fast-Slow chasing DPO (OFS-DPO) for preference alignment, simulating competition through fast and slow chasing among models to facilitate rapid adaptation. Specifically, we first derive the regret upper bound for online learning, validating our motivation with a min-max optimization pattern. Based on this, we introduce two identical modules using Low-rank Adaptive (LoRA) with different optimization speeds to simulate intraspecific competition, and propose a new regularization term to guide their learning. To further mitigate catastrophic forgetting in cross-domain scenarios, we extend the OFS-DPO with LoRA modules combination strategy, resulting in the Cross domain Online Fast-Slow chasing DPO (COFS-DPO). This method leverages linear combinations of fast modules parameters from different task domains, fully utilizing historical information to achive continual value alignment. Experimental results show that OFS-DPO outperforms DPO in in-domain alignment, while COFS-DPO excels in cross-domain continual learning scenarios. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05532 [pdf, other]

Exploring Adversarial Robustness of Deep State Space Models

Authors: Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou

Abstract: Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs r… ▽ More Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05531 [pdf, other]

Enhancing Adversarial Transferability via Information Bottleneck Constraints

Authors: Biqing Qi, Junqi Gao, Jianxing Liu, Ligang Wu, Bowen Zhou

Abstract: From the perspective of information bottleneck (IB) theory, we propose a novel framework for performing black-box transferable adversarial attacks named IBTA, which leverages advancements in invariant features. Intuitively, diminishing the reliance of adversarial perturbations on the original data, under equivalent attack performance constraints, encourages a greater reliance on invariant features… ▽ More From the perspective of information bottleneck (IB) theory, we propose a novel framework for performing black-box transferable adversarial attacks named IBTA, which leverages advancements in invariant features. Intuitively, diminishing the reliance of adversarial perturbations on the original data, under equivalent attack performance constraints, encourages a greater reliance on invariant features that contributes most to classification, thereby enhancing the transferability of adversarial attacks. Building on this motivation, we redefine the optimization of transferable attacks using a novel theoretical framework that centers around IB. Specifically, to overcome the challenge of unoptimizable mutual information, we propose a simple and efficient mutual information lower bound (MILB) for approximating computation. Moreover, to quantitatively evaluate mutual information, we utilize the Mutual Information Neural Estimator (MINE) to perform a thorough analysis. Our experiments on the ImageNet dataset well demonstrate the efficiency and scalability of IBTA and derived MILB. Our code is available at https://github.com/Biqing-Qi/Enhancing-Adversarial-Transferability-via-Information-Bottleneck-Constraints. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Journal ref: IEEE Signal Processing Letters, 2024

arXiv:2406.04567 [pdf, other]

Error Bounds of Supervised Classification from Information-Theoretic Perspective

Authors: Binchuan Qi, Wei Gong, Li Li

Abstract: There remains a list of unanswered research questions on deep learning (DL), including the remarkable generalization power of overparametrized neural networks, the efficient optimization performance despite the non-convexity, and the mechanisms behind flat minima in generalization. In this paper, we adopt an information-theoretic perspective to explore the theoretical foundations of supervised cla… ▽ More There remains a list of unanswered research questions on deep learning (DL), including the remarkable generalization power of overparametrized neural networks, the efficient optimization performance despite the non-convexity, and the mechanisms behind flat minima in generalization. In this paper, we adopt an information-theoretic perspective to explore the theoretical foundations of supervised classification using deep neural networks (DNNs). Our analysis introduces the concepts of fitting error and model risk, which, together with generalization error, constitute an upper bound on the expected risk. We demonstrate that the generalization errors are bounded by the complexity, influenced by both the smoothness of distribution and the sample size. Consequently, task complexity serves as a reliable indicator of the dataset's quality, guiding the setting of regularization hyperparameters. Furthermore, the derived upper bound fitting error links the back-propagated gradient, Neural Tangent Kernel (NTK), and the model's parameter count with the fitting error. Utilizing the triangle inequality, we establish an upper bound on the expected risk. This bound offers valuable insights into the effects of overparameterization, non-convex optimization, and the flat minima in DNNs.Finally, empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk, confirming the practical relevance of the theoretical findings. △ Less

Submitted 27 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03949 [pdf, other]

UltraMedical: Building Specialized Generalists in Biomedicine

Authors: Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu **fang, Zhiyuan Liu, Bowen Zhou

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enh… ▽ More Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enhanced by techniques like supervised fine-tuning and reinforcement learning from human or AI feedback, and direct preference optimization. However, these leading technologies (e.g., preference learning) are still significantly limited in the open source community due to the scarcity of specialized data. In this paper, we present the UltraMedical collections, which consist of high-quality manual and synthetic datasets in the biomedicine domain, featuring preference annotations across multiple advanced LLMs. By utilizing these datasets, we fine-tune a suite of specialized medical models based on Llama-3 series, demonstrating breathtaking capabilities across various medical benchmarks. Moreover, we develop powerful reward models skilled in biomedical and general reward benchmark, enhancing further online preference learning within the biomedical LLM community. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Datasets and models are available at https://github.com/TsinghuaC3I/UltraMedical

arXiv:2405.17534 [pdf, other]

SMR: State Memory Replay for Long Sequence Modeling

Authors: Biqing Qi, Junqi Gao, Kaiyan Zhang, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

Abstract: Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform samp… ▽ More Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform sample processing strategy. Theoretical analysis of SSMs through the lens of Event-Triggered Control (ETC) theory reveals the Non-Stable State (NSS) problem, where deviations from sampling point requirements lead to error transmission and accumulation, causing the divergence of the SSM's hidden state. Our analysis further reveals that adjustments of input sequences with early memories can mitigate the NSS problem, achieving Sampling Step Adaptation (SSA). Building on this insight, we introduce a simple yet effective plug-and-play mechanism, State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data. This enables SSMs to stably model varying sampling points. Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models. △ Less

Submitted 8 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Journal ref: Findings of the Association for Computational Linguistics, 2024

arXiv:2405.11870 [pdf, other]

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

Authors: Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, Bowen Zhou

Abstract: Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their… ▽ More Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, PO delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without integrating their optimization objectives, ignoring the opportunities to bridge their paradigm gap and take the strengths from both. To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of PO with inferior estimation and optimization. PO evaluates the quality of model's entire generated answer, whereas SFT only scores predicted tokens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-Tuning (IFT) to integrate SFT and Preference Optimization into a single process. IFT captures LMs' intuitive sense of the entire answers through a temporal residual connection, but it solely relies on a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show that IFT performs comparably or even superiorly to sequential recipes of SFT and some typical Preference Optimization methods across several tasks, particularly those requires generation, reasoning, and fact-following abilities. An explainable Frozen Lake game further validates the effectiveness of IFT for getting competitive policy. △ Less

Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2403.20009 [pdf, other]

On Large Language Models' Hallucination with Regard to Known Facts

Authors: Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong Meng, Mo Yu, Bowen Zhou, Jie Zhou

Abstract: Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual question… ▽ More Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual questions that query the same triplet knowledge but result in different answers. The difference between the model behaviors on the correct and incorrect outputs hence suggests the patterns when hallucinations happen. Second, to measure the pattern, we utilize map**s from the residual streams to vocabulary space. We reveal the different dynamics of the output token probabilities along the depths of layers between the correct and hallucinated cases. In hallucinated cases, the output token's information rarely demonstrates abrupt increases and consistent superiority in the later stages of the model. Leveraging the dynamic curve as a feature, we build a classifier capable of accurately detecting hallucinatory predictions with an 88\% success rate. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted by NAACL 2024 MainConference

arXiv:2403.04140 [pdf, other]

Contrastive Augmented Graph2Graph Memory Interaction for Few Shot Continual Learning

Authors: Biqing Qi, Junqi Gao, Xingquan Chen, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

Abstract: Few-Shot Class-Incremental Learning (FSCIL) has gained considerable attention in recent years for its pivotal role in addressing continuously arriving classes. However, it encounters additional challenges. The scarcity of samples in new sessions intensifies overfitting, causing incompatibility between the output features of new and old classes, thereby escalating catastrophic forgetting. A prevale… ▽ More Few-Shot Class-Incremental Learning (FSCIL) has gained considerable attention in recent years for its pivotal role in addressing continuously arriving classes. However, it encounters additional challenges. The scarcity of samples in new sessions intensifies overfitting, causing incompatibility between the output features of new and old classes, thereby escalating catastrophic forgetting. A prevalent strategy involves mitigating catastrophic forgetting through the Explicit Memory (EM), which comprise of class prototypes. However, current EM-based methods retrieves memory globally by performing Vector-to-Vector (V2V) interaction between features corresponding to the input and prototypes stored in EM, neglecting the geometric structure of local features. This hinders the accurate modeling of their positional relationships. To incorporate information of local geometric structure, we extend the V2V interaction to Graph-to-Graph (G2G) interaction. For enhancing local structures for better G2G alignment and the prevention of local feature collapse, we propose the Local Graph Preservation (LGP) mechanism. Additionally, to address sample scarcity in classes from new sessions, the Contrast-Augmented G2G (CAG2G) is introduced to promote the aggregation of same class features thus helps few-shot learning. Extensive comparisons on CIFAR100, CUB200, and the challenging ImageNet-R dataset demonstrate the superiority of our method over existing methods. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 12 Pages, 5 figures

arXiv:2403.03129 [pdf, other]

CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following

Authors: Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, Bowen Zhou

Abstract: With the advancement of language models (LMs), their exposure to private data is increasingly inevitable, and their deployment (especially for smaller ones) on personal devices, such as PCs and smartphones, has become a prevailing trend. In contexts laden with user information, enabling models to both safeguard user privacy and execute commands efficiently emerges as an essential research imperati… ▽ More With the advancement of language models (LMs), their exposure to private data is increasingly inevitable, and their deployment (especially for smaller ones) on personal devices, such as PCs and smartphones, has become a prevailing trend. In contexts laden with user information, enabling models to both safeguard user privacy and execute commands efficiently emerges as an essential research imperative. In this paper, we propose CoGenesis, a collaborative generation framework integrating large (hosted on cloud infrastructure) and small models (deployed on local devices) to address privacy concerns logically. Initially, we design a pipeline to create personalized writing instruction datasets enriched with extensive context details as the testbed of this research issue. Subsequently, we introduce two variants of CoGenesis based on sketch and logits respectively. Our experimental findings, based on our synthesized dataset and two additional open-source datasets, indicate that: 1) Large-scale models perform well when provided with user context but struggle in the absence of such context. 2) While specialized smaller models fine-tuned on the synthetic dataset show promise, they still lag behind their larger counterparts. 3) Our CoGenesis framework, utilizing mixed-scale models, showcases competitive performance, providing a feasible solution to privacy issues. △ Less

Submitted 6 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Accepted to ACL 2024 (Main Conference)

arXiv:2403.02628 [pdf, other]

Interactive Continual Learning: Fast and Slow Thinking

Authors: Biqing Qi, Xingquan Chen, Junqi Gao, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

Abstract: Advanced life forms, sustained by the synergistic interaction of neural cognitive mechanisms, continually acquire and transfer knowledge throughout their lifespan. In contrast, contemporary machine learning paradigms exhibit limitations in emulating the facets of continual learning (CL). Nonetheless, the emergence of large language models (LLMs) presents promising avenues for realizing CL via inte… ▽ More Advanced life forms, sustained by the synergistic interaction of neural cognitive mechanisms, continually acquire and transfer knowledge throughout their lifespan. In contrast, contemporary machine learning paradigms exhibit limitations in emulating the facets of continual learning (CL). Nonetheless, the emergence of large language models (LLMs) presents promising avenues for realizing CL via interactions with these models. Drawing on Complementary Learning System theory, this paper presents a novel Interactive Continual Learning (ICL) framework, enabled by collaborative interactions among models of various sizes. Specifically, we assign the ViT model as System1 and multimodal LLM as System2. To enable the memory module to deduce tasks from class information and enhance Set2Set retrieval, we propose the Class-Knowledge-Task Multi-Head Attention (CKT-MHA). Additionally, to improve memory retrieval in System1 through enhanced geometric representation, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution. Meanwhile, we introduce the von Mises-Fisher Outlier Detection and Interaction (vMF-ODI) strategy to identify hard examples, thus enhancing collaboration between System1 and System2 for complex reasoning realization. Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods. Code is available at github.com/ICL. △ Less

Submitted 18 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2402.16397 [pdf, other]

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

Authors: Biqing Qi, Junqi Gao, Yiang Luo, Jianxing Liu, Ligang Wu, Bowen Zhou

Abstract: The rise of generative neural networks has triggered an increased demand for intellectual property (IP) protection in generated content. Deep watermarking techniques, recognized for their flexibility in IP protection, have garnered significant attention. However, the surge in adversarial transferable attacks poses unprecedented challenges to the security of deep watermarking techniques-an area cur… ▽ More The rise of generative neural networks has triggered an increased demand for intellectual property (IP) protection in generated content. Deep watermarking techniques, recognized for their flexibility in IP protection, have garnered significant attention. However, the surge in adversarial transferable attacks poses unprecedented challenges to the security of deep watermarking techniques-an area currently lacking systematic investigation. This study fills this gap by introducing two effective transferable attackers to assess the vulnerability of deep watermarks against erasure and tampering risks. Specifically, we initially define the concept of local sample density, utilizing it to deduce theorems on the consistency of model outputs. Upon discovering that perturbing samples towards high sample density regions (HSDR) of the target class enhances targeted adversarial transferability, we propose the Easy Sample Selection (ESS) mechanism and the Easy Sample Matching Attack (ESMA) method. Additionally, we propose the Bottleneck Enhanced Mixup (BEM) that integrates information bottleneck theory to reduce the generator's dependence on irrelevant noise. Experiments show a significant enhancement in the success rate of targeted transfer attacks for both ESMA and BEM-ESMA methods. We further conduct a comprehensive evaluation using ESMA and BEM-ESMA as measurements, considering model architecture and watermark encoding length, and achieve some impressive findings. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 18 pages, 8 figures

arXiv:2311.05965 [pdf, other]

Large Language Models are Zero Shot Hypothesis Proposers

Authors: Biqing Qi, Kaiyan Zhang, Haoxiang Li, Kai Tian, Sihang Zeng, Zhang-Ren Chen, Bowen Zhou

Abstract: Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of s… ▽ More Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of scientific discovery. However, the potential of LLMs for scientific discovery has not been formally explored. In this paper, we start from investigating whether LLMs can propose scientific hypotheses. To this end, we construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. The dataset is divided into training, seen, and unseen test sets based on the publication date to control visibility. We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs. Additionally, we introduce an LLM-based multi-agent cooperative framework with different role designs and external tools to enhance the capabilities related to generating hypotheses. We also design four metrics through a comprehensive review to evaluate the generated hypotheses for both ChatGPT-based and human evaluations. Through experiments and analyses, we arrive at the following findings: 1) LLMs surprisingly generate untrained yet validated hypotheses from testing literature. 2) Increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilities. These findings strongly support the potential of LLMs as catalysts for new scientific discoveries and guide further exploration. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: Instruction Workshop @ NeurIPS 2023

arXiv:2311.04438 [pdf, other]

Reusing Convolutional Neural Network Models through Modularization and Composition

Authors: Binhang Qi, Hailong Sun, Hongyu Zhang, Xiang Gao

Abstract: With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the public DNN models for new tasks often fails due to mismatching functionality or performance. Inspired by the notion of modularization and composition in software reuse, we investigate the possibility of improving the reusability of DNN m… ▽ More With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the public DNN models for new tasks often fails due to mismatching functionality or performance. Inspired by the notion of modularization and composition in software reuse, we investigate the possibility of improving the reusability of DNN models in a more fine-grained manner. Specifically, we propose two modularization approaches named CNNSplitter and GradSplitter, which can decompose a trained convolutional neural network (CNN) model for $N$-class classification into $N$ small reusable modules. Each module recognizes one of the $N$ classes and contains a part of the convolution kernels of the trained CNN model. Then, the resulting modules can be reused to patch existing CNN models or build new CNN models through composition. The main difference between CNNSplitter and GradSplitter lies in their search methods: the former relies on a genetic algorithm to explore search space, while the latter utilizes a gradient-based search method. Our experiments with three representative CNNs on three widely-used public datasets demonstrate the effectiveness of the proposed approaches. Compared with CNNSplitter, GradSplitter incurs less accuracy loss, produces much smaller modules (19.88% fewer kernels), and achieves better results on patching weak models. In particular, experiments on GradSplitter show that (1) by patching weak models, the average improvement in terms of precision, recall, and F1-score is 17.13%, 4.95%, and 11.47%, respectively, and (2) for a new task, compared with the models trained from scratch, reusing modules achieves similar accuracy (the average loss of accuracy is only 2.46%) without a costly training process. Our approaches provide a viable solution to the rapid development and improvement of CNN models. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM). arXiv admin note: substantial text overlap with arXiv:2209.06116

arXiv:2310.15477 [pdf, other]

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Authors: Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen Zhou

Abstract: Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its… ▽ More Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Main Conference)

arXiv:2306.09376 [pdf, other]

Modularizing while Training: A New Paradigm for Modularizing DNN Models

Authors: Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao

Abstract: Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering. However, reusing an entire model could cause extra overh… ▽ More Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering. However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities. Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse. Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss. In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT). We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work. The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach. Furthermore, the total time cost required for training and modularizing is only 108 minutes, half of the baseline. △ Less

Submitted 5 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at ICSE'24

arXiv:2305.13888 [pdf, other]

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

Authors: Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, Bowen Zhou

Abstract: While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment. Previous studies try to distill task-specific ability from LLMs to smaller models, using data synthesis and chain-of-thought (CoT) fine-tuning. However, synthetic CoT data often contains faulty reasoning, which det… ▽ More While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment. Previous studies try to distill task-specific ability from LLMs to smaller models, using data synthesis and chain-of-thought (CoT) fine-tuning. However, synthetic CoT data often contains faulty reasoning, which deteriorates the quality of distillation, especially in reasoning capabilities. In this work, we propose Program-aided Distillation (PaD), which introduces reasoning programs to suppress the errors in distilled data, and thus achieves better distillation quality for reasoning tasks. In PaD, we utilize the reasoning program to substitute the CoT, allowing automated error checking of synthetic data. Further, through error injecting and further training, the small distilling model could iteratively self-refine the reasoning. Moreover, we conduct a step-wise beam search by step-by-step verifying to acquire more exact reasoning chains. We evaluate PaD on arithmetic reasoning, symbolic reasoning, and general ability. Experimental results demonstrate that smaller models using PaD can not only outperform certain LLMs~(e.g., LLaMA-1 13B) but also achieve strong improvement over baselines with a significantly smaller scale of parameters and data. The source code is publicly available at https://github.com/Xuekai-Zhu/pad. △ Less

Submitted 20 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: NAACL 2024 Long Paper; Code and data are available at https://github.com/Xuekai-Zhu/pad

arXiv:2304.00245 [pdf, other]

Reusing Deep Neural Network Models through Model Re-engineering

Authors: Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang, Zhaotian Li, Xudong Liu

Abstract: Training deep neural network (DNN) models, which has become an important task in today's software development, is often costly in terms of computational resources and time. With the inspiration of software reuse, building DNN models through reusing existing ones has gained increasing attention recently. Prior approaches to DNN model reuse have two main limitations: 1) reusing the entire model, whi… ▽ More Training deep neural network (DNN) models, which has become an important task in today's software development, is often costly in terms of computational resources and time. With the inspiration of software reuse, building DNN models through reusing existing ones has gained increasing attention recently. Prior approaches to DNN model reuse have two main limitations: 1) reusing the entire model, while only a small part of the model's functionalities (labels) are required, would cause much overhead (e.g., computational and time costs for inference), and 2) model reuse would inherit the defects and weaknesses of the reused model, and hence put the new system under threats of security attack. To solve the above problem, we propose SeaM, a tool that re-engineers a trained DNN model to improve its reusability. Specifically, given a target problem and a trained model, SeaM utilizes a gradient-based search method to search for the model's weights that are relevant to the target problem. The re-engineered model that only retains the relevant weights is then reused to solve the target problem. Evaluation results on widely-used models show that the re-engineered models produced by SeaM only contain 10.11% weights of the original models, resulting 42.41% reduction in terms of inference time. For the target problem, the re-engineered models even outperform the original models in classification accuracy by 5.85%. Moreover, reusing the re-engineered models inherits an average of 57% fewer defects than reusing the entire model. We believe our approach to reducing reuse overhead and defect inheritance is one important step forward for practical model reuse. △ Less

Submitted 29 July, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: Accepted by ICSE'23

arXiv:2303.06759 [pdf, other]

New Approximation Algorithms for Touring Regions

Authors: Benjamin Qi, Richard Qi, Xinyang Chen

Abstract: We analyze the touring regions problem: find a ($1+ε$)-approximate Euclidean shortest path in $d$-dimensional space that starts at a given starting point, ends at a given ending point, and visits given regions $R_1, R_2, R_3, \dots, R_n$ in that order. Our main result is an $\mathcal O \left(\frac{n}{\sqrtε}\log{\frac{1}ε} + \frac{1}ε \right)$-time algorithm for touring disjoint disks. We also g… ▽ More We analyze the touring regions problem: find a ($1+ε$)-approximate Euclidean shortest path in $d$-dimensional space that starts at a given starting point, ends at a given ending point, and visits given regions $R_1, R_2, R_3, \dots, R_n$ in that order. Our main result is an $\mathcal O \left(\frac{n}{\sqrtε}\log{\frac{1}ε} + \frac{1}ε \right)$-time algorithm for touring disjoint disks. We also give an $\mathcal O\left (\min\left(\frac{n}ε, \frac{n^2}{\sqrt ε}\right) \right)$-time algorithm for touring disjoint two-dimensional convex fat bodies. Both of these results naturally generalize to larger dimensions; we obtain $\mathcal O\left(\frac{n}{ε^{d-1}}\log^2\frac{1}ε+\frac{1}{ε^{2d-2}}\right)$ and $\mathcal O\left(\frac{n}{ε^{2d-2}}\right)$-time algorithms for touring disjoint $d$-dimensional balls and convex fat bodies, respectively. △ Less

Submitted 13 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: to appear in SOCG 2023. V2 - fixed figures

arXiv:2302.11838 [pdf, other]

Minimum-Entropy Coupling Approximation Guarantees Beyond the Majorization Barrier

Authors: Spencer Compton, Dmitriy Katz, Benjamin Qi, Kristjan Greenewald, Murat Kocaoglu

Abstract: Given a set of discrete probability distributions, the minimum entropy coupling is the minimum entropy joint distribution that has the input distributions as its marginals. This has immediate relevance to tasks such as entropic causal inference for causal graph discovery and bounding mutual information between variables that we observe separately. Since finding the minimum entropy coupling is NP-H… ▽ More Given a set of discrete probability distributions, the minimum entropy coupling is the minimum entropy joint distribution that has the input distributions as its marginals. This has immediate relevance to tasks such as entropic causal inference for causal graph discovery and bounding mutual information between variables that we observe separately. Since finding the minimum entropy coupling is NP-Hard, various works have studied approximation algorithms. The work of [Compton, ISIT 2022] shows that the greedy coupling algorithm of [Kocaoglu et al., AAAI 2017] is always within $log_2(e) \approx 1.44$ bits of the optimal coupling. Moreover, they show that it is impossible to obtain a better approximation guarantee using the majorization lower-bound that all prior works have used: thus establishing a majorization barrier. In this work, we break the majorization barrier by designing a stronger lower-bound that we call the profile method. Using this profile method, we are able to show that the greedy algorithm is always within $log_2(e)/e \approx 0.53$ bits of optimal for coupling two distributions (previous best-known bound is within 1 bit), and within $(1 + log_2(e))/2 \approx 1.22$ bits for coupling any number of distributions (previous best-known bound is within 1.44 bits). We also examine a generalization of the minimum entropy coupling problem: Concave Minimum-Cost Couplings. We are able to obtain similar guarantees for this generalization in terms of the concave cost function. Additionally, we make progress on the open problem of [Kovačević et al., Inf. Comput. 2015] regarding NP membership of the minimum entropy coupling problem by showing that any hardness of minimum entropy coupling beyond NP comes from the difficulty of computing arithmetic in the complexity class NP. Finally, we present exponential-time algorithms for computing the exactly optimal solution. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: AISTATS 2023

arXiv:2211.14163 [pdf]

doi 10.1109/TOH.2022.3151673

Contactless Haptic Display Through Magnetic Field Control

Authors: Xiong Lu, Yuxing Yan, Beibei Qi, Huang Qian, Junbin Sun, Aaron Quigley

Abstract: Haptic rendering enables people to touch, perceive, and manipulate virtual objects in a virtual environment. Using six cascaded identical hollow disk electromagnets and a small permanent magnet attached to an operator's finger, this paper proposes and develops an untethered haptic interface through magnetic field control. The concentric hole inside the six cascaded electromagnets provides the work… ▽ More Haptic rendering enables people to touch, perceive, and manipulate virtual objects in a virtual environment. Using six cascaded identical hollow disk electromagnets and a small permanent magnet attached to an operator's finger, this paper proposes and develops an untethered haptic interface through magnetic field control. The concentric hole inside the six cascaded electromagnets provides the workspace, where the 3D position of the permanent magnet is tracked with a Microsoft Kinect sensor. The driving currents of six cascaded electromagnets are calculated in real-time for generating the desired magnetic force. Offline data from an FEA (finite element analysis) based simulation, determines the relationship between the magnetic force, the driving currents, and the position of the permanent magnet. A set of experiments including the virtual object recognition experiment, the virtual surface identification experiment, and the user perception evaluation experiment were conducted to demonstrate the proposed system, where Microsoft HoloLens holographic glasses are used for visual rendering. The proposed magnetic haptic display leads to an untethered and non-contact interface for natural haptic rendering applications, which overcomes the constraints of mechanical linkages in tool-based traditional haptic devices. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Journal ref: in IEEE Transactions on Haptics, vol. 15, no. 2, pp. 328-338, 1 April-June 2022

arXiv:2209.06116 [pdf, other]

doi 10.1145/3551349.3561153

Patching Weak Convolutional Neural Network Models through Modularization and Composition

Authors: Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang

Abstract: Despite great success in many applications, deep neural networks are not always robust in practice. For instance, a convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily in classifying some particular classes of objects. In this work, we are concerned with patching the weak part of a CNN model instead of improving it through the costly retraining of the… ▽ More Despite great success in many applications, deep neural networks are not always robust in practice. For instance, a convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily in classifying some particular classes of objects. In this work, we are concerned with patching the weak part of a CNN model instead of improving it through the costly retraining of the entire model. Inspired by the fundamental concepts of modularization and composition in software engineering, we propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules. Each module is a sub-model containing a part of the convolution kernels of the strong model. To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model. The ability of the weak CNN model to recognize the TC can thus be improved through patching. Moreover, the ability to recognize non-TCs is also improved, as the samples misclassified as TC could be classified as non-TCs correctly. Experimental results with two representative CNNs on three widely-used datasets show that the averaged improvement on the TC in terms of precision and recall are 12.54% and 2.14%, respectively. Moreover, patching improves the accuracy of non-TCs by 1.18%. The results demonstrate that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for develo** robust CNN models. △ Less

Submitted 29 July, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: Accepted at ASE'22

arXiv:2208.10694 [pdf, other]

Spiral Contrastive Learning: An Efficient 3D Representation Learning Method for Unannotated CT Lesions

Authors: Penghua Zhai, Enwei Zhu, Baolian Qi, Xin Wei, **peng Li

Abstract: Computed tomography (CT) samples with pathological annotations are difficult to obtain. As a result, the computer-aided diagnosis (CAD) algorithms are trained on small datasets (e.g., LIDC-IDRI with 1,018 samples), limiting their accuracies and reliability. In the past five years, several works have tailored for unsupervised representations of CT lesions via two-dimensional (2D) and three-dimensio… ▽ More Computed tomography (CT) samples with pathological annotations are difficult to obtain. As a result, the computer-aided diagnosis (CAD) algorithms are trained on small datasets (e.g., LIDC-IDRI with 1,018 samples), limiting their accuracies and reliability. In the past five years, several works have tailored for unsupervised representations of CT lesions via two-dimensional (2D) and three-dimensional (3D) self-supervised learning (SSL) algorithms. The 2D algorithms have difficulty capturing 3D information, and existing 3D algorithms are computationally heavy. Light-weight 3D SSL remains the boundary to explore. In this paper, we propose the spiral contrastive learning (SCL), which yields 3D representations in a computationally efficient manner. SCL first transforms 3D lesions to the 2D plane using an information-preserving spiral transformation, and then learn transformation-invariant features using 2D contrastive learning. For the augmentation, we consider natural image augmentations and medical image augmentations. We evaluate SCL by training a classification head upon the embedding layer. Experimental results show that SCL achieves state-of-the-art accuracy on LIDC-IDRI (89.72%), LNDb (82.09%) and TianChi (90.16%) for unsupervised representation learning. With 10% annotated data for fine-tune, the performance of SCL is comparable to that of supervised learning algorithms (85.75% vs. 85.03% on LIDC-IDRI, 78.20% vs. 73.44% on LNDb and 87.85% vs. 83.34% on TianChi, respectively). Meanwhile, SCL reduces the computational effort by 66.98% compared to other 3D SSL algorithms, demonstrating the efficiency of the proposed method in unsupervised pre-training. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2207.00251 [pdf, other]

Computer-aided Tuberculosis Diagnosis with Attribute Reasoning Assistance

Authors: Chengwei Pan, Gangming Zhao, Junjie Fang, Baolian Qi, Jiaheng Liu, Chaowei Fang, Dingwen Zhang, **peng Li, Yizhou Yu

Abstract: Although deep learning algorithms have been intensively developed for computer-aided tuberculosis diagnosis (CTD), they mainly depend on carefully annotated datasets, leading to much time and resource consumption. Weakly supervised learning (WSL), which leverages coarse-grained labels to accomplish fine-grained tasks, has the potential to solve this problem. In this paper, we first propose a new l… ▽ More Although deep learning algorithms have been intensively developed for computer-aided tuberculosis diagnosis (CTD), they mainly depend on carefully annotated datasets, leading to much time and resource consumption. Weakly supervised learning (WSL), which leverages coarse-grained labels to accomplish fine-grained tasks, has the potential to solve this problem. In this paper, we first propose a new large-scale tuberculosis (TB) chest X-ray dataset, namely the tuberculosis chest X-ray attribute dataset (TBX-Att), and then establish an attribute-assisted weakly-supervised framework to classify and localize TB by leveraging the attribute information to overcome the insufficiency of supervision in WSL scenarios. Specifically, first, the TBX-Att dataset contains 2000 X-ray images with seven kinds of attributes for TB relational reasoning, which are annotated by experienced radiologists. It also includes the public TBX11K dataset with 11200 X-ray images to facilitate weakly supervised detection. Second, we exploit a multi-scale feature interaction model for TB area classification and detection with attribute relational reasoning. The proposed model is evaluated on the TBX-Att dataset and will serve as a solid baseline for future research. The code and data will be available at https://github.com/GangmingZhao/tb-attribute-weak-localization. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: Provisionally Accepted for Medical Image Computing and Computer Assisted Interventions 2022 (MICCAI 2022). arXiv admin note: text overlap with arXiv:2010.04483

arXiv:2205.15874 [pdf, ps, other]

On Maximizing Sums of Non-monotone Submodular and Linear Functions

Authors: Benjamin Qi

Abstract: We study the problem of Regularized Unconstrained Submodular Maximization (RegularizedUSM) as defined by Bodek and Feldman [BF22]. In this problem, you are given a non-monotone non-negative submodular function $f:2^{\mathcal N}\to \mathbb R_{\ge 0}$ and a linear function $\ell:2^{\mathcal N}\to \mathbb R$ over the same ground set $\mathcal N$, and the objective is to output a set… ▽ More We study the problem of Regularized Unconstrained Submodular Maximization (RegularizedUSM) as defined by Bodek and Feldman [BF22]. In this problem, you are given a non-monotone non-negative submodular function $f:2^{\mathcal N}\to \mathbb R_{\ge 0}$ and a linear function $\ell:2^{\mathcal N}\to \mathbb R$ over the same ground set $\mathcal N$, and the objective is to output a set $T\subseteq \mathcal N$ approximately maximizing the sum $f(T)+\ell(T)$. Specifically, an algorithm is said to provide an $(α,β)$-approximation for RegularizedUSM if it outputs a set $T$ such that $\mathbb E[f(T)+\ell(T)]\ge \max_{S\subseteq \mathcal N}[α\cdot f(S)+β\cdot \ell(S)]$. We also study the setting where $S$ and $T$ are subject to a matroid constraint, which we refer to as Regularized Constrained Submodular Maximization (RegularizedCSM). For both RegularizedUSM and RegularizedCSM, we provide improved $(α,β)$-approximation algorithms for the cases of non-positive $\ell$, non-negative $\ell$, and unconstrained $\ell$. In particular, for the case of unconstrained $\ell$, we are the first to provide nontrivial $(α,β)$-approximations for RegularizedCSM, and the $α$ we obtain for RegularizedUSM is superior to that of [BF22] for all $β\in (0,1)$. In addition to approximation algorithms, we provide improved inapproximability results for all of the aforementioned cases. In particular, we show that the $α$ our algorithm obtains for RegularizedCSM with unconstrained $\ell$ is tight for $β\ge \frac{e}{e+1}$. We also show 0.478-inapproximability for maximizing a submodular function where $S$ and $T$ are subject to a cardinality constraint, improving the long-standing 0.491-inapproximability result due to Gharan and Vondrak [GV10]. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 38 pages, 5 figures

ACM Class: F.2.2; G.2.1

arXiv:2109.12506 [pdf, other]

A Simple Self-calibration Method for The Internal Time Synchronization of MEMS LiDAR

Authors: Yu Zhang, Xiaoguang Di, Shiyu Yan, Bin Zhang, Baoling Qi, Chunhui Wang

Abstract: This paper proposes a simple self-calibration method for the internal time synchronization of MEMS(Micro-electromechanical systems) LiDAR during research and development. Firstly, we introduced the problem of internal time misalignment in MEMS lidar. Then, a robust Minimum Vertical Gradient(MVG) prior is proposed to calibrate the time difference between the laser and MEMS mirror, which can be calc… ▽ More This paper proposes a simple self-calibration method for the internal time synchronization of MEMS(Micro-electromechanical systems) LiDAR during research and development. Firstly, we introduced the problem of internal time misalignment in MEMS lidar. Then, a robust Minimum Vertical Gradient(MVG) prior is proposed to calibrate the time difference between the laser and MEMS mirror, which can be calculated automatically without any artificial participation or specially designed cooperation target. Finally, actual experiments on MEMS LiDARs are implemented to demonstrate the effectiveness of the proposed method. It should be noted that the calibration can be implemented in a simple laboratory environment without any ranging equipment and artificial participation, which greatly accelerate the progress of research and development in practical applications. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 9 pages, 8 figures,

ACM Class: I.4.5; J.2

arXiv:2107.06442 [pdf, other]

GREN: Graph-Regularized Embedding Network for Weakly-Supervised Disease Localization in X-ray Images

Authors: Baolian Qi, Gangming Zhao, Xin Wei, Changde Du, Chengwei Pan, Yizhou Yu, **peng Li

Abstract: Locating diseases in chest X-ray images with few careful annotations saves large human effort. Recent works approached this task with innovative weakly-supervised algorithms such as multi-instance learning (MIL) and class activation maps (CAM), however, these methods often yield inaccurate or incomplete regions. One of the reasons is the neglection of the pathological implications hidden in the re… ▽ More Locating diseases in chest X-ray images with few careful annotations saves large human effort. Recent works approached this task with innovative weakly-supervised algorithms such as multi-instance learning (MIL) and class activation maps (CAM), however, these methods often yield inaccurate or incomplete regions. One of the reasons is the neglection of the pathological implications hidden in the relationship across anatomical regions within each image and the relationship across images. In this paper, we argue that the cross-region and cross-image relationship, as contextual and compensating information, is vital to obtain more consistent and integral regions. To model the relationship, we propose the Graph Regularized Embedding Network (GREN), which leverages the intra-image and inter-image information to locate diseases on chest X-ray images. GREN uses a pre-trained U-Net to segment the lung lobes, and then models the intra-image relationship between the lung lobes using an intra-image graph to compare different regions. Meanwhile, the relationship between in-batch images is modeled by an inter-image graph to compare multiple images. This process mimics the training and decision-making process of a radiologist: comparing multiple regions and images for diagnosis. In order for the deep embedding layers of the neural network to retain structural information (important in the localization task), we use the Hash coding and Hamming distance to compute the graphs, which are used as regularizers to facilitate training. By means of this, our approach achieves the state-of-the-art result on NIH chest X-ray dataset for weakly-supervised disease localization. Our codes are accessible online (https://github.com/qibaolian/GREN). △ Less

Submitted 4 August, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: Accepted in IEEE Journal of Biomedical and Health Informatics (JBHI)

arXiv:2105.00625 [pdf, ps, other]

Three-Party Integer Comparison and Applications

Authors: Jie Ma, Bin Qi, Kewei Lv

Abstract: Secure integer comparison has been a popular research topic in cryptography, both for its simplicity to describe and for its applications. The aim is to enable two parties to compare their inputs without revealing the exact value of those inputs. In this paper, we highlight three-party integer comparison (TPIC), where a \emph{judge}, with no private input, wants to know the comparison result, wh… ▽ More Secure integer comparison has been a popular research topic in cryptography, both for its simplicity to describe and for its applications. The aim is to enable two parties to compare their inputs without revealing the exact value of those inputs. In this paper, we highlight three-party integer comparison (TPIC), where a \emph{judge}, with no private input, wants to know the comparison result, while two \emph{competitors} hold secret integers to do privacy-preserving comparison. The judge actively obtains the result rather than passively waiting for it sent by a competitor. We give two TPIC constructions considering \emph{Mixed adversaries}, who have with different capabilities. One is secure against a semi-honest adversary with low computation and communication cost, while the other is secure against a malicious adversary. Basing on TPIC, we present multi-party comparisons through concrete applications, including a joint bidding scheme and a practical auction. Brief security proofs and analysis for the applications are presented. In comparison, our auction scheme is more efficient with lower cost, making it feasible in practice rather than a theoretical design. All the comparisons and application schemes run on top of blockchain requiring a constant number of rounds. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2101.08992 [pdf, other]

Cross Chest Graph for Disease Diagnosis with Structural Relational Reasoning

Authors: Gangming Zhao, Baolian Qi, **peng Li

Abstract: Locating lesions is important in the computer-aided diagnosis of X-ray images. However, box-level annotation is time-consuming and laborious. How to locate lesions accurately with few, or even without careful annotations is an urgent problem. Although several works have approached this problem with weakly-supervised methods, the performance needs to be improved. One obstacle is that general weakly… ▽ More Locating lesions is important in the computer-aided diagnosis of X-ray images. However, box-level annotation is time-consuming and laborious. How to locate lesions accurately with few, or even without careful annotations is an urgent problem. Although several works have approached this problem with weakly-supervised methods, the performance needs to be improved. One obstacle is that general weakly-supervised methods have failed to consider the characteristics of X-ray images, such as the highly-structural attribute. We therefore propose the Cross-chest Graph (CCG), which improves the performance of automatic lesion detection by imitating doctor's training and decision-making process. CCG models the intra-image relationship between different anatomical areas by leveraging the structural information to simulate the doctor's habit of observing different areas. Meanwhile, the relationship between any pair of images is modeled by a knowledge-reasoning module to simulate the doctor's habit of comparing multiple images. We integrate intra-image and inter-image information into a unified end-to-end framework. Experimental results on the NIH Chest-14 database (112,120 frontal-view X-ray images with 14 diseases) demonstrate that the proposed method achieves state-of-the-art performance in weakly-supervised localization of lesions by absorbing professional knowledge in the medical field. △ Less

Submitted 1 February, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

arXiv:2011.14206 [pdf, other]

AdaGrasp: Learning an Adaptive Gripper-Aware Gras** Policy

Authors: Zhenjia Xu, Beichun Qi, Shubham Agrawal, Shuran Song

Abstract: This paper aims to improve robots' versatility and adaptability by allowing them to use a large variety of end-effector tools and quickly adapt to new tools. We propose AdaGrasp, a method to learn a single gras** policy that generalizes to novel grippers. By training on a large collection of grippers, our algorithm is able to acquire generalizable knowledge of how different grippers should be us… ▽ More This paper aims to improve robots' versatility and adaptability by allowing them to use a large variety of end-effector tools and quickly adapt to new tools. We propose AdaGrasp, a method to learn a single gras** policy that generalizes to novel grippers. By training on a large collection of grippers, our algorithm is able to acquire generalizable knowledge of how different grippers should be used in various tasks. Given a visual observation of the scene and the gripper, AdaGrasp infers the possible grasp poses and their grasp scores by computing the cross convolution between the shape encodings of the gripper and scene. Intuitively, this cross convolution operation can be considered as an efficient way of exhaustively matching the scene geometry with gripper geometry under different grasp poses (i.e., translations and orientations), where a good "match" of 3D geometry will lead to a successful grasp. We validate our methods in both simulation and real-world environments. Our experiment shows that AdaGrasp significantly outperforms the existing multi-gripper gras** policy method, especially when handling cluttered environments and partial observations. Video is available at https://youtu.be/kknTYTbORfs △ Less

Submitted 13 March, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: ICRA 2021. Project page: https://adagrasp.cs.columbia.edu

arXiv:1801.07397 [pdf, other]

doi 10.1109/JIOT.2018.2866164

HCIC: Hardware-assisted Control-flow Integrity Checking

Authors: Jiliang Zhang, Binhang Qi, Gang Qu

Abstract: Recently, code reuse attacks (CRAs), such as return-oriented programming (ROP) and jump-oriented programming (JOP), have emerged as a new class of ingenious security threatens. Attackers can utilize CRAs to hijack the control flow of programs to perform malicious actions without injecting any codes. Many defenses, classed into software-based and hardware-based, have been proposed. However, softwar… ▽ More Recently, code reuse attacks (CRAs), such as return-oriented programming (ROP) and jump-oriented programming (JOP), have emerged as a new class of ingenious security threatens. Attackers can utilize CRAs to hijack the control flow of programs to perform malicious actions without injecting any codes. Many defenses, classed into software-based and hardware-based, have been proposed. However, software-based methods are difficult to be deployed in practical systems due to high performance overhead. Hardware-based methods can reduce performance overhead but may require extending instruction set architectures (ISAs) and modifying compiler or suffer the vulnerability of key leakage. To tackle these issues, this paper proposes a new hardware-based control flow checking method to resist CRAs with negligible performance overhead without extending ISAs, modifying compiler and leaking the encryption/decryption key. The key technique involves two control flow checking mechanisms. The first one is the encrypted Hamming distances (EHDs) matching between the physical unclonable function (PUF) response and the return addresses, which prevents attackers from returning between gadgets so long as the PUF response is secret, thus resisting ROP attacks. The second one is the liner encryption/decryption operation (XOR) between PUF response and the instructions at target addresses of call and jmp instructions to defeat JOP attacks. Advanced return-based full-function reuse attacks will be prevented with the dynamic key-updating method. Experimental evaluations on benchmarks demonstrate that the proposed method introduces negligible 0.95% run-time overhead and 0.78% binary size overhead on average. △ Less

Submitted 19 September, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

Comments: 14 pages

Journal ref: IEEE Internet of Things Journal.(2018) 1-14

arXiv:1607.08193 [pdf, other]

doi 10.1103/PhysRevA.94.032315

Loss-tolerant quantum secure positioning with weak laser sources

Authors: Charles Ci Wen Lim, Feihu Xu, George Siopsis, Eric Chitambar, Philip G. Evans, Bing Qi

Abstract: Quantum position verification (QPV) is the art of verifying the geographical location of an untrusted party. Recently, it has been shown that the widely studied Bennett & Brassard 1984 (BB84) QPV protocol is insecure after the 3 dB loss point assuming local operations and classical communication (LOCC) adversaries. Here, we propose a time-reversed entanglement swap** QPV protocol (based on measu… ▽ More Quantum position verification (QPV) is the art of verifying the geographical location of an untrusted party. Recently, it has been shown that the widely studied Bennett & Brassard 1984 (BB84) QPV protocol is insecure after the 3 dB loss point assuming local operations and classical communication (LOCC) adversaries. Here, we propose a time-reversed entanglement swap** QPV protocol (based on measurement-device-independent quantum cryptography) that is highly robust against quantum channel loss. First, assuming ideal qubit sources, we show that the protocol is secure against LOCC adversaries for any quantum channel loss, thereby overcoming the 3 dB loss limit. Then, we analyze the security of the protocol in a more practical setting involving weak laser sources and linear optics. In this setting, we find that the security only degrades by an additive constant and the protocol is able to verify positions up to 47 dB channel loss. △ Less

Submitted 27 July, 2016; originally announced July 2016.

Comments: 11 pages, 3 figures. Partially based on an earlier work in arXiv:1510.04891

Journal ref: Phys. Rev. A 94, 032315 (2016)

arXiv:1303.6849 [pdf]

doi 10.1088/1674-1137/37/10/106102

A Fast Improved Fat Tree Encoder for Wave Union TDC in an FPGA

Authors: Qi Shen, Lei Zhao, Shubin Liu, Shengkai Liao, Binxiang Qi, Xueye Hu, Chengzhi Peng, Qi An

Abstract: Up to the present, the wave union method can achieve the best timing performance in FPGA based TDC designs. However, it should be guaranteed in such a structure that the non-thermometer code to binary code (NTH2B) encoding process should be finished within just one system clock cycle. So the implementation of the NTH2B encoder is quite challenging considering the high speed requirement. Besides, t… ▽ More Up to the present, the wave union method can achieve the best timing performance in FPGA based TDC designs. However, it should be guaranteed in such a structure that the non-thermometer code to binary code (NTH2B) encoding process should be finished within just one system clock cycle. So the implementation of the NTH2B encoder is quite challenging considering the high speed requirement. Besides, the high resolution wave union TDC also demands the encoder to convert an ultra-wide input code to a binary code. We present a fast improved fat tree encoder (IFTE) to fulfill such requirements, in which bubble error suppression is also integrated. With this encoder scheme, a wave union TDC with 7.7 ps RMS and 3.8 ps effective bin size was implemented in an FPGA from Xilinx Virtex 5 family. An encoding time of 8.33 ns was achieved for a 276-bit non-thermometer code to a 9-bit binary code conversion. We conducted a series of tests on the oscillating period of the wave union launcher, as well as the overall performance of the TDC; test results indicate that the IFTE works well. In fact, in the implementation of this encoder, no manual routing or special constrains were required; therefore, this IFTE structure could also be further applied in other delay chain based FPGA TDCs. △ Less

Submitted 27 March, 2013; originally announced March 2013.

Comments: Submitted to "Chinese Physics C"

arXiv:1207.1473 [pdf, ps, other]

doi 10.1103/PhysRevA.87.062327

Postprocessing for quantum random number generators: entropy evaluation and randomness extraction

Authors: Xiongfeng Ma, Feihu Xu, He Xu, Xiaoqing Tan, Bing Qi, Hoi-Kwong Lo

Abstract: Quantum random-number generators (QRNGs) can offer a means to generate information-theoretically provable random numbers, in principle. In practice, unfortunately, the quantum randomness is inevitably mixed with classical randomness due to classical noises. To distill this quantum randomness, one needs to quantify the randomness of the source and apply a randomness extractor. Here, we propose a ge… ▽ More Quantum random-number generators (QRNGs) can offer a means to generate information-theoretically provable random numbers, in principle. In practice, unfortunately, the quantum randomness is inevitably mixed with classical randomness due to classical noises. To distill this quantum randomness, one needs to quantify the randomness of the source and apply a randomness extractor. Here, we propose a generic framework for evaluating quantum randomness of real-life QRNGs by min-entropy, and apply it to two different existing quantum random-number systems in the literature. Moreover, we provide a guideline of QRNG data postprocessing for which we implement two information-theoretically provable randomness extractors: Toeplitz-hashing extractor and Trevisan's extractor. △ Less

Submitted 21 June, 2013; v1 submitted 5 July, 2012; originally announced July 2012.

Comments: 13 pages, 2 figures

Journal ref: Phys. Rev. A 87, 062327 (2013)

arXiv:quant-ph/0601115 [pdf, ps, other]

doi 10.1103/PhysRevA.75.032314

Phase-Remap** Attack in Practical Quantum Key Distribution Systems

Authors: Chi-Hang Fred Fung, Bing Qi, Kiyoshi Tamaki, Hoi-Kwong Lo

Abstract: Quantum key distribution (QKD) can be used to generate secret keys between two distant parties. Even though QKD has been proven unconditionally secure against eavesdroppers with unlimited computation power, practical implementations of QKD may contain loopholes that may lead to the generated secret keys being compromised. In this paper, we propose a phase-remap** attack targeting two practical… ▽ More Quantum key distribution (QKD) can be used to generate secret keys between two distant parties. Even though QKD has been proven unconditionally secure against eavesdroppers with unlimited computation power, practical implementations of QKD may contain loopholes that may lead to the generated secret keys being compromised. In this paper, we propose a phase-remap** attack targeting two practical bidirectional QKD systems (the "plug & play" system and the Sagnac system). We showed that if the users of the systems are unaware of our attack, the final key shared between them can be compromised in some situations. Specifically, we showed that, in the case of the Bennett-Brassard 1984 (BB84) protocol with ideal single-photon sources, when the quantum bit error rate (QBER) is between 14.6% and 20%, our attack renders the final key insecure, whereas the same range of QBER values has been proved secure if the two users are unaware of our attack; also, we demonstrated three situations with realistic devices where positive key rates are obtained without the consideration of Trojan horse attacks but in fact no key can be distilled. We remark that our attack is feasible with only current technology. Therefore, it is very important to be aware of our attack in order to ensure absolute security. In finding our attack, we minimize the QBER over individual measurements described by a general POVM, which has some similarity with the standard quantum state discrimination problem. △ Less

Submitted 5 March, 2007; v1 submitted 17 January, 2006; originally announced January 2006.

Comments: 13 pages, 8 figures

Journal ref: Phys. Rev. A 75, 032314 (2007)

Showing 1–39 of 39 results for author: Qi, B