Search | arXiv e-print repository

Benchmarking PathCLIP for Pathology Image Analysis

Authors: Sunyi Zheng, Xiaonan Cui, Yuxuan Sun, **gxiong Li, Honglin Li, Yunlong Zhang, **yi Chen, Xue** **g, Zhaoxiang Ye, Lin Yang

Abstract: Accurate image classification and retrieval are of importance for clinical diagnosis and treatment decision-making. The recent contrastive language-image pretraining (CLIP) model has shown remarkable proficiency in understanding natural images. Drawing inspiration from CLIP, PathCLIP is specifically designed for pathology image analysis, utilizing over 200,000 image and text pairs in training. Whi… ▽ More Accurate image classification and retrieval are of importance for clinical diagnosis and treatment decision-making. The recent contrastive language-image pretraining (CLIP) model has shown remarkable proficiency in understanding natural images. Drawing inspiration from CLIP, PathCLIP is specifically designed for pathology image analysis, utilizing over 200,000 image and text pairs in training. While the performance the PathCLIP is impressive, its robustness under a wide range of image corruptions remains unknown. Therefore, we conduct an extensive evaluation to analyze the performance of PathCLIP on various corrupted images from the datasets of Osteosarcoma and WSSS4LUAD. In our experiments, we introduce seven corruption types including brightness, contrast, Gaussian blur, resolution, saturation, hue, and markup at four severity levels. Through experiments, we find that PathCLIP is relatively robustness to image corruptions and surpasses OpenAI-CLIP and PLIP in zero-shot classification. Among the seven corruptions, blur and resolution can cause server performance degradation of the PathCLIP. This indicates that ensuring the quality of images is crucial before conducting a clinical test. Additionally, we assess the robustness of PathCLIP in the task of image-image retrieval, revealing that PathCLIP performs less effectively than PLIP on Osteosarcoma but performs better on WSSS4LUAD under diverse corruptions. Overall, PathCLIP presents impressive zero-shot classification and retrieval performance for pathology images, but appropriate care needs to be taken when using it. We hope this study provides a qualitative impression of PathCLIP and helps understand its differences from other CLIP models. △ Less

Submitted 12 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.01553 [pdf, other]

Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis

Authors: Shichuan Zhang, Sunyi Zheng, Zhongyi Shui, Honglin Li, Lin Yang

Abstract: Multi-modal Learning has attracted widespread attention in medical image analysis. Using multi-modal data, whole slide images (WSIs) and clinical information, can improve the performance of deep learning models in the diagnosis of axillary lymph node metastasis. However, clinical information is not easy to collect in clinical practice due to privacy concerns, limited resources, lack of interoperab… ▽ More Multi-modal Learning has attracted widespread attention in medical image analysis. Using multi-modal data, whole slide images (WSIs) and clinical information, can improve the performance of deep learning models in the diagnosis of axillary lymph node metastasis. However, clinical information is not easy to collect in clinical practice due to privacy concerns, limited resources, lack of interoperability, etc. Although patient selection can ensure the training set to have multi-modal data for model development, missing modality of clinical information can appear during test. This normally leads to performance degradation, which limits the use of multi-modal models in the clinic. To alleviate this problem, we propose a bidirectional distillation framework consisting of a multi-modal branch and a single-modal branch. The single-modal branch acquires the complete multi-modal knowledge from the multi-modal branch, while the multi-modal learns the robust features of WSI from the single-modal. We conduct experiments on a public dataset of Lymph Node Metastasis in Early Breast Cancer to validate the method. Our approach not only achieves state-of-the-art performance with an AUC of 0.861 on the test set without missing data, but also yields an AUC of 0.842 when the rate of missing modality is 80\%. This shows the effectiveness of the approach in dealing with multi-modal data and missing modality. Such a model has the potential to improve treatment decision-making for early breast cancer patients who have axillary lymph node metastatic status. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00644 [pdf, other]

DEWP: Deep Expansion Learning for Wind Power Forecasting

Authors: Wei Fan, Yanjie Fu, Shun Zheng, Jiang Bian, Yuanchun Zhou, Hui Xiong

Abstract: Wind is one kind of high-efficient, environmentally-friendly and cost-effective energy source. Wind power, as one of the largest renewable energy in the world, has been playing a more and more important role in supplying electricity. Though growing dramatically in recent years, the amount of generated wind power can be directly or latently affected by multiple uncertain factors, such as wind speed… ▽ More Wind is one kind of high-efficient, environmentally-friendly and cost-effective energy source. Wind power, as one of the largest renewable energy in the world, has been playing a more and more important role in supplying electricity. Though growing dramatically in recent years, the amount of generated wind power can be directly or latently affected by multiple uncertain factors, such as wind speed, wind direction, temperatures, etc. More importantly, there exist very complicated dependencies of the generated power on the latent composition of these multiple time-evolving variables, which are always ignored by existing works and thus largely hinder the prediction performances. To this end, we propose DEWP, a novel Deep Expansion learning for Wind Power forecasting framework to carefully model the complicated dependencies with adequate expressiveness. DEWP starts with a stack-by-stack architecture, where each stack is composed of (i) a variable expansion block that makes use of convolutional layers to capture dependencies among multiple variables; (ii) a time expansion block that applies Fourier series and backcast/forecast mechanism to learn temporal dependencies in sequential patterns. These two tailored blocks expand raw inputs into different latent feature spaces which can model different levels of dependencies of time-evolving sequential data. Moreover, we propose an inference block corresponding for each stack, which applies multi-head self-attentions to acquire attentive features and maps expanded latent representations into generated wind power. In addition, to make DEWP more expressive in handling deep neural architectures, we adapt doubly residue learning to process stack-by-stack outputs. Finally, we present extensive experiments in the real-world wind power forecasting application on two datasets from two different turbines to demonstrate the effectiveness of our approach. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: Accepted by TKDD

arXiv:2312.17515 [pdf, other]

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

Authors: Zi**g Shi, Meng Fang, Shunfeng Zheng, Shilong Deng, Ling Chen, Yali Du

Abstract: Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without established coordination protocols, requiring them to make intelligent inferences about teammates from limited data. This problem motivates the area of ad hoc teamwork… ▽ More Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without established coordination protocols, requiring them to make intelligent inferences about teammates from limited data. This problem motivates the area of ad hoc teamwork, in which an agent may potentially cooperate with a variety of teammates to achieve a shared goal. Our study focuses on the ad hoc teamwork problem where the agent operates in an environment driven by natural language. Our findings reveal the potential of LLM agents in team collaboration, highlighting issues related to hallucinations in communication. To address this issue, we develop CodeAct, a general agent that equips LLM with enhanced memory and code-driven reasoning, enabling the repurposing of partial information for rapid adaptation to new teammates. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: Code will release soon

arXiv:2312.16867 [pdf, other]

DualFluidNet: an Attention-based Dual-pipeline Network for FLuid Simulation

Authors: Yu Chen, Shuai Zheng, Menglong **, Yan Chang, Nianyi Wang

Abstract: Fluid motion can be considered as a point cloud transformation when using the SPH method. Compared to traditional numerical analysis methods, using machine learning techniques to learn physics simulations can achieve near-accurate results, while significantly increasing efficiency. In this paper, we propose an innovative approach for 3D fluid simulations utilizing an Attention-based Dual-pipeline… ▽ More Fluid motion can be considered as a point cloud transformation when using the SPH method. Compared to traditional numerical analysis methods, using machine learning techniques to learn physics simulations can achieve near-accurate results, while significantly increasing efficiency. In this paper, we propose an innovative approach for 3D fluid simulations utilizing an Attention-based Dual-pipeline Network, which employs a dual-pipeline architecture, seamlessly integrated with an Attention-based Feature Fusion Module. Unlike previous methods, which often make difficult trade-offs between global fluid control and physical law constraints, we find a way to achieve a better balance between these two crucial aspects with a well-designed dual-pipeline approach. Additionally, we design a Type-aware Input Module to adaptively recognize particles of different types and perform feature fusion afterward, such that fluid-solid coupling issues can be better dealt with. Furthermore, we propose a new dataset, Tank3D, to further explore the network's ability to handle more complicated scenes. The experiments demonstrate that our approach not only attains a quantitative enhancement in various metrics, surpassing the state-of-the-art methods but also signifies a qualitative leap in neural network-based simulation by faithfully adhering to the physical laws. Code and video demonstrations are available at https://github.com/chenyu-xjtu/DualFluidNet. △ Less

Submitted 18 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 14 pages

arXiv:2312.15993 [pdf]

Adaptive Kalman-based hybrid car following strategy using TD3 and CACC

Authors: Yuqi Zheng, Ruidong Yan, Bin Jia, Rui Jiang, Adriana TAPUS, Xiao**g Chen, Shiteng Zheng, Ying Shang

Abstract: In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performa… ▽ More In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performance and even lead to accidents. To address the above problems, a hybrid car following strategy based on an adaptive Kalman Filter is proposed by regarding CACC and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Different from traditional hybrid strategy based on fixed coefficients, the Kalman gain H, using as an adaptive coefficient, is derived from multi-timestep predictions and Monte Carlo Tree Search. At the end of study, simulation results with 4157745 timesteps indicate that, compared with the TD3 and HCFS algorithms, the proposed algorithm in this study can substantially enhance the safety of car following in mixed traffic flow without compromising the comfort and efficiency. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 32pages,13figures

arXiv:2312.14414 [pdf, other]

Critical quantum geometric tensors of parametrically-driven nonlinear resonators

Authors: Hao-Long Zhang, Jia-Hao Lv, Ken Chen, Xue-Jia Yu, Fan Wu, Zhen-Biao Yang, Shi-Biao Zheng

Abstract: Parametrically driven nonlinear resonators represent a building block for realizing fault-tolerant quantum computation and are useful for critical quantum sensing. From a fundamental viewpoint, the most intriguing feature of such a system is perhaps the critical phenomena, which can occur without interaction with any other quantum system. The non-analytic behaviors of its eigenspectrum have been s… ▽ More Parametrically driven nonlinear resonators represent a building block for realizing fault-tolerant quantum computation and are useful for critical quantum sensing. From a fundamental viewpoint, the most intriguing feature of such a system is perhaps the critical phenomena, which can occur without interaction with any other quantum system. The non-analytic behaviors of its eigenspectrum have been substantially investigated, but those associated with the ground state wavefunction have largely remained unexplored. Using the quantum ground state geometric tensor as an indicator, we comprehensively establish a phase diagram involving the driving parameter $\varepsilon$ and phase $φ$. The results reveal that with the increase in $\varepsilon$, the system undergoes a quantum phase transition from the normal to the superradiant phase, with the critical point unaffected by $φ$. Furthermore, the critical exponent and scaling dimension are obtained by an exact numerical method, which is consistent with previous works. Our numerical results show that the phase transition falls within the universality class of the quantum Rabi model. This work reveals that the quantum metric and Berry curvature display diverging behaviors across the quantum phase transition. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Any comments or suggestions are welcome !

arXiv:2312.11820 [pdf, other]

SoC-Tuner: An Importance-guided Exploration Framework for DNN-targeting SoC Design

Authors: Shixin Chen, Su Zheng, Chen Bai, Wenqian Zhao, Shuo Yin, Yang Bai, Bei Yu

Abstract: Designing a system-on-chip (SoC) for deep neural network (DNN) acceleration requires balancing multiple metrics such as latency, power, and area. However, most existing methods ignore the interactions among different SoC components and rely on inaccurate and error-prone evaluation tools, leading to inferior SoC design. In this paper, we present SoC-Tuner, a DNN-targeting exploration framework to f… ▽ More Designing a system-on-chip (SoC) for deep neural network (DNN) acceleration requires balancing multiple metrics such as latency, power, and area. However, most existing methods ignore the interactions among different SoC components and rely on inaccurate and error-prone evaluation tools, leading to inferior SoC design. In this paper, we present SoC-Tuner, a DNN-targeting exploration framework to find the Pareto optimal set of SoC configurations efficiently. Our framework constructs a thorough SoC design space of all components and divides the exploration into three phases. We propose an importance-based analysis to prune the design space, a sampling algorithm to select the most representative initialization points, and an information-guided multi-objective optimization method to balance multiple design metrics of SoC design. We validate our framework with the actual very-large-scale-integration (VLSI) flow on various DNN benchmarks and show that it outperforms previous methods. To the best of our knowledge, this is the first work to construct an exploration framework of SoCs for DNN acceleration. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: ASP-DAC 2024

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11539 [pdf, other]

KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know

Authors: Shangshang Zheng, He Bai, Yizhe Zhang, Yi Su, Xiaochuan Niu, Navdeep Jaitly

Abstract: Measuring the alignment between a Knowledge Graph (KG) and Large Language Models (LLMs) is an effective method to assess the factualness and identify the knowledge blind spots of LLMs. However, this approach encounters two primary challenges including the translation of KGs into natural language and the efficient evaluation of these extensive and complex structures. In this paper, we present KGLen… ▽ More Measuring the alignment between a Knowledge Graph (KG) and Large Language Models (LLMs) is an effective method to assess the factualness and identify the knowledge blind spots of LLMs. However, this approach encounters two primary challenges including the translation of KGs into natural language and the efficient evaluation of these extensive and complex structures. In this paper, we present KGLens--a novel framework aimed at measuring the alignment between KGs and LLMs, and pinpointing the LLMs' knowledge deficiencies relative to KGs. KGLens features a graph-guided question generator for converting KGs into natural language, along with a carefully designed sampling strategy based on parameterized KG structure to expedite KG traversal. We conducted experiments using three domain-specific KGs from Wikidata, which comprise over 19,000 edges, 700 relations, and 21,000 entities. Our analysis across eight LLMs reveals that KGLens not only evaluates the factual accuracy of LLMs more rapidly but also delivers in-depth analyses on topics, temporal dynamics, and relationships. Furthermore, human evaluation results indicate that KGLens can assess LLMs with a level of accuracy nearly equivalent to that of human annotators, achieving 95.7% of the accuracy rate. △ Less

Submitted 16 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.09579 [pdf, other]

MobileSAMv2: Faster Segment Anything to Everything

Authors: Chaoning Zhang, Dongshen Han, Sheng Zheng, **woo Choi, Tae-Ho Kim, Choong Seon Hong

Abstract: Segment anything model (SAM) addresses two practical yet challenging segmentation tasks: \textbf{segment anything (SegAny)}, which utilizes a certain point to predict the mask for a single object of interest, and \textbf{segment everything (SegEvery)}, which predicts the masks for all objects on the image. What makes SegAny slow for SAM is its heavyweight image encoder, which has been addressed by… ▽ More Segment anything model (SAM) addresses two practical yet challenging segmentation tasks: \textbf{segment anything (SegAny)}, which utilizes a certain point to predict the mask for a single object of interest, and \textbf{segment everything (SegEvery)}, which predicts the masks for all objects on the image. What makes SegAny slow for SAM is its heavyweight image encoder, which has been addressed by MobileSAM via decoupled knowledge distillation. The efficiency bottleneck of SegEvery with SAM, however, lies in its mask decoder because it needs to first generate numerous masks with redundant grid-search prompts and then perform filtering to obtain the final valid masks. We propose to improve its efficiency by directly generating the final masks with only valid prompts, which can be obtained through object discovery. Our proposed approach not only helps reduce the total time on the mask decoder by at least 16 times but also achieves superior performance. Specifically, our approach yields an average performance boost of 3.6\% (42.5\% \textit{v.s.} 38.9\%) for zero-shot object proposal on the LVIS dataset with the mask AR@$K$ metric. Qualitative results show that our approach generates fine-grained masks while avoiding over-segmenting things. This project targeting faster SegEvery than the original SAM is termed MobileSAMv2 to differentiate from MobileSAM which targets faster SegAny. Moreover, we demonstrate that our new prompt sampling is also compatible with the distilled image encoders in MobileSAM, contributing to a unified framework for efficient SegAny and SegEvery. The code is available at the same link as MobileSAM Project \href{https://github.com/ChaoningZhang/MobileSAM}{\textcolor{red}{https://github.com/ChaoningZhang/MobileSAM}}. \end{abstract} △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: MobileSAM achieves faster segment anything, while MobileSAMv2 achieves faster segment everything

arXiv:2312.08648 [pdf, other]

CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data

Authors: Jiangming Shi, Shanshan Zheng, Xiangbo Yin, Yang Lu, Yuan Xie, Yanyun Qu

Abstract: Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such… ▽ More Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such as Contrastive Language-Image Pre-training (CLIP), which paves a new way for image classification and object recognition. Inspired by the success of CLIP on few-shot and zero-shot learning, we use CLIP to optimize the federated learning between server and client models under its vision-language supervision. It is promising to mitigate the user heterogeneity and class-distribution balance due to the powerful cross-modality representation and rich open-vocabulary prior knowledge. In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. In CLIP2FL, the knowledge of the off-the-shelf CLIP model is transferred to the client-server models, and a bridge is built between the client and server. Specifically, for client-side learning, knowledge distillation is conducted between client models and CLIP to improve the ability of client-side feature representation. For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model. A prototype contrastive learning with the supervision of the text encoder of CLIP is introduced to generate federated features depending on the client-side gradients, and they are used to retrain a balanced server classifier. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: This paper has been accepted by AAAI24

arXiv:2312.08538 [pdf, other]

Contractive error feedback for gradient compression

Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis

Abstract: On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle… ▽ More On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.08164 [pdf, other]

doi 10.1364/OE.499778

Quantum metric and metrology with parametrically-driven Tavis-Cummings models

Authors: Jia-Hao Lü, Pei-Rong Han, Wen Ning, Xin Zhu, Fan Wu, Li-Tuo Shen, Zhen-Biao Yang, Shi-Biao Zheng

Abstract: We study the quantum metric in a driven Tavis-Cummings model, comprised of multiple qubits interacting with a quantized photonic field. The parametrical driving of the photonic field breaks the system's U(1) symmetry down to a ${\rm Z}_2$ symmetry, whose spontaneous breaking initiates a superradiant phase transition. We analytically solved the eigenenergies and eigenstates, and numerically simulat… ▽ More We study the quantum metric in a driven Tavis-Cummings model, comprised of multiple qubits interacting with a quantized photonic field. The parametrical driving of the photonic field breaks the system's U(1) symmetry down to a ${\rm Z}_2$ symmetry, whose spontaneous breaking initiates a superradiant phase transition. We analytically solved the eigenenergies and eigenstates, and numerically simulated the system behaviors near the critical point. The critical behaviors near the superradiant phase transition are characterized by the quantum metric, defined in terms of the response of the quantum state to variation of the control parameter. In addition, a quantum metrological protocol based on the critical behaviors of the quantum metric near the superradiant phase transition is proposed, which enables greatly the achievable measurement precision. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 13 pages, 5 figures

Journal ref: Opt. Express 31, 41669-41683 (2023)

arXiv:2312.06632 [pdf, other]

Control Risk for Potential Misuse of Artificial Intelligence in Science

Authors: Jiyan He, Weitao Feng, Yaosen Min, **gwei Yi, Kunsheng Tang, Shuai Li, Jie Zhang, Kejiang Chen, Wenbo Zhou, Xing Xie, Weiming Zhang, Nenghai Yu, Shuxin Zheng

Abstract: The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in scien… ▽ More The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in science, and call for responsible AI development and use in this domain. We first itemize the risks posed by AI in scientific contexts, then demonstrate the risks by highlighting real-world examples of misuse in chemical science. These instances underscore the need for effective risk management strategies. In response, we propose a system called SciGuard to control misuse risks for AI models in science. We also propose a red-teaming benchmark SciMT-Safety to assess the safety of different systems. Our proposed SciGuard shows the least harmful impact in the assessment without compromising performance in benign tests. Finally, we highlight the need for a multidisciplinary and collaborative effort to ensure the safe and ethical use of AI models in science. We hope that our study can spark productive discussions on using AI ethically in science among researchers, practitioners, policymakers, and the public, to maximize benefits and minimize the risks of misuse. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.05486 [pdf, other]

FreeFlow: A Comprehensive Understanding on Diffusion Probabilistic Models via Optimal Transport

Authors: Bowen Sun, Shibao Zheng

Abstract: The blooming diffusion probabilistic models (DPMs) have garnered significant interest due to their impressive performance and the elegant inspiration they draw from physics. While earlier DPMs relied upon the Markovian assumption, recent methods based on differential equations have been rapidly applied to enhance the efficiency and capabilities of these models. However, a theoretical interpretatio… ▽ More The blooming diffusion probabilistic models (DPMs) have garnered significant interest due to their impressive performance and the elegant inspiration they draw from physics. While earlier DPMs relied upon the Markovian assumption, recent methods based on differential equations have been rapidly applied to enhance the efficiency and capabilities of these models. However, a theoretical interpretation encapsulating these diverse algorithms is insufficient yet pressingly required to guide further development of DPMs. In response to this need, we present FreeFlow, a framework that provides a thorough explanation of the diffusion formula as time-dependent optimal transport, where the evolutionary pattern of probability density is given by the gradient flows of a functional defined in Wasserstein space. Crucially, our framework necessitates a unified description that not only clarifies the subtle mechanism of DPMs but also indicates the roots of some defects through creative involvement of Lagrangian and Eulerian views to understand the evolution of probability flow. We particularly demonstrate that the core equation of FreeFlow condenses all stochastic and deterministic DPMs into a single case, showcasing the expansibility of our method. Furthermore, the Riemannian geometry employed in our work has the potential to bridge broader subjects in mathematics, which enable the involvement of more profound tools for the establishment of more outstanding and generalized models in the future. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.04973 [pdf, other]

Ex-post Individually Rational Bayesian Persuasion

Authors: Jiahao Zhang, Shuran Zheng, Renato Paes Leme, Zhiwei Steven Wu

Abstract: The success of Bayesian persuasion relies on the key assumption that the sender will commit to a predetermined information disclosure policy (signaling scheme). However, in practice, it is usually difficult for the receiver to monitor whether the sender sticks to the disclosure policy, which makes the credibility of the sender's disclosure policy questionable. The sender's credibility is particula… ▽ More The success of Bayesian persuasion relies on the key assumption that the sender will commit to a predetermined information disclosure policy (signaling scheme). However, in practice, it is usually difficult for the receiver to monitor whether the sender sticks to the disclosure policy, which makes the credibility of the sender's disclosure policy questionable. The sender's credibility is particularly tenuous when there are obvious deviations that benefit the sender. In this work, we identify such a deviation: the sender may be unwilling to send a signal that will lead to a less desirable outcome compared to no information disclosure. We thus propose the notion of ex-post individually rational (ex-post IR) Bayesian persuasion: after observing the state, the sender is never required to send a signal that will make the outcome worse off (compared to no information disclosure). An ex-post IR Bayesian persuasion policy is more likely to be truthfully followed by the sender, and thus more credible for the receiver. Our contribution is threefold. Firstly, we demonstrate that the optimal ex-post IR Bayesian persuasion policy can be efficiently computed through a linear program, while also offering geometric characterizations of this optimal policy. Second, we show that surprisingly, for non-trivial classes of games, the imposition of ex-post IR constraints does not affect the sender's expected utility. Finally, we compare ex-post IR Bayesian persuasion to other information disclosure models that ensure different notions of credibility. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 21 pages

arXiv:2312.04020 [pdf, ps, other]

Note on gradient estimate of heat kernel for Schrödinger operators

Authors: Shijun Zheng

Abstract: Let $H=-Δ+V$ be a Schrödinger operator on $\mathbb{R}^n$. We show that gradient estimates for the heat kernel of $H$ with upper Gaussian bounds imply polynomial decay for the kernels of certain smooth dyadic spectral operators. The latter decay property has been known to play an important role in the Littlewood-Paley theory for $L^p$ and Sobolev spaces. We are able to establish the result by modif… ▽ More Let $H=-Δ+V$ be a Schrödinger operator on $\mathbb{R}^n$. We show that gradient estimates for the heat kernel of $H$ with upper Gaussian bounds imply polynomial decay for the kernels of certain smooth dyadic spectral operators. The latter decay property has been known to play an important role in the Littlewood-Paley theory for $L^p$ and Sobolev spaces. We are able to establish the result by modifying Hebisch and the author's recent proofs. We give a counterexample in one dimension to show that there exists $V$ in the Schwartz class such that the long time gradient heat kernel estimate fails. △ Less

Submitted 6 December, 2023; originally announced December 2023.

MSC Class: 35J10; 42B37

Journal ref: Applied Mathematics, Volume 1, No.5, November 2010, pp. 425-430

arXiv:2312.03690 [pdf, other]

Inverse Design of Vitrimeric Polymers by Molecular Dynamics and Generative Modeling

Authors: Yiwen Zheng, Prakash Thakolkaran, Jake A. Smith, Ziheng Lu, Shuxin Zheng, Bichlien H. Nguyen, Siddhant Kumar, Aniruddh Vashisth

Abstract: Vitrimer is a new class of sustainable polymers with the ability of self-healing through rearrangement of dynamic covalent adaptive networks. However, a limited choice of constituent molecules restricts their property space, prohibiting full realization of their potential applications. Through a combination of molecular dynamics (MD) simulations and machine learning (ML), particularly a novel grap… ▽ More Vitrimer is a new class of sustainable polymers with the ability of self-healing through rearrangement of dynamic covalent adaptive networks. However, a limited choice of constituent molecules restricts their property space, prohibiting full realization of their potential applications. Through a combination of molecular dynamics (MD) simulations and machine learning (ML), particularly a novel graph variational autoencoder (VAE) model, we establish a method for generating novel vitrimers and guide their inverse design based on desired glass transition temperature (Tg). We build the first vitrimer dataset of one million and calculate Tg on 8,424 of them by high-throughput MD simulations calibrated by a Gaussian process model. The proposed VAE employs dual graph encoders and a latent dimension overlap** scheme which allows for individual representation of multi-component vitrimers. By constructing a continuous latent space containing necessary information of vitrimers, we demonstrate high accuracy and efficiency of our framework in discovering novel vitrimers with desirable Tg beyond the training regime. The proposed vitrimers with reasonable synthesizability cover a wide range of Tg and broaden the potential widespread usage of vitrimeric materials. △ Less

Submitted 13 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.02546 [pdf, other]

Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning

Authors: Zhuo Huang, Chang Liu, Yinpeng Dong, Hang Su, Shibao Zheng, Tongliang Liu

Abstract: Although vision models such as Contrastive Language-Image Pre-Training (CLIP) show impressive generalization performance, their zero-shot robustness is still limited under Out-of-Distribution (OOD) scenarios without fine-tuning. Instead of undesirably providing human supervision as commonly done, it is possible to take advantage of Multi-modal Large Language Models (MLLMs) that hold powerful visua… ▽ More Although vision models such as Contrastive Language-Image Pre-Training (CLIP) show impressive generalization performance, their zero-shot robustness is still limited under Out-of-Distribution (OOD) scenarios without fine-tuning. Instead of undesirably providing human supervision as commonly done, it is possible to take advantage of Multi-modal Large Language Models (MLLMs) that hold powerful visual understanding abilities. However, MLLMs are shown to struggle with vision problems due to the incompatibility of tasks, thus hindering their utilization. In this paper, we propose to effectively leverage MLLMs to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models. By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner. To solve the incompatibility issue, we propose a novel Denoising In-Context Learning (DICL) strategy to align vision tasks with MLLMs. Concretely, by estimating a transition matrix that captures the probability of one class being confused with another, an instruction containing a correct exemplar and an erroneous one from the most probable noisy class can be constructed. Such an instruction can help any MLLMs with ICL ability to detect and rectify incorrect predictions of vision models. Through extensive experiments on ImageNet, WILDS, DomainBed, and other OOD datasets, we carefully validate the quantitative and qualitative effectiveness of our method. Our code is available at https://github.com/tmllab/Machine_Vision_Therapy. △ Less

Submitted 29 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: ICML 2024

arXiv:2312.02155 [pdf, other]

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Authors: Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Sheng** Zhang, Liqiang Nie, Yebin Liu

Abstract: We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress… ▽ More We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed. △ Less

Submitted 16 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024 (Highlight). Project page: https://shunyuanzheng.github.io/GPS-Gaussian

arXiv:2312.01367 [pdf, other]

DiFace: Cross-Modal Face Recognition through Controlled Diffusion

Authors: Bowen Sun, Shibao Zheng

Abstract: Diffusion probabilistic models (DPMs) have exhibited exceptional proficiency in generating visual media of outstanding quality and realism. Nonetheless, their potential in non-generative domains, such as face recognition, has yet to be thoroughly investigated. Meanwhile, despite the extensive development of multi-modal face recognition methods, their emphasis has predominantly centered on visual m… ▽ More Diffusion probabilistic models (DPMs) have exhibited exceptional proficiency in generating visual media of outstanding quality and realism. Nonetheless, their potential in non-generative domains, such as face recognition, has yet to be thoroughly investigated. Meanwhile, despite the extensive development of multi-modal face recognition methods, their emphasis has predominantly centered on visual modalities. In this context, face recognition through textual description presents a unique and promising solution that not only transcends the limitations from application scenarios but also expands the potential for research in the field of cross-modal face recognition. It is regrettable that this avenue remains unexplored and underutilized, a consequence from the challenges mainly associated with three aspects: 1) the intrinsic imprecision of verbal descriptions; 2) the significant gaps between texts and images; and 3) the immense hurdle posed by insufficient databases.To tackle this problem, we present DiFace, a solution that effectively achieves face recognition via text through a controllable diffusion process, by establishing its theoretical connection with probability transport. Our approach not only unleashes the potential of DPMs across a broader spectrum of tasks but also achieves, to the best of our knowledge, a significant accuracy in text-to-image face recognition for the first time, as demonstrated by our experiments on verification and identification. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2312.01292 [pdf, ps, other]

Joint Beam Scheduling and Power Optimization for Beam Hop** LEO Satellite Systems

Authors: Shuang Zheng, Xing Zhang, Peng Wang, Wenbo Wang

Abstract: Low earth orbit (LEO) satellite communications can provide ubiquitous and reliable services, making it an essential part of the Internet of Everything network. Beam hop** (BH) is an emerging technology for effectively addressing the issue of low resource utilization caused by the non-uniform spatio-temporal distribution of traffic demands. However, how to allocate multi-dimensional resources in… ▽ More Low earth orbit (LEO) satellite communications can provide ubiquitous and reliable services, making it an essential part of the Internet of Everything network. Beam hop** (BH) is an emerging technology for effectively addressing the issue of low resource utilization caused by the non-uniform spatio-temporal distribution of traffic demands. However, how to allocate multi-dimensional resources in a timely and efficient way for the highly dynamic LEO satellite systems remains a challenge. This paper proposes a joint beam scheduling and power optimization beam hop** (JBSPO-BH) algorithm considering the differences in the geographic distribution of sink nodes. The JBSPO-BH algorithm decouples the original problem into two sub-problems. The beam scheduling problem is modelled as a potential game, and the Nash equilibrium (NE) point is obtained as the beam scheduling strategy. Moreover, the penalty function interior point method is applied to optimize the power allocation. Simulation results show that the JBSPO-BH algorithm has low time complexity and fast convergence and achieves better performance both in throughput and fairness. Compared with greedy-based BH, greedy-based BH with the power optimization, round-robin BH, Max-SINR BH and satellite resource allocation algorithm, the throughput of the proposed algorithm is improved by 44.99%, 20.79%, 156.06%, 15.39% and 8.17%, respectively. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2312.00186 [pdf, other]

Planning Reliability Assurance Tests for Autonomous Vehicles

Authors: Simin Zheng, Lu Lu, Yili Hong, Jian Liu

Abstract: Artificial intelligence (AI) technology has become increasingly prevalent and transforms our everyday life. One important application of AI technology is the development of autonomous vehicles (AV). However, the reliability of an AV needs to be carefully demonstrated via an assurance test so that the product can be used with confidence in the field. To plan for an assurance test, one needs to dete… ▽ More Artificial intelligence (AI) technology has become increasingly prevalent and transforms our everyday life. One important application of AI technology is the development of autonomous vehicles (AV). However, the reliability of an AV needs to be carefully demonstrated via an assurance test so that the product can be used with confidence in the field. To plan for an assurance test, one needs to determine how many AVs need to be tested for how many miles and the standard for passing the test. Existing research has made great efforts in develo** reliability demonstration tests in the other fields of applications for product development and assessment. However, statistical methods have not been utilized in AV test planning. This paper aims to fill in this gap by develo** statistical methods for planning AV reliability assurance tests based on recurrent events data. We explore the relationship between multiple criteria of interest in the context of planning AV reliability assurance tests. Specifically, we develop two test planning strategies based on homogeneous and non-homogeneous Poisson processes while balancing multiple objectives with the Pareto front approach. We also offer recommendations for practical use. The disengagement events data from the California Department of Motor Vehicles AV testing program is used to illustrate the proposed assurance test planning methods. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 29 pages, 5 figures

arXiv:2311.18220 [pdf, ps, other]

Lifting query complexity to time-space complexity for two-way finite automata

Authors: Shenggen Zheng, Yaqiao Li, Minghua Pan, Jozef Gruska, Lvzhou Li

Abstract: Time-space tradeoff has been studied in a variety of models, such as Turing machines, branching programs, and finite automata, etc. While communication complexity as a technique has been applied to study finite automata, it seems it has not been used to study time-space tradeoffs of finite automata. We design a new technique showing that separations of query complexity can be lifted, via communica… ▽ More Time-space tradeoff has been studied in a variety of models, such as Turing machines, branching programs, and finite automata, etc. While communication complexity as a technique has been applied to study finite automata, it seems it has not been used to study time-space tradeoffs of finite automata. We design a new technique showing that separations of query complexity can be lifted, via communication complexity, to separations of time-space complexity of two-way finite automata. As an application, one of our main results exhibits the first example of a language $L$ such that the time-space complexity of two-way probabilistic finite automata with a bounded error (2PFA) is $\widetildeΩ(n^2)$, while of exact two-way quantum finite automata with classical states (2QCFA) is $\widetilde{O}(n^{5/3})$, that is, we demonstrate for the first time that exact quantum computing has an advantage in time-space complexity comparing to classical computing. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.17267 [pdf, other]

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

Authors: Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu

Abstract: To build scalable models for challenging real-world tasks, it is important to learn from diverse, multi-modal data in various forms (e.g., videos, text, and images). Among the existing works, a plethora of them have focused on leveraging large but cumbersome cross-modal architectures. Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real… ▽ More To build scalable models for challenging real-world tasks, it is important to learn from diverse, multi-modal data in various forms (e.g., videos, text, and images). Among the existing works, a plethora of them have focused on leveraging large but cumbersome cross-modal architectures. Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real-world applications, so building a lightweight VL architecture and an efficient learning schema is of great practical value. In this paper, we propose an Efficient Video-Language Model (dubbed as E-ViLM) and a masked video modeling (MVM) schema, assisted with a semantic vector-quantized tokenizer. In particular, our E-ViLM learns to reconstruct the semantic labels of masked video regions, produced by the pre-trained vector-quantized tokenizer, which discretizes the continuous visual signals into labels. We show that with our simple MVM task and regular VL pre-training modelings, our E-ViLM, despite its compactness, is able to learn expressive representations from Video-Language corpus and generalize well to extensive Video-Language tasks including video question answering, text-to-video retrieval, etc. In particular, our E-ViLM obtains obvious efficiency improvements by reaching competing performances with faster inference speed, i.e., our model reaches $39.3$% Top-$1$ accuracy on the MSRVTT benchmark, retaining $91.4$% of the accuracy of state-of-the-art larger VL architecture with only $15%$ parameters and $94.8%$ fewer GFLOPs. We also provide extensive ablative studies that validate the effectiveness of our proposed learning schema for E-ViLM. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16480 [pdf, other]

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Authors: **yi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui, Lin Yang

Abstract: Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (PathText). In spec… ▽ More Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (PathText). In specific, we collected nearly 10000 high-quality WSI-text pairs for visual-language models by recognizing and cleaning pathology reports which narrate diagnostic slides in TCGA. On the model end, we propose the multiple instance generative model (MI-Gen) which can produce pathology reports for gigapixel WSIs. We benchmark our model on the largest subset of TCGA-PathoText. Experimental results show our model can generate pathology reports which contain multiple clinical clues and achieve competitive performance on certain slide-level tasks. We observe that simple semantic extraction from the pathology reports can achieve the best performance (0.838 of F1 score) on BRCA subty** surpassing previous state-of-the-art approaches. Our collected dataset and related code are available. △ Less

Submitted 27 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.16155 [pdf, other]

Deep Learning-Based Frequency Offset Estimation

Authors: Tao Chen, Shilian Zheng, Jiawei Zhu, Qi Xuan, Xiaoniu Yang

Abstract: In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning fo… ▽ More In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning for CFO estimation by employing a residual network (ResNet) to learn and extract signal features from the raw in-phase (I) and quadrature (Q) components of the signals. We use multiple modulation schemes in the training set to make the trained model adaptable to multiple modulations or even new signals. In comparison to the commonly used traditional CFO estimation methods, our proposed IQ-ResNet method exhibits superior performance across various scenarios including different oversampling ratios, various signal lengths, and different channels △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.15939 [pdf, other]

Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Authors: Zhongyi Shui, Yunlong Zhang, Kai Yao, Chenglu Zhu, Sunyi Zheng, **gxiong Li, Honglin Li, Yuxuan Sun, Ruizhe Guo, Lin Yang

Abstract: Nucleus instance segmentation in histology images is crucial for a broad spectrum of clinical applications. Current dominant algorithms rely on regression of nuclear proxy maps. Distinguishing nucleus instances from the estimated maps requires carefully curated post-processing, which is error-prone and parameter-sensitive. Recently, the Segment Anything Model (SAM) has earned huge attention in med… ▽ More Nucleus instance segmentation in histology images is crucial for a broad spectrum of clinical applications. Current dominant algorithms rely on regression of nuclear proxy maps. Distinguishing nucleus instances from the estimated maps requires carefully curated post-processing, which is error-prone and parameter-sensitive. Recently, the Segment Anything Model (SAM) has earned huge attention in medical image segmentation, owing to its impressive generalization ability and promptable property. Nevertheless, its potential on nucleus instance segmentation remains largely underexplored. In this paper, we present a novel prompt-driven framework that consists of a nucleus prompter and SAM for automatic nucleus instance segmentation. Specifically, the prompter learns to generate a unique point prompt for each nucleus while the SAM is fine-tuned to output the corresponding mask for the prompted nucleus. Furthermore, we propose the inclusion of adjacent nuclei as negative prompts to enhance the model's capability to identify overlap** nuclei. Without complicated post-processing, our proposed method sets a new state-of-the-art performance on three challenging benchmarks. Code is available at \url{github.com/windygoo/PromptNucSeg} △ Less

Submitted 24 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: under review

arXiv:2311.15058 [pdf]

Controlled generation of Poincaré sphere beams with inverse-designed multimode meta-waveguides

Authors: **g Luan, Shuang Zheng, Zhenyu Wan, Tiange Wu, Weijie Chang, Deming Liu, Minming Zhang

Abstract: The angular momentum of light can be described by positions on various Poincaré spheres, where different structured light beams have proven useful for numerous optical applications. However, the dynamic generation and control of arbitrary structured light on different Poincaré spheres is still handled via bulky optics in free space. Here we propose and demonstrate multimode silicon photonic integr… ▽ More The angular momentum of light can be described by positions on various Poincaré spheres, where different structured light beams have proven useful for numerous optical applications. However, the dynamic generation and control of arbitrary structured light on different Poincaré spheres is still handled via bulky optics in free space. Here we propose and demonstrate multimode silicon photonic integrated meta-waveguides to generate arbitrary structured light beams on polarization/orbit/higher-order/hybrid Poincaré spheres. The multimode meta-waveguides are inversely designed to map polarization states/higher-order spatial modes to orbit angular momentum, generating polarization-/charge-diverse orbit angular momentum modes. Based on the fundamental orbit angular momentum mode basis enabled by the meta-waveguides, different structured-light fields on polarization/orbit/higher-order/hybrid Poincaré spheres could be flexibly generated by controlling the relative amplitude and phase profiles of on-chip guided modes. The demonstrated photonic integrated devices hold great potential for the flexible manipulation of structure light beams in many applications. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.14273 [pdf, other]

doi 10.1007/JHEP02(2024)061

Gravitational Dark Matter from Minimal Preheating

Authors: Ruopeng Zhang, Sibo Zheng

Abstract: Following our previous work, we continue to explore gravitational dark matter production during the minimal preheating caused by inflaton self-resonance. In this situation there is only one dimensionless index parameter $n$ characterizing the inflation potential after the end of inflation, which leads to a robust prediction on the gravitational dark matter relic abundance. Using lattice method to… ▽ More Following our previous work, we continue to explore gravitational dark matter production during the minimal preheating caused by inflaton self-resonance. In this situation there is only one dimensionless index parameter $n$ characterizing the inflation potential after the end of inflation, which leads to a robust prediction on the gravitational dark matter relic abundance. Using lattice method to handle the non-perturbative evolutions of relevant quantities during the inflaton self-resonance, we derive the gravitational dark matter relic abundance arising from both the inflaton condensate and fluctuation annihilation. While being absent for $n=2$, the former one can instead dominate over the later one for $n=4,6$. Our results show that gravitational dark matter mass of $1.04~(2.66)\times 10^{14}$ GeV accommodates the observed value of dark matter relic abundance for $n=4$ (6). △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 13 pages, 5 figures

arXiv:2311.12885 [pdf, other]

Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis

Authors: Honglin Li, Yunlong Zhang, Chenglu Zhu, Jiatong Cai, Sunyi Zheng, Lin Yang

Abstract: Histopathology image analysis is the golden standard of clinical diagnosis for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole Slide Image (WSI) of histopathology tissue is used for analysis. Because of the extremely large scale of resolution, previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI by Multi-Instanc… ▽ More Histopathology image analysis is the golden standard of clinical diagnosis for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole Slide Image (WSI) of histopathology tissue is used for analysis. Because of the extremely large scale of resolution, previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI by Multi-Instance Learning (MIL) to make the slide-level prediction when develo** computer-aided diagnosis tools. However, most previous WSI-MIL models using global-attention without pairwise interaction and any positional information, or self-attention with absolute position embedding can not well handle shape varying large WSIs, e.g. testing WSIs after model deployment may be larger than training WSIs, since the model development set is always limited due to the difficulty of histopathology WSIs collection. To deal with the problem, in this paper, we propose to amend position embedding for shape varying long-contextual WSI by introducing Linear Bias into Attention, and adapt it from 1-d long sequence into 2-d long-contextual WSI which helps model extrapolate position embedding to unseen or under-fitted positions. We further utilize Flash-Attention module to tackle the computational complexity of Transformer, which also keep full self-attention performance compared to previous attention approximation work. Our method, Long-contextual MIL (Long-MIL) are evaluated on extensive experiments including 4 dataset including WSI classification and survival prediction tasks to validate the superiority on shape varying WSIs. The source code will be open-accessed soon. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.12413 [pdf, ps, other]

On the calculation of upper variance under multiple probabilities

Authors: Xinpeng Li, Miao Yu, Shiyi Zheng

Abstract: The notion of upper variance under multiple probabilities is defined by a corresponding minimax optimization problem. This paper proposes a simple algorithm to solve the related minimax optimization problem exactly. As an application, we provide the probabilistic representation for a class of quadratic programming problems. The notion of upper variance under multiple probabilities is defined by a corresponding minimax optimization problem. This paper proposes a simple algorithm to solve the related minimax optimization problem exactly. As an application, we provide the probabilistic representation for a class of quadratic programming problems. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2311.12358 [pdf, other]

Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

Authors: Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao

Abstract: Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME… ▽ More Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. In particular, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.11534 [pdf, other]

doi 10.1093/mnras/stad3480

Spatial distribution of NH2D in massive star-forming regions

Authors: Yuqiang Li, Junzhi Wang, Juan Li, Shu Liu, Kai Yang, Siqi Zheng, Zhe Lu

Abstract: To understand the relation between NH$_2$D and its physical environment, we mapped ortho-NH$_2$D $1_{11}^s-1_{01}^a$ at 85.9 GHz toward 24 Galactic late-stage massive star-forming regions with Institut de Radioastronomie Millim$ é$trique (IRAM) 30-m telescope. Ortho-NH$_2$D $1_{11}^s-1_{01}^a$ was detected in 18 of 24 sources. Comparing with the distribution of H$^{13}$CN 1-0 as a dense gas tracer… ▽ More To understand the relation between NH$_2$D and its physical environment, we mapped ortho-NH$_2$D $1_{11}^s-1_{01}^a$ at 85.9 GHz toward 24 Galactic late-stage massive star-forming regions with Institut de Radioastronomie Millim$ é$trique (IRAM) 30-m telescope. Ortho-NH$_2$D $1_{11}^s-1_{01}^a$ was detected in 18 of 24 sources. Comparing with the distribution of H$^{13}$CN 1-0 as a dense gas tracer and radio recombination line H42$α$, ortho-NH$_2$D $1_{11}^s-1_{01}^a$ present complex and diverse spatial distribution in these targets. 11 of the 18 targets, present a different distribution between ortho-NH$_2$D $1_{11}^s-1_{01}^a$ and H$^{13}$CN 1-0, while no significant difference between these two lines can be found in the other 7 sources, mainly due to limited spatial resolution and sensitivity. Moreover, with H42$α$ tracing massive young stellar objects, ortho-NH$_2$D $1_{11}^s-1_{01}^a$ seems to show a relatively weak emission near the massive young stellar objects. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: 30 pages, 20 figures, 4 tables. Accepted to MNRAS

arXiv:2311.11465 [pdf, other]

Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape

Authors: Chaoning Zhang, Yu Qiao, Shehbaz Tariq, Sheng Zheng, Chenshuang Zhang, Chenghao Li, Hyundong Shin, Choong Seon Hong

Abstract: In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM), which has attracted significant attention. In this work, we understand SAM from the perspective of t… ▽ More In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM), which has attracted significant attention. In this work, we understand SAM from the perspective of texture \textit{v.s.} shape. Different from label-oriented recognition tasks, the SAM is trained to predict a mask for covering the object shape based on a promt. With this said, it seems self-evident that the SAM is biased towards shape. In this work, however, we reveal an interesting finding: the SAM is strongly biased towards texture-like dense features rather than shape. This intriguing finding is supported by a novel setup where we disentangle texture and shape cues and design texture-shape cue conflict for mask prediction. △ Less

Submitted 3 June, 2023; originally announced November 2023.

arXiv:2311.10463 [pdf, other]

Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

Authors: Xiatian Zhang, Sisi Zheng, Hubert P. H. Shum, Haozheng Zhang, Nan Song, Mingkang Song, Hongxiao Jia

Abstract: Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mech… ▽ More Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mechanisms driving the observed patterns, which limited the clinical application of rs-fMRI. To overcome that, we propose a graph learning framework that captures comprehensive features by integrating both correlation and distance-based similarity measures under a contrastive loss. This approach results in a more expressive framework that captures brain dynamic features at different scales and enables more accurate prediction of treatment response. Our experiments on the chronic pain and depersonalization disorder datasets demonstrate that our proposed method outperforms current methods in different scenarios. To the best of our knowledge, we are the first to explore the integration of distance-based and correlation-based neural similarity into graph learning for treatment response prediction. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: Proceedings of the 2023 International Conference on Neural Information Processing (ICONIP)

arXiv:2311.10418 [pdf, other]

doi 10.1145/3627703.3629585

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Authors: Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu

Abstract: Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long… ▽ More Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 18 pages, 18 figures

arXiv:2311.10360 [pdf, other]

Self-interacting dark matter to freeze-in via vector portal

Authors: Xinyue Yin, Shuai Xu, Sibo Zheng

Abstract: It is challenging to resolve the small-scale problem for dark matter being a weakly-interacting massive particle. We attempt to address this issue by proposing a self-interacting freeze-in dark matter via dark photon. In this model, the dark matter obtains the observed relic abundance via Standard Model $γ$ and $Z$ boson induced freeze-in processes, whereas the dark matter force mediator has a neg… ▽ More It is challenging to resolve the small-scale problem for dark matter being a weakly-interacting massive particle. We attempt to address this issue by proposing a self-interacting freeze-in dark matter via dark photon. In this model, the dark matter obtains the observed relic abundance via Standard Model $γ$ and $Z$ boson induced freeze-in processes, whereas the dark matter force mediator has a negligible relic abundance and a lifetime larger than the age of Universe. We place constraints in classical and resonant regime resolving the small-scale problem from CMB, $X/γ$-ray, Supernova 1987A and out-of-equilibrium condition. It turns out that the CMB constraint on dark matter annihilations is satisfied despite large Sommerfeld effect taking place, while the other constraints are trivially accommodated due to various millicharge induced suppressions. Finally we briefly discuss future cosmological tests on such freeze-in dark matter model. △ Less

Submitted 8 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 18 pages, 6 figures. A refined version to eliminate errors in the previous version, with new model realization and numerical analysis

arXiv:2311.07972 [pdf, other]

Residual Importance Weighted Transfer Learning For High-dimensional Linear Regression

Authors: Junlong Zhao, Shengbin Zheng, Chenlei Leng

Abstract: Transfer learning is an emerging paradigm for leveraging multiple sources to improve the statistical inference on a single target. In this paper, we propose a novel approach named residual importance weighted transfer learning (RIW-TL) for high-dimensional linear models built on penalized likelihood. Compared to existing methods such as Trans-Lasso that selects sources in an all-in-all-out manner,… ▽ More Transfer learning is an emerging paradigm for leveraging multiple sources to improve the statistical inference on a single target. In this paper, we propose a novel approach named residual importance weighted transfer learning (RIW-TL) for high-dimensional linear models built on penalized likelihood. Compared to existing methods such as Trans-Lasso that selects sources in an all-in-all-out manner, RIW-TL includes samples via importance weighting and thus may permit more effective sample use. To determine the weights, remarkably RIW-TL only requires the knowledge of one-dimensional densities dependent on residuals, thus overcoming the curse of dimensionality of having to estimate high-dimensional densities in naive importance weighting. We show that the oracle RIW-TL provides a faster rate than its competitors and develop a cross-fitting procedure to estimate this oracle. We discuss variants of RIW-TL by adopting different choices for residual weighting. The theoretical properties of RIW-TL and its variants are established and compared with those of LASSO and Trans-Lasso. Extensive simulation and a real data analysis confirm its advantages. △ Less

Submitted 3 January, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.07877 [pdf, other]

Test-Time Training for Semantic Segmentation with Output Contrastive Loss

Authors: Yunlong Zhang, Yuxuan Sun, Sunyi Zheng, Zhongyi Shui, Chenglu Zhu, Lin Yang

Abstract: Although deep learning-based segmentation models have achieved impressive performance on public benchmarks, generalizing well to unseen environments remains a major challenge. To improve the model's generalization ability to the new domain during evaluation, the test-time training (TTT) is a challenging paradigm that adapts the source-pretrained model in an online fashion. Early efforts on TTT mai… ▽ More Although deep learning-based segmentation models have achieved impressive performance on public benchmarks, generalizing well to unseen environments remains a major challenge. To improve the model's generalization ability to the new domain during evaluation, the test-time training (TTT) is a challenging paradigm that adapts the source-pretrained model in an online fashion. Early efforts on TTT mainly focus on the image classification task. Directly extending these methods to semantic segmentation easily experiences unstable adaption due to segmentation's inherent characteristics, such as extreme class imbalance and complex decision spaces. To stabilize the adaptation process, we introduce contrastive loss (CL), known for its capability to learn robust and generalized representations. Nevertheless, the traditional CL operates in the representation space and cannot directly enhance predictions. In this paper, we resolve this limitation by adapting the CL to the output space, employing a high temperature, and simplifying the formulation, resulting in a straightforward yet effective loss function called Output Contrastive Loss (OCL). Our comprehensive experiments validate the efficacy of our approach across diverse evaluation scenarios. Notably, our method excels even when applied to models initially pre-trained using domain adaptation methods on test domain data, showcasing its resilience and adaptability.\footnote{Code and more information could be found at~ \url{https://github.com/dazhangyu123/OCL}} △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.07125 [pdf, other]

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Authors: Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang

Abstract: In the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a subset of discriminative instances, which are closely linked to overfitting. To mitigate overfitting, we present Attention-Challenging MIL (ACMIL). ACMIL combines two techniques based on separate analyses for attention value concentration. Firstly, UMAP o… ▽ More In the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) classification, attention mechanisms often focus on a subset of discriminative instances, which are closely linked to overfitting. To mitigate overfitting, we present Attention-Challenging MIL (ACMIL). ACMIL combines two techniques based on separate analyses for attention value concentration. Firstly, UMAP of instance features reveals various patterns among discriminative instances, with existing attention mechanisms capturing only some of them. To remedy this, we introduce Multiple Branch Attention (MBA) to capture more discriminative instances using multiple attention branches. Secondly, the examination of the cumulative value of Top-K attention scores indicates that a tiny number of instances dominate the majority of attention. In response, we present Stochastic Top-K Instance Masking (STKIM), which masks out a portion of instances with Top-K attention values and allocates their attention values to the remaining instances. The extensive experimental results on three WSI datasets with two pre-trained backbones reveal that our ACMIL outperforms state-of-the-art methods. Additionally, through heatmap visualization and UMAP visualization, this paper extensively illustrates ACMIL's effectiveness in suppressing attention value concentration and overcoming the overfitting challenge. The source code is available at \url{https://github.com/dazhangyu123/ACMIL}. △ Less

Submitted 28 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Under review

arXiv:2311.06330 [pdf, other]

Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations

Authors: Zengqing Wu, Run Peng, Xu Han, Shuyuan Zheng, Yixin Zhang, Chuan Xiao

Abstract: Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual c… ▽ More Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual components of a system. Yet, ABM has its own set of challenges, notably its struggle with modeling natural language instructions and common sense in mathematical equations or rules. This paper seeks to transcend these boundaries by integrating Large Language Models (LLMs) like GPT into ABM. This amalgamation gives birth to a novel framework, Smart Agent-Based Modeling (SABM). Building upon the concept of smart agents -- entities characterized by their intelligence, adaptability, and computation ability -- we explore in the direction of utilizing LLM-powered agents to simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the state of the art of ABM, introduce SABM's potential and methodology, and present three case studies (source codes available at https://github.com/Roihn/SABM), demonstrating the SABM methodology and validating its effectiveness in modeling real-world systems. Furthermore, we cast a vision towards several aspects of the future of SABM, anticipating a broader horizon for its applications. Through this endeavor, we aspire to redefine the boundaries of computer simulations, enabling a more profound understanding of complex systems. △ Less

Submitted 14 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Source codes are available at https://github.com/Roihn/SABM

arXiv:2311.04534 [pdf, other]

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

Abstract: Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Mask… ▽ More Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Masking strategy for the ASR task, which ignores the dependency among speech tokens. In this paper, we propose to model speech tokens in an autoregressive way, similar to text. We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach. To address this issue, we propose a novel approach denoted Smoothed Label Distillation (SLD), which applies a KL divergence loss with smoothed labels on speech tokens. Our experiments show that SLD effectively models speech tokens and outperforms Loss Masking for decoder-only Transformers in ASR tasks with different speech discretization methods. The source code can be found here: https://github.com/alibaba-damo-academy/SpokenNLP/tree/main/sld △ Less

Submitted 4 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 5 pages, accepted by ICASSP 2024

arXiv:2311.03761 [pdf, other]

Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

Authors: Tao Chen, Shilian Zheng, Kunfeng Qiu, Luxin Zhang, Qi Xuan, Xiaoniu Yang

Abstract: The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the d… ▽ More The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.03004 [pdf, other]

doi 10.1109/TVT.2024.3372704

Breaking the Degrees-of-Freedom Limit of Holographic MIMO Communications: A 3-D Antenna Array Topology

Authors: Shuai S. A. Yuan, Jie Wu, Hong**g Xu, Tengjiao Wang, Da Li, Xiaoming Chen, Chongwen Huang, Sheng Sun, Shilie Zheng, Xianmin Zhang, Er-** Li, Wei E. I. Sha

Abstract: The performance of holographic multiple-input multiple-output (MIMO) communications, employing two-dimensional (2-D) planar antenna arrays, is typically compromised by finite degrees-of-freedom (DOF) stemming from limited array size. The DOF constraint becomes significant when the element spacing approaches approximately half a wavelength, thereby restricting the overall performance of MIMO system… ▽ More The performance of holographic multiple-input multiple-output (MIMO) communications, employing two-dimensional (2-D) planar antenna arrays, is typically compromised by finite degrees-of-freedom (DOF) stemming from limited array size. The DOF constraint becomes significant when the element spacing approaches approximately half a wavelength, thereby restricting the overall performance of MIMO systems. To break this inherent limitation, we propose a novel three-dimensional (3-D) antenna array that strategically explores the untapped vertical dimension. We investigate the performance of MIMO systems utilizing 3-D arrays across different multi-path scenarios, encompassing Rayleigh channels with varying angular spreads and the 3rd generation partnership project (3GPP) channels. We subsequently showcase the advantages of these 3-D arrays over their 2-D counterparts with the same aperture sizes. As a proof of concept, a practical dipole-based 3-D array, facilitated by an electromagnetic band-gap (EBG) reflecting surface, is conceived, constructed, and evaluated. The experimental results align closely with full-wave simulations, and channel simulations substantiate that the DOF and capacity constraints of traditional holographic MIMO systems can be surpassed by adopting such a 3-D array configuration. △ Less

Submitted 27 February, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Journal ref: IEEE Transactions on Vehicular Technology, Volume 73 , Issue 8, 2024

arXiv:2311.01721 [pdf, ps, other]

Tentative detection of cyanoformamide NCCONH2 in space

Authors: Juan Li, Donghui Quan, Junzhi Wang, Xia Zhang, Xing Lu, Qian Gou, Feng Gao, Yajun Wu, Edwin Bergin, Shanghuo Li, Zhiqiang Shen, Fujun Du, Meng Li, Siqi Zheng, Xingwu Zheng

Abstract: The peptide-like molecules, cyanoformamide (NCCONH2), is the cyano (CN) derivative of formamide (NH2CHO). It is known to play a role in the synthesis of nucleic acid precursors under prebiotic conditions. In this paper, we present a tentative detection of NCCONH2 in the interstellar medium (ISM) with the Atacama Large Millimeter/submillimeter Array (ALMA) archive data. Ten unblended lines of NCCON… ▽ More The peptide-like molecules, cyanoformamide (NCCONH2), is the cyano (CN) derivative of formamide (NH2CHO). It is known to play a role in the synthesis of nucleic acid precursors under prebiotic conditions. In this paper, we present a tentative detection of NCCONH2 in the interstellar medium (ISM) with the Atacama Large Millimeter/submillimeter Array (ALMA) archive data. Ten unblended lines of NCCONH2 were seen around 3sigma noise levels toward Sagittarius B2(N1E), a position that is slightly offset from the continuum peak. The column density of NCCONH2 was estimated to be 2.4\times 10^15 cm ^-2, and the fractional abundance of NCCONH2 toward Sgr B2(N1E) was 6.9\times10^-10. The abundance ratio between NCCONH2 and NH2CHO is estimated to be ~0.01. We also searched for other peptide-like molecules toward Sgr B2(N1E). The abundances of NH2CHO, CH3NCO and CH3NHCHO toward Sgr B2(N1E) were about one tenth of those toward Sgr B2(N1S), while the abundances of CH3CONH2 was only one twentieth of that toward Sgr B2(N1S). △ Less

Submitted 15 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 20 pages, 6 figures, 2 tables, accepted by PASJ

arXiv:2311.00660 [pdf, other]

TPSeNCE: Towards Artifact-Free Realistic Rain Generation for Deraining and Object Detection in Rain

Authors: Shen Zheng, Changjie Lu, Srinivasa G. Narasimhan

Abstract: Rain generation algorithms have the potential to improve the generalization of deraining methods and scene understanding in rainy conditions. However, in practice, they produce artifacts and distortions and struggle to control the amount of rain generated due to a lack of proper constraints. In this paper, we propose an unpaired image-to-image translation framework for generating realistic rainy i… ▽ More Rain generation algorithms have the potential to improve the generalization of deraining methods and scene understanding in rainy conditions. However, in practice, they produce artifacts and distortions and struggle to control the amount of rain generated due to a lack of proper constraints. In this paper, we propose an unpaired image-to-image translation framework for generating realistic rainy images. We first introduce a Triangular Probability Similarity (TPS) constraint to guide the generated images toward clear and rainy images in the discriminator manifold, thereby minimizing artifacts and distortions during rain generation. Unlike conventional contrastive learning approaches, which indiscriminately push negative samples away from the anchors, we propose a Semantic Noise Contrastive Estimation (SeNCE) strategy and reassess the pushing force of negative samples based on the semantic similarity between the clear and the rainy images and the feature similarity between the anchor and the negative samples. Experiments demonstrate realistic rain generation with minimal artifacts and distortions, which benefits image deraining and object detection in rain. Furthermore, the method can be used to generate realistic snowy and night images, underscoring its potential for broader applicability. Code is available at https://github.com/ShenZheng2000/TPSeNCE. △ Less

Submitted 7 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2310.20161 [pdf, ps, other]

Sulphur isotopes toward Sagittarius B2 extended envelope in the Galactic Center

Authors: Qingxu Li, Juan Li, Siqi Zheng, Junzhi Wang, Feng Gao, Yajun Wu

Abstract: The isotopic ratios are good tools for probing the stellar nucleosynthesis and chemical evolution. We performed high-sensitivity map** observations of the J=7-6 rotational transitions of OCS, OC34S, O13CS, and OC33S toward the Galactic Center giant molecular cloud, Sagittarius B2 (Sgr B2) with IRAM 30m telescope. Positions with optically thin and uncontaminated lines are chosen to determine the… ▽ More The isotopic ratios are good tools for probing the stellar nucleosynthesis and chemical evolution. We performed high-sensitivity map** observations of the J=7-6 rotational transitions of OCS, OC34S, O13CS, and OC33S toward the Galactic Center giant molecular cloud, Sagittarius B2 (Sgr B2) with IRAM 30m telescope. Positions with optically thin and uncontaminated lines are chosen to determine the sulfur isotope ratios. A 32S/34S ratio of 17.1\pm0.9 was derived with OCS and OC34S lines, while 34S/33S ratio of 6.8\pm1.9 was derived directly from integrated intensity ratio of OC34S and OC33S. With independent and accurate measurements of 32S/34S ratio, our results confirm the termination of the decreasing trend of 32S/34S ratios toward the Galactic Center, suggesting a drop in the production of massive stars at the Galactic centre. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 20 pages, 7 figures, accepted by PASJ

arXiv:2310.19102 [pdf, other]

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Authors: Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

Abstract: The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption a… ▽ More The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption and increase computing capacity. However, prevalent quantization schemes (e.g., 8-bit weight-activation quantization) cannot fully leverage the capabilities of modern GPUs, such as 4-bit integer operators, resulting in sub-optimal performance. To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss. Atom significantly boosts serving throughput by using low-bit operators and considerably reduces memory consumption via low-bit quantization. It attains high accuracy by applying a novel mixed-precision and fine-grained quantization process. We evaluate Atom on 4-bit weight-activation quantization in the serving context. Atom improves end-to-end throughput (token/s) by up to $7.7\times$ compared to the FP16 and by $2.5\times$ compared to INT8 quantization, while maintaining the same latency target. △ Less

Submitted 16 April, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

Showing 101–150 of 995 results for author: Zheng, S