Search | arXiv e-print repository

Gate Tunable Asymmetric Ozone Adsorption on Graphene

Authors: Zhen Qi, Wanlei Li, Jun Cheng, Zhongxin Guo, Chenglong Li, Shang Wang, Zuoquan Tan, Zhiting Gao, Yongchao Wang, Zichen Lian, Shanshan Chen, Yonglin He, Zhiyong Wang, Yapei Wang, **song Zhang, Yayu Wang, Peng Cai

Abstract: Molecular adsorption is pivotal in device fabrication and material synthesis for quantum technology. However, elucidating the behavior of physisorption poses technical challenges. Here graphene with ultrahigh sensitivity was utilized to detect ozone adsorption at cryogenic temperatures. Significant hole do** observed in graphene indicates a strong interaction between ozone and graphene. Interest… ▽ More Molecular adsorption is pivotal in device fabrication and material synthesis for quantum technology. However, elucidating the behavior of physisorption poses technical challenges. Here graphene with ultrahigh sensitivity was utilized to detect ozone adsorption at cryogenic temperatures. Significant hole do** observed in graphene indicates a strong interaction between ozone and graphene. Interestingly, the adsorption exhibits asymmetry with positive and negative gate voltages. The strong affinity of ozone provides a tool to modulate materials and devices, while the gate tunability of adsorption offers new insights into construction and manipulation of oxide quantum materials. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.06007 [pdf]

Anomalous properties of spark plasma sintered boron nitride solids

Authors: Abhijit Biswas, Peter Serles, Gustavo A. Alvarez, Jesse Schimpf, Michel Hache, Jonathan Kong, Pedro Guerra Demingos, Bo Yuan, Tymofii S. Pieshkov, Chenxi Li, Anand B. Puthirath, Bin Gao, Tia Gray, Xiang Zhang, Jishnu Murukeshan, Robert Vajtai, Pengcheng Dai, Chandra Veer Singh, Jane Howe, Yu Zou, Lane W. Martin, James Patrick Clancy, Zhiting Tian, Tobin Filleter, Pulickel M. Ajayan

Abstract: Hexagonal boron nitride (h-BN) is brittle, however, its atomic-scale structural engineering can lead to unprecedented physical properties. Here we report the bulk synthesis of high-density crystalline h-BN solids by using high-temperature spark plasma sintering (SPS) of micron size h-BN powders. In addition to the high mechanical strength and ductile response of such materials, we have obtained an… ▽ More Hexagonal boron nitride (h-BN) is brittle, however, its atomic-scale structural engineering can lead to unprecedented physical properties. Here we report the bulk synthesis of high-density crystalline h-BN solids by using high-temperature spark plasma sintering (SPS) of micron size h-BN powders. In addition to the high mechanical strength and ductile response of such materials, we have obtained anomalous values of dielectric constant beyond theoretical limits, high thermal conductivity, and exceptional neutron radiation shielding capability. Through exhaustive characterizations we reveal that SPS induces non-basal plane crystallinity, twisting of layers, and facilitates inter-grain fusion with a high degree of in-plane alignment across macroscale dimensions, resulting in near-theoretical density and anomalous properties. Our findings highlight the importance of material design, via new approaches such as twisting and interconnections between atomically thin layers, to create novel ceramics with properties that could go beyond their intrinsic theoretical predictions. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Authors version, comments are welcome

arXiv:2405.05817 [pdf, other]

Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

Authors: Huanyu Tian, Martin Huber, Christopher E. Mower, Zhe Han, Changsheng Li, Xingguang Duan, Christos Bergeles

Abstract: In this study, we introduce a novel shared-control system for key-hole docking operations, combining a commercial camera with occlusion-robust pose estimation and a hand-eye information fusion technique. This system is used to enhance docking precision and force-compliance safety. To train a hand-eye information fusion network model, we generated a self-supervised dataset using this docking system… ▽ More In this study, we introduce a novel shared-control system for key-hole docking operations, combining a commercial camera with occlusion-robust pose estimation and a hand-eye information fusion technique. This system is used to enhance docking precision and force-compliance safety. To train a hand-eye information fusion network model, we generated a self-supervised dataset using this docking system. After training, our pose estimation method showed improved accuracy compared to traditional methods, including observation-only approaches, hand-eye calibration, and conventional state estimation filters. In real-world phantom experiments, our approach demonstrated its effectiveness with reduced position dispersion (1.23\pm 0.81 mm vs. 2.47 \pm 1.22 mm) and force dispersion (0.78\pm 0.57 N vs. 1.15 \pm 0.97 N) compared to the control group. These advancements in semi-autonomy co-manipulation scenarios enhance interaction and stability. The study presents an anti-interference, steady, and precision solution with potential applications extending beyond laparoscopic surgery to other minimally invasive procedures. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2405.04566 [pdf, other]

Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

Authors: Chris Junchi Li

Abstract: Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that… ▽ More Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04479 [pdf]

doi 10.1038/s41586-024-07584-w

Tunable superconductivity in electron- and hole-doped Bernal bilayer graphene

Authors: Chushan Li, Fan Xu, Bohao Li, Jiayi Li, Guoan Li, Kenji Watanabe, Takashi Taniguchi, Bingbing Tong, Jie Shen, Li Lu, **feng Jia, Fengcheng Wu, Xiaoxue Liu, Tingxin Li

Abstract: Graphene-based, high quality two-dimensional electronic systems have emerged as a highly tunable platform for studying superconductivity. Specifically, superconductivity has been observed in both electron-doped and hole-doped twisted graphene moire systems, whereas in crystalline graphene systems, superconductivity has so far only been observed in hole-doped rhombohedral trilayer and hole-doped Be… ▽ More Graphene-based, high quality two-dimensional electronic systems have emerged as a highly tunable platform for studying superconductivity. Specifically, superconductivity has been observed in both electron-doped and hole-doped twisted graphene moire systems, whereas in crystalline graphene systems, superconductivity has so far only been observed in hole-doped rhombohedral trilayer and hole-doped Bernal bilayer graphene (BBG). Recently, enhanced superconductivity has been demonstrated in BBG due to the proximity with a monolayer WSe2. Here, we report the observation of superconductivity and a series of flavor-symmetry-breaking phases in both electron- and hole-doped BBG/WSe2 device by electrostatic do**. The strength of the observed superconductivity is tunable by applied vertical electric fields. The maximum Berezinskii-Kosterlitz-Thouless (BKT) transition temperature for the electron- and hole-doped superconductivity is about 210 mK and 400 mK, respectively. Superconductivities emerge only when applied electric fields drive BBG electron or hole wavefunctions toward the WSe2 layer, underscoring the importance of the WSe2 layer in the observed superconductivity. We find the hole-doped superconductivity violates the Pauli paramagnetic limit, consistent with an Ising-like superconductor. In contrast, the electron-doped superconductivity obeys the Pauli limit, even though the proximity induced Ising spin-orbit coupling is also notable in the conduction band. Our findings highlight the rich physics associated with the conduction band in BBG, paving the way for further studies into the superconducting mechanisms of crystalline graphene and the development of novel superconductor devices based on BBG. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04295 [pdf, other]

Semi-Supervised Disease Classification based on Limited Medical Image Data

Authors: Yan Zhang, Chun Li, Zhaoxia Liu, Ming Li

Abstract: In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medi… ▽ More In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medical image-aided diagnosis algorithms, numerous theoretical and practical obstacles persist. The research on PU learning for medical image-assisted diagnosis holds substantial importance, as it aims to reduce the time spent by professional experts in classifying images. Unlike natural images, medical images are typically accompanied by a scarcity of annotated data, while an abundance of unlabeled cases exists. Addressing these challenges, this paper introduces a novel generative model inspired by Hölder divergence, specifically designed for semi-supervised disease classification using positive and unlabeled medical image data. In this paper, we present a comprehensive formulation of the problem and establish its theoretical feasibility through rigorous mathematical analysis. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on five benchmark datasets commonly used in PU medical learning: BreastMNIST, PneumoniaMNIST, BloodMNIST, OCTMNIST, and AMD. The experimental results clearly demonstrate the superiority of our method over existing approaches based on KL divergence. Notably, our approach achieves state-of-the-art performance on all five disease classification benchmarks. By addressing the limitations imposed by limited labeled data and harnessing the untapped potential of unlabeled medical images, our novel generative model presents a promising direction for enhancing semi-supervised disease classification in the field of medical image analysis. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04061 [pdf, other]

Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

Authors: Mingfei Lu, Chenxu Li, Shujian Yu, Robert Jenssen, Badong Chen

Abstract: Divergence measures play a central role and become increasingly essential in deep learning, yet efficient measures for multiple (more than two) distributions are rarely explored. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both inevitable and essential. Examples include clustering, multi-source domain adaptation or generalization, and m… ▽ More Divergence measures play a central role and become increasingly essential in deep learning, yet efficient measures for multiple (more than two) distributions are rarely explored. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both inevitable and essential. Examples include clustering, multi-source domain adaptation or generalization, and multi-view learning, among others. While computing the mean of pairwise distances between any two distributions is a prevalent method to quantify the total divergence among multiple distributions, it is imperative to acknowledge that this approach is not straightforward and necessitates significant computational resources. In this study, we introduce a new divergence measure tailored for multiple distributions named the generalized Cauchy-Schwarz divergence (GCSD). Additionally, we furnish a kernel-based closed-form sample estimator, making it convenient and straightforward to use in various machine-learning applications. Finally, we explore its profound implications in the realm of deep learning by applying it to tackle two thoughtfully chosen machine-learning tasks: deep clustering and multi-source domain adaptation. Our extensive experimental investigations confirm the robustness and effectiveness of GCSD in both scenarios. The findings also underscore the innovative potential of GCSD and its capability to significantly propel machine learning methodologies that necessitate the quantification of multiple distributions. △ Less

Submitted 5 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04024 [pdf, other]

Stellar Population near NGC 2021: Procession of Star Formation in the South Rim of Supergiant Shell LMC 4

Authors: Po-Sheng Ou, Rui-Ching Chao, You-Hua Chu, Chin-Yi Hsu, Chuan-Jui Li

Abstract: Supergiant shells (SGSs) are the largest interstellar structures where heated and enriched gas flows into the host galaxy's halo. The SGSs in the Large Magellanic Cloud (LMC) are so close that their stars can be resolved with ground-based telescopes to allow studies of star formation history. Aiming to study the star formation history and energy budget of LMC 4, we have conducted a pilot study of… ▽ More Supergiant shells (SGSs) are the largest interstellar structures where heated and enriched gas flows into the host galaxy's halo. The SGSs in the Large Magellanic Cloud (LMC) are so close that their stars can be resolved with ground-based telescopes to allow studies of star formation history. Aiming to study the star formation history and energy budget of LMC 4, we have conducted a pilot study of the cluster NGC 2021 and the OB associations in its vicinity near the south rim of LMC 4. We use the Magellanic Cloud Photometric Survey data of the LMC to establish a methodology to examine the stellar population and assess the massive star formation history. We find a radial procession of massive star formation from the northwest part of the OB association LH79 through NGC 2021 to the OB association LH78 in the south. Using the stellar content of NGC 2021 and the assumption of Salpeter's initial mass function, we estimate that $\sim$4 supernovae have occurred in NGC 2021, injecting at least $4\times10^{51}$ ergs of kinetic energy into the interior of LMC 4. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages, 9 figures, accepted by the Astronomical Journal

arXiv:2405.04007 [pdf, other]

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

Authors: Yuying Ge, Sijie Zhao, Chen Li, Yixiao Ge, Ying Shan

Abstract: In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario da… ▽ More In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario data collected from the internet, which captures the intricacies of user intentions for promoting the practical application of image editing in the real world. (3) High-precision multi-turn editing data annotated by humans, which involves multiple rounds of edits for simulating iterative editing processes. The combination of these diverse data sources makes SEED-Data-Edit a comprehensive and versatile dataset for training language-guided image editing model. We fine-tune a pretrained Multimodal Large Language Model (MLLM) that unifies comprehension and generation with SEED-Data-Edit. The instruction tuned model demonstrates promising results, indicating the potential and effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. The datasets are released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Technical Report; Dataset released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit

arXiv:2405.03553 [pdf, other]

AlphaMath Almost Zero: process Supervision without process

Authors: Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan

Abstract: Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these models still struggle with complex problems that require multiple reasoning steps, frequently leading to logical or numerical errors. While numerical mistakes can be largely addressed by integrating a code interpreter, identifying logical errors within intermediate… ▽ More Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these models still struggle with complex problems that require multiple reasoning steps, frequently leading to logical or numerical errors. While numerical mistakes can be largely addressed by integrating a code interpreter, identifying logical errors within intermediate steps is more challenging. Moreover, manually annotating these steps for training is not only expensive but also labor-intensive, requiring the expertise of professional annotators. In our study, we introduce an innovative approach that bypasses the need for process annotations (from human or GPTs) by utilizing the Monte Carlo Tree Search (MCTS) framework. This technique automatically generates both the process supervision and the step-level evaluation signals. Our method iteratively trains the policy and value models, leveraging the capabilities of a well-pretrained LLM to progressively enhance its mathematical reasoning skills. Furthermore, we propose an efficient inference strategy-step-level beam search, where the value model is crafted to assist the policy model (i.e., LLM) in navigating more effective reasoning paths, rather than solely relying on prior probabilities. The experimental results on both in-domain and out-of-domain datasets demonstrate that even without GPT-4 or human-annotated process supervision, our AlphaMath framework achieves comparable or superior results to previous state-of-the-art methods. △ Less

Submitted 23 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: update to the latest results

arXiv:2405.03501 [pdf, other]

Boosting Single Positive Multi-label Classification with Generalized Robust Loss

Authors: Yanxi Chen, Chunxiao Li, Xinyang Dai, **huan Li, Weiyu Sun, Yiming Wang, Renyuan Zhang, Tinghe Zhang, Bo Wang

Abstract: Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and ro… ▽ More Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and robust losses, mostly leading to unacceptable false negatives. To address this issue, we first propose a generalized loss framework based on expected risk minimization to provide soft pseudo labels, and point out that the former losses can be seamlessly converted into our framework. In particular, we design a novel robust loss based on our framework, which enjoys flexible coordination between false positives and false negatives, and can additionally deal with the imbalance between positive and negative samples. Extensive experiments show that our approach can significantly improve SPML performance and outperform the vast majority of state-of-the-art methods on all the four benchmarks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, 6 tables

arXiv:2405.03333 [pdf, other]

Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance

Authors: Xunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, Zicheng Zhang, Chunyi Li, Haoning Wu, Guangtao Zhai

Abstract: Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the… ▽ More Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the Video Quality Assessment (VQA). Unfortunately, almost all existing VQA models are built generally, measuring the quality of a video from a comprehensive perspective. As a result, Light-VQA, trained on LLVE-QA, is proposed for assessing LLVE. We extend the work of Light-VQA by expanding the LLVE-QA dataset into Video Exposure Correction Quality Assessment (VEC-QA) dataset with over-exposed videos and their corresponding corrected versions. In addition, we propose Light-VQA+, a VQA model specialized in assessing VEC. Light-VQA+ differs from Light-VQA mainly from the usage of the CLIP model and the vision-language guidance during the feature extraction, followed by a new module referring to the Human Visual System (HVS) for more accurate assessment. Extensive experimental results show that our model achieves the best performance against the current State-Of-The-Art (SOTA) VQA models on the VEC-QA dataset and other public datasets. △ Less

Submitted 13 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03251 [pdf, ps, other]

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

Authors: Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Abstract: The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that contribute to the effectiveness of softmax remain largely unexplored. As a step towards better understanding, this paper provides a theoretical study of the op… ▽ More The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that contribute to the effectiveness of softmax remain largely unexplored. As a step towards better understanding, this paper provides a theoretical study of the optimization and generalization properties of two-layer softmax neural networks, providing theoretical insights into their superior performance as other activation functions, such as ReLU and exponential. Leveraging the Neural Tangent Kernel (NTK) framework, our analysis reveals that the normalization effect of the softmax function leads to a good perturbation property of the induced NTK matrix, resulting in a good convex region of the loss landscape. Consequently, softmax neural networks can learn the target function in the over-parametrization regime. To demonstrate the broad applicability of our theoretical findings, we apply them to the task of learning score estimation functions in diffusion models, a promising approach for generative modeling. Our analysis shows that gradient-based algorithms can learn the score function with a provable accuracy. Our work provides a deeper understanding of the effectiveness of softmax neural networks and their potential in various domains, paving the way for further advancements in natural language processing and beyond. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 53 pages

arXiv:2405.03159 [pdf, other]

DeepMpMRI: Tensor-decomposition Regularized Learning for Fast and High-Fidelity Multi-Parametric Microstructural MR Imaging

Authors: Wenxin Fan, Jian Cheng, Cheng Li, Xinrui Ma, **g Yang, Juan Zou, Ruoyou Wu, Zan Chen, Yuan**g Feng, Hairong Zheng, Shanshan Wang

Abstract: Deep learning has emerged as a promising approach for learning the nonlinear map** between diffusion-weighted MR images and tissue parameters, which enables automatic and deep understanding of the brain microstructures. However, the efficiency and accuracy in the multi-parametric estimations are still limited since previous studies tend to estimate multi-parametric maps with dense sampling and i… ▽ More Deep learning has emerged as a promising approach for learning the nonlinear map** between diffusion-weighted MR images and tissue parameters, which enables automatic and deep understanding of the brain microstructures. However, the efficiency and accuracy in the multi-parametric estimations are still limited since previous studies tend to estimate multi-parametric maps with dense sampling and isolated signal modeling. This paper proposes DeepMpMRI, a unified framework for fast and high-fidelity multi-parametric estimation from various diffusion models using sparsely sampled q-space data. DeepMpMRI is equipped with a newly designed tensor-decomposition-based regularizer to effectively capture fine details by exploiting the correlation across parameters. In addition, we introduce a Nesterov-based adaptive learning algorithm that optimizes the regularization parameter dynamically to enhance the performance. DeepMpMRI is an extendable framework capable of incorporating flexible network architecture. Experimental results demonstrate the superiority of our approach over 5 state-of-the-art methods in simultaneously estimating multi-parametric maps for various diffusion models with fine-grained details both quantitatively and qualitatively, achieving 4.5 - 22.5$\times$ acceleration compared to the dense sampling of a total of 270 diffusion gradients. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.02814 [pdf, other]

NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

Authors: Xu Wang, Cheng Li, Yi Chang, **dong Wang, Yuan Wu

Abstract: Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further… ▽ More Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further developed through positive emotional stimuli. This discovery raises an intriguing question: can negative emotions similarly influence LLMs, potentially enhancing their performance? In response to this question, we introduce NegativePrompt, a novel approach underpinned by psychological principles, involving ten specifically designed negative emotional stimuli. We embark on rigorous experimental evaluations of five LLMs including Flan-T5-Large, Vicuna, Llama 2, ChatGPT, and GPT-4, across a set of 45 tasks. The results are revealing: NegativePrompt markedly enhances the performance of LLMs, evidenced by relative improvements of 12.89% in Instruction Induction tasks and 46.25% in BIG-Bench tasks. Moreover, we conduct attention visualization experiments to decipher the underlying mechanisms of NegativePrompt's influence. Our research contributes significantly to the understanding of LLMs and emotion interaction, demonstrating the practical efficacy of NegativePrompt as an emotion-driven method and offering novel insights for the enhancement of LLMs in real-world applications. The code is available at https://github.com/wangxu0820/NegativePrompt. △ Less

Submitted 12 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: This paper has been accepted by IJCAI 2024

arXiv:2405.02717 [pdf, other]

AFter: Attention-based Fusion Router for RGBT Tracking

Authors: Andong Lu, Wanyu Wang, Chenglong Li, ** Tang, Bin Luo

Abstract: Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emp… ▽ More Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emph{ter} called AFter, which optimizes the fusion structure to adapt to the dynamic challenging scenarios, for robust RGBT tracking. In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure. Through optimizing the combination of attention-based fusion units, we can dynamically select the fusion structure to adapt to various challenging scenarios. Unlike complex search of different structures in neural architecture search algorithms, we develop a dynamic routing algorithm, which equips each attention-based fusion unit with a router, to predict the combination weights for efficient optimization of the fusion structure. Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of the proposed AFter against state-of-the-art RGBT trackers. We release the code in https://github.com/Alexadlu/AFter. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: Peer review

arXiv:2405.02700 [pdf, other]

Identification of Novel Modes in Generative Models via Fourier-based Differential Clustering

Authors: **gwei Zhang, Mohammad Jalali, Cheuk Ting Li, Farzan Farnia

Abstract: An interpretable comparison of generative models requires the identification of sample types produced more frequently by each of the involved models. While several quantitative scores have been proposed in the literature to rank different generative models, such score-based evaluations do not reveal the nuanced differences between the generative models in capturing various sample types. In this wo… ▽ More An interpretable comparison of generative models requires the identification of sample types produced more frequently by each of the involved models. While several quantitative scores have been proposed in the literature to rank different generative models, such score-based evaluations do not reveal the nuanced differences between the generative models in capturing various sample types. In this work, we attempt to solve a differential clustering problem to detect sample types expressed differently by two generative models. To solve the differential clustering problem, we propose a method called Fourier-based Identification of Novel Clusters (FINC) to identify modes produced by a generative model with a higher frequency in comparison to a reference distribution. FINC provides a scalable stochastic algorithm based on random Fourier features to estimate the eigenspace of kernel covariance matrices of two generative models and utilize the principal eigendirections to detect the sample types present more dominantly in each model. We demonstrate the application of the FINC method to large-scale computer vision datasets and generative model frameworks. Our numerical results suggest the scalability of the developed Fourier-based method in highlighting the sample types produced with different frequencies by widely-used generative models. Code is available at \url{https://github.com/buyeah1109/FINC} △ Less

Submitted 4 July, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

arXiv:2405.02485 [pdf, other]

A Survey of Few-Shot Learning for Biomedical Time Series

Authors: Chenqi Li, Timothy Denison, Tingting Zhu

Abstract: Advancements in wearable sensor technologies and the digitization of medical records have contributed to the unprecedented ubiquity of biomedical time series data. Data-driven models have tremendous potential to assist clinical diagnosis and improve patient care by improving long-term monitoring capabilities, facilitating early disease detection and intervention, as well as promoting personalized… ▽ More Advancements in wearable sensor technologies and the digitization of medical records have contributed to the unprecedented ubiquity of biomedical time series data. Data-driven models have tremendous potential to assist clinical diagnosis and improve patient care by improving long-term monitoring capabilities, facilitating early disease detection and intervention, as well as promoting personalized healthcare delivery. However, accessing extensively labeled datasets to train data-hungry deep learning models encounters many barriers, such as long-tail distribution of rare diseases, cost of annotation, privacy and security concerns, data-sharing regulations, and ethical considerations. An emerging approach to overcome the scarcity of labeled data is to augment AI methods with human-like capabilities to leverage past experiences to learn new tasks with limited examples, called few-shot learning. This survey provides a comprehensive review and comparison of few-shot learning methods for biomedical time series applications. The clinical benefits and limitations of such methods are discussed in relation to traditional data-driven approaches. This paper aims to provide insights into the current landscape of few-shot learning for biomedical time series and its implications for future research and applications. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.02234 [pdf]

Influence of Polymer on Shock-Induced Pore Collapse: Hotspot Criticality through Reactive Molecular Dynamics

Authors: Jalen Macatangay, Chunyu Li, Alejandro Strachan

Abstract: The shock initiation of energetic materials is mediated by the localization of mechanical energy into hotspots. These originate through the interaction of the shock and material microstructure; the most potent hotspots are formed by the collapse of porosity. Recent work using molecular dynamics (MD) has shed light on the molecular mechanisms responsible for the shock-to-deflagration transition fol… ▽ More The shock initiation of energetic materials is mediated by the localization of mechanical energy into hotspots. These originate through the interaction of the shock and material microstructure; the most potent hotspots are formed by the collapse of porosity. Recent work using molecular dynamics (MD) has shed light on the molecular mechanisms responsible for the shock-to-deflagration transition following pore collapse in pure energetic materials. However, explosive formulations are composites of energetic crystals and a polymer binder, which differs from the prior focus on pure materials. The role of polymer phases on hotspot formation and its criticality is not well-understood. We use reactive MD simulations to investigate the role of polystyrene and polyvinyl nitrate films around pores in the shock-induced pore collapse of RDX. The polymer affects the hotspots' temperature and their criticality. While the presence of inert polymer often delays or hinders chemical reactions of the energetic material, certain geometries accelerate chemistry. The simulations provide a mechanistic understanding of these phenomena. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.02196 [pdf, other]

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

Authors: Chenyang Ai, Lechuan Zhao, Zhijie Huang, Cangyuan Li, Xinan Wang, Ying Wang

Abstract: Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and CGRA, support tensor operators with low energy and area efficiency. This paper conducts an in-depth exploration of general accelerator for tensor processing.… ▽ More Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and CGRA, support tensor operators with low energy and area efficiency. This paper conducts an in-depth exploration of general accelerator for tensor processing. First, we find the similarity between matrix multiplication and precision multiplication, and create a method classifying tensor operators. Then, we implement two discoveries and introduce the systolic architecture into general-purpose accelerator. Therefore, we propose a new General Tensor Accelerator (GTA), which has a better area efficiency and data reuse. Furthermore, we create a large hardware scheduling space consisting of dataflow, precision and array resize. Our evaluation results demonstrate that GTA is able to achieves 7.76X, 5.35X, 8.76X memory efficiency and 6.45X, 3.39X, 25.83X speedup over of VPU, GPGPU and CGRA. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01868 [pdf, other]

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

Authors: Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

Abstract: This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work,… ▽ More This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17% and proactivity by 27%, and achieving a tenfold enhancement in recommendation accuracy. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Main paper 8 pages; References and Appendix 9 pages; 7 figures and 14 tables

arXiv:2405.01548 [pdf]

doi 10.1109/JLT.2023.3304659

Foundry's perspective on laser and SOA module integration with silicon photonics

Authors: James Y. S. Tan, Shawn Xie Wu, Salih Yanikgonul, Chao Li, Patrick Guo-Qiang Lo

Abstract: Silicon photonic integrated circuit (PIC) builds on the demand for a low cost approach from established silicon-based manufacturing infrastructure traditionally built for electronics. Besides its natural abundance, silicon has desirable properties such as optically low loss (at certain critical wavelengths), and small form factor to enable high density scaled-up optical on-chip circuitry. However,… ▽ More Silicon photonic integrated circuit (PIC) builds on the demand for a low cost approach from established silicon-based manufacturing infrastructure traditionally built for electronics. Besides its natural abundance, silicon has desirable properties such as optically low loss (at certain critical wavelengths), and small form factor to enable high density scaled-up optical on-chip circuitry. However, given its indirect bandgap, the platform is typically integrated with other direct bandgap (e.g., III-V semiconductor) platforms for on-chip light source. An effective solution to integrating light source onto silicon photonics platform is integral to a practical scaled-up and full-fledged integrated photonics implementation. Here, we discuss the integration solutions, and present our foundry's perspective toward realizing it. △ Less

Submitted 20 February, 2024; originally announced May 2024.

Comments: 14 pages

Journal ref: IEEE J Lightwave Technol. vol. 42, no. 3, pp. 1062-1074, 2024

arXiv:2405.01394 [pdf, other]

Analysis of a Modular Autonomous Driving Architecture: The Top Submission to CARLA Leaderboard 2.0 Challenge

Authors: Weize Zhang, Mohammed Elmahgiubi, Kasra Rezaee, Behzad Khamidehi, Hamidreza Mirkhani, Fazel Arasteh, Chunlin Li, Muhammad Ahsan Kaleem, Eduardo R. Corral-Soto, Dhruv Sharma, Tongtong Cao

Abstract: In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assiste… ▽ More In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assisted perception models to help our planner perform more reliably in highly challenging traffic scenarios. We use open-source driving datasets in conjunction with Inverse Reinforcement Learning (IRL) to enhance the performance of our motion planner. We provide insight into our design choices and trade-offs made to achieve this solution. We also explore the impact of each component in the overall performance of our solution, with the intent of providing a guideline where allocation of resources can have the greatest impact. △ Less

Submitted 21 March, 2024; originally announced May 2024.

arXiv:2405.01275 [pdf, other]

Variable Selection in Ultra-high Dimensional Feature Space for the Cox Model with Interval-Censored Data

Authors: Daewoo Pak, Jianrui Zhang, Di Wu, Haolei Weng, Chenxi Li

Abstract: We develop a set of variable selection methods for the Cox model under interval censoring, in the ultra-high dimensional setting where the dimensionality can grow exponentially with the sample size. The methods select covariates via a penalized nonparametric maximum likelihood estimation with some popular penalty functions, including lasso, adaptive lasso, SCAD, and MCP. We prove that our penalize… ▽ More We develop a set of variable selection methods for the Cox model under interval censoring, in the ultra-high dimensional setting where the dimensionality can grow exponentially with the sample size. The methods select covariates via a penalized nonparametric maximum likelihood estimation with some popular penalty functions, including lasso, adaptive lasso, SCAD, and MCP. We prove that our penalized variable selection methods with folded concave penalties or adaptive lasso penalty enjoy the oracle property. Extensive numerical experiments show that the proposed methods have satisfactory empirical performance under various scenarios. The utility of the methods is illustrated through an application to a genome-wide association study of age to early childhood caries. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00914 [pdf, other]

Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

Authors: Chris Junchi Li

Abstract: This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, e… ▽ More This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, enhancing oracle complexity for efficient optimization. Theoretical guarantees for finding approximate first-order stationary points and second-order stationary points at the state-of-the-art query complexities are established, showcasing their effectiveness in solving complex optimization tasks. Empirical studies for real-world problems are provided to further validate the outperformance of our proposed algorithms. The significance of \texttt{(P)RAF${}^2$BA} in optimizing nonconvex-strongly-convex bilevel optimization problems is underscored by its state-of-the-art convergence rates and computational efficiency. △ Less

Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: Minor typographical updates. arXiv admin note: text overlap with arXiv:2307.00126

arXiv:2405.00742 [pdf, other]

Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong

Abstract: Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among differe… ▽ More Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among different stations. To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks. Firstly, for better model performance, a Graph Neural Network (GNN) model is leveraged to characterize the geographic correlation among different charging stations in a federated manner. Secondly, to ensure robustness and deal with the data heterogeneity in a federated setting, a message passing that utilizes a global attention mechanism to aggregate personalized models for each client is proposed. Thirdly, by concerning cyberattacks, a special credit-based function is designed to mitigate potential threats from malicious clients or unwanted attacks. Extensive experiments on a public EV charging dataset are conducted using various deep learning techniques and federated learning methods to demonstrate the prediction accuracy and robustness of the proposed approach. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 11 pages,4 figures

arXiv:2405.00506 [pdf]

Significant tuning of internal mode coupling in doubly clamped MEMS beam resonators by thermal effect

Authors: Chao Li, Qian Liu, Kohei Uchida, Hua Li, Kazuhiko Hirakawa, Ya Zhang

Abstract: Intermodal coupling has been demonstrated to be a promising mechanism for the development of advanced micro/nanoelectromechanical devices. However, strong mode coupling remains a key challenge limiting the practical application of intermodal coupling. Furthermore, the insight into physical mechanisms underlying mode coupling and the capability to quantitatively tune the mode coupling is also limit… ▽ More Intermodal coupling has been demonstrated to be a promising mechanism for the development of advanced micro/nanoelectromechanical devices. However, strong mode coupling remains a key challenge limiting the practical application of intermodal coupling. Furthermore, the insight into physical mechanisms underlying mode coupling and the capability to quantitatively tune the mode coupling is also limited. Here, we experimentally and theoretically demonstrate the significant tunability of mode coupling by using the thermal tuning effect, yet in an asymmetric doubly-clamped MEMS beam resonator, enabling various coupling strength to be implemented for practical applications. In this system, two out-of-plane vibrational modes are mechanically coupled through displacement-induced tension, and their mode coupling strength arises from both hardening and softening nonlinearities of the two modes, thus allowing for the tuning of mode coupling strength by thermally enhancing the softening nonlinearity of the MEMS beam. Our results demonstrate a feasible approach to tune the mode coupling and offer insights into fundamental mechanism of mode coupling in MEMS beam resonators, paving the way for the development of MEMS resonators with enhanced performance and application-specific tunability. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00236 [pdf, other]

STT: Stateful Tracking with Transformers for Autonomous Driving

Authors: Longlong **g, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sang** Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c… ▽ More Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: ICRA 2024

arXiv:2404.19534 [pdf, other]

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu **, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huan**g Yue, **gyu Yang , et al. (38 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.19525 [pdf, other]

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

Authors: Luxi Chen, Zhengyi Wang, Zihan Zhou, Tingting Gao, Hang Su, Jun Zhu, Chongxuan Li

Abstract: Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to r… ▽ More Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs. Given a single set of images sampled from a multi-view score-based diffusion model, SIR repeatedly optimizes 3D parameters, unlike the single-step optimization in SDS. With other improvements in training, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splatting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian. Our code is available at \url{https://github.com/ML-GSAI/MicroDreamer}. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19335 [pdf, other]

StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which… ▽ More Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which limits its extension to more real-world applications. To tackle this issue, we propose to treat the hard prompt and soft prompt as separate inputs to mitigate noise brought by the prompt initialization. Furthermore, we optimize soft prompts with contrastive learning for utilizing class-aware information in the training process to maintain model performance. Experimental results demonstrate that \sysname outperforms state-of-the-art methods by 7.20% in accuracy and reduces the standard deviation by 2.02 on average. Furthermore, extensive experiments underscore its robustness and stability across 7 datasets covering various tasks. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Submitted to ACL 2024

arXiv:2404.18757 [pdf, ps, other]

Flow by Gauss curvature to the Minkowski problem of p-harmonic measure

Authors: Chao Li, Xia Zhao

Abstract: The Minkowski problem of harmonic measures was first studied by Jerison [19]. Recently, Akman and Mukherjee [1] studied the Minkowski problem corresponding to $p$-harmonic measures on convex domains and generalized Jerison's results. In this paper, we prove the existence of the smooth solution to the Minkowski problem for the $p$-harmonic measure by method of the Gauss curvature flow. The Minkowski problem of harmonic measures was first studied by Jerison [19]. Recently, Akman and Mukherjee [1] studied the Minkowski problem corresponding to $p$-harmonic measures on convex domains and generalized Jerison's results. In this paper, we prove the existence of the smooth solution to the Minkowski problem for the $p$-harmonic measure by method of the Gauss curvature flow. △ Less

Submitted 5 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: 20 pages

MSC Class: 35K96; 52A20; 53C21; 31B05; 31A15

arXiv:2404.18753 [pdf, ps, other]

Fixers and derangements of finite permutation groups

Authors: Hong Yi Huang, Cai Heng Li, Yi Lin Xie

Abstract: Let $G\leqslant\mathrm{Sym}(Ω)$ be a finite transitive permutation group with point stabiliser $H$. We say that a subgroup $K$ of $G$ is a fixer if every element of $K$ has fixed points, and we say that $K$ is large if $|K| \geqslant |H|$. There is a special interest in studying large fixers due to connections with Erdős-Ko-Rado type problems. In this paper, we classify up to conjugacy the large f… ▽ More Let $G\leqslant\mathrm{Sym}(Ω)$ be a finite transitive permutation group with point stabiliser $H$. We say that a subgroup $K$ of $G$ is a fixer if every element of $K$ has fixed points, and we say that $K$ is large if $|K| \geqslant |H|$. There is a special interest in studying large fixers due to connections with Erdős-Ko-Rado type problems. In this paper, we classify up to conjugacy the large fixers of the almost simple primitive groups with socle $\mathrm{PSL}_2(q)$, and we use this result to verify a special case of a conjecture of Spiga on permutation characters. We also present some results on large fixers of almost simple primitive groups with socle an alternating or sporadic group. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 40 pages

arXiv:2404.18343 [pdf, other]

G-Refine: A General Quality Refiner for Text-to-Image Generation

Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising the integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly. Extensive experimentation reveals that when compared to alternative optimization methods, AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases. This improvement significantly contributes to the practical application of contemporary T2I models, paving the way for their broader adoption. The code will be released on https://github.com/Q-Future/Q-Refine. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18319 [pdf, other]

User Welfare Optimization in Recommender Systems with Competing Content Creators

Authors: Fan Yao, Yiming Liao, Mingzhe Wu, Chuanhao Li, Yan Zhu, James Yang, Qifan Wang, Haifeng Xu, Hongning Wang

Abstract: Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global use… ▽ More Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global user preference distribution often traps the competition, especially the creators, in states that yield sub-optimal user welfare. To encourage creators to best serve a broad user population with relevant content, it becomes the platform's responsibility to leverage its information advantage regarding user preference distribution to accurately signal creators. In this study, we perform system-side user welfare optimization under a competitive game setting among content creators. We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content. These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators' content production strategies. To validate the effectiveness of our proposed method, we report our findings from a series of experiments, including: 1. a proof-of-concept negative example illustrating how creators' strategies converge towards sub-optimal states without platform intervention; 2. offline experiments employing our proposed intervention mechanisms on diverse datasets; and 3. results from a three-week online experiment conducted on a leading short-video recommendation platform. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18203 [pdf, other]

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

Authors: Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs thro… ▽ More Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18047 [pdf, other]

LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots

Authors: Qingrui Zhao, Mingyuan Li, Yongliang Shi, Xuechao Chen, Zhangguo Yu, Lianqiang Han, Zhenyuan Fu, **tao Zhang, Chao Li, Yuanxi Zhang, Qiang Huang

Abstract: High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Addition… ▽ More High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Additionally, the use of kinematic measurement results in an increased output state frequency of about 1kHz. This ensures temporal continuity of the estimated state and makes it practical for control purposes of biped robots. We also announce a biped robot dataset consisting of LiDAR, inertial measurement unit (IMU), joint encoders, force/torque (F/T) sensors, and motion capture ground truth to evaluate the proposed method. The dataset is collected during robot locomotion, and our approach reached the best quantitative result among other LIO-based methods and biped robot state estimation algorithms. The dataset and source code will be available at https://github.com/Mr-Zqr/LIKO. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17926 [pdf, other]

Pre-training on High Definition X-ray Images: An Experimental Study

Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong **, Yao Rong, Bo Jiang, Chuanfu Li, ** Tang

Abstract: Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficul… ▽ More Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Technology Report

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.16313 [pdf, ps, other]

Further Investigations on Nonlinear Complexity of Periodic Binary Sequences

Authors: Qin Yuan, Chunlei Li, Xiangyong Zeng, Tor Helleseth, Debiao He

Abstract: Nonlinear complexity is an important measure for assessing the randomness of sequences. In this paper we investigate how circular shifts affect the nonlinear complexities of finite-length binary sequences and then reveal a more explicit relation between nonlinear complexities of finite-length binary sequences and their corresponding periodic sequences. Based on the relation, we propose two algorit… ▽ More Nonlinear complexity is an important measure for assessing the randomness of sequences. In this paper we investigate how circular shifts affect the nonlinear complexities of finite-length binary sequences and then reveal a more explicit relation between nonlinear complexities of finite-length binary sequences and their corresponding periodic sequences. Based on the relation, we propose two algorithms that can generate all periodic binary sequences with any prescribed nonlinear complexity. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.16205 [pdf, other]

AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

arXiv:2404.15922 [pdf, other]

doi 10.1103/PhysRevLett.132.213602

Single-Atom Verification of the Optimal Trade-Off between Speed and Cost in Shortcuts to Adiabaticity

Authors: J. -W. Zhang, J. -T. Bu, J. C. Li, Weiquan Meng, W. -Q. Ding, B. Wang, W. -F. Yuan, H. -J. Du, G. -Y. Ding, W. -J. Chen, L. Chen, F. Zhou, Zhenyu Xu, M. Feng

Abstract: The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost i… ▽ More The approach of shortcuts to adiabaticity enables the effective execution of adiabatic dynamics in quantum information processing with enhanced speed. Owing to the inherent trade-off between dynamical speed and the cost associated with the transitionless driving field, executing arbitrarily fast operations becomes impractical. To understand the accurate interplay between speed and energetic cost in this process, we propose theoretically and verify experimentally a new trade-off, which is characterized by a tightly optimized bound within $s$-parameterized phase spaces. Our experiment is carried out in a single ultracold $^{40}$Ca$^{+}$ ion trapped in a harmonic potential. By exactly operating the quantum states of the ion, we execute the Landau-Zener model as an example, where the quantum speed limit as well as the cost are governed by the spectral gap. We witness that our proposed trade-off is indeed tight in scenarios involving both initially eigenstates and initially thermal equilibrium states. Our work helps understanding the fundamental constraints in shortcuts to adiabaticity and illuminates the potential of under-utilized phase spaces that have been traditionally overlooked. △ Less

Submitted 6 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 6+5 pages, 3+3 figures

Journal ref: Phys. Rev. Lett. 132, 213602 (2024)

arXiv:2404.15766 [pdf, other]

Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Authors: Kaiwen Xue, Yuhao Zhou, Shen Nie, Xu Min, Xiaolu Zhang, Jun Zhou, Chongxuan Li

Abstract: Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by con… ▽ More Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver. △ Less

Submitted 2 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: Published as a conference paper at ICML 2024

arXiv:2404.15702 [pdf, other]

Nyonic Technical Report

Authors: Junfeng Tian, Rui Wang, Cong Li, Yudong Zhou, Jun Liu, Jun Wang

Abstract: This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a sp… ▽ More This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public} △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15688 [pdf, ps, other]

Observer-Based Realization of Control Systems

Authors: Daizhan Cheng, Changxi Li, Xiao Zhang, Zheng** Ji

Abstract: Lebesgue-type of dynamic control systems and dimension-kee** semi-tensor product (DK-STP) of matrices are introduced. Using bridge matrices, the DK-STP is used to construct approximated observer-based realization (OR) of linear control systems, as Lebesgue-type control systems, are proposed. A necessary and sufficient condition for the OR-system to have exactly same observer dynamics is obtained… ▽ More Lebesgue-type of dynamic control systems and dimension-kee** semi-tensor product (DK-STP) of matrices are introduced. Using bridge matrices, the DK-STP is used to construct approximated observer-based realization (OR) of linear control systems, as Lebesgue-type control systems, are proposed. A necessary and sufficient condition for the OR-system to have exactly same observer dynamics is obtained. When the exact OR-system does not exist, the extended OR-system, which contains observers of the original system as part of its state variables, is presented. Moreover, the (minimum) feedback (extended) OR-system is also constructed, and its relationship with Kalman's minimum realization is revealed. Finally, the technique developed for linear control systems has been extended to affine nonlinear control systems. The purpose of OR-system is to provide a new technique to deal with large scale complex systems. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15411 [pdf, other]

Stellar atmospheric parameters of $\sim$ 11,000 RR Lyrae stars from LAMOST Spectra

Authors: Jiangtao Wang, Jianrong Shi, Jianning Fu, Weikai Zong, Chunqian Li

Abstract: Accurate determination of the stellar atmospheric parameters of RR Lyrae stars (RRLs) requires short individual exposures of the spectra to mitigate pulsation effects. We present improved template matching methods to determine the stellar atmospheric parameters of RRLs from single-epoch spectra of LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope, also known as the Guoshou**g tele… ▽ More Accurate determination of the stellar atmospheric parameters of RR Lyrae stars (RRLs) requires short individual exposures of the spectra to mitigate pulsation effects. We present improved template matching methods to determine the stellar atmospheric parameters of RRLs from single-epoch spectra of LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope, also known as the Guoshou**g telescope). We determine the radial velocities and stellar atmospheric parameters (effective temperature: $T_\mathrm{eff}$, surface gravity: $\log{g}$, and metallicity: [M/H]) of 10,486 and 1,027 RRLs from 42,729 low-resolution spectra (LRS) and 7,064 medium-resolution spectra (MRS) of LAMOST, respectively. Our results are in good agreement with the parameters of other databases, where the external uncertainties of $T_\mathrm{eff}$, $\log{g}$, and [M/H] for LRS/MRS are estimated to be 314/274 K, 0.42/0.29 dex, and 0.39/0.31 dex, respectively. We conclude with the variation characteristics of the radial velocities ($RV$) and stellar atmospheric parameters for RRLs during the pulsation phase. There is a significant difference of $28\pm21$ km/s between the peak-to-peak amplitude ($A_\mathrm{ptp}$) of $RV$ from H$α$ line ($RV_\mathrm{Hα}$) and from metal lines ($RV_\mathrm{metal}$) for RRab, whereas it is only $4\pm17$ km/s for RRc. The $A_\mathrm{ptp}$ of $T_\mathrm{eff}$ is $930\pm456$ and $409\pm375$ K for RRab and RRc, respectively. The $\log{g}$ of RRab show mild variation of approximately $0.23\pm0.42$ dex near the phase of $\varphi = 0.9$, while that of RRc almost remains constant. The [M/H] of RRab and RRc show a minor variation of about $0.25\pm0.50$ and $0.28\pm0.55$ dex, respectively, near the phase of $\varphi = 0.9$. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 17 pages, 10 figures, accepted for publication in ApJs

arXiv:2404.14969 [pdf, ps, other]

The symmetric (2+1)-dimensional Lotka-Volterra equation with self-consistent sources

Authors: Mengyuan Cui, Chunxia Li, Yuqin Yao

Abstract: The symmetric (2+1)-dimensional Lotka-Volterra equation with self-consistent sources is constructed and solved by employing the source generation procedure, whose solutions are expressed in terms of pfaffians. As special cases of the pfaffian solutions, different types of explicit solutions are obtained, including dromions, soliton solutions and breather solutions. The symmetric (2+1)-dimensional Lotka-Volterra equation with self-consistent sources is constructed and solved by employing the source generation procedure, whose solutions are expressed in terms of pfaffians. As special cases of the pfaffian solutions, different types of explicit solutions are obtained, including dromions, soliton solutions and breather solutions. △ Less

Submitted 23 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14955 [pdf, other]

A Comprehensive Survey for Hyperspectral Image Classification: The Evolution from Conventional to Transformers

Authors: Muhammad Ahmad, Salvatore Distifano, Adil Mehmood Khan, Manuel Mazzara, Chenyu Li, **g Yao, Hao Li, Jagannath Aryal, Gemine Vivone, Danfeng Hong

Abstract: Hyperspectral Image Classification (HSC) is a challenging task due to the high dimensionality and complex nature of Hyperspectral (HS) data. Traditional Machine Learning approaches while effective, face challenges in real-world data due to varying optimal feature sets, subjectivity in human-driven design, biases, and limitations. Traditional approaches encounter the curse of dimensionality, strugg… ▽ More Hyperspectral Image Classification (HSC) is a challenging task due to the high dimensionality and complex nature of Hyperspectral (HS) data. Traditional Machine Learning approaches while effective, face challenges in real-world data due to varying optimal feature sets, subjectivity in human-driven design, biases, and limitations. Traditional approaches encounter the curse of dimensionality, struggle with feature selection and extraction, lack spatial information consideration, exhibit limited robustness to noise, face scalability issues, and may not adapt well to complex data distributions. In recent years, DL techniques have emerged as powerful tools for addressing these challenges. This survey provides a comprehensive overview of the current trends and future prospects in HSC, focusing on the advancements from DL models to the emerging use of Transformers. We review the key concepts, methodologies, and state-of-the-art approaches in DL for HSC. We explore the potential of Transformer-based models in HSC, outlining their benefits and challenges. We also delve into emerging trends in HSC, as well as thorough discussions on Explainable AI and Interoperability concepts along with Diffusion Models (image denoising, feature extraction, and image fusion). Additionally, we address several open challenges and research questions pertinent to HSC. Comprehensive experimental results have been undertaken using three HS datasets to verify the efficacy of various conventional DL models and Transformers. Finally, we outline future research directions and potential applications that can further enhance the accuracy and efficiency of HSC. The Source code is available at \url{https://github.com/mahmad00/Conventional-to-Transformer-for-Hyperspectral-Image-Classification-Survey-2024}. △ Less

Submitted 12 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Showing 251–300 of 7,300 results for author: Li, C