Search | arXiv e-print repository

doi 10.1098/rsta.2023.0094

Large-scale Array for Radio Astronomy on the Farside

Authors: Xuelei Chen, Feng Gao, Fengquan Wu, Yechi Zhang, Tong Wang, Weilin Liu, Dali Zou, Furen Deng, Yang Gong, Kai He, Jixia Li, Shijie Sun, Nanben Suo, Yougang Wang, Pengju Wu, Jiaqin Xu, Yidong Xu, Bin Yue, Cong Zhang, Jia Zhou, Minquan Zhou, Chenguang Zhu, Jiacong Zhu

Abstract: At the Royal Society meeting in 2023, we have mainly presented our lunar orbit array concept called DSL, and also briefly introduced a concept of a lunar surface array, LARAF. As the DSL concept had been presented before, in this article we introduce the LARAF. We propose to build an array in the far side of the Moon, with a master station which handles the data collection and processing, and 20 s… ▽ More At the Royal Society meeting in 2023, we have mainly presented our lunar orbit array concept called DSL, and also briefly introduced a concept of a lunar surface array, LARAF. As the DSL concept had been presented before, in this article we introduce the LARAF. We propose to build an array in the far side of the Moon, with a master station which handles the data collection and processing, and 20 stations with maximum baseline of 10 km. Each station consists 12 membrane antenna units, and the stations are connected to the master station by power line and optical fiber. The array will make interferometric observation in the 0.1-50 MHz band during the lunar night, powered by regenerated fuel cells (RFCs). The whole array can be carried to the lunar surface with a heavy rocket mission, and deployed with a rover in 8 months. Such an array would be an important step in the long term development of lunar based ultralong wavelength radio astronomy. It has a sufficiently high sensitivity to observe many radio sources in the sky, though still short of the dark age fluctuations. We discuss the possible options in the power supply, data communication, deployment, etc. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: final submission version, 30 pages, 16 figures

Journal ref: Phil. Trans. R. Soc. A.382,20230094(2024)

arXiv:2403.15698 [pdf, other]

SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

Authors: Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

Abstract: Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a… ▽ More Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a substantial gap between academic research and industrial deployment. Procedural Controllable Generation (PCG) is an efficient technique for creating scalable and high-quality assets, but it is unfriendly for ordinary users as it demands profound domain expertise. To address these issues, we resort to using the large language model (LLM) to drive the procedural modeling. In this paper, we introduce a large-scale scene generation framework, SceneX, which can automatically produce high-quality procedural models according to designers' textual descriptions.Specifically, the proposed method comprises two components, PCGBench and PCGPlanner. The former encompasses an extensive collection of accessible procedural assets and thousands of hand-craft API documents. The latter aims to generate executable actions for Blender to produce controllable and precise 3D assets guided by the user's instructions. Our SceneX can generate a city spanning 2.5 km times 2.5 km with delicate layout and geometric structures, drastically reducing the time cost from several weeks for professional PCG engineers to just a few hours for an ordinary user. Extensive experiments demonstrated the capability of our method in controllable large-scale scene generation and editing, including asset placement and season translation. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.15483 [pdf]

Rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model

Authors: Maoxuan Zhou, Wei Kang, Kun He

Abstract: In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly… ▽ More In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly, Gram angular field coding technique is used to encode the time domain signal of the rolling bearing and generate the feature map to retain the complete information of the vibration signal. Then, the re-sulting data is divided into a training set, a validation set, and a test set. Among them, the training set is input into the gradient penalty Wasserstein distance generation adversarial network to complete the training, and a new sample with similar features to the training sample is obtained, and then the original training set is expanded. Next, multi-scale convolution is used to extract the fault features of the extended training set, and the feature graph is normalized by example to overcome the influence of the difference in feature distribution. Finally, the attention mechanism is applied to the adaptive weighting of normalized features and the extraction of deep features, and the fault diagnosis is completed by the softmax classifier. Compared with ResNet method, the experimental results show that the proposed method has better generalization performance and anti-noise performance. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13583 [pdf, other]

CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing

Authors: Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian Yuan, Dongmei Zhang

Abstract: Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which e… ▽ More Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which enhances complex code generation by online searching for more information with planned queries and correctness testing for code refinement. Moreover, CoCoST serializes the complex inputs and outputs to improve comprehension and generates test cases to ensure the adaptability for real-world applications. CoCoST is validated through rigorous experiments on the DS-1000 and ClassEval datasets. Experimental results show that CoCoST substantially improves the quality of complex code generation, highlighting its potential to enhance the practicality of LLMs in generating complex code. △ Less

Submitted 1 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12027 [pdf, other]

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in develo** evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding. △ Less

Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.09341 [pdf, other]

doi 10.1051/0004-6361/202348512

Likely detection of magnetic field related LFQPO in the soft X-ray re-brightening of GRS~1915+105

Authors: Ling-Da Kong, Long Ji, Andrea Santangelo, Meng-Lei Zhou, Qing-Cang Shui, Shu Zhang

Abstract: Utilizing NICER observations, we present an analysis of the soft X-ray re-brightening event of GRS 1915+105 observed in 2021. During this event, we observed the emergence of a stable, long-lasting low-frequency quasi-periodic oscillation (LFQPO) with frequencies ranging from 0.17 to 0.21 Hz. Through a careful spectral analysis, we demonstrate that a low-temperature Compton-thick gas model well cha… ▽ More Utilizing NICER observations, we present an analysis of the soft X-ray re-brightening event of GRS 1915+105 observed in 2021. During this event, we observed the emergence of a stable, long-lasting low-frequency quasi-periodic oscillation (LFQPO) with frequencies ranging from 0.17 to 0.21 Hz. Through a careful spectral analysis, we demonstrate that a low-temperature Compton-thick gas model well characterizes the emitted radiation. By examining the spectrum and identifying numerous absorption lines, we discerned a transition in the wind properties. This transition was marked by a shift from a state characterized by low speed, high column density, and high ionization degree to one featuring still low speed but low column density and ionization degree. Intriguingly, the presence or absence of the QPO signal is perfectly correlated with these distinct wind characteristics. The low-speed wind observed could be indicative of a 'failed wind', while the observed shift implies a transition from a magnetically to a thermally driven wind. Notably, this QPO signal exclusively manifested itself during the magnetically driven phase, suggesting the possibility of a novel perturbation associated with magnetic effects. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Journal ref: A&A 686, A211 (2024)

arXiv:2403.05567 [pdf, other]

A Unified Framework for Underwater Metaverse with Optical Perception

Authors: **gyang Cao, Mu Zhou, Jiacheng Wang, Guangyuan Liu, Dusit Niyato, Shiwen Mao, Zhu Han, Jiawen Kang

Abstract: With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scien… ▽ More With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scientific research and marine conservation. Next, we present the architecture and supporting technologies of the underwater Metaverse, including high-resolution underwater imageing technologies and image processing technologies for rendering a realistic virtual world. Based on this, we present a use case for building a realistic underwater virtual world using underwater quantum imaging-generated artificial intelligence (QI-GAI) technology. The results demonstrate the effectiveness of the underwater Metaverse framework in simulating complex underwater environments, thus validating its potential in providing high-quality, interactive underwater virtual experiences. Finally, the paper examines the future development directions of underwater Metaverse, and provides new perspectives for marine science and conservation. △ Less

Submitted 20 February, 2024; originally announced March 2024.

arXiv:2403.05063 [pdf, other]

Aligning Large Language Models for Controllable Recommendations

Authors: Wensheng Lu, Jianxun Lian, Wei Zhang, Guanghua Li, Mingyang Zhou, Hao Liao, Xing Xie

Abstract: Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting th… ▽ More Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 13 pages

MSC Class: 68T50

arXiv:2403.04918 [pdf, other]

Secure Information Embedding and Extraction in Forensic 3D Fingerprinting

Authors: Canran Wang, **wen Wang, Mi Zhou, Vinh Pham, Senyue Hao, Chao Zhou, Ning Zhang, Netanel Raviv

Abstract: The prevalence of 3D printing poses a significant risk to public safety, as any individual with internet access and a commodity printer is able to produce untraceable firearms, keys, counterfeit products, etc. To aid government authorities in combating these new security threats, several approaches have been taken to tag 3D-prints with identifying information. Known as fingerprints, this informati… ▽ More The prevalence of 3D printing poses a significant risk to public safety, as any individual with internet access and a commodity printer is able to produce untraceable firearms, keys, counterfeit products, etc. To aid government authorities in combating these new security threats, several approaches have been taken to tag 3D-prints with identifying information. Known as fingerprints, this information is written into the object using various bit embedding techniques; examples include varying the height of the molten thermoplastic layers, and depositing metallic powder with different magnetic properties. Yet, the practicality of theses techniques in real-world forensic settings is hindered by the adversarial nature of this problem. That is, the 3D-printing process is out of reach of any law enforcement agencies; it is the adversary who controls all aspects of printing and possesses the printed object. To combat these threats, law enforcement agencies can regulate the manufacturing of 3D printers, on which they may enforce a fingerprinting scheme, and collect adversarially tampered remains (e.g., fragments of a broken 3D-printed firearm) during forensic investigation. Therefore, it is important to devise fingerprinting techniques so that the fingerprint could be extracted even if printing is carried out by the adversary. To this end, we present SIDE (Secure Information Embedding and Extraction), a fingerprinting framework that tackles the adversarial nature of forensic fingerprinting in 3D prints by offering both secure information embedding and secure information extraction. △ Less

Submitted 12 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02726 [pdf]

Bias in Generative AI

Authors: Mi Zhou, Vibhanshu Abhishek, Timothy Derdenger, Jaymo Kim, Kannan Srinivasan

Abstract: This study analyzed images generated by three popular generative artificial intelligence (AI) tools - Midjourney, Stable Diffusion, and DALLE 2 - representing various occupations to investigate potential bias in AI generators. Our analysis revealed two overarching areas of concern in these AI generators, including (1) systematic gender and racial biases, and (2) subtle biases in facial expressions… ▽ More This study analyzed images generated by three popular generative artificial intelligence (AI) tools - Midjourney, Stable Diffusion, and DALLE 2 - representing various occupations to investigate potential bias in AI generators. Our analysis revealed two overarching areas of concern in these AI generators, including (1) systematic gender and racial biases, and (2) subtle biases in facial expressions and appearances. Firstly, we found that all three AI generators exhibited bias against women and African Americans. Moreover, we found that the evident gender and racial biases uncovered in our analysis were even more pronounced than the status quo when compared to labor force statistics or Google images, intensifying the harmful biases we are actively striving to rectify in our society. Secondly, our study uncovered more nuanced prejudices in the portrayal of emotions and appearances. For example, women were depicted as younger with more smiles and happiness, while men were depicted as older with more neutral expressions and anger, posing a risk that generative AI models may unintentionally depict women as more submissive and less competent than men. Such nuanced biases, by their less overt nature, might be more problematic as they can permeate perceptions unconsciously and may be more difficult to rectify. Although the extent of bias varied depending on the model, the direction of bias remained consistent in both commercial and open-source AI generators. As these tools become commonplace, our study highlights the urgency to identify and mitigate various biases in generative AI, reinforcing the commitment to ensuring that AI technologies benefit all of humanity for a more inclusive future. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00987 [pdf, other]

Composite Distributed Learning and Synchronization of Nonlinear Multi-Agent Systems with Complete Uncertain Dynamics

Authors: Emadodin Jandaghi, Dalton L. Stein, Adam Hoburg, Paolo Stegagno, Mingxi Zhou, Chengzhi Yuan

Abstract: This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized determinist… ▽ More This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized deterministic learning controller. The first layer is to facilitate each robotic agent's estimation of the leader's information. The second layer is responsible for both controlling individual robot agents to track desired reference trajectories and accurately identifying/learning their nonlinear uncertain dynamics. The proposed distributed learning control scheme represents an advancement in the existing literature due to its ability to manage robotic agents with completely uncertain dynamics including uncertain mass matrices. This allows the robotic control to be environment-independent which can be used in various settings, from underwater to space where identifying system dynamics parameters is challenging. The stability and parameter convergence of the closed-loop system are rigorously analyzed using the Lyapunov method. Numerical simulations validate the effectiveness of the proposed scheme. △ Less

Submitted 9 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.17208 [pdf, other]

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

Authors: Mo Zhou, Jianfeng Lu

Abstract: We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution lies in establishing the global convergence property of our proposed actor-critic flow, demonstrati… ▽ More We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution lies in establishing the global convergence property of our proposed actor-critic flow, demonstrating a linear rate of convergence. Theoretical findings are further validated through numerical examples, showing the efficacy of our approach in practical applications. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2302.05816

MSC Class: 93E20 (Primary); 49L12 49M25 (secondary) ACM Class: G.1.6; G.1.8

arXiv:2402.17207 [pdf, other]

Deployment Prior Injection for Run-time Calibratable Object Detection

Authors: Mo Zhou, Yiding Yang, Haoxiang Li, Vishal M. Patel, Gang Hua

Abstract: With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection. Yet, it turns into a harmful but inevitable training set bias upon test distributions that shift differently across space and time. Nevertheless, the existing detectors cannot incorporate deployment context prior during the test phase without parameter update. Such… ▽ More With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection. Yet, it turns into a harmful but inevitable training set bias upon test distributions that shift differently across space and time. Nevertheless, the existing detectors cannot incorporate deployment context prior during the test phase without parameter update. Such kind of capability requires the model to explicitly learn disentangled representations with respect to context prior. To achieve this, we introduce an additional graph input to the detector, where the graph represents the deployment context prior, and its edge values represent object relations. Then, the detector behavior is trained to bound to the graph with a modified training objective. As a result, during the test phase, any suitable deployment context prior can be injected into the detector via graph edits, hence calibrating, or "re-biasing" the detector towards the given prior at run-time without parameter update. Even if the deployment prior is unknown, the detector can self-calibrate using deployment prior approximated using its own predictions. Comprehensive experimental results on the COCO dataset, as well as cross-dataset testing on the Objects365 dataset, demonstrate the effectiveness of the run-time calibratable detector. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.14436 [pdf]

Structural and resistivity properties of Fe$_{1-x}$Co${_x}$Se single crystals grown by the molten salt method

Authors: Qiaoyu Wang, Mingwei Ma, Binbin Ruan, Menghu Zhou, Yadong Gu, Qingsong Yang, Lewei Chen, Yunqing Shi, Junkun Yi, Genfu Chen, Zhian Ren

Abstract: A series of tetragonal Fe$_{1-x}$Co${_x}$Se single crystals with a complete Co do** range (0$\leq$x$\leq$0.52) up to its solid solubility limit in FeSe have been grown by an eutectic AlCl${_3}$/KCl molten salt method. The typical lateral size of as-grown Fe$_{1-x}$Co${_x}$Se single crystals is 1$-$5 mm. The chemical composition and homogeneity of the crystals was examined by both inductively cou… ▽ More A series of tetragonal Fe$_{1-x}$Co${_x}$Se single crystals with a complete Co do** range (0$\leq$x$\leq$0.52) up to its solid solubility limit in FeSe have been grown by an eutectic AlCl${_3}$/KCl molten salt method. The typical lateral size of as-grown Fe$_{1-x}$Co${_x}$Se single crystals is 1$-$5 mm. The chemical composition and homogeneity of the crystals was examined by both inductively coupled plasma atomic emission spectroscopy and energy dispersive spectrometer. X-ray diffraction analysis demonstrates that the crystal lattice parameters $a$ and $c$ are both linearly decreased with increasing Co do** level x. In the whole do** range, all the samples show metallic behaviour in contrast to a metal insulator transition of Cu-doped FeSe according to the resistivity measurements △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14270 [pdf, other]

Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization

Authors: Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen, Yingbin Liang, Mingyuan Zhou, Zhangyang Wang

Abstract: In the rapidly advancing arena of large language models (LLMs), a key challenge is to enhance their capabilities amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses. These sampl… ▽ More In the rapidly advancing arena of large language models (LLMs), a key challenge is to enhance their capabilities amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses. These samples are deemed informative and beneficial for model refinement, contrasting with the highest-loss samples, which would be discarded due to their correlation with data noise and complexity. We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization (IR-DRO). IR-DRO is designed to dynamically prioritize the training focus on informative samples through an instance reweighting mechanism, streamlined by a closed-form solution for straightforward integration into established training protocols. Through rigorous experimentation with various models and datasets, our findings indicate that our sample-targeted methods significantly improve LLM performance across multiple benchmarks, in both continual pre-training and instruction tuning scenarios. Our codes are available at https://github.com/VITA-Group/HardFocusTraining. △ Less

Submitted 1 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Preprint; updated reference and related works

arXiv:2402.12192 [pdf, other]

Pan-Mamba: Effective pan-sharpening with State Space Model

Authors: Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

Abstract: Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening mot… ▽ More Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening motivates our exploration. Our contribution, Pan-Mamba, represents a novel pan-sharpening network that leverages the efficiency of the Mamba model in global information modeling. In Pan-Mamba, we customize two core components: channel swap** Mamba and cross-modal Mamba, strategically designed for efficient cross-modal information exchange and fusion. The former initiates a lightweight cross-modal interaction through the exchange of partial panchromatic and multi-spectral channels, while the latter facilities the information representation capability by exploiting inherent cross-modal relationships. Through extensive experiments across diverse datasets, our proposed approach surpasses state-of-the-art methods, showcasing superior fusion results in pan-sharpening. To the best of our knowledge, this work is the first attempt in exploring the potential of the Mamba model and establishes a new frontier in the pan-sharpening techniques. The source code is available at \url{https://github.com/alexhe101/Pan-Mamba}. △ Less

Submitted 8 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.10958 [pdf, other]

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

Authors: Yueqin Yin, Zhendong Wang, Yi Gu, Hai Huang, Weizhu Chen, Mingyuan Zhou

Abstract: In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences derived from the same prompts, and it functions without needing an additional reward model. However, DPO does not fully reflect the complex nature of human learnin… ▽ More In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It works by using pairs of preferences derived from the same prompts, and it functions without needing an additional reward model. However, DPO does not fully reflect the complex nature of human learning, which often involves understanding contrasting responses to not only identical but also similar questions. To overcome this shortfall, we propose Relative Preference Optimization (RPO). RPO is designed to discern between more and less preferred responses derived from both identical and related prompts. It introduces a contrastive weighting mechanism, enabling the tuning of LLMs using a broader range of preference data, including both paired and unpaired sets. This approach expands the learning capabilities of the model, allowing it to leverage insights from a more varied set of prompts. Through empirical tests, including dialogue and summarization tasks, and evaluations using the AlpacaEval2.0 leaderboard, RPO has demonstrated a superior ability to align LLMs with user preferences and to improve their adaptability during the training process. Our code can be viewed at https://github.com/yinyueqin/relative-preference-optimization △ Less

Submitted 27 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.10315 [pdf, other]

A variable ionized disk wind in MAXI J1803-298 revealed by NICER

Authors: Zuobin Zhang, Cosimo Bambi, Honghui Liu, Jiachen Jiang, Fangzheng Shi, Yuexin Zhang, Andrew J. Young, John A. Tomsick, Benjamin M. Coughenour, Menglei Zhou

Abstract: We present the results from the NICER observation data of MAXI J1803-298 across the entire 2021 outburst. In the intermediate and soft state, we detect significant absorption lines at $\sim 7.0$ keV and $\sim 6.7$ keV, arising from the X-ray disk wind outflowing with a velocity of hundreds of km per second along our line of sight. The fitting results from photoionized model suggest that the wind i… ▽ More We present the results from the NICER observation data of MAXI J1803-298 across the entire 2021 outburst. In the intermediate and soft state, we detect significant absorption lines at $\sim 7.0$ keV and $\sim 6.7$ keV, arising from the X-ray disk wind outflowing with a velocity of hundreds of km per second along our line of sight. The fitting results from photoionized model suggest that the wind is driven by thermal pressure and the mass-loss rate is low. We find a clear transition for iron from predominantly H-like to predominantly He-like during the intermediate-to-soft state transition. Our results indicate this transition for iron is caused by the evolution of the illuminating spectrum and the slow change of the geometric properties of the disk wind together. The coexistence of disk wind and QPOs features in intermediate state is also reported. Our study makes MAXI J1803-298 the first source in which a transition from optical wind to X-ray wind is detected, offering new insights into the evolution of disk winds across an entire outburst and long-term coupling of accretion disks and mass outflows around accreting black holes. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08265 [pdf, other]

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Authors: Shentao Yang, Tianqi Chen, Mingyuan Zhou

Abstract: Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency… ▽ More Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach. △ Less

Submitted 12 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: 41st International Conference on Machine Learning (ICML 2024)

arXiv:2402.06859 [pdf, other]

LiRank: Industrial Large Scale Ranking Models at LinkedIn

Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems. △ Less

Submitted 9 February, 2024; originally announced February 2024.

ACM Class: H.3.3

arXiv:2402.06190 [pdf, other]

Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

Authors: Amin Karimi Monsefi, Payam Karisani, Mengxi Zhou, Stacey Choi, Nathan Doble, Heng Ji, Srinivasan Parthasarathy, Rajiv Ramnath

Abstract: Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introdu… ▽ More Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.05493 [pdf, other]

Investigating White-Box Attacks for On-Device Models

Authors: Mingyi Zhou, Xiang Gao, **g Wu, Kui Liu, Hailong Sun, Li Li

Abstract: Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not supp… ▽ More Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models. △ Less

Submitted 1 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Published in The International Conference on Software Engineering 2024 (ICSE'24)

arXiv:2402.02263 [pdf, other]

MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers

Authors: Yatong Bai, Mo Zhou, Vishal M. Patel, Somayeh Sojoudi

Abstract: Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct pred… ▽ More Adversarial robustness often comes at the cost of degraded accuracy, impeding the real-life application of robust classification models. Training-based solutions for better trade-offs are limited by incompatibilities with already-trained high-performance large models, necessitating the exploration of training-free ensemble approaches. Observing that robust models are more confident in correct predictions than in incorrect ones on clean and adversarial data alike, we speculate amplifying this "benign confidence property" can reconcile accuracy and robustness in an ensemble setting. To achieve so, we propose "MixedNUTS", a training-free method where the output logits of a robust classifier and a standard non-robust classifier are processed by nonlinear transformations with only three parameters, which are optimized through an efficient algorithm. MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output. On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness -- it boosts CIFAR-100 clean accuracy by 7.86 points, sacrificing merely 0.87 points in robust accuracy. △ Less

Submitted 12 April, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

MSC Class: 68T07

arXiv:2401.13942 [pdf, other]

StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models

Authors: Mohan Zhou, Yalong Bai, Qing Yang, Tiejun Zhao

Abstract: The ability to fine-tune generative models for text-to-image generation tasks is crucial, particularly facing the complexity involved in accurately interpreting and visualizing textual inputs. While LoRA is efficient for language model adaptation, it often falls short in text-to-image tasks due to the intricate demands of image generation, such as accommodating a broad spectrum of styles and nuanc… ▽ More The ability to fine-tune generative models for text-to-image generation tasks is crucial, particularly facing the complexity involved in accurately interpreting and visualizing textual inputs. While LoRA is efficient for language model adaptation, it often falls short in text-to-image tasks due to the intricate demands of image generation, such as accommodating a broad spectrum of styles and nuances. To bridge this gap, we introduce StyleInject, a specialized fine-tuning approach tailored for text-to-image models. StyleInject comprises multiple parallel low-rank parameter matrices, maintaining the diversity of visual features. It dynamically adapts to varying styles by adjusting the variance of visual features based on the characteristics of the input signal. This approach significantly minimizes the impact on the original model's text-image alignment capabilities while adeptly adapting to various styles in transfer learning. StyleInject proves particularly effective in learning from and enhancing a range of advanced, community-fine-tuned generative models. Our comprehensive experiments, including both small-sample and large-scale data fine-tuning as well as base model distillation, show that StyleInject surpasses traditional LoRA in both text-image semantic consistency and human preference evaluation, all while ensuring greater parameter efficiency. △ Less

Submitted 10 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 11 pages, 11 figures

arXiv:2401.13940 [pdf, other]

doi 10.1145/3597503.3639197

How Are Paid and Volunteer Open Source Developers Different? A Study of the Rust Project

Authors: Yuxia Zhang, Mian Qin, Klaas-Jan Stol, Minghui Zhou, Hui Liu

Abstract: It is now commonplace for organizations to pay developers to work on specific open source software (OSS) projects to pursue their business goals. Such paid developers work alongside voluntary contributors, but given the different motivations of these two groups of developers, conflict may arise, which may pose a threat to a project's sustainability. This paper presents an empirical study of paid d… ▽ More It is now commonplace for organizations to pay developers to work on specific open source software (OSS) projects to pursue their business goals. Such paid developers work alongside voluntary contributors, but given the different motivations of these two groups of developers, conflict may arise, which may pose a threat to a project's sustainability. This paper presents an empirical study of paid developers and volunteers in Rust, a popular open source programming language project. Rust is a particularly interesting case given considerable concerns about corporate participation. We compare volunteers and paid developers through contribution characteristics and long-term participation, and solicit volunteers' perceptions on paid developers. We find that core paid developers tend to contribute more frequently; commits contributed by one-time paid developers have bigger sizes; peripheral paid developers implement more features; and being paid plays a positive role in becoming a long-term contributor. We also find that volunteers do have some prejudices against paid developers. This study suggests that the dichotomous view of paid vs. volunteer developers is too simplistic and that further subgroups can be identified. Companies should become more sensitive to how they engage with OSS communities, in certain ways as suggested by this study. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.11078 [pdf, other]

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

Authors: Mingyuan Zhou, Rakib Hyder, Ziwei Xuan, Guojun Qi

Abstract: Recent advances in 3D avatar generation have gained significant attentions. These breakthroughs aim to produce more realistic animatable avatars, narrowing the gap between virtual and real-world experiences. Most of existing works employ Score Distillation Sampling (SDS) loss, combined with a differentiable renderer and text condition, to guide a diffusion model in generating 3D avatars. However,… ▽ More Recent advances in 3D avatar generation have gained significant attentions. These breakthroughs aim to produce more realistic animatable avatars, narrowing the gap between virtual and real-world experiences. Most of existing works employ Score Distillation Sampling (SDS) loss, combined with a differentiable renderer and text condition, to guide a diffusion model in generating 3D avatars. However, SDS often generates oversmoothed results with few facial details, thereby lacking the diversity compared with ancestral sampling. On the other hand, other works generate 3D avatar from a single image, where the challenges of unwanted lighting effects, perspective views, and inferior image quality make them difficult to reliably reconstruct the 3D face meshes with the aligned complete textures. In this paper, we propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting. To this end, the proposed approach presents a diffuse color extraction model and an authenticity guided texture diffusion model. The former removes the unwanted lighting effects to reveal true diffuse colors so that the generated avatars can be rendered under various lighting conditions. The latter follows two gradient-based guidances for generating PBR textures to render diverse face-identity features and details better aligning with 3D mesh geometry. We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: The project page is at http://usrc-sea.github.io/UltrAvatar/

arXiv:2401.09547 [pdf, other]

A deep learning algorithm for computing mean field control problems via forward-backward score dynamics

Authors: Mo Zhou, Stanley Osher, Wuchen Li

Abstract: We propose a deep learning approach to compute mean field control problems with individual noises. The problem consists of the Fokker-Planck (FP) equation and the Hamilton-Jacobi-Bellman (HJB) equation. Using the differential of the entropy, namely the score function, we first formulate the deterministic forward-backward characteristics for the mean field control system, which is different from th… ▽ More We propose a deep learning approach to compute mean field control problems with individual noises. The problem consists of the Fokker-Planck (FP) equation and the Hamilton-Jacobi-Bellman (HJB) equation. Using the differential of the entropy, namely the score function, we first formulate the deterministic forward-backward characteristics for the mean field control system, which is different from the classical forward-backward stochastic differential equations (FBSDEs). We further apply the neural network approximation to fit the proposed deterministic characteristic lines. Numerical examples, including the control problem with entropy potential energy, the linear quadratic regulator, and the systemic risks, demonstrate the effectiveness of the proposed method. △ Less

Submitted 17 January, 2024; originally announced January 2024.

MSC Class: 49N80 (Primary) 35Q89 (Secondary) ACM Class: G.1.6; G.1.8

arXiv:2401.07039 [pdf, other]

Quantum Generative Diffusion Model: A Fully Quantum-Mechanical Model for Generating Quantum State Ensemble

Authors: Chuangtao Chen, Qinglin Zhao, MengChu Zhou, Zhimin He, Zhili Sun, Haozhen Situ

Abstract: Classical diffusion models have shown superior generative results and have been applied to many problems. Exploring these models in the quantum domain can advance the field of quantum generative learning. In this paper, we introduce the Quantum Generative Diffusion Model (QGDM), a simple and elegant quantum counterpart of classical diffusion models. The core idea of QGDM is that any target quant… ▽ More Classical diffusion models have shown superior generative results and have been applied to many problems. Exploring these models in the quantum domain can advance the field of quantum generative learning. In this paper, we introduce the Quantum Generative Diffusion Model (QGDM), a simple and elegant quantum counterpart of classical diffusion models. The core idea of QGDM is that any target quantum state can be transformed into a completely mixed state, which has the highest entropy and maximum uncertainty about the system, through a non-unitary forward process. Subsequently, a trainable backward process can be used to recover the target state from the completely mixed state. The design requirements for QGDM's backward process include ensuring non-unitarity while maintaining a low number of parameters. To achieve this, we introduce partial trace operations in the backward process to enforce non-unitary. Additionally, we control the number of trainable parameters by using a parameter-sharing strategy and incorporating temporal information as an input in the backward process. Furthermore, we introduce a resource-efficient version of QGDM, which reduces the number of auxiliary qubits while preserving impressive generative capabilities. Our proposed models exhibit better convergence performance than Quantum Generative Adversarial Networks (QGANs) because our models optimize a convex distance function using gradient descent. Comparative results with QGANs demonstrate the effectiveness of our models in generating both pure and mixed quantum states. Notably, our models achieve 53.03% higher fidelity in mixed-state generation tasks compared to QGANs. These results highlight the potential of the proposed models to tackle challenging quantum generation tasks. △ Less

Submitted 3 June, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

Comments: Comments are welcome

arXiv:2401.05212 [pdf, other]

Outflow-related radio emission in radio-quiet quasars

Authors: Mai Liao, Junxian Wang, Wenke Ren, Minhua Zhou

Abstract: In this work, we revisit the relationship between [O III] line width $w_{\rm 90}$ (as the indicator of AGN outflow velocity) and the radio emission in RQQs by employing a large sample of Type I quasars ($\sim 37,000$) selected from the Sloan Digital Sky Survey (SDSS) Data Release Sixteen. By median stacking the radio images (to include the dominant fraction of individually radio non-detected RQQs)… ▽ More In this work, we revisit the relationship between [O III] line width $w_{\rm 90}$ (as the indicator of AGN outflow velocity) and the radio emission in RQQs by employing a large sample of Type I quasars ($\sim 37,000$) selected from the Sloan Digital Sky Survey (SDSS) Data Release Sixteen. By median stacking the radio images (to include the dominant fraction of individually radio non-detected RQQs) of Karl G. Jansky Very Large Array (VLA) Sky Survey (VLASS) for subsamples of RQQs with different $w_{\rm 90}$, our study demonstrates that, the correlation between $w_{\rm 90}$ and radio emission in our SDSS RQQs is significant, and remains solid after controlling the effects of black hole mass, quasar luminosity, Eddington ratio and redshift. This intrinsic link supports that the [O III] outflows in quasars, most likely resulted from wide-angled sub-relativistic quasar winds launched from the accretion disc, could make a dominant contribution to radio emission in the general RQQs. Alternatively, the correlation may be attributed to low-power jets in RQQs if they are ubiquitous and could efficiently enhance the [O III] width through interacting with the ISM. Meanwhile, the star-formation rates traced by the flux ratio of [Ne V]/[O II] emission lines display no dependence on $w_{\rm 90}$ after controlling the effects of black hole mass, quasar luminosity, Eddington ratio and redshift. This suggests that the stronger radio emission in RQQs with larger $w_{\rm 90}$ could not be attributed to outflow enhanced (positive feedback) star formation in the hosts. However, this also indicates the outflows, though exhibiting robust correlation with radio power, produce neither positive nor negative feedback to the star formation in their hosts. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 9 pages, 4 figures, accepted by MNRAS

arXiv:2401.03788 [pdf, other]

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion

Authors: Minglong Xue, **hong He, Wenhai Wang, Mingliang Zhou

Abstract: Low-light image enhancement techniques have significantly progressed, but unstable image quality recovery and unsatisfactory visual perception are still significant challenges. To solve these problems, we propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD. Specifically, CFWD leverages multimodal visual-language information i… ▽ More Low-light image enhancement techniques have significantly progressed, but unstable image quality recovery and unsatisfactory visual perception are still significant challenges. To solve these problems, we propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD. Specifically, CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process. Multi-scale supervision across different modalities facilitates the alignment of image features with semantic features during the wavelet diffusion process, effectively bridging the gap between degraded and normal domains. Moreover, to further promote the effective recovery of the image details, we combine the Fourier transform based on the wavelet transform and construct a Hybrid High Frequency Perception Module (HFPM) with a significant perception of the detailed features. This module avoids the diversity confusion of the wavelet diffusion process by guiding the fine-grained structure recovery of the enhancement results to achieve favourable metric and perceptually oriented enhancement. Extensive quantitative and qualitative experiments on publicly available real-world benchmarks show that our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression. The project code is available at https://github.com/hejh8/CFWD. △ Less

Submitted 17 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03261 [pdf, other]

The X-ray high-energy cutoff in Compact Symmetric Object Mrk 348

Authors: Mai Liao, Junxian Wang, Jialai Kang, Xiaofeng Li, Minhua Zhou

Abstract: Compact radio AGN are thought to be young radio active galactic nuclei (AGN) at the early stage of AGN evolution, thus are ideal laboratory to study the high-energy emission throughout the evolution of radio AGN. In this work, we report for the first time the detection of the high-energy cutoff ($E_{\rm cut}$), a direct indicator of thermal coronal radiation, of X-ray emission in Mrk 348 ($z$ = 0.… ▽ More Compact radio AGN are thought to be young radio active galactic nuclei (AGN) at the early stage of AGN evolution, thus are ideal laboratory to study the high-energy emission throughout the evolution of radio AGN. In this work, we report for the first time the detection of the high-energy cutoff ($E_{\rm cut}$), a direct indicator of thermal coronal radiation, of X-ray emission in Mrk 348 ($z$ = 0.015), a young radio galaxy classified as compact symmetric object. With a 100 ks NuSTAR exposure, we find that the high-energy cutoff ($E_{\rm cut}$ ) is firmly detected ($218^{+124}_{-62}$ keV). Fitting with various Comptonization models indicates the presence of a hot corona with temperature $kT_{\rm e}$ = 35 -- 40 keV. These strongly support the corona origin for its hard X-ray emission. The comparison in the $E_{\rm cut}$ -- spectra index $Γ$ plot of Mrk 348 with normal large-scale radio galaxies (mostly FR II) yields no difference between them. This suggests the corona properties in radio sources may not evolve over time (i.e., from the infant stage to mature stage), which is to-be-confirmed with future sample studies of young radio AGN. △ Less

Submitted 10 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: 7 pages, 5 figures, 2 tables, accepted by MNRAS

arXiv:2401.02539 [pdf, other]

Robot-Assisted Deep Venous Thrombosis Ultrasound Examination using Virtual Fixture

Authors: Dianye Huang, Chenguang Yang, Mingchuan Zhou, Angelos Karlas, Nassir Navab, Zhongliang Jiang

Abstract: Deep Venous Thrombosis (DVT) is a common vascular disease with blood clots inside deep veins, which may block blood flow or even cause a life-threatening pulmonary embolism. A typical exam for DVT using ultrasound (US) imaging is by pressing the target vein until its lumen is fully compressed. However, the compression exam is highly operator-dependent. To alleviate intra- and inter-variations, we… ▽ More Deep Venous Thrombosis (DVT) is a common vascular disease with blood clots inside deep veins, which may block blood flow or even cause a life-threatening pulmonary embolism. A typical exam for DVT using ultrasound (US) imaging is by pressing the target vein until its lumen is fully compressed. However, the compression exam is highly operator-dependent. To alleviate intra- and inter-variations, we present a robotic US system with a novel hybrid force motion control scheme ensuring position and force tracking accuracy, and soft landing of the probe onto the target surface. In addition, a path-based virtual fixture is proposed to realize easy human-robot interaction for repeat compression operation at the lesion location. To ensure the biometric measurements obtained in different examinations are comparable, the 6D scanning path is determined in a coarse-to-fine manner using both an external RGBD camera and US images. The RGBD camera is first used to extract a rough scanning path on the object. Then, the segmented vascular lumen from US images are used to optimize the scanning path to ensure the visibility of the target object. To generate a continuous scan path for develo** virtual fixtures, an arc-length based path fitting model considering both position and orientation is proposed. Finally, the whole system is evaluated on a human-like arm phantom with an uneven surface. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted Paper IEEE T-ASE

arXiv:2401.02458 [pdf, other]

Data-Centric Foundation Models in Computational Healthcare: A Survey

Authors: Yunkun Zhang, ** Gao, Zheling Tan, Lingfeng Zhou, Kexin Ding, Mu Zhou, Shaoting Zhang, Dequan Wang

Abstract: The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinica… ▽ More The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinical data records has been a longstanding challenge, ranging from data quantity, annotation, patient privacy, and ethics. In this survey, we investigate a wide range of data-centric approaches in the FM era (from model pre-training to inference) towards improving the healthcare workflow. We discuss key perspectives in AI security, assessment, and alignment with human values. Finally, we offer a promising outlook of FM-based analytics to enhance the performance of patient outcome and clinical workflow in the evolving landscape of healthcare and medicine. We provide an up-to-date list of healthcare-related foundation models and datasets at https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare . △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.02309 [pdf, other]

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

Authors: Hao Sun, Mingyao Zhou, Wen**g Chen, Wei Xie

Abstract: Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature ext… ▽ More Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature extraction and feature interaction, achieving good performance. Nevertheless, these approaches underutilize the reciprocal relationship between two tasks. In this paper, we propose a task-reciprocal transformer based on DETR (TR-DETR) that focuses on exploring the inherent reciprocity between MR and HD. Specifically, a local-global multi-modal alignment module is first built to align features from diverse modalities into a shared latent space. Subsequently, a visual feature refinement is designed to eliminate query-irrelevant information from visual features for modal interaction. Finally, a task cooperation module is constructed to refine the retrieval pipeline and the highlight score prediction process by utilizing the reciprocity between MR and HD. Comprehensive experiments on QVHighlights, Charades-STA and TVSum datasets demonstrate that TR-DETR outperforms existing state-of-the-art methods. Codes are available at \url{https://github.com/mingyao1120/TR-DETR}. △ Less

Submitted 4 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI-24

arXiv:2401.02161 [pdf, other]

Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain

Authors: Xuanhua He, Tao Hu, Guoli Wang, Ze** Wang, Run Wang, Qian Zhang, Keyu Yan, Ziyi Chen, Rui Li, Chenjun Xie, Jie Zhang, Man Zhou

Abstract: RAW to sRGB map**, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution… ▽ More RAW to sRGB map**, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution variations. Recent methods directly rebuild color map** and spatial structure via shared deep representation, limiting optimal performance. Inspired by Image Signal Processing (ISP) pipeline, which distinguishes image restoration and enhancement, we present a novel Neural ISP framework, named FourierISP. This approach breaks the image down into style and structure within the frequency domain, allowing for independent optimization. FourierISP is comprised of three subnetworks: Phase Enhance Subnet for structural refinement, Amplitude Refine Subnet for color learning, and Color Adaptation Subnet for blending them in a smooth manner. This approach sharpens both color and structure, and extensive evaluations across varied datasets confirm that our approach realizes state-of-the-art results. Code will be available at ~\url{https://github.com/alexhe101/FourierISP}. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.02151 [pdf, other]

Frequency-Adaptive Pan-Sharpening with Mixture of Experts

Authors: Xuanhua He, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

Abstract: Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mi… ▽ More Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening, which consists of three key components: the Adaptive Frequency Separation Prediction Module, the Sub-Frequency Learning Expert Module, and the Expert Mixture Module. In detail, the first leverages the discrete cosine transform to perform frequency separation by predicting the frequency mask. On the basis of generated mask, the second with low-frequency MOE and high-frequency MOE takes account for enabling the effective low-frequency and high-frequency information reconstruction. Followed by, the final fusion module dynamically weights high-frequency and low-frequency MOE knowledge to adapt to remote sensing images with significant content variations. Quantitative and qualitative experiments over multiple datasets demonstrate that our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes. Code will be made publicly at \url{https://github.com/alexhe101/FAME-Net}. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.00160 [pdf, other]

Acceleration Estimation of Signal Propagation Path Length Changes for Wireless Sensing

Authors: Jiacheng Wang, Hongyang Du, Dusit Niyato, Mu Zhou, Jiawen Kang, H. Vincent Poor

Abstract: As indoor applications grow in diversity, wireless sensing, vital in areas like localization and activity recognition, is attracting renewed interest. Indoor wireless sensing relies on signal processing, particularly channel state information (CSI) based signal parameter estimation. Nonetheless, regarding reflected signals induced by dynamic human targets, no satisfactory algorithm yet exists for… ▽ More As indoor applications grow in diversity, wireless sensing, vital in areas like localization and activity recognition, is attracting renewed interest. Indoor wireless sensing relies on signal processing, particularly channel state information (CSI) based signal parameter estimation. Nonetheless, regarding reflected signals induced by dynamic human targets, no satisfactory algorithm yet exists for estimating the acceleration of dynamic path length change (DPLC), which is crucial for various sensing tasks in this context. Hence, this paper proposes DP-AcE, a CSI-based DPLC acceleration estimation algorithm. We first model the relationship between the phase difference of adjacent CSI measurements and the DPLC's acceleration. Unlike existing works assuming constant velocity, DP-AcE considers both velocity and acceleration, yielding a more accurate and objective representation. Using this relationship, an algorithm combining scaling with Fourier transform is proposed to realize acceleration estimation. We evaluate DP-AcE via the acceleration estimation and acceleration-based fall detection with the collected CSI. Experimental results reveal that, using distance as the metric, DP-AcE achieves a median acceleration estimation percentage error of 4.38%. Furthermore, in multi-target scenarios, the fall detection achieves an average true positive rate of 89.56% and a false positive rate of 11.78%, demonstrating its importance in enhancing indoor wireless sensing capabilities. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2401.00006 [pdf, other]

Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

Authors: Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, **g Hou, Yu Qiao, Yu Liu

Abstract: Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterp… ▽ More Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and goal-conditioned training a policy for decision-making; (2) co-training to align the LLM and policy, achieving instruction open-endedness. We conducted experiments using Contra, an open-ended FPS game, demonstrating that an agent trained with OpenPAL not only comprehends arbitrary instructions but also exhibits efficient execution. These results suggest that OpenPAL holds the potential to construct open-ended embodied agents in practical scenarios. △ Less

Submitted 6 February, 2024; v1 submitted 12 December, 2023; originally announced January 2024.

arXiv:2312.14013 [pdf, ps, other]

Two-Stage Pseudo Maximum Likelihood Estimation of Semiparametric Copula-based Regression Models for Semi-Competing Risks Data

Authors: Sakie J. Arachchige, Xinyuan Chen, Qian M. Zhou

Abstract: We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. Under a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence… ▽ More We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. Under a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence between the bivariate event times is specified by a parametric copula function. For the estimation procedure, in the first stage, the parameters associated with the marginal of the terminal event are estimated only using the corresponding observed outcomes, and in the second stage, the marginal parameters for the non-terminal event time and the copula parameter are estimated via maximizing a pseudo-likelihood function based on the joint distribution of the bivariate event times. We derived the asymptotic properties of the proposed estimator and provided an analytic variance estimator for inference. Through simulation studies, we showed that our approach leads to consistent estimates with less computational cost and more robustness compared to the one-stage procedure developed in Chen (2012), where all parameters were estimated simultaneously. In addition, our approach demonstrates more desirable finite-sample performances over another existing two-stage estimation method proposed in Zhu et al. (2021). △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 24 pages, 1 figure

arXiv:2312.13671 [pdf, other]

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Authors: Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian Yuan, Dongmei Zhang

Abstract: Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible ope… ▽ More Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI'2024

arXiv:2312.13351 [pdf, other]

A large jet narrow-line Seyfert 1 galaxy: observations from pc to 100 kpc scales

Authors: Sina Chen, Preeti Kharb, Silpa Sasikumar, Sumana Nandi, Marco Berton, Emilia Jarvela, Ari Laor, Ehud Behar, Luigi Foschini, Amelia Vietri, Minfeng Gu, Giovanni La Mura, Luca Crepaldi, Minhua Zhou

Abstract: We present new 1.5-8.5 GHz Very Long Baseline Array (VLBA) observations and 0.32-1.26 GHz Giant Meterwave Radio Telescope (GMRT) observations of J0354-1340, which is the only known radio-quiet (RQ) or radio-intermediate (RI) narrow-line Seyfert 1 galaxy with a 100-kpc two-sided radio jet. A pc-scale one-sided jet in the southeast direction from the core emission is found in the VLBA observations,… ▽ More We present new 1.5-8.5 GHz Very Long Baseline Array (VLBA) observations and 0.32-1.26 GHz Giant Meterwave Radio Telescope (GMRT) observations of J0354-1340, which is the only known radio-quiet (RQ) or radio-intermediate (RI) narrow-line Seyfert 1 galaxy with a 100-kpc two-sided radio jet. A pc-scale one-sided jet in the southeast direction from the core emission is found in the VLBA observations, while the kpc-scale jet observed with Karl G. Jansky Very Large Array (VLA) and GMRT is in the south-north direction. The core spectra on pc and kpc scales are presented in combination with the archival VLASS observations at 3.0 GHz and the VLA C configuration observations at 5.5 GHz. The pc-scale emission dominates the kpc-scale emission above ~ 5 GHz, and the spectrum is inverted due to synchrotron self-absorption. This indicates a compact synchrotron source with a size of ~ 0.04 pc, which is associated with either the jet base or the corona. A sub-kpc scale jet, which is unresolved on scales of ~ 3 arcsec, probably dominates the emission below ~ 5 GHz. Future radio observations can explore the jet structure between the pc and 100 kpc scales, the origin of their direction mismatch, and the pc-scale jet proper motion. It remains to be explored how common such large-scale jets are in RQ or RI AGN. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted for publication in ApJ

arXiv:2312.10160 [pdf, other]

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Authors: Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji

Abstract: Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natur… ▽ More Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natural image captioning, the factuality of generated captions for structured document images, such as charts, has not received as much scrutiny, posing a potential threat to information reliability in critical applications. This work delves into the factuality aspect by introducing a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models, ultimately forming the foundation of a novel dataset, CHOCOLATE. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies. In response to this challenge, we establish the new task of Chart Caption Factual Error Correction and introduce CHARTVE, a model for visual entailment that outperforms proprietary and open-source LVLMs in evaluating factual consistency. Furthermore, we propose C2TFEC, an interpretable two-stage framework that excels at correcting factual errors. This work inaugurates a new domain in factual error correction for chart captions, presenting a novel evaluation mechanism, and demonstrating an effective approach to ensuring the factuality of generated chart captions. The code and data as well as the continuously updated benchmark can be found at: https://khuangaf.github.io/CHOCOLATE/. △ Less

Submitted 30 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: ACL 2024 Findings

arXiv:2312.09576 [pdf, other]

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

arXiv:2312.09050 [pdf, other]

A Sparse Cross Attention-based Graph Convolution Network with Auxiliary Information Awareness for Traffic Flow Prediction

Authors: Lingqiang Chen, Qinglin Zhao, Guanghui Li, Mengchu Zhou, Chenglong Dai, Yiming Feng

Abstract: Deep graph convolution networks (GCNs) have recently shown excellent performance in traffic prediction tasks. However, they face some challenges. First, few existing models consider the influence of auxiliary information, i.e., weather and holidays, which may result in a poor grasp of spatial-temporal dynamics of traffic data. Second, both the construction of a dynamic adjacent matrix and regular… ▽ More Deep graph convolution networks (GCNs) have recently shown excellent performance in traffic prediction tasks. However, they face some challenges. First, few existing models consider the influence of auxiliary information, i.e., weather and holidays, which may result in a poor grasp of spatial-temporal dynamics of traffic data. Second, both the construction of a dynamic adjacent matrix and regular graph convolution operations have quadratic computation complexity, which restricts the scalability of GCN-based models. To address such challenges, this work proposes a deep encoder-decoder model entitled AIMSAN. It contains an auxiliary information-aware module (AIM) and sparse cross attention-based graph convolution network (SAN). The former learns multi-attribute auxiliary information and obtains its embedded presentation of different time-window sizes. The latter uses a cross-attention mechanism to construct dynamic adjacent matrices by fusing traffic data and embedded auxiliary data. Then, SAN applies diffusion GCN on traffic data to mine rich spatial-temporal dynamics. Furthermore, AIMSAN considers and uses the spatial sparseness of traffic nodes to reduce the quadratic computation complexity. Experimental results on three public traffic datasets demonstrate that the proposed method outperforms other counterparts in terms of various performance indices. Specifically, the proposed method has competitive performance with the state-of-the-art algorithms but saves 35.74% of GPU memory usage, 42.25% of training time, and 45.51% of validation time on average. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.09039 [pdf, other]

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

Authors: Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang

Abstract: Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and semi-structured tabular data. However, previous table reasoning solutions only consider small-sized tables and exhibit limitations in handling larger tables. In addition, most existing methods struggle to reason over… ▽ More Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and semi-structured tabular data. However, previous table reasoning solutions only consider small-sized tables and exhibit limitations in handling larger tables. In addition, most existing methods struggle to reason over complex questions since they lack essential information or they are scattered in different places. To alleviate these challenges, we propose TAP4LLM as a versatile pre-processing toolbox to generate table prompts through (1) table sampling, (2) table augmentation, and (3) table packing while balancing the token allocation trade-off. In each module, we collect and design several common methods for usage in various scenarios (e.g., speed over accuracy). We also provide a comprehensive evaluation on performance of each components inside TAP4LLM and show that our method improves LLMs' reasoning capabilities in various tabular tasks and enhances the interaction between LLMs and tabular data by employing effective pre-processing. △ Less

Submitted 17 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.08716 [pdf, other]

Induced magneto-conductivity in a two-node Weyl semimetal under Gaussian random disorder

Authors: Chuan-Xiong Xu, Hao-** Yu, Mei Zhou, Xuanting Ji

Abstract: Measuring the magnetoconductivity induced from impurities may help determine the impurity distribution and reveal the structure of a Weyl semimetal sample. To verify this, we utilized the Gaussian random disorder to simulate charged impurities in a two-node Weyl semimetal model and investigate the impact of charged impurities on magnetoconductivity in Weyl semimetals. We first compute the longitud… ▽ More Measuring the magnetoconductivity induced from impurities may help determine the impurity distribution and reveal the structure of a Weyl semimetal sample. To verify this, we utilized the Gaussian random disorder to simulate charged impurities in a two-node Weyl semimetal model and investigate the impact of charged impurities on magnetoconductivity in Weyl semimetals. We first compute the longitudinal magnetic conductivity and find that it is positive and increases proportionally with the parameter governing the Gaussian distribution of charged impurities, suggesting the presence of negative longitudinal magnetoresistivity (NLMR). Then we consider both the intravalley and inter-valley scattering processes to calculate the induced transverse magnetoconductivity in the model. Our findings indicate that both inter-valley and intra-valley scattering processes play important roles in calculating the transverse magnetoconductivity. The locations of Weyl nodes can also be determined by magnetoconductivity measurements. This is possible if the magnetic field strength and the density of charged impurities are known. Alternatively, the measurement of magnetic conductivity may reveal the distribution of charged impurites in a given sample once the locations of the Weyl nodes have been determined. These findings can aid in detecting the structure of a Weyl semimetal sample, enhancing comprehension of magnetotransport in Weyl semimetals, and promoting the development of valley electronics. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 16 pages,7 figures

arXiv:2312.05006 [pdf, other]

Decoupling Degradation and Content Processing for Adverse Weather Image Restoration

Authors: Xi Wang, Xueyang Fu, Peng-Tao Jiang, Jie Huang, Mi Zhou, Bo Li, Zheng-Jun Zha

Abstract: Adverse weather image restoration strives to recover clear images from those affected by various weather types, such as rain, haze, and snow. Each weather type calls for a tailored degradation removal approach due to its unique impact on images. Conversely, content reconstruction can employ a uniform approach, as the underlying image content remains consistent. Although previous techniques can han… ▽ More Adverse weather image restoration strives to recover clear images from those affected by various weather types, such as rain, haze, and snow. Each weather type calls for a tailored degradation removal approach due to its unique impact on images. Conversely, content reconstruction can employ a uniform approach, as the underlying image content remains consistent. Although previous techniques can handle multiple weather types within a single network, they neglect the crucial distinction between these two processes, limiting the quality of restored images. This work introduces a novel adverse weather image restoration method, called DDCNet, which decouples the degradation removal and content reconstruction process at the feature level based on their channel statistics. Specifically, we exploit the unique advantages of the Fourier transform in both these two processes: (1) the degradation information is mainly located in the amplitude component of the Fourier domain, and (2) the Fourier domain contains global information. The former facilitates channel-dependent degradation removal operation, allowing the network to tailor responses to various adverse weather types; the latter, by integrating Fourier's global properties into channel-independent content features, enhances network capacity for consistent global content reconstruction. We further augment the degradation removal process with a degradation map** loss function. Extensive experiments demonstrate our method achieves state-of-the-art performance in multiple adverse weather removal benchmarks. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04767 [pdf, other]

Finite Horizon Reinforcement Learning in Solving Optimal Control of State-Dependent Switched Systems

Authors: Mi Zhou

Abstract: In this article, the deep deterministic policy gradient (DDPG) method is used to learn an optimal control policy of a multi-region state-dependent switched system. We observe good performance of this model-free method and explain it in a rigorous mathematical language. The performance of the learning-based methods is compared with the optimal solution given by vanilla differential dynamic programm… ▽ More In this article, the deep deterministic policy gradient (DDPG) method is used to learn an optimal control policy of a multi-region state-dependent switched system. We observe good performance of this model-free method and explain it in a rigorous mathematical language. The performance of the learning-based methods is compared with the optimal solution given by vanilla differential dynamic programming (DDP) in three customized environments. △ Less

Submitted 14 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04257 [pdf, other]

Proxima: Near-storage Acceleration for Graph-based Approximate Nearest Neighbor Search in 3D NAND

Authors: Weihong Xu, Junwei Chen, Po-Kai Hsu, Jaeyoung Kang, Minxuan Zhou, Sumukh **e, Shimeng Yu, Tajana Rosing

Abstract: Approximate nearest neighbor search (ANNS) plays an indispensable role in a wide variety of applications, including recommendation systems, information retrieval, and semantic search. Among the cutting-edge ANNS algorithms, graph-based approaches provide superior accuracy and scalability on massive datasets. However, the best-performing graph-based ANN search solutions incur tens of hundreds of me… ▽ More Approximate nearest neighbor search (ANNS) plays an indispensable role in a wide variety of applications, including recommendation systems, information retrieval, and semantic search. Among the cutting-edge ANNS algorithms, graph-based approaches provide superior accuracy and scalability on massive datasets. However, the best-performing graph-based ANN search solutions incur tens of hundreds of memory footprints as well as costly distance computation, thus hindering their efficient deployment at scale. The 3D NAND flash is emerging as a promising device for data-intensive applications due to its high density and nonvolatility. In this work, we present the near-storage processing (NSP)-based ANNS solution Proxima, to accelerate graph-based ANNS with algorithm-hardware co-design in 3D NAND flash. Proxima significantly reduces the complexity of graph search by leveraging the distance approximation and early termination. On top of the algorithmic enhancement, we implement Proxima search algorithm in 3D NAND flash using the heterogeneous integration technique. To maximize 3D NAND's bandwidth utilization, we present customized dataflow and optimized data allocation scheme. Our evaluation results show that: compared to graph ANNS on CPU and GPU, Proxima achieves a magnitude improvement in throughput or energy efficiency. Proxima yields 7x to 13x speedup over existing ASIC designs. Furthermore, Proxima achieves a good balance between accuracy, efficiency and storage density compared to previous NSP-based accelerators. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Showing 51–100 of 1,131 results for author: Zhou, M