Search | arXiv e-print repository

Driving Referring Video Object Segmentation with Vision-Language Pre-trained Models

Authors: Zikun Zhou, Wentao Xiong, Li Zhou, Xin Li, Zhenyu He, Yaowei Wang

Abstract: The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pre-trained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Langua… ▽ More The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pre-trained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language~(VL) relation modeling from scratch. Witnessing the success of Vision-Language Pre-trained (VLP) models, we propose to learn relation modeling for RVOS based on their aligned VL feature space. Nevertheless, transferring VLP models to RVOS is a deceptively challenging task due to the substantial gap between the pre-training task (image/region-level prediction) and the RVOS task (pixel-level prediction in videos). In this work, we introduce a framework named VLP-RVOS to address this transfer challenge. We first propose a temporal-aware prompt-tuning method, which not only adapts pre-trained representations for pixel-level prediction but also empowers the vision encoder to model temporal clues. We further propose to perform multi-stage VL relation modeling while and after feature extraction for comprehensive VL understanding. Besides, we customize a cube-frame attention mechanism for spatial-temporal reasoning. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms and exhibits strong generalization abilities. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10469 [pdf, other]

Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions

Authors: Yu Xia, Sriram Narayanamoorthy, Zhengyuan Zhou, Joshua Mabry

Abstract: The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shop** behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained a… ▽ More The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shop** behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10226 [pdf, other]

Geometric phase amplification in a clock interferometer for enhanced metrology

Authors: Zhifan Zhou, Sebastian C. Carrasco, Christian Sanner, Vladimir S. Malinovsky, Ron Folman

Abstract: High-precision measurements are crucial for testing the fundamental laws of nature and for advancing the technological frontier. Clock interferometry, where particles with an internal clock are coherently split and recombined along two spatial paths, has sparked significant interest due to its fundamental implications, especially at the intersection of quantum mechanics and general relativity. Her… ▽ More High-precision measurements are crucial for testing the fundamental laws of nature and for advancing the technological frontier. Clock interferometry, where particles with an internal clock are coherently split and recombined along two spatial paths, has sparked significant interest due to its fundamental implications, especially at the intersection of quantum mechanics and general relativity. Here, we demonstrate that a clock interferometer provides metrological improvement with respect to its technical-noise-limited counterpart employing a single internal quantum state. This enhancement around a critical working point can be interpreted as a geometric-phase-induced signal-to-noise ratio gain. In our experimental setup, we infer a precision enhancement of 8.8 decibels when measuring a small difference between external fields. We estimate that tens of decibels of precision enhancement could be attained for measurements with a higher atom flux. This opens the door to the development of a superior probe for fundamental physics as well as a high-performance sensor for various technological applications. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10037 [pdf, other]

Bilateral Event Mining and Complementary for Event Stream Super-Resolution

Authors: Zhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang

Abstract: Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mut… ▽ More Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mutually refine each other by considering their correlations. In this paper, we propose a bilateral event mining and complementary network (BMCNet) to fully leverage the potential of each event and capture the shared information to complement each other simultaneously. Specifically, we resort to a two-stream network to accomplish comprehensive mining of each type of events individually. To facilitate the exchange of information between two streams, we propose a bilateral information exchange (BIE) module. This module is layer-wisely embedded between two streams, enabling the effective propagation of hierarchical global information while alleviating the impact of invalid information brought by inherent characteristics of events. The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods in ESR, achieving performance improvements of over 11\% on both real and synthetic datasets. Moreover, our method significantly enhances the performance of event-based downstream tasks such as object recognition and video reconstruction. Our code is available at https://github.com/Lqm26/BMCNet-ESR. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR2024

arXiv:2405.09776 [pdf]

doi 10.1103/PhysRevB.109.184106

Magnetic structure and magnetoelectric coupling in antiferromagnet Co5(TeO3)4Cl2

Authors: B. Yu, L. Huang, J. S. Li, L. Lin, V. Ovidiu Garlea, Q. Zhang, T. Zou, J. C. Zhang, J. Peng, Y. S. Tang, G. Z. Zhou, J. H. Zhang, S. H. Zheng, M. F. Liu, Z. B. Yan, X. H. Zhou, S. Dong, J. G. Wan, J. -M. Liu

Abstract: The van der Waals (vdW) layered multiferroics, which host simultaneous ferroelectric and magnetic orders, have attracted attention not only for their potentials to be utilized in nanoelectric devices and spintronics, but also offer alternative opportunities for emergent physical phenomena. To date, the vdW layered multiferroic materials are still very rare. In this work, we have investigated the m… ▽ More The van der Waals (vdW) layered multiferroics, which host simultaneous ferroelectric and magnetic orders, have attracted attention not only for their potentials to be utilized in nanoelectric devices and spintronics, but also offer alternative opportunities for emergent physical phenomena. To date, the vdW layered multiferroic materials are still very rare. In this work, we have investigated the magnetic structure and magnetoelectric effects in Co5(TeO3)4Cl2, a promising new multiferroic compound with antiferromagnetic (AFM) Neel point TN = 18 K. The neutron powder diffraction reveals the non-coplanar AFM state with preferred Neel vector along the c-axis, while a spin re-orientation occurring between 8 K and 15 K is identified, which results from the distinct temperature dependence of the non-equivalent Co sites moment in Co5(TeO3)4Cl2. What is more, it is found that Co5(TeO3)4Cl2 is one of the best vdW multiferroics studied so far in terms of the multiferroic performance. The measured linear ME coefficient exhibits the emergent oscillation dependence of the angle between magnetic field and electric field, and the maximal value is as big as 45 ps/m. It is suggested that Co5(TeO3)4Cl2 is an appreciated platform for exploring the emergent multiferroicity in vdW layered compounds. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 31 pages, 9 figures

Journal ref: Phys. Rev. B 109, 184106(2024)

arXiv:2405.09066 [pdf, other]

Search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, J. Bloms, A. Bortone, I. Boyko , et al. (559 additional authors not shown)

Abstract: We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for… ▽ More We present the first search for the leptonic decays $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ by analyzing a data sample of electron-positron collisions recorded with the BESIII detector at center-of-mass energies between 4.178 and 4.226 GeV, corresponding to an integrated luminosity of 6.32~fb$^{-1}$. No significant signal is observed. The upper limits on the branching fractions for $D^{*+}\to e^+ν_e$ and $D^{*+}\to μ^+ν_μ$ are set to be $1.1 \times 10^{-5}$ and $4.3 \times 10^{-6}$ at 90\% confidence level, respectively. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 14 pages, 7 figures

arXiv:2405.08573 [pdf, other]

ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph

Authors: Shenji Zhu, Miaoxin Hu, Tianya Pan, Yue Hong, Bin Li, Zhiguang Zhou, Ting Xu

Abstract: Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a… ▽ More Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a visualization framework for tooth segmentation on dental panoramic radiograph. First, we employ Mask R-CNN to conduct preliminary tooth segmentation, and a set of domain metrics are proposed to estimate the accuracy of the segmented teeth, including tooth shape, tooth position and tooth angle. Then, we represent the teeth with high-dimensional vectors and visualize their distribution in a low-dimensional space, in which experts can easily observe those teeth with specific metrics. Further, we expand the sample set with the expert-specified teeth and train the tooth segmentation model iteratively. Finally, we conduct case study and expert study to demonstrate the effectiveness and usability of our ViSTooth, in aiding experts to implement accurate tooth segmentation guided by expert knowledge. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07977 [pdf, other]

A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds

Authors: Anton Orlichenko, Gang Qu, Ziyu Zhou, Anqi Liu, Hong-Wen Deng, Zhengming Ding, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu-** Wang

Abstract: Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized re… ▽ More Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2405.07741 [pdf, other]

Search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (635 additional authors not shown)

Abstract: Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions… ▽ More Using 9.0 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies from 4.178 to 4.278 GeV with the BESIII detector at the BEPCII collider, we perform the first search for the radiative transition $χ_{c1}(3872)\toγψ_2(3823)$. No $χ_{c1}(3872)\toγψ_2(3823)$ signal is observed. The upper limit on the ratio of branching fractions $\mathcal{B}(χ_{c1}(3872)\toγψ_2(3823), ψ_2(3823)\toγχ_{c1})/\mathcal{B}(χ_{c1}(3872)\toπ^+π^- J/ψ)$ is set as 0.075 at the 90\% confidence level. Our result contradicts theoretical predictions under the assumption that the $χ_{c1}(3872)$ is the pure charmonium state $χ_{c1}(2P)$. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures

arXiv:2405.07303 [pdf, other]

Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment

Authors: L. T. Yang, S. K. Liu, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China **** Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio… ▽ More We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China **** Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.07152 [pdf, other]

On the energy budget of starquake-induced repeating fast radio bursts

Authors: Wei-Yang Wang, Chen Zhang, En** Zhou, Xiaohui Liu, Jiarui Niu, Zixuan Zhou, He Gao, Jifeng Liu, Renxin Xu, Bing Zhang

Abstract: With a growing sample of fast radio bursts (FRBs), we investigate the energy budget of different power sources within the framework of magnetar starquake triggering mechanism. During a starquake, the energy can be released in any form through magnetic, strain, rotational, and gravitational energies. Following findings are revealed: 1. The crust can store a free magnetic energy of the amount of at… ▽ More With a growing sample of fast radio bursts (FRBs), we investigate the energy budget of different power sources within the framework of magnetar starquake triggering mechanism. During a starquake, the energy can be released in any form through magnetic, strain, rotational, and gravitational energies. Following findings are revealed: 1. The crust can store a free magnetic energy of the amount of at least $6.3\times10^{46}$ erg via toroidal fields, with frequent starquakes happening due to the instability of the crust. 2. The strain energy develops as a rigid object spins down, which can be released during a global starquake accompanied by a glitch. However, it takes a long time to accumulate enough strain energy via spin-down. 3. The rotational energy of a magnetar with $P\lesssim0.1\rm\,s$ can match the energy and luminosity budget of FRBs. 4. The budget of the total gravitational energy is high, but the mechanism and efficiency of converting this energy to radiation deserve further exploration. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 10 pages, 2 figures. Submitted. Some intriguing FAST's results are expected!

arXiv:2405.07027 [pdf, other]

TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

Authors: Zhen Tan, Zongtan Zhou, Yangbing Ge, Zi Wang, Xieyuanli Chen, Dewen Hu

Abstract: The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncate… ▽ More The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06871 [pdf, other]

Statistical Error of Numerical Integrators for Underdamped Langevin Dynamics with Deterministic And Stochastic Gradients

Authors: Xuda Ye, Zhennan Zhou

Abstract: We propose a novel discrete Poisson equation approach to estimate the statistical error of a broad class of numerical integrators for the underdamped Langevin dynamics. The statistical error refers to the mean square error of the estimator to the exact ensemble average with a finite number of iterations. With the proposed error analysis framework, we show that when the potential function $U(x)$ is… ▽ More We propose a novel discrete Poisson equation approach to estimate the statistical error of a broad class of numerical integrators for the underdamped Langevin dynamics. The statistical error refers to the mean square error of the estimator to the exact ensemble average with a finite number of iterations. With the proposed error analysis framework, we show that when the potential function $U(x)$ is strongly convex in $\mathbb R^d$ and the numerical integrator has strong order $p$, the statistical error is $O(h^{2p}+\frac1{Nh})$, where $h$ is the time step and $N$ is the number of iterations. Besides, this approach can be adopted to analyze integrators with stochastic gradients, and quantitative estimates can be derived as well. Our approach only requires the geometric ergodicity of the continuous-time underdamped Langevin dynamics, and relaxes the constraint on the time step. △ Less

Submitted 10 May, 2024; originally announced May 2024.

MSC Class: 60H35; 37M05

arXiv:2405.06393 [pdf, other]

Measurement of the ${e}^{+}{e}^{-}\to p \bar{p}π^{0}$ cross section at $\sqrt{s}=2.1000-3.0800$ GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the… ▽ More The process $e^{+}e^{-}\to p\bar{p}π^{0}$ is studied at 20 center-of-mass energies ranging from 2.1000 to 3.0800 GeV using 636.8 pb$^{-1}$ of data collected with the BESIII detector operating at the BEPCII collider. The Born cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$ are measured with high precision. Since the lowest center-of-mass energy, 2.1000 GeV, is less than 90 MeV above the $p\bar{p}π^0$ energy threshold, we can probe the threshold behavior for this reaction. However, no anomalous threshold enhancement is found in the cross sections for $e^{+}e^{-}\to p\bar{p}π^{0}$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05633 [pdf, other]

HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions

Authors: Jiabin Chen, Fei Xu, Yikun Gu, Li Chen, Fangming Liu, Zhi Zhou

Abstract: Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of application… ▽ More Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of applications is surprisingly low, which inevitably causes a long batching time and SLO violations. Hence, there is an urgent need for batching multiple DNN inference requests with diverse SLOs (i.e., multi-SLO DNN inference) in serverless platforms. Moreover, the potential performance and cost benefits of deploying heterogeneous (i.e., CPU and GPU) functions for DNN inference have received scant attention. In this paper, we present HarmonyBatch, a cost-efficient resource provisioning framework designed to achieve predictable performance for multi-SLO DNN inference with heterogeneous serverless functions. Specifically, we construct an analytical performance and cost model of DNN inference on both CPU and GPU functions, by explicitly considering the GPU time-slicing scheduling mechanism and request arrival rate distribution. Based on such a model, we devise a two-stage merging strategy in HarmonyBatch to judiciously batch the multi-SLO DNN inference requests into application groups. It aims to minimize the budget of function provisioning for each application group while guaranteeing diverse performance SLOs of inference applications. We have implemented a prototype of HarmonyBatch on Alibaba Cloud Function Compute. Extensive prototype experiments with representative DNN inference workloads demonstrate that HarmonyBatch can provide predictable performance to serverless DNN inference workloads while reducing the monetary cost by up to 82.9% compared to the state-of-the-art methods. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 10 pages, 14 figures, accepted by IWQOS24

arXiv:2405.04289 [pdf, ps, other]

Direct Training High-Performance Deep Spiking Neural Networks: A Review of Theories and Methods

Authors: Chenlin Zhou, Han Zhang, Liutao Yu, Yumin Ye, Zhaokun Zhou, Liwei Huang, Zhengyu Ma, Xiaopeng Fan, Huihui Zhou, Yonghong Tian

Abstract: Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks (ANNs), in virtue of their high biological plausibility, rich spatial-temporal dynamics, and event-driven computation. The direct training algorithms based on the surrogate gradient method provide sufficient flexibility to design novel SNN architectures and explore the spatial-temporal dynam… ▽ More Spiking neural networks (SNNs) offer a promising energy-efficient alternative to artificial neural networks (ANNs), in virtue of their high biological plausibility, rich spatial-temporal dynamics, and event-driven computation. The direct training algorithms based on the surrogate gradient method provide sufficient flexibility to design novel SNN architectures and explore the spatial-temporal dynamics of SNNs. According to previous studies, the performance of models is highly dependent on their sizes. Recently, direct training deep SNNs have achieved great progress on both neuromorphic datasets and large-scale static datasets. Notably, transformer-based SNNs show comparable performance with their ANN counterparts. In this paper, we provide a new perspective to summarize the theories and methods for training deep SNNs with high performance in a systematic and comprehensive way, including theory fundamentals, spiking neuron models, advanced SNN models and residual architectures, software frameworks and neuromorphic hardware, applications, and future trends. The reviewed papers are collected at https://github.com/zhouchenlin2096/Awesome-Spiking-Neural-Networks △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 29 pages

arXiv:2405.04135 [pdf, other]

In-context Learning for Automated Driving Scenarios

Authors: Ziqi Zhou, **gyue Zhang, **gyuan Zhang, Boyue Wang, Tianyu Shi, Alaa Khamis

Abstract: One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic e… ▽ More One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic environment descriptions are input into the LLM. The LLM then utilizes this information to assist in generating rewards, thereby steering the behavior of RL agents towards patterns that more closely resemble human driving. The experimental results demonstrate that this approach not only makes RL agents more anthropomorphic but also reaches better performance. Additionally, various strategies for reward-proxy and reward-sha** are investigated, revealing the significant impact of prompt design on sha** an AD vehicle's behavior. These findings offer a promising direction for the development of more advanced and human-like automated driving systems. Our experimental data and source code can be found here. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages, 6 figures, 35 references

arXiv:2405.03857 [pdf, other]

The MOST Hosts Survey: spectroscopic observation of the host galaxies of ~40,000 transients using DESI

Authors: Maayane T. Soumagnac, Peter Nugent, Robert A. Knop, Anna Y. Q. Ho, William Hohensee, Autumn Awbrey, Alexis Andersen, Greg Aldering, Matan Ventura, Jessica N. Aguilar, Steven Ahlen, Segev Y. Benzvi, David Brooks, Dillon Brout, Todd Claybaugh, Tamara M. Davis, Kyle Dawson, Axel de la Macorra, Arjun Dey, Biprateep Dey, Peter Doel, Kelly A. Douglass, Jaime E. Forero-Romero, Enrique Gaztanaga, Satya Gontcho A Gontcho , et al. (32 additional authors not shown)

Abstract: We present the MOST Hosts survey (Multi-Object Spectroscopy of Transient Hosts). The survey is planned to run throughout the five years of operation of the Dark Energy Spectroscopic Instrument (DESI) and will generate a spectroscopic catalog of the hosts of most transients observed to date, in particular all the supernovae observed by most public, untargeted, wide-field, optical surveys (PTF/iPTF,… ▽ More We present the MOST Hosts survey (Multi-Object Spectroscopy of Transient Hosts). The survey is planned to run throughout the five years of operation of the Dark Energy Spectroscopic Instrument (DESI) and will generate a spectroscopic catalog of the hosts of most transients observed to date, in particular all the supernovae observed by most public, untargeted, wide-field, optical surveys (PTF/iPTF, SDSS II, ZTF, DECAT, DESIRT). Scientific questions for which the MOST Hosts survey will be useful include Type Ia supernova cosmology, fundamental plane and peculiar velocity measurements, and the understanding of the correlations between transients and their host galaxy properties. Here, we present the first release of the MOST Hosts survey: 21,931 hosts of 20,235 transients. These numbers represent 36% of the final MOST Hosts sample, consisting of 60,212 potential host galaxies of 38,603 transients (a transient can be assigned multiple potential hosts). Of these galaxies, 40% do not appear in the DESI primary target list and therefore require a specific program like MOST Hosts. Of all the transients in the MOST Hosts list, only 26.7% have existing classifications, and so the survey will provide redshifts (and luminosities) for nearly 30,000 transients. A preliminary Hubble diagram and a transient luminosity-duration diagram are shown as examples of future potential uses of the MOST Hosts survey. The survey will also provide a training sample of spectroscopically observed transients for photometry-only classifiers, as we enter an era when most newly observed transients will lack spectroscopic classification. The MOST Hosts DESI survey data will be released through the Wiserep platform on a rolling cadence and updated to match the DESI releases. Dates of future releases and updates are available through the https://mosthosts.desi.lbl.gov website. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Submitted to ApJS

arXiv:2405.03488 [pdf, other]

Accurate and Fast Approximate Graph Pattern Mining at Scale

Authors: Anna Arpaci-Dusseau, Zixiang Zhou, Xuhao Chen

Abstract: Approximate graph pattern mining (A-GPM) is an important data analysis tool for many graph-based applications. There exist sampling-based A-GPM systems to provide automation and generalization over a wide variety of use cases. However, there are two major obstacles that prevent existing A-GPM systems being adopted in practice. First, the termination mechanism that decides when to end sampling lack… ▽ More Approximate graph pattern mining (A-GPM) is an important data analysis tool for many graph-based applications. There exist sampling-based A-GPM systems to provide automation and generalization over a wide variety of use cases. However, there are two major obstacles that prevent existing A-GPM systems being adopted in practice. First, the termination mechanism that decides when to end sampling lacks theoretical backup on confidence, and is unstable and slow in practice. Second, they suffer poor performance when dealing with the "needle-in-the-hay" cases, because a huge number of samples are required to converge, given the extremely low hit rate of their fixed sampling schemes. We build ScaleGPM, an accurate and fast A-GPM system that removes the two obstacles. First, we propose a novel on-the-fly convergence detection mechanism to achieve stable termination and provide theoretical guarantee on the confidence, with negligible overhead. Second, we propose two techniques to deal with the "needle-in-the-hay" problem, eager-verify and hybrid sampling. Our eager-verify method improves sampling hit rate by pruning unpromising candidates as early as possible. Hybrid sampling improves performance by automatically choosing the better scheme between fine-grained and coarse-grained sampling schemes. Experiments show that our online convergence detection mechanism can detect convergence and results in stable and rapid termination with theoretically guaranteed confidence. We show the effectiveness of eager-verify in improving the hit rate, and the scheme-selection mechanism in correctly choosing the better scheme for various cases. Overall, ScaleGPM achieves a geomean average of 565x (up to 610169x) speedup over the state-of-the-art A-GPM system, Arya. In particular, ScaleGPM handles billion-scale graphs in seconds, where existing systems either run out of memory or fail to complete in hours. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 15 pages, 12 figures

arXiv:2405.03217 [pdf, other]

PCG: Mitigating Conflict-based Cache Side-channel Attacks with Prefetching

Authors: Fang Jiang, Fei Tong, Hongyu Wang, Xiaoyu Cheng, Zhe Zhou, Ming Ling, Yuxing Mao

Abstract: To defend against conflict-based cache side-channel attacks, cache partitioning or remap** techniques were proposed to prevent set conflicts between different security domains or obfuscate the locations of such conflicts. But such techniques complicate cache design and may result in significant performance penalties. Therefore, there have been lightweight prefetching-based schemes proposed to in… ▽ More To defend against conflict-based cache side-channel attacks, cache partitioning or remap** techniques were proposed to prevent set conflicts between different security domains or obfuscate the locations of such conflicts. But such techniques complicate cache design and may result in significant performance penalties. Therefore, there have been lightweight prefetching-based schemes proposed to introduce noise to confuse attackers' observation. However, we have validated experimentally that relying on prefetching to only introduce noise is insufficient, as attackers can still reliably distinguish the victim's cache accesses. This paper proposes a novel prefetching-based scheme, called PCG. It combines adding victim-irrelevant cache occupancy changes and reducing victim-relevant cache occupancy changes to disrupt attackers by generating noisy and indistinguishable cache access patterns. Additionally, PCG can either work independently or seamlessly be integrated with most of the commonly used prefetchers. We have implemented and evaluated PCG in both gem5 and the open-source RISC-V core BOOMv3. The evaluation results show the PCG's robust security superior to the existing solutions, while without resulting in significant performance degradation. According to the evaluation based on the SPEC CPU 2017 benchmark suite, PCG even shows an average performance improvement of about 1.64%. Moreover, it incurs only 1.26% overhead on hardware resource consumption. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 12 pages, 9 figures, submitting to a journal

arXiv:2405.03117 [pdf, other]

Galaxies with Biconical Ionized Structure in MaNGA - I. Sample Selection and Driven Mechanisms

Authors: Zhi-Jie Zhou, Yan-Mei Chen, Run-Quan Guan, Yong Shi, Qiu-Sheng Gu, Dmitry Bizyaev

Abstract: Based on the integral field unit (IFU) data from Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey, we develop a new method to select galaxies with biconical ionized structures, building a sample of 142 edge-on biconical ionized galaxies. We classify these 142 galaxies into 81 star-forming galaxies, 31 composite galaxies, and 30 AGNs (consisting of 23 Seyferts and 7 LI(N)ERs) acco… ▽ More Based on the integral field unit (IFU) data from Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey, we develop a new method to select galaxies with biconical ionized structures, building a sample of 142 edge-on biconical ionized galaxies. We classify these 142 galaxies into 81 star-forming galaxies, 31 composite galaxies, and 30 AGNs (consisting of 23 Seyferts and 7 LI(N)ERs) according to the {\nii}-BPT diagram. The star-forming bicones have bar-like structures while AGN bicones display hourglass structures, and composite bicones exhibit transitional morphologies between them due to both black hole and star-formation activities. Star-forming bicones have intense star-formation activities in their central regions, and the primary driver of biconical structures is the central star formation rate surface density. The lack of difference in the strength of central black hole activities (traced by dust attenuation corrected {\oiii}$λ$5007 luminosity and Eddington ratio) between Seyfert bicones and their control samples can be naturally explained as that the accretion disk and the galactic disk are not necessarily coplanar. Additionally, the biconical galaxies with central LI(N)ER-like line ratios are edge-on disk galaxies that show strong central dust attenuation. The radial gradients of {\ha} surface brightness follow the $r^{-2.35}$ relation, roughly consistent with $r^{-2}$ profile, which is expected in the case of photoionization by a central point-like source. These observations indicate obscured AGNs or AGN echoes as the primary drivers of biconical structures in LI(N)ERs. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 12 pages, 9 figures, 1 table, Accepted for publication in MNRAS

arXiv:2405.01927 [pdf, other]

SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network

Authors: Ziang Zhou, Jieming Shi, Renchi Yang, Yuanhang Zou, Qing Li

Abstract: Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though… ▽ More Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Published as a conference paper at ICML 2023

arXiv:2405.01851 [pdf, other]

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been e… ▽ More There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.00718 [pdf, other]

Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models

Authors: Xu Ji, Jianyi Zhang, Ziyin Zhou, Zhangchi Zhao, Qianqian Qiao, Kaiying Han, Md Imran Hossen, Xiali Hei

Abstract: Ensuring the resilience of Large Language Models (LLMs) against malicious exploitation is paramount, with recent focus on mitigating offensive responses. Yet, the understanding of cant or dark jargon remains unexplored. This paper introduces a domain-specific Cant dataset and CantCounter evaluation framework, employing Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis stages. Experiments r… ▽ More Ensuring the resilience of Large Language Models (LLMs) against malicious exploitation is paramount, with recent focus on mitigating offensive responses. Yet, the understanding of cant or dark jargon remains unexplored. This paper introduces a domain-specific Cant dataset and CantCounter evaluation framework, employing Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis stages. Experiments reveal LLMs, including ChatGPT, are susceptible to cant bypassing filters, with varying recognition accuracy influenced by question types, setups, and prompt clues. Updated models exhibit higher acceptance rates for cant queries. Moreover, LLM reactions differ across domains, e.g., reluctance to engage in racism versus LGBT topics. These findings underscore LLMs' understanding of cant and reflect training data characteristics and vendor approaches to sensitive topics. Additionally, we assess LLMs' ability to demonstrate reasoning capabilities. Access to our datasets and code is available at https://github.com/cistineup/CantCounter. △ Less

Submitted 25 April, 2024; originally announced May 2024.

arXiv:2405.00244 [pdf, other]

Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

Authors: Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou

Abstract: As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruc… ▽ More As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruction, we present Real-HDRV, a large-scale real-world benchmark dataset for HDR video reconstruction, featuring various scenes, diverse motion patterns, and high-quality labels. Specifically, our dataset contains 500 LDRs-HDRs video pairs, comprising about 28,000 LDR frames and 4,000 HDR labels, covering daytime, nighttime, indoor, and outdoor scenes. To our best knowledge, our dataset is the largest real-world HDR video reconstruction dataset. Correspondingly, we propose an end-to-end network for HDR video reconstruction, where a novel two-stage strategy is designed to perform alignment sequentially. Specifically, the first stage performs global alignment with the adaptively estimated global offsets, reducing the difficulty of subsequent alignment. The second stage implicitly performs local alignment in a coarse-to-fine manner at the feature level using the adaptive separable convolution. Extensive experiments demonstrate that: (1) models trained on our dataset can achieve better performance on real scenes than those trained on synthetic datasets; (2) our method outperforms previous state-of-the-art methods. Our dataset is available at https://github.com/yungsyu99/Real-HDRV. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: This paper has been accepted by CVPR 2024

arXiv:2404.19525 [pdf, other]

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

Authors: Luxi Chen, Zhengyi Wang, Zihan Zhou, Tingting Gao, Hang Su, Jun Zhu, Chongxuan Li

Abstract: Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to r… ▽ More Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs. Given a single set of images sampled from a multi-view score-based diffusion model, SIR repeatedly optimizes 3D parameters, unlike the single-step optimization in SDS. With other improvements in training, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splatting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian. Our code is available at \url{https://github.com/ML-GSAI/MicroDreamer}. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19188 [pdf, other]

Maximum bound principle and original energy dissipation of arbitrarily high-order rescaled exponential time differencing Runge-Kutta schemes for Allen--Cahn equations

Authors: Chaoyu Quan, Xiaoming Wang, Pinzhong Zheng, Zhi Zhou

Abstract: The energy dissipation law and the maximum bound principle are two critical physical properties of the Allen--Cahn equations. While many existing time-step** methods are known to preserve the energy dissipation law, most apply to a modified form of energy. In this work, we demonstrate that, when the nonlinear term of the Allen--Cahn equation is Lipschitz continuous, a class of arbitrarily high-o… ▽ More The energy dissipation law and the maximum bound principle are two critical physical properties of the Allen--Cahn equations. While many existing time-step** methods are known to preserve the energy dissipation law, most apply to a modified form of energy. In this work, we demonstrate that, when the nonlinear term of the Allen--Cahn equation is Lipschitz continuous, a class of arbitrarily high-order exponential time differencing Runge--Kutta (ETDRK) schemes preserve the original energy dissipation property, under a mild step-size constraint. Additionally, we guarantee the Lipschitz condition on the nonlinear term by applying a rescaling post-processing technique, which ensures that the numerical solution unconditionally satisfies the maximum bound principle. Consequently, our proposed schemes maintain both the original energy dissipation law and the maximum bound principle and can achieve arbitrarily high-order accuracy. We also establish an optimal error estimate for the proposed schemes. Some numerical experiments are carried out to verify our theoretical results. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18463 [pdf, other]

Efficient bound preserving and asymptotic preserving semi-implicit schemes for the fast reaction-diffusion system

Authors: Yu Zhao, Zhennan Zhou

Abstract: We consider a special type of fast reaction-diffusion systems in which the coefficients of the reaction terms of the two substances are much larger than those of the diffusion terms while the diffusive motion to the substrate is negligible. Specifically speaking, the rate constants of the reaction terms are $O(1/ε)$ while the diffusion coefficients are $O(1)$ where the parameter $ε$ is small. When… ▽ More We consider a special type of fast reaction-diffusion systems in which the coefficients of the reaction terms of the two substances are much larger than those of the diffusion terms while the diffusive motion to the substrate is negligible. Specifically speaking, the rate constants of the reaction terms are $O(1/ε)$ while the diffusion coefficients are $O(1)$ where the parameter $ε$ is small. When the rate constants of the reaction terms become highly large, i.e. $ε$ tends to 0, the singular limit behavior of such a fast reaction-diffusion system is inscribed by the Stefan problem with latent heat, which brings great challenges in numerical simulations. In this paper, we adopt a semi-implicit scheme, which is first-order accurate in time and can accurately approximate the interface propagation even when the reaction becomes extremely fast, that is to say, the parameter $ε$ is sufficiently small. The scheme satisfies the positivity, bound preserving properties and has $L^2$ stability and the linearized stability results of the system. For better performance on numerical simulations, we then construct a semi-implicit Runge-Kutta scheme which is second-order accurate in time. Numerous numerical tests are carried out to demonstrate the properties, such as the order of accuracy, positivity and bound preserving, the capturing of the sharp interface with various $ε$ and to simulate the dynamics of the substances and the substrate, and to explore the heat transfer process, such as solid melting or liquid solidification in two dimensions. △ Less

Submitted 29 April, 2024; originally announced April 2024.

MSC Class: 35Q92; 65M06; 65M12; 92E20

arXiv:2404.18033 [pdf, other]

Exposing Text-Image Inconsistency Using Diffusion Models

Authors: Mingzhen Huang, Shan Jia, Zhou Zhou, Yan Ju, Jialing Cai, Siwei Lyu

Abstract: In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more… ▽ More In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more nuanced, human evaluation is impractical at scale and susceptible to errors. To address these limitations, this study introduces D-TIIL (Diffusion-based Text-Image Inconsistency Localization), which employs text-to-image diffusion models to localize semantic inconsistencies in text and image pairs. These models, trained on large-scale datasets act as ``omniscient" agents that filter out irrelevant information and incorporate background knowledge to identify inconsistencies. In addition, D-TIIL uses text embeddings and modified image regions to visualize these inconsistencies. To evaluate D-TIIL's efficacy, we introduce a new TIIL dataset containing 14K consistent and inconsistent text-image pairs. Unlike existing datasets, TIIL enables assessment at the level of individual words and image regions and is carefully designed to represent various inconsistencies. D-TIIL offers a scalable and evidence-based approach to identifying and localizing text-image inconsistency, providing a robust framework for future research combating misinformation. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17133 [pdf, other]

Converging TDDFT calculations in 5 iterations with minimal auxiliary preconditioning

Authors: Zehao Zhou, Shane M. Parker

Abstract: Eigenvalue problems and linear systems of equations involving large symmetric matrices are commonly solved in quantum chemistry using Krylov space methods, such as the Davidson algorithm. The preconditioner is a key component of Krylov space methods that accelerates convergence by improving the quality of new guesses at each iteration. We systematically design a new preconditioner for time-depende… ▽ More Eigenvalue problems and linear systems of equations involving large symmetric matrices are commonly solved in quantum chemistry using Krylov space methods, such as the Davidson algorithm. The preconditioner is a key component of Krylov space methods that accelerates convergence by improving the quality of new guesses at each iteration. We systematically design a new preconditioner for time-dependent density functional theory (TDDFT) calculations based on the recently introduced TDDFT-ris semiempirical model by re-tuning the empirical scaling factor and the angular momenta of a minimal auxiliary basis. The final preconditioner produced includes up to $d$-functions in the auxiliary basis and is named "rid". The rid preconditioner converges excitation energies and polarizabilities in 5-6 iterations on average, a factor of 2-3 faster than the conventional diagonal preconditioner, without changing the converged results. Thus, the rid preconditioner is a broadly applicable and efficient preconditioner for TDDFT calculations. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures

arXiv:2404.16947 [pdf, other]

Fuzzing MLIR by Synthesizing Custom Mutations

Authors: Ben Limpanukorn, Jiyuan Wang, Hong ** Kang, Eric Zitong Zhou, Miryung Kim

Abstract: Multi-Level Intermediate Representation (MLIR) is an effort to enable faster compiler development by providing an extensible framework for downstream developers to define custom IRs with MLIR dialects. MLIR dialects define new IRs that are tailored for specific domains. The diversity and rapid evolution of these IRs make it impractical to pre-define custom generator logic for every available diale… ▽ More Multi-Level Intermediate Representation (MLIR) is an effort to enable faster compiler development by providing an extensible framework for downstream developers to define custom IRs with MLIR dialects. MLIR dialects define new IRs that are tailored for specific domains. The diversity and rapid evolution of these IRs make it impractical to pre-define custom generator logic for every available dialect. We design a new approach called SynthFuzz that automatically infers and applies custom mutations from existing tests. Inferred custom mutations are parameterized and context-dependent such that they can be concretized depending on the target context. By doing this, we obviate the need to manually write custom mutations for newly introduced MLIR dialects. Further, SynthFuzz increases the chance of finding effective edit locations and reduces the chance of inserting invalid edit content by performing k-ancestor-prefix and l-sibling-postfix matching. We compare SynthFuzz to three baselines: Grammarinator -- a grammar-based fuzzer without custom mutators, MLIRSmith -- a custom test generator for MLIR, and NeuRI -- a custom test generator with support for parameterized generation. We conduct this comparison on 4 different MLIR projects where each project defines a new set of MLIR dialects that would take months of effort to manually write custom input generation and mutation logic. We show that SynthFuzz on average improves input diversity by 1.51$\times$, which increases branch coverage by 1.16$\times$. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11$\times$, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Mutation parameterization reduces the fraction of tests violating general MLIR constraints by 0.57$\times$, increasing the time spent fuzzing dialect-specific code. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16458 [pdf, ps, other]

On an infinite commuting ODE system associated to a simple Lie algebra

Authors: Di Yang, Cheng Zhang, Zejun Zhou

Abstract: Inspired by a recent work of Dubrovin [7], for each simple Lie algebra $\mathfrak{g}$, we introduce an infinite family of pairwise commuting ODEs and define their $τ$-functions. We show that these $τ$-functions can be identified with the $τ$-functions for the Drinfeld--Sokolov hierarchy of $\mathfrak{g}$-type. Explicit examples for $\mathfrak{g}=A_1$ and $A_2$ are provided, which are connected to… ▽ More Inspired by a recent work of Dubrovin [7], for each simple Lie algebra $\mathfrak{g}$, we introduce an infinite family of pairwise commuting ODEs and define their $τ$-functions. We show that these $τ$-functions can be identified with the $τ$-functions for the Drinfeld--Sokolov hierarchy of $\mathfrak{g}$-type. Explicit examples for $\mathfrak{g}=A_1$ and $A_2$ are provided, which are connected to the KdV hierarchy and the Boussinesq hierarchy respectively. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16353 [pdf, other]

Rigorous derivation of a Hele-Shaw type model and its non-symmetric traveling wave solution

Authors: Yu Feng, Qingyou He, Jian-Guo Liu, Zhennan Zhou

Abstract: In this paper, we consider a Hele-Shaw model that describes tumor growth subject to nutrient supply. This model was recently studied in \cite{feng2022tumor} via asymptotic analysis. Our contributions are twofold: Firstly, we provide a rigorous derivation of this Hele-Shaw model by taking the incompressible limit of the porous medium reaction-diffusion equation, which solidifies the mathematical fo… ▽ More In this paper, we consider a Hele-Shaw model that describes tumor growth subject to nutrient supply. This model was recently studied in \cite{feng2022tumor} via asymptotic analysis. Our contributions are twofold: Firstly, we provide a rigorous derivation of this Hele-Shaw model by taking the incompressible limit of the porous medium reaction-diffusion equation, which solidifies the mathematical foundations of the model. Secondly, from a bifurcation theory perspective, we prove the existence of non-symmetric traveling wave solutions to the model, which reflect the intrinsic boundary instability in tumor growth dynamics. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 23 pages, 2 figures

MSC Class: 35R35; 76D27; 92C10; 70K50

arXiv:2404.16311 [pdf, other]

Bourgeois' contact manifolds are tight

Authors: Russell Avdek, Zhengyi Zhou

Abstract: We prove that Bourgeois' contact structures on $M \times \mathbb{T}^{2}$ determined by the supporting open books of a contact manifold $(M, ξ)$ are always tight. The proof is based on a contact homology computation leveraging holomorphic foliations and Kuranishi structures. We prove that Bourgeois' contact structures on $M \times \mathbb{T}^{2}$ determined by the supporting open books of a contact manifold $(M, ξ)$ are always tight. The proof is based on a contact homology computation leveraging holomorphic foliations and Kuranishi structures. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 33 pages. Comments welcome

arXiv:2404.14941 [pdf, other]

Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks

Authors: Zhe Zhao, Pengkun Wang, Xu Wang, Haibin Wen, Xiaolong Xie, Zhengyang Zhou, Qingfu Zhang, Yang Wang

Abstract: Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that… ▽ More Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14918 [pdf, ps, other]

Existence of weak solutions for a class of non-divergent parabolic equations with variable exponent

Authors: **gfeng Shao, Zhichang Guo, Zhongxiang Zhou

Abstract: A doubly degenerate parabolic equation in non-divergent form with variable growth is investigated in this paper. In suitable spaces, we prove the existence of weak solutions of the equation for cases $1\leq m < 2$ and $m\geq 2$ in different ways. And we establish the non-expansion of support of the solution for the problem. A doubly degenerate parabolic equation in non-divergent form with variable growth is investigated in this paper. In suitable spaces, we prove the existence of weak solutions of the equation for cases $1\leq m < 2$ and $m\geq 2$ in different ways. And we establish the non-expansion of support of the solution for the problem. △ Less

Submitted 23 April, 2024; originally announced April 2024.

MSC Class: 35D30; 35K59

arXiv:2404.14850 [pdf, other]

Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models

Authors: Yang Tan, Mingchen Li, Bingxin Zhou, Bozitao Zhong, Lirong Zheng, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, Liang Hong

Abstract: Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfe… ▽ More Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is non-trivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark datasets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated training speed by a maximum of 1034% and an average of 362%, the convergence rate is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 30 pages, 4 figures, 8 tables

arXiv:2404.14641 [pdf, other]

Bring the Heat: Tidal Heating Constraints for Black Holes and Exotic Compact Objects from the LIGO-Virgo-KAGRA Data

Authors: Horng Sheng Chia, Zihan Zhou, Mikhail M. Ivanov

Abstract: We present the first constraints on tidal heating for the binary systems detected in the LIGO-Virgo-KAGRA (LVK) gravitational wave data. Tidal heating, also known as tidal dissipation, characterizes the viscous nature of an astrophysical body and provides a channel for exchanging energy and angular momentum with the tidal environment. Using the worldline effective field theory formalism, we introd… ▽ More We present the first constraints on tidal heating for the binary systems detected in the LIGO-Virgo-KAGRA (LVK) gravitational wave data. Tidal heating, also known as tidal dissipation, characterizes the viscous nature of an astrophysical body and provides a channel for exchanging energy and angular momentum with the tidal environment. Using the worldline effective field theory formalism, we introduce a physically motivated and easily interpretable parametrization of tidal heating valid for an arbitrary compact astrophysical object. We then derive the imprints of the spin-independent and linear-in-spin tidal heating effects of generic binary components on the waveform phases and amplitudes of quasi-circular orbits. Notably, the mass-weighted spin-independent tidal heating coefficient derived in this work, $\mathcal{H}_0$, is the dissipative analog of the tidal Love number. We constrain the tidal heating coefficients using the public LVK O1-O3 data. Our parameter estimation study includes two separate analyses: the first treats the catalog of binary events as binary black holes (BBH), while the second makes no assumption about the nature of the binary constituents and can therefore be interpreted as constraints for exotic compact objects. In the former case, we combine the posterior distributions of the individual BBH events and obtain a joint constraint of $-13 < \mathcal{H}_0 < 20$ at the $90\%$ credible interval for the BBH population. This translates into a bound on the fraction of the emitted gravitational wave energy lost due to tidal heating (or gained due to radiation enhancement effects) at $|ΔE_H/ΔE_{\infty}|\lesssim 3\cdot 10^{-3}$. Our work provides the first robust framework for deriving and measuring tidal heating effects in merging binary systems, demonstrating its potential as a powerful probe of the nature of binary constituents and tests of new physics. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 30+18 pages, 7 figures

Report number: MIT-CTP/5710

arXiv:2404.14294 [pdf, other]

A Survey on Efficient Inference for Large Language Models

Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-** Zhang, Yuhan Dong, Yu Wang

Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards develo** techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards develo** techniques aimed at enhancing the efficiency of LLM inference. This paper presents a comprehensive survey of the existing literature on efficient LLM inference. We start by analyzing the primary causes of the inefficient LLM inference, i.e., the large model size, the quadratic-complexity attention operation, and the auto-regressive decoding approach. Then, we introduce a comprehensive taxonomy that organizes the current literature into data-level, model-level, and system-level optimization. Moreover, the paper includes comparative experiments on representative methods within critical sub-fields to provide quantitative insights. Last but not least, we provide some knowledge summary and discuss future research directions. △ Less

Submitted 8 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14073 [pdf, other]

Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning

Authors: Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang, Zhengyang Zhou, Sijie Ruan, Yuxuan Liang

Abstract: Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to de… ▽ More Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to decipher the trajectory representation learning process from a causal perspective. Building upon the SCM, we further present a Trajectory modeling framework (TrajCL) based on Causal Learning, which leverages the backdoor adjustment theory as an intervention tool to eliminate the spurious correlations between geospatial context and trajectories. Extensive experiments on two real-world datasets verify that TrajCL markedly enhances performance in trajectory classification tasks while showcasing superior generalization and interpretability. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: The paper has been accepted by IJCAI 2024

arXiv:2404.13840 [pdf, other]

Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

arXiv:2404.13733 [pdf, other]

Elucidating the Design Space of Dataset Condensation

Authors: Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen

Abstract: Dataset condensation, a concept within data-centric learning, efficiently transfers critical attributes from an original dataset to a synthetic version, maintaining both diversity and realism. This approach significantly improves model training efficiency and is adaptable across multiple application areas. Previous methods in dataset condensation have faced challenges: some incur high computationa… ▽ More Dataset condensation, a concept within data-centric learning, efficiently transfers critical attributes from an original dataset to a synthetic version, maintaining both diversity and realism. This approach significantly improves model training efficiency and is adaptable across multiple application areas. Previous methods in dataset condensation have faced challenges: some incur high computational costs which limit scalability to larger datasets (e.g., MTT, DREAM, and TESLA), while others are restricted to less optimal design spaces, which could hinder potential improvements, especially in smaller datasets (e.g., SRe2L, G-VBSM, and RDED). To address these limitations, we propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching and adjusting the learning rate schedule. These strategies are grounded in empirical evidence and theoretical backing. Our resulting approach, Elucidate Dataset Condensation (EDC), establishes a benchmark for both small and large-scale dataset condensation. In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%. This performance exceeds those of SRe2L, G-VBSM, and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively. △ Less

Submitted 6 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13703 [pdf, ps, other]

Classical solutions of a mean field system for pulse-coupled oscillators: long time asymptotics versus blowup

Authors: José Antonio Carrillo, Xu'an Dou, Pierre Roux, Zhennan Zhou

Abstract: We introduce a novel reformulation of the mean-field system for pulse-coupled oscillators. It is based on writing a closed equation for the inverse distribution function associated to the probability density of oscillators with a given phase in a suitable time scale. This new framework allows to show a hidden contraction/expansion of certain distances leading to a full clarification of the long-ti… ▽ More We introduce a novel reformulation of the mean-field system for pulse-coupled oscillators. It is based on writing a closed equation for the inverse distribution function associated to the probability density of oscillators with a given phase in a suitable time scale. This new framework allows to show a hidden contraction/expansion of certain distances leading to a full clarification of the long-time behavior, existence of steady states, rates of convergence, and finite time blow-up of classical solutions for a large class of monotone phase response functions. In the process, we get insights about the origin of obstructions to global-in-time existence and uniform in time estimates on the firing rate of the oscillators. △ Less

Submitted 21 April, 2024; originally announced April 2024.

MSC Class: 35Q92; 35B40; 35B44; 34C15

arXiv:2404.13534 [pdf, other]

Motion-aware Latent Diffusion Models for Video Frame Interpolation

Authors: Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang

Abstract: With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecut… ▽ More With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark datasets demonstrate that our method achieves state-of-the-art performance significantly outperforming existing approaches, especially under challenging scenarios involving dynamic textures with complex motion. △ Less

Submitted 4 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 17 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2303.09508 by other authors

arXiv:2404.12605 [pdf, other]

GluMarker: A Novel Predictive Modeling of Glycemic Control Through Digital Biomarkers

Authors: Ziyi Zhou, Ming Cheng, Xingjian Diao, Yanjun Cui, Xiangling Li

Abstract: The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited… ▽ More The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited attention given to overall glycemic control. This leaves a gap in expanding the scope of digital biomarkers for overall glycemic control in diabetes management. To address such a research gap, we propose GluMarker -- an end-to-end framework for modeling digital biomarkers using broader factors sources to predict glycemic control. Through the assessment and refinement of various machine learning baselines, GluMarker achieves state-of-the-art on Anderson's dataset in predicting next-day glycemic control. Moreover, our research identifies key digital biomarkers for the next day's glycemic control prediction. These identified biomarkers are instrumental in illuminating the daily factors that influence glycemic management, offering vital insights for diabetes care. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12400 [pdf, other]

Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of a multi-scale k-nearest neighbors (KNN) algorithm with feature fusion for graph construction, marking a leap in dimensionality reduction techniques by preserving essential data features. Moreover, the groundbreaking graph construction mechanism and the high-performance lightweight GCN increase embedding extraction speed by up to 36 times faster. We further offer Efflex in two versions, Efflex-L for scenarios demanding high accuracy, and Efflex-B for environments requiring swift data processing. Comprehensive experimentation with the Porto and Geolife datasets validates our approach, positioning Efflex as the state-of-the-art in the domain. Such enhancements in speed and accuracy highlight the versatility of Efflex, underscoring its wide-ranging potential for deployment in time-sensitive and computationally constrained applications. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.11924 [pdf, other]

Toward Short-Term Glucose Prediction Solely Based on CGM Time Series

Authors: Ming Cheng, Xingjian Diao, Ziyi Zhou, Yanjun Cui, Wenjun Liu, Shitong Cheng

Abstract: The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to… ▽ More The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to respond to immediate glucose level changes cannot analyze glucose variability comprehensively. Moreover, contemporary research generally integrates various physiological parameters (e.g. insulin doses, food intake, etc.), which inevitably raises data privacy concerns. To bridge such a research gap, we propose TimeGlu -- an end-to-end pipeline for short-term glucose prediction solely based on CGM time series data. We implement four baseline methods to conduct a comprehensive comparative analysis of the model's performance. Through extensive experiments on two contrasting datasets (CGM Glucose and Colas dataset), TimeGlu achieves state-of-the-art performance without the need for additional personal data from patients, providing effective guidance for real-world diabetic glucose management. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11357 [pdf, other]

Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness

Authors: Hangtao Zhang, Shengshan Hu, Yichen Wang, Leo Yu Zhang, Ziqi Zhou, Xianlong Wang, Yanjun Zhang, Chao Chen

Abstract: Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor techniques have primarily been adapted from classification tasks, overlooking deeper vulnerabilities specific to object detection. This paper is dedicated to bridging this gap by introd… ▽ More Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor techniques have primarily been adapted from classification tasks, overlooking deeper vulnerabilities specific to object detection. This paper is dedicated to bridging this gap by introducing Detector Collapse} (DC), a brand-new backdoor attack paradigm tailored for object detection. DC is designed to instantly incapacitate detectors (i.e., severely impairing detector's performance and culminating in a denial-of-service). To this end, we develop two innovative attack schemes: Sponge for triggering widespread misidentifications and Blinding for rendering objects invisible. Remarkably, we introduce a novel poisoning strategy exploiting natural objects, enabling DC to act as a practical backdoor in real-world environments. Our experiments on different detectors across several benchmarks show a significant improvement ($\sim$10\%-60\% absolute and $\sim$2-7$\times$ relative) in attack efficacy over state-of-the-art attacks. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI-24

arXiv:2404.11355 [pdf, other]

Consisaug: A Consistency-based Augmentation for Polyp Detection in Endoscopy Image Analysis

Authors: Ziyu Zhou, Wenyuan Shen, Chang Liu

Abstract: Colorectal cancer (CRC), which frequently originates from initially benign polyps, remains a significant contributor to global cancer-related mortality. Early and accurate detection of these polyps via colonoscopy is crucial for CRC prevention. However, traditional colonoscopy methods depend heavily on the operator's experience, leading to suboptimal polyp detection rates. Besides, the public data… ▽ More Colorectal cancer (CRC), which frequently originates from initially benign polyps, remains a significant contributor to global cancer-related mortality. Early and accurate detection of these polyps via colonoscopy is crucial for CRC prevention. However, traditional colonoscopy methods depend heavily on the operator's experience, leading to suboptimal polyp detection rates. Besides, the public database are limited in polyp size and shape diversity. To enhance the available data for polyp detection, we introduce Consisaug, an innovative and effective methodology to augment data that leverages deep learning. We utilize the constraint that when the image is flipped the class label should be equal and the bonding boxes should be consistent. We implement our Consisaug on five public polyp datasets and at three backbones, and the results show the effectiveness of our method. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: MLMI 2023

Showing 101–150 of 3,040 results for author: Zhou, Z