Search | arXiv e-print repository

Frequency-selective terahertz wave amplification by a time-boundary-engineered Huygens metasurface

Authors: Fu Deng, Fengjie Zhu, Xiaoyue Zhou, Yi Chan, **gbo Wu, Caihong Zhang, Biaobing **, Jensen Li, Kebin Fan, **gdi Zhang

Abstract: Ultrafast manipulation of optical resonance can establish the time-boundary effect in time-variant media leading to a new degree of freedom for coherent control of electromagnetic waves. Here, we demonstrate that a free-standing all dielectric Huygens metasurface of degenerate electric and magnetic resonances can prompt the broadband near-unity transmission in its static state, whereas it enables… ▽ More Ultrafast manipulation of optical resonance can establish the time-boundary effect in time-variant media leading to a new degree of freedom for coherent control of electromagnetic waves. Here, we demonstrate that a free-standing all dielectric Huygens metasurface of degenerate electric and magnetic resonances can prompt the broadband near-unity transmission in its static state, whereas it enables wave amplification in the presence of time boundary. The time boundary is realized by femtosecond laser excitations that transiently inject free carriers into the constituent meta-atoms for dynamic removal of a pre-established two-fold degeneracy. We observe that the transmittance in the photo-excited Huygens metasurface can exceed unity transmittance, i.e., THz wave amplification, by a factor over 20% in intensity at frequencies tunable by varying the arrival of time boundary with respect to that of the seed terahertz pulse. By numerical simulations and analysis with time-dependent coupled mode theory, we show that the wave amplification results from the ultrafast Q-switching and shift in resonant frequencies. This work demonstrates a new approach to achieve tunable amplification in an optical microcavity by exploiting the concept of time-variant media and the unique electromagnetic properties of Huygens metasurface. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.18962 [pdf, other]

Multi-modal Food Recommendation using Clustering and Self-supervised Learning

Authors: Yixin Zhang, Xin Zhou, Qianwen Meng, Fanglin Zhu, Yonghui Xu, Zhiqi Shen, Lizhen Cui

Abstract: Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigati… ▽ More Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigation of two datasets indicates that pre-trained multi-modal dense representations might precipitate a deterioration in performance compared to ID features when encapsulating interactive relationships. This observation implies that ID features possess a relative superiority in modeling interactive collaborative signals. Consequently, contemporary cutting-edge methodologies augment ID features with multi-modal information as supplementary features, overlooking the latent semantic relations between recipes. To rectify this, we present CLUSSL, a novel food recommendation framework that employs clustering and self-supervised learning. Specifically, CLUSSL formulates a modality-specific graph tailored to each modality with discrete/continuous features, thereby transforming semantic features into structural representation. Furthermore, CLUSSL procures recipe representations pertinent to different modalities via graph convolutional operations. A self-supervised learning objective is proposed to foster independence between recipe representations derived from different unimodal graphs. Comprehensive experiments on real-world datasets substantiate that CLUSSL consistently surpasses state-of-the-art recommendation benchmarks in performance. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Working paper

arXiv:2406.16633 [pdf, other]

MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Jiabin Liu, Changpeng Cai

Abstract: End-to-end (E2E) training approaches are commonly plagued by high memory consumption, reduced efficiency in training, challenges in model parallelization, and suboptimal biocompatibility. Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Nonetheless, conventional local learning methods fall short in achieving high model accuracy due to in… ▽ More End-to-end (E2E) training approaches are commonly plagued by high memory consumption, reduced efficiency in training, challenges in model parallelization, and suboptimal biocompatibility. Local learning is considered a novel interactive training method that holds promise as an alternative to E2E. Nonetheless, conventional local learning methods fall short in achieving high model accuracy due to inadequate local inter-module interactions. In this paper, we introduce a new model known as the Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network (MLAAN). MLAAN features an innovative supervised local learning approach coupled with a robust reinforcement module. This dual-component design enables the MLAAN to integrate smoothly with established local learning techniques, thereby enhancing the efficacy of the foundational methods. The method simultaneously acquires the local and global features of the model separately by constructing an independent auxiliary network and a cascade auxiliary network on the one hand and incorporates a leap augmented module, which serves to counteract the reduced learning capacity often associated with weaker supervision. This architecture not only augments the exchange of information amongst the local modules but also effectively mitigates the model's tendency toward myopia. The experimental evaluations conducted on four benchmark datasets, CIFAR-10, STL-10, SVHN, and ImageNet, demonstrate that the integration of MLAAN with existing supervised local learning methods significantly enhances the original methodologies. Of particular note, MLAAN enables local learning methods to comprehensively outperform end-to-end training approaches in terms of optimal performance while saving GPU memory. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.14540 [pdf, other]

IRASim: Learning Interactive Real-Robot Action Simulators

Authors: Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong

Abstract: Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext… ▽ More Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate extremely realistic videos of a robot arm that executes a given action trajectory, starting from an initial given frame. To validate the effectiveness of our method, we create a new benchmark, IRASim Benchmark, based on three real-robot datasets and perform extensive experiments on the benchmark. Results show that IRASim outperforms all the baseline methods and is more preferable in human evaluations. We hope that IRASim can serve as an effective and scalable approach to enhance robot learning in the real world. To promote research for generative real-robot action simulators, we open-source code, benchmark, and checkpoints at https: //gen-irasim.github.io. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Opensource, project website: https://gen-irasim.github.io

arXiv:2406.13483 [pdf, other]

doi 10.1016/j.ijsolstr.2024.112671

Voltage-controlled non-axisymmetric vibrations of soft electro-active tubes with strain-stiffening effect

Authors: F. Zhu, B. Wu, M. Destrade, H. Wang, R. Bao, W. Chen

Abstract: Material properties of soft electro-active (SEA) structures are significantly sensitive to external electro-mechanical biasing fields (such as pre-stretch and electric stimuli), which generate remarkable knock-on effects on their dynamic characteristics. In this work, we analyze the electrostatically tunable non-axisymmetric vibrations of an incompressible SEA cylindrical tube under the combinatio… ▽ More Material properties of soft electro-active (SEA) structures are significantly sensitive to external electro-mechanical biasing fields (such as pre-stretch and electric stimuli), which generate remarkable knock-on effects on their dynamic characteristics. In this work, we analyze the electrostatically tunable non-axisymmetric vibrations of an incompressible SEA cylindrical tube under the combination of a radially applied electric voltage and an axial pre-stretch. Following the theory of nonlinear electro-elasticity and the associated linearized theory for superimposed perturbations, we derive the nonlinear static response of the SEA tube to the inhomogeneous biasing fields for the Gent ideal dielectric model. Using the State Space Method, we efficiently obtain the frequency equations for voltage-controlled small-amplitude three-dimensional non-axisymmetric vibrations, covering a wide range of behaviors, from the purely radial breathing mode to torsional modes, axisymmetric longitudinal modes, and prismatic diffuse modes. We also perform an exhaustive numerical analysis to validate the proposed approach compared with the conventional displacement method, as well as to elucidate the influences of the applied voltage, axial pre-stretch, and strain-stiffening effect on the nonlinear static response and vibration behaviors of the SEA tube. The present study clearly indicates that manipulating electro-mechanical biasing fields is a feasible way to tune the small-amplitude vibration characteristics of an SEA tube. The results should benefit experimental work on, and design of, voltage-controlled resonant devices made of SEA tubes. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Journal ref: International Journal of Solids and Structures 290 (2024) 112671

arXiv:2406.13390 [pdf, other]

Stabilizing the Kerr arbitrary cat states and holonomic universal control

Authors: Ke-hui Yu, Fan Zhu, Jiao-jiao Xue, Hong-rong Li

Abstract: The interference-free double potential wells realized by the two-photon driving Kerr nonlinear resonator (KNR) can stabilize cat states and protect them from decoherence through a large energy gap. In this work, we use a parametrically driving KNR to propose a novel engineering Hamiltonian that can stabilize arbitrary cat states and independently manipulate the superposed coherent states to move a… ▽ More The interference-free double potential wells realized by the two-photon driving Kerr nonlinear resonator (KNR) can stabilize cat states and protect them from decoherence through a large energy gap. In this work, we use a parametrically driving KNR to propose a novel engineering Hamiltonian that can stabilize arbitrary cat states and independently manipulate the superposed coherent states to move arbitrarily in phase space. This greater degree of control allows us to make the two potential wells collide and merge, generating a collision state with many novel properties. Furthermore, the potential wells carrying quantum states move adiabatically in phase space produce quantum holonomy. We explore the quantum holonomy of collision states for the first time and propose a holonomy-free preparation method for arbitrary cat states. Additionally, we develop a universal holonomic quantum computing protocol utilizing the quantum holonomy of coherent and collision states, including single-qubit rotation gates and multi-qubit control gates. Finally, we propose an experimentally feasible physical realization in superconducting circuits to achieve the Hamiltonian described above. Our proposal provides a platform with greater control degrees of freedom, enabling more operations on bosonic modes and the exploration of intriguing physics. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures

arXiv:2406.13294 [pdf, other]

Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the original image rather than the target tokens. To address this challenge, we propose a Contextual-Injection Attack (CIA) that employs gradient-based perturbation to inject target tokens into both visual and textual contexts, thereby improving the probability distribution of the target tokens. By shifting the contextual semantics towards the target tokens instead of the original image semantics, CIA enhances the cross-prompt transferability of adversarial images.Extensive experiments on the BLIP2, InstructBLIP, and LLaVA models show that CIA outperforms existing methods in cross-prompt transferability, demonstrating its potential for more effective adversarial strategies in VLMs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 13 pages

arXiv:2406.11497 [pdf, other]

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Authors: Boyi Deng, Wenjie Wang, Fengbin Zhu, Qifan Wang, Fuli Feng

Abstract: Retrieval-Augmented Generation (RAG) can alleviate hallucinations of Large Language Models (LLMs) by referencing external documents. However, the misinformation in external documents may mislead LLMs' generation. To address this issue, we explore the task of "credibility-aware RAG", in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counter… ▽ More Retrieval-Augmented Generation (RAG) can alleviate hallucinations of Large Language Models (LLMs) by referencing external documents. However, the misinformation in external documents may mislead LLMs' generation. To address this issue, we explore the task of "credibility-aware RAG", in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counteract misinformation. To this end, we introduce a plug-and-play method named $\textbf{Cr}$edibility-aware $\textbf{A}$ttention $\textbf{M}$odification (CrAM). CrAM identifies influential attention heads in LLMs and adjusts their attention weights based on the credibility of the documents, thereby reducing the impact of low-credibility documents. Experiments on Natual Questions and TriviaQA using Llama2-13B, Llama3-8B, and Qwen-7B show that CrAM improves the RAG performance of LLMs against misinformation pollution by over 20%, even surpassing supervised fine-tuning methods. △ Less

Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.10361 [pdf, other]

On Efficient Neural Network Architectures for Image Compression

Authors: Yichi Zhang, Zhihao Duan, Fengqing Zhu

Abstract: Recent advances in learning-based image compression typically come at the cost of high complexity. Designing computationally efficient architectures remains an open challenge. In this paper, we empirically investigate the impact of different network designs in terms of rate-distortion performance and computational complexity. Our experiments involve testing various transforms, including convolutio… ▽ More Recent advances in learning-based image compression typically come at the cost of high complexity. Designing computationally efficient architectures remains an open challenge. In this paper, we empirically investigate the impact of different network designs in terms of rate-distortion performance and computational complexity. Our experiments involve testing various transforms, including convolutional neural networks and transformers, as well as various context models, including hierarchical, channel-wise, and space-channel context models. Based on the results, we present a series of efficient models, the final model of which has comparable performance to recent best-performing methods but with significantly lower complexity. Extensive experiments provide insights into the design of architectures for learned image compression and potential direction for future research. The code is available at \url{https://gitlab.com/viper-purdue/efficient-compression}. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 2024 IEEE International Conference on Image Processing (ICIP2024)

arXiv:2406.10246 [pdf, other]

Semantic-Enhanced Relational Metric Learning for Recommender Systems

Authors: Mingming Li, Fuqing Zhu, Feng Yuan, Songlin Hu

Abstract: Recently, relational metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. Different from the knowledge graph where the entity-to-entity relations are given in advance, historical interactions lack explicit relations between users and items in recommender systems. Currently, many researchers have s… ▽ More Recently, relational metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. Different from the knowledge graph where the entity-to-entity relations are given in advance, historical interactions lack explicit relations between users and items in recommender systems. Currently, many researchers have succeeded in constructing the implicit relations to remit this issue. However, in previous work, the learning process of the induction function only depends on a single source of data (i.e., user-item interaction) in a supervised manner, resulting in the co-occurrence relation that is free of any semantic information. In this paper, to tackle the above problem in recommender systems, we propose a joint Semantic-Enhanced Relational Metric Learning (SERML) framework that incorporates the semantic information. Specifically, the semantic signal is first extracted from the target reviews containing abundant item features and personalized user preferences. A novel regression model is then designed via leveraging the extracted semantic signal to improve the discriminative ability of original relation-based training process. On four widely-used public datasets, experimental results demonstrate that SERML produces a competitive performance compared with several state-of-the-art methods in recommender systems. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.03838 [pdf, other]

On universal splittings of tree-level particle and string scattering amplitudes

Authors: Qu Cao, ** Dong, Song He, Canxin Shi, Fanky Zhu

Abstract: In this paper, we study the newly discovered universal splitting behavior for tree-level scattering amplitudes of particles and strings~\cite{Cao:2024gln}: when a set of Mandelstam variables (and Lorentz products involving polarizations for gluons/gravitons) vanish, the $n$-point amplitude factorizes as the product of two lower-point {\it currents} with $n{+}3$ external legs in total. We refer to… ▽ More In this paper, we study the newly discovered universal splitting behavior for tree-level scattering amplitudes of particles and strings~\cite{Cao:2024gln}: when a set of Mandelstam variables (and Lorentz products involving polarizations for gluons/gravitons) vanish, the $n$-point amplitude factorizes as the product of two lower-point {\it currents} with $n{+}3$ external legs in total. We refer to any such subspace of the kinematic space of $n$ massless momenta as ``2-split kinematics", where the scattering potential for string amplitudes and the corresponding scattering equations for particle amplitudes nicely split into two parts. Based on these, we provide a systematic and detailed study of the splitting behavior for essentially all ingredients which appear as integrands for open- and closed-string amplitudes as well as Cachazo-He-Yuan (CHY) formulas, including Parke-Taylor factors, correlators in superstring and bosonic string theories, and CHY integrands for a variety of amplitudes of scalars, gluons and gravitons. These results then immediately lead to the splitting behavior of string and particle amplitudes in a wide range of theories, including bi-adjoint $φ^3$ (with string extension known as $Z$ and $J$ integrals), non-linear sigma model, Dirac-Born-Infeld, the special Galileon, \textit{etc.}, as well as Yang-Mills and Einstein gravity (with bosonic and superstring extensions). Our results imply and extend some other factorization behavior of tree amplitudes considered recently, including smooth splittings~\cite{Cachazo:2021wsz} and factorizations near zeros~\cite{Arkani-Hamed:2023swr}, to all these theories. A special case of splitting also yields soft theorems for gluons/gravitons as well as analogous soft behavior for Goldstone particles near their Adler zeros. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 37 pages, 3 figures

arXiv:2406.03736 [pdf, other]

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Authors: **gyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, Chongxuan Li

Abstract: Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time… ▽ More Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form. Motivated by the finding, we propose reparameterized absorbing discrete diffusion (RADD), a dedicated diffusion model that characterizes the time-independent conditional probabilities. Besides its simplicity, RADD can reduce the number of function evaluations (NFEs) by caching the output of the time-independent network when the noisy sample remains unchanged in a sampling interval. Empirically, RADD is up to 3.5 times faster while consistently achieving a better performance than the strongest baseline. Built upon the new factorization of the concrete score, we further prove a surprising result that the exact likelihood of absorbing diffusion can be rewritten to a simple form (named denoising cross-entropy) and then estimated efficiently by the Monte Carlo method. The resulting approach also applies to the original parameterization of the concrete score. It significantly advances the state-of-the-art discrete diffusion on 5 zero-shot language modeling benchmarks (measured by perplexity) at the GPT-2 scale. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03026 [pdf, other]

Dynamical topology of chiral and nonreciprocal state transfers in a non-Hermitian quantum system

Authors: Pengfei Lu, Yang Liu, Qifeng Lao, Teng Liu, Xinxin Rao, Ji Bian, Hao Wu, Feng Zhu, Le Luo

Abstract: The fundamental concept underlying topological phenomena posits the geometric phase associated with eigenstates. In contrast to this prevailing notion, theoretical studies on time-varying Hamiltonians allow for a new type of topological phenomenon, known as topological dynamics, where the evolution process allows a hidden topological invariant associated with continuous flows. To validate this con… ▽ More The fundamental concept underlying topological phenomena posits the geometric phase associated with eigenstates. In contrast to this prevailing notion, theoretical studies on time-varying Hamiltonians allow for a new type of topological phenomenon, known as topological dynamics, where the evolution process allows a hidden topological invariant associated with continuous flows. To validate this conjecture, we study topological chiral and nonreciprocal dynamics by encircling the exceptional points (EPs) of non-Hermitian Hamiltonians in a trapped ion system. These dynamics are topologically robust against external perturbations even in the presence dissipation-induced nonadiabatic processes. Our findings indicate that they are protected by dynamical vorticity -- an emerging topological invariant associated with the energy dispersion of non-Hermitian band structures in a parallel transported eigenbasis. The symmetry breaking and other key features of topological dynamics are directly observed through quantum state tomography. Our results mark a significant step towards exploring topological properties of open quantum systems. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.00659 [pdf, other]

High Performance Operation of a Direct-Current and Superconducting Radio-Frequency Combined Photocathode Gun

Authors: H. Jia, T. Li, T. Wang, Y. Zhao, X. Zhang, H. Xu, Z. Liu, J. Liu, L. Lin, H. Xie, L. Feng, F. Wang, F. Zhu, J. Hao, S. Quan, K. Liu, S. Huang

Abstract: Superconducting radio-frequency (SRF) guns are promising candidates to deliver high brightness continuous-wave (CW) electron beams for new generations of coherent linac light sources, ultrafast electron diffractions, MeV pulsed beam applications, etc. To solve the compatibility problem of semiconductor photocathodes, a hybrid gun combining a direct-current gap and an SRF cavity has been developed.… ▽ More Superconducting radio-frequency (SRF) guns are promising candidates to deliver high brightness continuous-wave (CW) electron beams for new generations of coherent linac light sources, ultrafast electron diffractions, MeV pulsed beam applications, etc. To solve the compatibility problem of semiconductor photocathodes, a hybrid gun combining a direct-current gap and an SRF cavity has been developed. The gun, employing K2CsSb photocathodes driven by a green laser, has been brought into stable CW operation with a dark current below 100 pA, delivering electron beams at an energy gain of 2.4 MeV, an electron bunch charge of 100 pC, and a repetition rate of 1 MHz. A normalized beam emittance of 0.54 mm-mrad has been achieved at the bunch charge of 100 pC and peak current of about 6 A. CW operation at 81.25 MHz repetition rate has also been tested with the maximum average beam current reaching 3 mA. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures

arXiv:2406.00446 [pdf, other]

GLCAN: Global-Local Collaborative Auxiliary Network for Local Learning

Authors: Feiyu Zhu, Yuming Zhang, Changpeng Cai, Guinan Guo, Jiao Li, Xiuyuan Guo, Quanwei Zhang, Peizhe Wang, Chenghao He, Junhao Su

Abstract: Traditional deep neural networks typically use end-to-end backpropagation, which often places a big burden on GPU memory. Another promising training method is local learning, which involves splitting the network into blocks and training them in parallel with the help of an auxiliary network. Local learning has been widely studied and applied to image classification tasks, and its performance is co… ▽ More Traditional deep neural networks typically use end-to-end backpropagation, which often places a big burden on GPU memory. Another promising training method is local learning, which involves splitting the network into blocks and training them in parallel with the help of an auxiliary network. Local learning has been widely studied and applied to image classification tasks, and its performance is comparable to that of end-to-end method. However, different image tasks often rely on different feature representations, which is difficult for typical auxiliary networks to adapt to. To solve this problem, we propose the construction method of Global-Local Collaborative Auxiliary Network (GLCAN), which provides a macroscopic design approach for auxiliary networks. This is the first demonstration that local learning methods can be successfully applied to other tasks such as object detection and super-resolution. GLCAN not only saves a lot of GPU memory, but also has comparable performance to an end-to-end approach on data sets for multiple different tasks. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20565 [pdf, other]

doi 10.1145/3589335.3648296

Knowledge Enhanced Multi-intent Transformer Network for Recommendation

Authors: Ding Zou, Wei Wei, Feida Zhu, Chuanyu Xu, Tao Zhang, Chengfu Huo

Abstract: Incorporating Knowledge Graphs into Recommendation has attracted growing attention in industry, due to the great potential of KG in providing abundant supplementary information and interpretability for the underlying models. However, simply integrating KG into recommendation usually brings in negative feedback in industry, due to the ignorance of the following two factors: i) users' multiple inten… ▽ More Incorporating Knowledge Graphs into Recommendation has attracted growing attention in industry, due to the great potential of KG in providing abundant supplementary information and interpretability for the underlying models. However, simply integrating KG into recommendation usually brings in negative feedback in industry, due to the ignorance of the following two factors: i) users' multiple intents, which involve diverse nodes in KG. For example, in e-commerce scenarios, users may exhibit preferences for specific styles, brands, or colors. ii) knowledge noise, which is a prevalent issue in Knowledge Enhanced Recommendation (KGR) and even more severe in industry scenarios. The irrelevant knowledge properties of items may result in inferior model performance compared to approaches that do not incorporate knowledge. To tackle these challenges, we propose a novel approach named Knowledge Enhanced Multi-intent Transformer Network for Recommendation (KGTN), comprising two primary modules: Global Intents Modeling with Graph Transformer, and Knowledge Contrastive Denoising under Intents. Specifically, Global Intents with Graph Transformer focuses on capturing learnable user intents, by incorporating global signals from user-item-relation-entity interactions with a graph transformer, meanwhile learning intent-aware user/item representations. Knowledge Contrastive Denoising under Intents is dedicated to learning precise and robust representations. It leverages intent-aware representations to sample relevant knowledge, and proposes a local-global contrastive mechanism to enhance noise-irrelevant representation learning. Extensive experiments conducted on benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. And online A/B testing results on Alibaba large-scale industrial recommendation platform also indicate the real-scenario effectiveness of KGTN. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accept By The Web Conf 2024 (WWW 2024) Industry Track. arXiv admin note: text overlap with arXiv:2204.08807

arXiv:2405.18240 [pdf, other]

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-Lin Liu

Abstract: Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resol… ▽ More Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resolution. Modifying the preset resolution of a model may severely degrade the performance. In this work, we propose to enhance the model adaptability to resolution variation by optimizing the patch embedding. The proposed method, called Multi-Scale Patch Embedding (MSPE), substitutes the standard patch embedding with multiple variable-sized patch kernels and selects the best parameters for different resolutions, eliminating the need to resize the original image. Our method does not require high-cost training or modifications to other parts, making it easy to apply to most ViT models. Experiments in image classification, segmentation, and detection tasks demonstrate the effectiveness of MSPE, yielding superior performance on low-resolution inputs and performing comparably on high-resolution inputs with existing methods. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17790 [pdf, other]

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

arXiv:2405.16060 [pdf, other]

Delay-Effective Task Offloading Technology in Internet of Vehicles: From the Perspective of the Vehicle Platooning

Authors: Kan Yu, Fuze Zhu, Xiaowu Liu, Zhiyong Feng, Dong Li

Abstract: The task offloading technology plays a crucial vital role in the Internet of Vehicle (IoV) with the demands of delay minimum, by jointly optimizing the heterogeneous computing resources supported by the vehicles, roadside units (RSUs), and macro base stations (MBSs). In previous works, on the one hand, they ignored the wireless interference among the exchange and sharing of the task data. On the o… ▽ More The task offloading technology plays a crucial vital role in the Internet of Vehicle (IoV) with the demands of delay minimum, by jointly optimizing the heterogeneous computing resources supported by the vehicles, roadside units (RSUs), and macro base stations (MBSs). In previous works, on the one hand, they ignored the wireless interference among the exchange and sharing of the task data. On the other hand, the available resources supported by the vehicles that have similar driving behaviors, which can form a vehicle platooning (VEH-PLA) and effectively integrate the resources of individual vehicle, has not been addressed. In addition, as a novel resource management paradigm, the VEH-PLA should consider the task categorization, since vehicles in VEH-PLA may have the same task offloading requests, which also has not attracted enough attention. In this paper, considering the wireless interference, mobility, VEH-PLA, and task categorization, we propose four kinds of task offloading models for the purpose of the processing delay minimum. Furthermore, by utilizing centralized training and decentralized execution (CTDE) based on multi-agent deep reinforcement learning (MADRL), we present a task offloading decision-making method to find the global optimal offloading decision, resulting in a significant enhancement in the load balancing of resources and processing delay. Finally, the simulations demonstrate that the proposed method significantly outperforms traditional task offloading methods in terms of the processing delay minimum while kee** the resource load balancing. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.13868 [pdf, other]

Automatically Identifying Local and Global Circuits with Linear Computation Graphs

Authors: Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu

Abstract: Circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with sparse autoencoders (SAEs) and a variant called skip SAEs. With these two modules inserted into the model, the model's computation graph with respect to OV and MLP circuits becomes strictly linear. Our methods do not require linear approximation to comp… ▽ More Circuit analysis of any certain model behavior is a central task in mechanistic interpretability. We introduce our circuit discovery pipeline with sparse autoencoders (SAEs) and a variant called skip SAEs. With these two modules inserted into the model, the model's computation graph with respect to OV and MLP circuits becomes strictly linear. Our methods do not require linear approximation to compute the causal effect of each node. This fine-grained graph enables identifying both end-to-end and local circuits accounting for either logits or intermediate features. We can scalably apply this pipeline with a technique called Hierarchical Attribution. We analyze three kind of circuits in GPT2-Small, namely bracket, induction and Indirect Object Identification circuits. Our results reveal new findings underlying existing discoveries. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11893 [pdf]

Tunable moiré bandgap in hBN-aligned bilayer graphene device with in-situ electrostatic gating

Authors: Hanbo Xiao, Han Gao, Min Li, Fanqiang Chen, Qiao Li, Yiwei Li, Meixiao Wang, Fangyuan Zhu, Lexian Yang, Feng Miao, Yulin Chen, Cheng Chen, Bin Cheng, Jianpeng Liu, Zhongkai Liu

Abstract: Over the years, great efforts have been devoted in introducing a sizable and tunable band gap in graphene for its potential application in next-generation electronic devices. The primary challenge in modulating this gap has been the absence of a direct method for observing changes of the band gap in momentum space. In this study, we employ advanced spatial- and angle-resolved photoemission spectro… ▽ More Over the years, great efforts have been devoted in introducing a sizable and tunable band gap in graphene for its potential application in next-generation electronic devices. The primary challenge in modulating this gap has been the absence of a direct method for observing changes of the band gap in momentum space. In this study, we employ advanced spatial- and angle-resolved photoemission spectroscopy technique to directly visualize the gap formation in bilayer graphene, modulated by both displacement fields and moiré potentials. The application of displacement field via in-situ electrostatic gating introduces a sizable and tunable electronic bandgap, proportional to the field strength up to 100 meV. Meanwhile, the moiré potential, induced by aligning the underlying hexagonal boron nitride substrate, extends the bandgap by ~ 20 meV. Theoretical calculations, effectively capture the experimental observations. Our investigation provides a quantitative understanding of how these two mechanisms collaboratively modulate the band gap in bilayer graphene, offering valuable guidance for the design of graphene-based electronic devices. △ Less

Submitted 24 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 16 pages,4 figures

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.09796 [pdf]

Prototype Design of a Digital Low-level RF System for S-band Deflectors

Authors: J. F. Zhu, H. L. Ding, H. K. Li, Y. Li, X. W. Dai, J. W. Han, W. Q. Zhang

Abstract: S-band deflectors are generally operated on pulsed mode for beam diagnosis. We plan to deploy 5 S-band (2997 MHz) deflectors to accurately measure the longitudinal time distribution of ultra-short electron beam pulses in Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL). A microwave system of one deflector consists of a low-level RF system (LLRF), a solid-state amplifier, waveguide c… ▽ More S-band deflectors are generally operated on pulsed mode for beam diagnosis. We plan to deploy 5 S-band (2997 MHz) deflectors to accurately measure the longitudinal time distribution of ultra-short electron beam pulses in Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL). A microwave system of one deflector consists of a low-level RF system (LLRF), a solid-state amplifier, waveguide couplers, and a klystron, operated in pulse mode with a maximum repetition frequency of 50 Hz. Its microwave amplitude and phase stability must be better than 0.06%/0.08° (RMS). This article will introduce the prototype design of the hardware, firmware, and software of the digital LLRF system. We use homemade Local Oscillators (LOs) and commercial cards based on the MicroTCA standard in hardware design. The firmware design will use a Non-IQ demodulation and a pulse feedforward algorithm to suppress noise from high voltage of klystron. The software design is based on the EPICS control system architecture, achieving slow control and interface display functions. This report will also show some preliminary test results. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 3 pages, 5 figures, IPAC'24 - 15th International Particle Accelerator Conference

arXiv:2405.09459 [pdf, other]

Fourier Boundary Features Network with Wider Catchers for Glass Segmentation

Authors: Xiaolin Qin, Jiacen Liu, Qianlei Wang, Shaolin Zhang, Fei Zhu, Zhang Yi

Abstract: Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetratin… ▽ More Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetrating glass. We proposed the Fourier Boundary Features Network with Wider Catchers (FBWC), which might be the first attempt to utilize sufficiently wide horizontal shallow branches without vertical deepening for guiding the fine granularity segmentation boundary through primary glass semantic information. Specifically, we designed the Wider Coarse-Catchers (WCC) for anchoring large area segmentation and reducing excessive extraction from a structural perspective. We embed fine-grained features by Cross Transpose Attention (CTA), which is introduced to avoid the incomplete area within the boundary caused by reflection noise. For excavating glass features and balancing high-low layers context, a learnable Fourier Convolution Controller (FCC) is proposed to regulate information integration robustly. The proposed method has been validated on three different public glass segmentation datasets. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art (SOTA) methods in glass image segmentation. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07827 [pdf, other]

Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

Authors: Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

Abstract: Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage… ▽ More Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted at CVPRw 2024

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.07291 [pdf, other]

Robust Beamforming with Gradient-based Liquid Neural Network

Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane Debbah

Abstract: Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address the… ▽ More Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address these issues, we propose a robust gradient-based liquid neural network (GLNN) framework that utilizes ordinary differential equation-based liquid neurons to solve the beamforming problem. Specifically, our proposed GLNN framework takes gradients of the optimization objective function as inputs to extract the high-order channel feature information, and then introduces a residual connection to mitigate the training burden. Furthermore, we use the manifold learning technique to compress the search space of the beamforming problem. These designs enable the GLNN to effectively maintain low complexity while ensuring strong robustness to noisy and highly dynamic channels. Extensive simulation results demonstrate that the GLNN can achieve 4.15% higher spectral efficiency than that of typical iterative algorithms, and reduce the time consumption to only 1.61% that of conventional methods. △ Less

Submitted 17 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07257 [pdf, other]

Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation

Authors: Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, **g Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu

Abstract: Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and ca… ▽ More Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce the Inter-Reconstructed Feature Disentanglement (IRFD) method to decouple human facial features into three latent spaces. We then design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we present a novel generator that employs modified latent codes derived from the editing module to regulate emotional expression, head poses, and speech content in synthesizing facial animations. Extensive trials demonstrate that our method can generate realistic talking head with coordinated lip motions, authentic facial emotions, and smooth head movements. The demo video is available at the anonymous link: https://anonymous.4open.science/r/SPEAK-F56E △ Less

Submitted 12 May, 2024; originally announced May 2024.

ACM Class: I.4.5; I.4.9

arXiv:2405.00391 [pdf, ps, other]

Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays

Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Ahmed Alhammadi, Hui Chen, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

Abstract: The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial… ▽ More The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial intelligence have been developed to reduce beamforming overhead. Intuitively, if we can implement the near-optimal beamforming only using a tiny subset of the all channel information, the overhead for channel estimation and beamforming would be reduced significantly compared with the traditional beamforming methods that usually need full channel information and the inversion of large dimensional matrix. In light of this idea, we propose a novel scheme that utilizes Wasserstein generative adversarial network with gradient penalty to infer the full beamforming matrices based on very little of channel information. Simulation results confirm that it can accomplish comparable performance with the weighted minimum mean-square error algorithm, while reducing the overhead by over 50%. △ Less

Submitted 15 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00365 [pdf, other]

Robust Continuous-Time Beam Tracking with Liquid Neural Network

Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Richeng **, Qianqian Yang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

Abstract: Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high… ▽ More Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high overhead, we propose a novel solution for robust continuous-time beam tracking with liquid neural network, which dynamically adjust the narrow mmWave beams to ensure real-time beam alignment with mobile users. Through extensive simulations, we validate the effectiveness of our proposed method and demonstrate its superiority over existing state-of-the-art deep-learning-based approaches. Specifically, our scheme achieves at most 46.9% higher normalized spectral efficiency than the baselines when the user is moving at 5 m/s, demonstrating the potential of liquid neural networks to enhance mmWave mobile communication performance. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.18095 [pdf, other]

Partial confinement in a quantum-link simulator

Authors: Zheng Tang, Fei Zhu, Yi-Fan Luo, Wei Zheng, Li Chen

Abstract: Confinement/deconfinement, captivating attributes of high-energy elementary particles, have recently garnered wide attention in quantum simulations based on cold atoms. Yet, the partial confinement, an intermediate state between the confinement and deconfinement, remains underexplored. The partial confinement encapsulates the phenomenon that the confining behavior of charged particles is contingen… ▽ More Confinement/deconfinement, captivating attributes of high-energy elementary particles, have recently garnered wide attention in quantum simulations based on cold atoms. Yet, the partial confinement, an intermediate state between the confinement and deconfinement, remains underexplored. The partial confinement encapsulates the phenomenon that the confining behavior of charged particles is contingent upon their relative positions. In this paper, we demonstrate that the spin-1 quantum link model provides an excellent platform for exploring partial confinement. We conduct a comprehensive investigation of the physics emerging from partial confinement in both the context of equilibrium and non-equilibrium dynamics. Potential experimental setups using cold atoms are also discussed. Our work offers a simple and feasible routine for the study of confinement-related physics in the state-of-the-art artificial quantum systems subject to gauge symmetries. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.15575 [pdf, other]

Jitter Characterization of the HyTI Satellite

Authors: Chase Urasaki, Frances Zhu, Michael Bottom, Miguel Nunes, Aidan Walk

Abstract: The Hyperspectral Thermal Imager (HyTI) is a technology demonstration mission that will obtain high spatial, spectral, and temporal resolution long-wave infrared images of Earth's surface from a 6U cubesat. HyTI science requires that the pointing accuracy of the optical axis shall not exceed 2.89 arcsec over the 0.5 ms integration time due to microvibration effects (known as jitter). Two sources o… ▽ More The Hyperspectral Thermal Imager (HyTI) is a technology demonstration mission that will obtain high spatial, spectral, and temporal resolution long-wave infrared images of Earth's surface from a 6U cubesat. HyTI science requires that the pointing accuracy of the optical axis shall not exceed 2.89 arcsec over the 0.5 ms integration time due to microvibration effects (known as jitter). Two sources of vibration are a cryocooler that is added to maintain the detector at 68 K and three orthogonally placed reaction wheels that are a part of the attitude control system. Both of these parts will introduce vibrations that are propagated through to the satellite structure while imaging. Typical methods of characterizing and measuring jitter involve complex finite element methods and specialized equipment and setups. In this paper, we describe a novel method of characterizing jitter for small satellite systems that is low-cost and minimally modifies the subject's mass distribution. The metrology instrument is comprised of a laser source, a small mirror mounted via a 3D printed clamp to a jig, and a lateral effect position-sensing detector. The position-sensing detector samples 1000 Hz and can measure displacements as little as 0.15 arcsec at distances of one meter. This paper provides an experimental procedure that incrementally analyzes vibratory sources to establish causal relationships between sources and the vibratory modes they create. We demonstrate the capabilities of this metrology system and testing procedure on HyTI in the Hawaii Space Flight Lab's clean room. Results include power spectral density plots that show fundamental and higher-order vibratory modal frequencies. Results from metrology show that jitter from reaction wheels meets HyTI system requirements within 3$σ$. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: Accepted for the 2024 IEEE Aerospace Conference Proceedings

arXiv:2404.12638 [pdf, other]

Learning to Cut via Hierarchical Sequence/Set Model for Efficient Mixed-Integer Programming

Authors: Jie Wang, Zhihai Wang, Xijun Li, Yufei Kuang, Zhihao Shi, Fangzhou Zhu, Mingxuan Yuan, Jia Zeng, Yongdong Zhang, Feng Wu

Abstract: Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), which formulate many important real-world applications. Cut selection heavily depends on (P1) which cuts to prefer and (P2) how many cuts to select. Although modern MILP solvers tackle (P1)-(P2) by human-designed heuristics, machine learning carries the potential to learn more effective heuristics. Howev… ▽ More Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), which formulate many important real-world applications. Cut selection heavily depends on (P1) which cuts to prefer and (P2) how many cuts to select. Although modern MILP solvers tackle (P1)-(P2) by human-designed heuristics, machine learning carries the potential to learn more effective heuristics. However, many existing learning-based methods learn which cuts to prefer, neglecting the importance of learning how many cuts to select. Moreover, we observe that (P3) what order of selected cuts to prefer significantly impacts the efficiency of MILP solvers as well. To address these challenges, we propose a novel hierarchical sequence/set model (HEM) to learn cut selection policies. Specifically, HEM is a bi-level model: (1) a higher-level module that learns how many cuts to select, (2) and a lower-level module -- that formulates the cut selection as a sequence/set to sequence learning problem -- to learn policies selecting an ordered subset with the cardinality determined by the higher-level module. To the best of our knowledge, HEM is the first data-driven methodology that well tackles (P1)-(P3) simultaneously. Experiments demonstrate that HEM significantly improves the efficiency of solving MILPs on eleven challenging MILP benchmarks, including two Huawei's real problems. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.00244

arXiv:2404.12257 [pdf, other]

Food Portion Estimation via 3D Object Scaling

Authors: Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

Abstract: Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D imag… ▽ More Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D images by leveraging the power of 3D food models and physical reference in the eating scene. Our method estimates the pose of the camera and the food object in the input image and recreates the eating occasion by rendering an image of a 3D model of the food with the estimated poses. We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items and associated annotations including food volume, weight, and energy. Our method achieves an average error of 31.10 kCal (17.67%) on this dataset, outperforming existing portion estimation methods. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11648 [pdf, other]

Pions from higher-dimensional gluons: general realizations and stringy models

Authors: ** Dong, Xiang Li, Fan Zhu

Abstract: In this paper we revisit the general phenomenon that scattering amplitudes of pions can be obtained from "dimensional reduction" of gluons in higher dimensions in a more general context. We show that such "dimensional reduction" operations universally turn gluons into pions regardless of details of interactions: under such operations any amplitude that is gauge invariant and contains only local si… ▽ More In this paper we revisit the general phenomenon that scattering amplitudes of pions can be obtained from "dimensional reduction" of gluons in higher dimensions in a more general context. We show that such "dimensional reduction" operations universally turn gluons into pions regardless of details of interactions: under such operations any amplitude that is gauge invariant and contains only local simple poles becomes one that satisfies Adler zero in the soft limit. As two such examples, we show that starting from gluon amplitudes in both superstring and bosonic string theories, the operations produce "stringy" completion of pion scattering amplitudes to all orders in $α'$, with leading order given by non-linear sigma model amplitudes. Via Kawai-Lewellen-Tye relations, they give closed-stringy completion for Born-Infeld theory and the special Galileon theory, which are directly related to gravity amplitudes in closed-string theories. We also discuss how they naturally produce stringy models for mixed amplitudes of pions and colored scalars. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 26 pages

arXiv:2404.11180 [pdf, other]

Causal Deconfounding via Confounder Disentanglement for Dual-Target Cross-Domain Recommendation

Authors: Jiajie Zhu, Yan Wang, Feng Zhu, Zhu Sun

Abstract: In recent years, dual-target Cross-Domain Recommendation (CDR) has been proposed to capture comprehensive user preferences in order to ultimately enhance the recommendation accuracy in both data-richer and data-sparser domains simultaneously. However, in addition to users' true preferences, the user-item interactions might also be affected by confounders (e.g., free ship**, sales promotion). As… ▽ More In recent years, dual-target Cross-Domain Recommendation (CDR) has been proposed to capture comprehensive user preferences in order to ultimately enhance the recommendation accuracy in both data-richer and data-sparser domains simultaneously. However, in addition to users' true preferences, the user-item interactions might also be affected by confounders (e.g., free ship**, sales promotion). As a result, dual-target CDR has to meet two challenges: (1) how to effectively decouple observed confounders, including single-domain confounders and cross-domain confounders, and (2) how to preserve the positive effects of observed confounders on predicted interactions, while eliminating their negative effects on capturing comprehensive user preferences. To address the above two challenges, we propose a Causal Deconfounding framework via Confounder Disentanglement for dual-target Cross-Domain Recommendation, called CD2CDR. In CD2CDR, we first propose a confounder disentanglement module to effectively decouple observed single-domain and cross-domain confounders. We then propose a causal deconfounding module to preserve the positive effects of such observed confounders and eliminate their negative effects via backdoor adjustment, thereby enhancing the recommendation accuracy in each domain. Extensive experiments conducted on five real-world datasets demonstrate that CD2CDR significantly outperforms the state-of-the-art methods. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11068 [pdf, other]

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

Authors: Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch

Abstract: AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute res… ▽ More AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute resources. In this work, we conducted a comprehensive analysis on the AlphaFold training procedure based on Openfold, identified that inefficient communications and overhead-dominated computations were the key factors that prevented the AlphaFold training from effective scaling. We introduced ScaleFold, a systematic training method that incorporated optimizations specifically for these factors. ScaleFold successfully scaled the AlphaFold training to 2080 NVIDIA H100 GPUs with high resource utilization. In the MLPerf HPC v3.0 benchmark, ScaleFold finished the OpenFold benchmark in 7.51 minutes, shown over $6\times$ speedup than the baseline. For training the AlphaFold model from scratch, ScaleFold completed the pretraining in 10 hours, a significant improvement over the seven days required by the original AlphaFold pretraining baseline. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09718 [pdf, ps, other]

Counting, mixing and equidistribution for GPS systems with applications to relatively Anosov groups

Authors: Pierre-Louis Blayac, Richard Canary, Feng Zhu, Andrew Zimmer

Abstract: We establish counting, mixing and equidistribution results for finite BMS measures on flow spaces associated to geometrically finite convergence group actions. We show that, in particular, these results apply to flow spaces associated to relatively Anosov groups. We establish counting, mixing and equidistribution results for finite BMS measures on flow spaces associated to geometrically finite convergence group actions. We show that, in particular, these results apply to flow spaces associated to relatively Anosov groups. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 55 pages. Comments welcome!

arXiv:2404.09713 [pdf, ps, other]

Patterson-Sullivan theory for coarse cocycles

Authors: Pierre-Louis Blayac, Richard Canary, Feng Zhu, Andrew Zimmer

Abstract: In this paper we develop a theory of Patterson-Sullivan measures associated to coarse cocycles of convergence groups. This framework includes Patterson-Sullivan measures associated to the Busemann cocycle on the geodesic boundary of a Gromov hyperbolic metric spaces and Patterson-Sullivan measures on flag manifolds associated to Anosov (or more general transverse) subgroups of semisimple Lie group… ▽ More In this paper we develop a theory of Patterson-Sullivan measures associated to coarse cocycles of convergence groups. This framework includes Patterson-Sullivan measures associated to the Busemann cocycle on the geodesic boundary of a Gromov hyperbolic metric spaces and Patterson-Sullivan measures on flag manifolds associated to Anosov (or more general transverse) subgroups of semisimple Lie groups, as well as more examples. Under some natural geometric assumptions on the coarse cocycle, we prove existence, uniqueness, and ergodicity results. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 66 pages. Comments welcome!

arXiv:2404.07507 [pdf, other]

Learning to Classify New Foods Incrementally Via Compressed Exemplars

Authors: Justin Yang, Zhihao Duan, Jiangpeng He, Fengqing Zhu

Abstract: Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image… ▽ More Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image classification systems should adapt to and manage data that continuously evolves. This is where continual learning plays an important role. A challenge in continual learning is catastrophic forgetting, where ML models tend to discard old knowledge upon learning new information. While memory-replay algorithms have shown promise in mitigating this problem by storing old data as exemplars, they are hampered by the limited capacity of memory buffers, leading to an imbalance between new and previously learned data. To address this, our work explores the use of neural image compression to extend buffer size and enhance data diversity. We introduced the concept of continuously learning a neural compression model to adaptively improve the quality of compressed data and optimize the bitrates per pixel (bpp) to store more exemplars. Our extensive experiments, including evaluations on food-specific datasets including Food-101 and VFN-74, as well as the general dataset ImageNet-100, demonstrate improvements in classification accuracy. This progress is pivotal in advancing more realistic food recognition systems that are capable of adapting to continually evolving data. Moreover, the principles and methodologies we've developed hold promise for broader applications, extending their benefits to other domains of continual machine learning systems. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.06270 [pdf, other]

3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis

Authors: Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai

Abstract: In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynam… ▽ More In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. The project is available at https://npucvr.github.io/GaGS/ △ Less

Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024. Project page: https://npucvr.github.io/GaGS/

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04476 [pdf, other]

DELTA: Decoupling Long-Tailed Online Continual Learning

Authors: Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

Abstract: A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks f… ▽ More A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks from sequentially arriving class-imbalanced data streams. Each data is observed only once for training without knowing the task data distribution. We present DELTA, a decoupled learning approach designed to enhance learning representations and address the substantial imbalance in LTOCL. We enhance the learning process by adapting supervised contrastive learning to attract similar samples and repel dissimilar (out-of-class) samples. Further, by balancing gradients during training using an equalization loss, DELTA significantly enhances learning outcomes and successfully mitigates catastrophic forgetting. Through extensive evaluation, we demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods. Our results suggest considerable promise for applying OCL in real-world applications. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: CVPR Workshop acceptance archival track

arXiv:2404.00681 [pdf, other]

CoUDA: Coherence Evaluation via Unified Data Augmentation

Authors: Dawei Zhu, Wenhao Wu, Yifan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li

Abstract: Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance. In this pape… ▽ More Coherence evaluation aims to assess the organization and structure of a discourse, which remains challenging even in the era of large language models. Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models. However, previous augmentations for this task primarily rely on heuristic rules, lacking designing criteria as guidance. In this paper, we take inspiration from linguistic theory of discourse structure, and propose a data augmentation framework named CoUDA. CoUDA breaks down discourse coherence into global and local aspects, and designs augmentation strategies for both aspects, respectively. Especially for local coherence, we propose a novel generative strategy for constructing augmentation samples, which involves post-pretraining a generative model and applying two controlling mechanisms to control the difficulty of generated samples. During inference, CoUDA also jointly evaluates both global and local aspects to comprehensively assess the overall coherence of a discourse. Extensive experiments in coherence evaluation show that, with only 233M parameters, CoUDA achieves state-of-the-art performance in both pointwise scoring and pairwise ranking tasks, even surpassing recent GPT-3.5 and GPT-4 based metrics. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: NAACL 2024

arXiv:2404.00432 [pdf, other]

doi 10.1109/ICMEW59549.2023.00038

Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems

Authors: Md Adnan Faisal Hossain, Zhihao Duan, Yuning Huang, Fengqing Zhu

Abstract: Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a ran… ▽ More Feature compression is a promising direction for coding for machines. Existing methods have made substantial progress, but they require designing and training separate neural network models to meet different specifications of compression rate, performance accuracy and computational complexity. In this paper, a flexible variable-rate feature compression method is presented that can operate on a range of rates by introducing a rate control parameter as an input to the neural network model. By compressing different intermediate features of a pre-trained vision task model, the proposed method can scale the encoding complexity without changing the overall size of the model. The proposed method is more flexible than existing baselines, at the same time outperforming them in terms of the three-way trade-off between feature compression rate, vision task accuracy, and encoding complexity. We have made the source code available at https://github.com/adnan-hossain/var_feat_comp.git. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 6 pages, 7 figures, 1 table, International Conference on Multimedia and Expo Workshops 2023

arXiv:2403.18535 [pdf, other]

Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs

Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

Abstract: Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bo… ▽ More Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME2024)

arXiv:2403.18294 [pdf, other]

Multi-scale Unified Network for Image Classification

Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

Abstract: Convolutional Neural Networks (CNNs) have advanced significantly in visual representation learning and recognition. However, they face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs. Conventional methods rescale all input images into a fixed size, wherein a larger fixed size favors performance but rescaling small size images to… ▽ More Convolutional Neural Networks (CNNs) have advanced significantly in visual representation learning and recognition. However, they face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs. Conventional methods rescale all input images into a fixed size, wherein a larger fixed size favors performance but rescaling small size images to a larger size incurs digitization noise and increased computation cost. In this work, we carry out a comprehensive, layer-wise investigation of CNN models in response to scale variation, based on Centered Kernel Alignment (CKA) analysis. The observations reveal lower layers are more sensitive to input image scale variations than high-level layers. Inspired by this insight, we propose Multi-scale Unified Network (MUSN) consisting of multi-scale subnets, a unified network, and scale-invariant constraint. Our method divides the shallow layers into multi-scale subnets to enable feature extraction from multi-scale inputs, and the low-level features are unified in deep layers for extracting high-level semantic features. A scale-invariant constraint is posed to maintain feature consistency across different scales. Extensive experiments on ImageNet and other scale-diverse datasets, demonstrate that MSUN achieves significant improvements in both model performance and computational efficiency. Particularly, MSUN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18291 [pdf, other]

Towards Non-Exemplar Semi-Supervised Class-Incremental Learning

Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

Abstract: Deep neural networks perform remarkably well in close-world scenarios. However, novel classes emerged continually in real applications, making it necessary to learn incrementally. Class-incremental learning (CIL) aims to gradually recognize new classes while maintaining the discriminability of old ones. Existing CIL methods have two limitations: a heavy reliance on preserving old data for forgetti… ▽ More Deep neural networks perform remarkably well in close-world scenarios. However, novel classes emerged continually in real applications, making it necessary to learn incrementally. Class-incremental learning (CIL) aims to gradually recognize new classes while maintaining the discriminability of old ones. Existing CIL methods have two limitations: a heavy reliance on preserving old data for forgetting mitigation and the need for vast labeled data for knowledge adaptation. To overcome these issues, we propose a non-exemplar semi-supervised CIL framework with contrastive learning and semi-supervised incremental prototype classifier (Semi-IPC). On the one hand, contrastive learning helps the model learn rich representations, easing the trade-off between learning representations of new classes and forgetting that of old classes. On the other hand, Semi-IPC learns a prototype for each class with unsupervised regularization, enabling the model to incrementally learn from partially labeled new data while maintaining the knowledge of old classes. Experiments on benchmark datasets demonstrate the strong performance of our method: without storing any old samples and only using less than 1% of labels, Semi-IPC outperforms advanced exemplar-based methods. We hope our work offers new insights for future CIL research. The code will be made publicly available. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18266 [pdf, other]

Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised Learning

Authors: Wenzhuo Liu, Fei Zhu, Cheng-Lin Liu

Abstract: Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning rather than complete retraining. This poses a challenge in striking a balance between stability and plast… ▽ More Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning rather than complete retraining. This poses a challenge in striking a balance between stability and plasticity when adapting to new information. In this paper, we employ Centered Kernel Alignment for quantitatively analyzing model stability and plasticity, revealing the critical roles of batch normalization layers for stability and convolutional layers for plasticity. Motivated by this, we propose Branch-tuning, an efficient and straightforward method that achieves a balance between stability and plasticity in continual SSL. Branch-tuning consists of branch expansion and compression, and can be easily applied to various SSL methods without the need of modifying the original methods, retaining old data or models. We validate our method through incremental experiments on various benchmark datasets, demonstrating its effectiveness and practical value in real-world scenarios. We hope our work offers new insights for future continual self-supervised learning research. The code will be made publicly available. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Showing 1–50 of 644 results for author: Zhu, F