Search | arXiv e-print repository

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Authors: Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao

Abstract: The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been ada… ▽ More The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing. △ Less

Submitted 23 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: On-going work

arXiv:2405.07303 [pdf, other]

Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment

Authors: L. T. Yang, S. K. Liu, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China **** Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio… ▽ More We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China **** Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.07276 [pdf]

doi 10.1016/j.nimb.2024.165407

Bremsstrahlung of 5-25 keV electrons incident on MoSi$_2$, TiB$_2$ and ZrB$_2$ thick solid conductive compounds

Authors: Heng Zhang, Zhu An, **gjun Zhu, Hong Huang

Abstract: Absolute measurements were conducted to study the bremsstrahlung emission from ~5-25 keV electrons incident on three thick solid conductive compounds of MoSi$_2$, TiB$_2$ and ZrB$_2$. The additivity approximation was applied in the Monte Carlo PENELOPE simulations for compounds and mixtures. The results showed that in general the experimental bremsstrahlung spectra were in good agreement with the… ▽ More Absolute measurements were conducted to study the bremsstrahlung emission from ~5-25 keV electrons incident on three thick solid conductive compounds of MoSi$_2$, TiB$_2$ and ZrB$_2$. The additivity approximation was applied in the Monte Carlo PENELOPE simulations for compounds and mixtures. The results showed that in general the experimental bremsstrahlung spectra were in good agreement with the Monte Carlo simulation results, suggesting the feasibility of the additivity approximation in Monte Carlo simulations for the studied cases even in the absolute measurements and that the significant differences between experiments and Monte Carlo simulations near the Duane-Hunt limit for insulating targets in previous studies do not appear in the present studies. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.04883 [pdf, other]

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Authors: Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Lu** Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao **, Peng Gao, Zhou Zhao

Abstract: Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space… ▽ More Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via "space bonds". Specifically, we introduce two kinds of basic space bonds: 1) Space Displacement Bond and 2) Space Combination Bond. Based on these basic bonds, we design Complex Sequential & Parallel Bonds to effectively integrate multiple spaces simultaneously. Benefiting from the modularization concept, we further propose a coarse-to-fine customized inference strategy to flexibly adjust the enhanced unified space for different purposes. Experimentally, we bind ImageBind with extra image-text and audio-text expert spaces, resulting in three main variants: ImageBind++, InternVL_IB, and InternVL_IB++. These resulting spaces outperform ImageBind on 5 audio-image-text downstream tasks across 9 datasets. Moreover, via customized inference, it even surpasses the advanced audio-text and image-text expert spaces. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024. The code and checkpoints will be released at https://github.com/zehanwang01/FreeBind

arXiv:2405.04686 [pdf]

Ultrafast dynamics of wavelength-sensitive magnons in unconventional compensated semiconducting antiferromagnet

Authors: Hanshen Huang, Tao Qu, Yang Cheng, Lixuan Tai, Christopher Eckberg, Quanjun Pan, Abdullah Alrasheed, Su Kong Chong, Bingqian Dai, Yaochen Li, Qingyuan Shu, Chao-Yao Yang, Jie-Xiang Yu, Gen Yin, Kang L. Wang

Abstract: Antiferromagnet is a promising candidate for the next generation spintronic devices, benefiting from its ultrafast dynamics and spontaneous zero stray field. However, the understanding of their ultrafast spin behaviors is lacking due to the challenges of controlling/detecting the quenched net magnetization. Unconventional compensated semiconducting antiferromagnets present strong time-reversal sym… ▽ More Antiferromagnet is a promising candidate for the next generation spintronic devices, benefiting from its ultrafast dynamics and spontaneous zero stray field. However, the understanding of their ultrafast spin behaviors is lacking due to the challenges of controlling/detecting the quenched net magnetization. Unconventional compensated semiconducting antiferromagnets present strong time-reversal symmetry breaking, spin splitting in the momentum space, and suitable bandgap for optical control/detection. Thus, it is a powerful platform to uncover the ultrafast dynamics of antiferromagnets. Here, we show an exotic wavelength-dependent spin dynamic in the unconventional compensated semiconducting antiferromagnet α-MnTe via time-resolved quadratic magneto-optical Kerr effect measurement, where the probing photon energy of the laser matches its bandgap. This direct excitation and detection of distinct magnon modes reveal varying spin behaviors and time characteristics in a broad temperature range. It originates from the spins triggered at different bands of electronic structures and is depicted in an energy transfer model among electrons, phonons, and magnons. Our study of exotic optical properties in this unconventional semiconducting antiferromagnet fulfills the missing information of spin evolution in the time domain and paves the way for its utilization in ultrafast spintronic devices. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04093 [pdf, other]

DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects

Authors: Da Fu, Mingfei Rong, Eun-Hu Kim, Hao Huang, Witold Pedrycz

Abstract: Accurate classification of fine-grained images remains a challenge in backbones based on convolutional operations or self-attention mechanisms. This study proposes novel dual-current neural networks (DCNN), which combine the advantages of convolutional operations and self-attention mechanisms to improve the accuracy of fine-grained image classification. The main novel design features for construct… ▽ More Accurate classification of fine-grained images remains a challenge in backbones based on convolutional operations or self-attention mechanisms. This study proposes novel dual-current neural networks (DCNN), which combine the advantages of convolutional operations and self-attention mechanisms to improve the accuracy of fine-grained image classification. The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) kee** the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features. Experimental results demonstrated that using DCNN as the backbone network for classifying certain fine-grained benchmark datasets achieved performance advantage improvements of 13.5--19.5% and 2.2--12.9%, respectively, compared to other advanced convolution or attention-based fine-grained backbones. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.04065 [pdf, other]

FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

Authors: Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhi**g Wu

Abstract: Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue, which degrades the inference efficiency of the L… ▽ More Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue, which degrades the inference efficiency of the LLMs because they fail to use the Key-Value (KV) cache efficiently. In this paper, we propose FlashBack, a modular RALM designed to improve the inference efficiency of RALM with appending context pattern while maintaining decent performance after fine-tuning by Low-Rank Adaption. FlashBack appends retrieved documents at the end of the context for efficiently utilizing the KV cache instead of prepending them. And we introduce Marking Token as two special prompt tokens for marking the boundary of the appending context during fine-tuning. Our experiments on testing generation quality show that FlashBack can remain decent generation quality in perplexity. And the inference speed of FlashBack is up to $4\times$ faster than the prepending counterpart on a 7B LLM (Llama 2) in the runtime test. Via bypassing unnecessary re-computation, it demonstrates an advancement by achieving significantly faster inference speed, and this heightened efficiency will substantially reduce inferential cost. △ Less

Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 14 pages

arXiv:2405.03969 [pdf, other]

Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform

Authors: Zhijian Qiao, Haoming Huang, Chuhao Liu, Shaojie Shen, Fumin Zhang, Huan Yin

Abstract: The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., ``speak the same language''. To ac… ▽ More The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., ``speak the same language''. To achieve this, we design a cross-modality registration method, spanning from front end the back end. At the front end, we extract descriptors by identifying walls and capturing the intersected corners. Subsequently, for the back-end pose estimation, we employ the Hough transform for pose estimation and estimate multiple pose candidates. The final pose is verified by wall-pixel correlation. To evaluate the effectiveness of our method, we conducted real-world multi-session experiments in a large-scale university building, involving two different types of LiDAR sensors. We also report our findings and plan to make our collected dataset open-sourced. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 12 pages, 10 figures

arXiv:2405.03485 [pdf, other]

doi 10.1145/3641519.3657422

LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model

Authors: Haowen Sun, Ruikun Zheng, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

Abstract: In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific m… ▽ More In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific motions to the correct body parts. To address this issue, we propose a two-stage pipeline to overcome this challenge: it first employs large language models (LLMs) to decompose global motion descriptions into part-specific narratives, which are then processed by independent body-part motion encoders to ensure precise local semantic alignment. Finally, an attention-based full-body optimizer refines the motion generation results and guarantees the overall coherence. Our experiments demonstrate that LGTM gains significant improvements in generating locally accurate, semantically-aligned human motion, marking a notable advancement in text-to-motion applications. Code and data for this paper are available at https://github.com/L-Sun/LGTM △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 9 pages,7 figures, SIGGRAPH 2024

arXiv:2405.03221 [pdf, other]

Spatial and Surface Correspondence Field for Interaction Transfer

Authors: Zeyu Huang, Honghao Xu, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

Abstract: In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combin… ▽ More In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combined spatial and surface representation. We correspond the agent points and object points related to the representation to the target object space using a learned spatial and surface correspondence field, which represents objects as deformed and rotated signed distance fields. With the corresponded points, an optimization is performed under the constraints of our spatial and surface interaction representation and additional regularization. Experiments conducted on human-chair and hand-mug interaction transfer tasks show that our approach can handle larger geometry and topology variations between source and target shapes, significantly outperforming state-of-the-art methods. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted to SIGGRAPH 2024, project page at https://vcc.tech/research/2024/InterTransfer

arXiv:2405.02982 [pdf, other]

Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

Authors: Xin **, Qianqian Qiao, Yi Lu, Shan Gao, Heng Huang, Guangdong Li

Abstract: Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aes… ▽ More Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aesthetics, the field of aesthetic evaluation for paintings and Drawings has seen limited attention until the introduction of the BAID dataset in March 2023. This dataset solely comprises overall scores for high-quality artistic images. Our research marks the pioneering introduction of a multi-attribute, multi-category dataset specifically tailored to the field of painting: Aesthetics of Paintings and Drawings Dataset (APDD). The construction of APDD received active participation from 28 professional artists worldwide, along with dozens of students specializing in the field of art. This dataset encompasses 24 distinct artistic categories and 10 different aesthetic attributes. Each image in APDD has been evaluated by six professionally trained experts in the field of art, including assessments for both total aesthetic scores and aesthetic attribute scores. The final APDD dataset comprises a total of 4985 images, with an annotation count exceeding 31100 entries. Concurrently, we propose an innovative approach: Art Assessment Network for Specific Painting Styles (AANSPS), designed for the assessment of aesthetic attributes in mixed-attribute art datasets. Through this research, our goal is to catalyze advancements in the field of aesthetic evaluation for paintings and drawings, while enriching the available resources and methodologies for its further development and application. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.01102 [pdf, other]

Less is More: on the Over-Globalizing Problem in Graph Transformers

Authors: Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi

Abstract: Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property a… ▽ More Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while kee** the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer. △ Less

Submitted 24 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024 (Camera-Ready)

arXiv:2405.00749 [pdf, other]

More is Better: Deep Domain Adaptation with Multiple Sources

Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding

Abstract: In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to d… ▽ More In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to domain shift. Domain adaptation (DA) aims to address this problem by aligning the distributions between the source and target domains. Multi-source domain adaptation (MDA) is a powerful and practical extension in which the labeled data may be collected from multiple sources with different distributions. In this survey, we first define various MDA strategies. Then we systematically summarize and compare modern MDA methods in the deep learning era from different perspectives, followed by commonly used datasets and a brief benchmark. Finally, we discuss future research directions for MDA that are worth investigating. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024. arXiv admin note: text overlap with arXiv:2002.12169

arXiv:2405.00393 [pdf, other]

Inferring State Machine from the Protocol Implementation via Large Language Model

Authors: Haiyang Wei, Zhengjie Du, Haohui Huang, Yue Liu, Guang Cheng, Linzhang Wang, Bing Mao

Abstract: State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex c… ▽ More State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex code structures and behaviors. To address these limitations, we propose an innovative state machine inference approach powered by Large Language Models (LLMs). Utilizing text-embedding technology, this method allows LLMs to dissect and analyze the intricacies of protocol implementation code. Through targeted prompt engineering, we systematically identify and infer the underlying state machines. Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90% and successfully delineating differences on state machines among various implementations of the same protocol. Importantly, integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations. △ Less

Submitted 14 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00125 [pdf, other]

Neural network based emulation of galaxy power spectrum covariances -- A reanalysis of BOSS DR12 data

Authors: Joseph Adamo, Hung-** Huang, Tim Eifler

Abstract: We train neural networks to quickly generate redshift-space galaxy power spectrum covariances from a given parameter set (cosmology and galaxy bias). This covariance emulator utilizes a combination of traditional fully-connected network layers and transformer architecture to accurately predict covariance matrices for the high redshift, north galactic cap sample of the BOSS DR12 galaxy catalog. We… ▽ More We train neural networks to quickly generate redshift-space galaxy power spectrum covariances from a given parameter set (cosmology and galaxy bias). This covariance emulator utilizes a combination of traditional fully-connected network layers and transformer architecture to accurately predict covariance matrices for the high redshift, north galactic cap sample of the BOSS DR12 galaxy catalog. We run simulated likelihood analyses with emulated and brute-force computed covariances, and we quantify the network's performance via two different metrics: 1) difference in $χ^2$ and 2) likelihood contours for simulated BOSS DR 12 analyses. We find that the emulator returns excellent results over a large parameter range. We then use our emulator to perform a re-analysis of the BOSS HighZ NGC galaxy power spectrum, and find that varying covariance with cosmology along with the model vector produces $Ω_m = 0.276^{+0.013}_{-0.015}$, $H_0 = 70.2\pm 1.9$ km/s/Mpc, and $σ_8 = 0.674^{+0.058}_{-0.077}$. These constraints represent an average $0.46σ$ shift in best-fit values and a $5\%$ increase in constraining power compared to fixing the covariance matrix ($Ω_m = 0.293\pm 0.017$, $H_0 = 70.3\pm 2.0$ km/s/Mpc, $σ_8 = 0.702^{+0.063}_{-0.075}$). This work demonstrates that emulators for more complex cosmological quantities than second-order statistics can be trained over a wide parameter range at sufficiently high accuracy to be implemented in realistic likelihood analyses. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures, to be submitted to Physical Review D

arXiv:2404.19368 [pdf, other]

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Authors: Chaozheng Wang, Zongjie Li, Cuiyun Gao, Wenxuan Wang, Ting Peng, Hailiang Huang, Yuetang Deng, Shuai Wang, Michael R. Lyu

Abstract: Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despi… ▽ More Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despite the effectiveness, we observe a noticeable multilingual bias in the generation performance of LCMs. Specifically, LCMs demonstrate proficiency in generating solutions when provided with instructions in English, yet may falter when faced with semantically equivalent instructions in other NLs such as Chinese. Moreover, the ability of LCMs to generate code exhibits variety across different programming languages (PLs), such as Python and C++. The observed phenomenon indicates the presence of multi-lingual bias within the generative capabilities of LCMs, which has remained unexplored. In this paper, we aim to investigate the multi-lingual bias that exists in current LCMs. First, we initiate our investigation by constructing the first multi-lingual evaluation benchmark X-HumanEval-X, enabling us to systematically evaluate the extent of multi-lingual bias that exists in current LCMs. In our large-scale experiments on nine popular LCMs, we observe a pronounced multi-lingual bias of LCMs in code generation, including multi-NL and multi-PL bias. Specifically, when using Chinese instructions, the code generation capabilities of LCMs decrease by at least 13% in terms of the Pass@1 metric. Furthermore, LCMs perform variously across different programming languages, e.g., the performance gap between Python and C++ reaches as high as 20.9%. ... △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 12 pages

arXiv:2404.18753 [pdf, ps, other]

Fixers and derangements of finite permutation groups

Authors: Hong Yi Huang, Cai Heng Li, Yi Lin Xie

Abstract: Let $G\leqslant\mathrm{Sym}(Ω)$ be a finite transitive permutation group with point stabiliser $H$. We say that a subgroup $K$ of $G$ is a fixer if every element of $K$ has fixed points, and we say that $K$ is large if $|K| \geqslant |H|$. There is a special interest in studying large fixers due to connections with Erdős-Ko-Rado type problems. In this paper, we classify up to conjugacy the large f… ▽ More Let $G\leqslant\mathrm{Sym}(Ω)$ be a finite transitive permutation group with point stabiliser $H$. We say that a subgroup $K$ of $G$ is a fixer if every element of $K$ has fixed points, and we say that $K$ is large if $|K| \geqslant |H|$. There is a special interest in studying large fixers due to connections with Erdős-Ko-Rado type problems. In this paper, we classify up to conjugacy the large fixers of the almost simple primitive groups with socle $\mathrm{PSL}_2(q)$, and we use this result to verify a special case of a conjecture of Spiga on permutation characters. We also present some results on large fixers of almost simple primitive groups with socle an alternating or sporadic group. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 40 pages

arXiv:2404.18644 [pdf, other]

Low-Overhead Defect-Adaptive Surface Code with Bandage-Like Super-Stabilizers

Authors: Zuolin Wei, Tan He, Yangsen Ye, Dachao Wu, Yiming Zhang, Youwei Zhao, Wei** Lin, He-Liang Huang, Xiaobo Zhu, Jian-Wei Pan

Abstract: To make practical quantum algorithms work, large-scale quantum processors protected by error-correcting codes are required to resist noise and ensure reliable computational outcomes. However, a major challenge arises from defects in processor fabrication, as well as occasional losses or cosmic rays during the computing process, all of which can lead to qubit malfunctions and disrupt error-correcti… ▽ More To make practical quantum algorithms work, large-scale quantum processors protected by error-correcting codes are required to resist noise and ensure reliable computational outcomes. However, a major challenge arises from defects in processor fabrication, as well as occasional losses or cosmic rays during the computing process, all of which can lead to qubit malfunctions and disrupt error-correcting codes' normal operations. In this context, we introduce an automatic adapter to implement the surface code on defective lattices. Unlike previous approaches, this adapter leverages newly proposed bandage-like super-stabilizers to save more qubits when defects are clustered, thus enhancing the code distance and reducing super-stabilizer weight. For instance, in comparison with earlier methods, with a code size of 27 and a random defect rate of 2\%, the disabled qubits decrease by $1/3$, and the average preserved code distance increases by 63\%. This demonstrates a significant reduction in overhead when handling defects using our approach, and this advantage amplifies with increasing processor size and defect rates. Our work presents a low-overhead, automated solution to the challenge of adapting the surface code to defects, an essential step towards scaling up the construction of large-scale quantum computers for practical applications. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18202 [pdf, other]

WorldGPT: Empowering LLM as Multimodal World Model

Authors: Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

Abstract: World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). W… ▽ More World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). WorldGPT acquires an understanding of world dynamics through analyzing millions of videos across various domains. To further enhance WorldGPT's capability in specialized scenarios and long-term tasks, we have integrated it with a novel cognitive architecture that combines memory offloading, knowledge retrieval, and context reflection. As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios. Conducting evaluations on WorldNet directly demonstrates WorldGPT's capability to accurately model state transition patterns, affirming its effectiveness in understanding and predicting the dynamics of complex scenarios. We further explore WorldGPT's emerging potential in serving as a world simulator, hel** multimodal agents generalize to unfamiliar domains through efficiently synthesising multimodal instruction instances which are proved to be as reliable as authentic data for fine-tuning purposes. The project is available on \url{https://github.com/DCDmllm/WorldGPT}. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17238 [pdf, other]

TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

Authors: Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao

Abstract: Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal incon… ▽ More Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal inconsistent information. However, they could potentially constrain the capturing of personalized user preferences. In addition, it is almost impossible to entirely eliminate noise in diverse user-generated multi-modal content. To solve these problems, we propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content. Specifically, we explicitly capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference. We also achieve the modeling of the user's multi-modal sequential preferences. In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective to dynamically evaluate the uncertainty of prediction results. Experimental evaluation on four widely-used datasets demonstrates the superior performance of our model compared to state-of-the-art methods. The code is released at https://github.com/FairyMeng/TrustSR. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17169 [pdf, other]

FairGT: A Fairness-aware Graph Transformer

Authors: Renqiang Luo, Huafei Huang, Shuo Yu, Xiuzhen Zhang, Feng Xia

Abstract: The design of Graph Transformers (GTs) generally neglects considerations for fairness, resulting in biased outcomes against certain sensitive subgroups. Since GTs encode graph information without relying on message-passing mechanisms, conventional fairness-aware graph learning methods cannot be directly applicable to address these issues. To tackle this challenge, we propose FairGT, a Fairness-awa… ▽ More The design of Graph Transformers (GTs) generally neglects considerations for fairness, resulting in biased outcomes against certain sensitive subgroups. Since GTs encode graph information without relying on message-passing mechanisms, conventional fairness-aware graph learning methods cannot be directly applicable to address these issues. To tackle this challenge, we propose FairGT, a Fairness-aware Graph Transformer explicitly crafted to mitigate fairness concerns inherent in GTs. FairGT incorporates a meticulous structural feature selection strategy and a multi-hop node feature integration method, ensuring independence of sensitive features and bolstering fairness considerations. These fairness-aware graph information encodings seamlessly integrate into the Transformer framework for downstream tasks. We also prove that the proposed fair structural topology encoding with adjacency matrix eigenvector selection and multi-hop integration are theoretically effective. Empirical evaluations conducted across five real-world datasets demonstrate FairGT's superiority in fairness metrics over existing graph transformers, graph neural networks, and state-of-the-art fairness-aware graph learning approaches. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Journal ref: IJCAI2024

arXiv:2404.16027 [pdf, other]

ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity

Authors: Qinxi Yu, Masoud Moghani, Karthik Dharmarajan, Vincent Schorp, William Chung-Ho Panitch, **gzhou Liu, Kush Hari, Huang Huang, Mayank Mittal, Ken Goldberg, Animesh Garg

Abstract: Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci… ▽ More Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci Research Kit (dVRK) and Smart Tissue Autonomous Robot (STAR) which represent common subtasks in surgical training. ORBIT-Surgical leverages GPU parallelization to train reinforcement learning and imitation learning algorithms to facilitate study of robot learning to augment human surgical skills. ORBIT-Surgical also facilitates realistic synthetic data generation for active perception tasks. We demonstrate ORBIT-Surgical sim-to-real transfer of learned policies onto a physical dVRK robot. Project website: orbit-surgical.github.io △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15789 [pdf, other]

MotionMaster: Training-free Camera Motion Transfer For Video Generation

Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

Abstract: The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate sub… ▽ More The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control. △ Less

Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14771 [pdf, other]

Music Style Transfer With Diffusion Model

Authors: Hong Huang, Yuyi Wang, Luyao Li, Jun Lin

Abstract: Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited. When considering the conversion between multiple styles, previous methods required designing multiple modes to disentangle the complex style of the music, resulting in large computational costs and slow audio generation. The existing music style transfer methods generate spectr… ▽ More Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited. When considering the conversion between multiple styles, previous methods required designing multiple modes to disentangle the complex style of the music, resulting in large computational costs and slow audio generation. The existing music style transfer methods generate spectrograms with artifacts, leading to significant noise in the generated audio. To address these issues, this study proposes a music style transfer framework based on diffusion models (DM) and uses spectrogram-based methods to achieve multi-to-multi music style transfer. The GuideDiff method is used to restore spectrograms to high-fidelity audio, accelerating audio generation speed and reducing noise in the generated audio. Experimental results show that our model has good performance in multi-mode music style transfer compared to the baseline and can generate high-quality audio in real-time on consumer-grade GPUs. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 8 pages, 6 figures, ICMC 2023

Journal ref: International Computer Music Conference (ICMC 2023) pp. 40-47, October 2023

arXiv:2404.14753 [pdf, ps, other]

Investigating $Ξ$ resonances from pentaquark perspective

Authors: Ye Yan, Qi Huang, Xinmei Zhu, Hongxia Huang, Jialun **

Abstract: We have investigated the $qss\bar{q}q$ ($q = u$ or $d$) system to find possible pentaquark explanations for the $Ξ$ resonances. The bound state calculation is carried out within the framework of the quark delocalization color screening model. The scattering processes are also studied to examine the possible resonance states. The current results indicate that the $Ξ(1950)$ can be interpreted as… ▽ More We have investigated the $qss\bar{q}q$ ($q = u$ or $d$) system to find possible pentaquark explanations for the $Ξ$ resonances. The bound state calculation is carried out within the framework of the quark delocalization color screening model. The scattering processes are also studied to examine the possible resonance states. The current results indicate that the $Ξ(1950)$ can be interpreted as $Λ\bar{K}^*$ state with $J^P = 1/2^-$. Three states are identified that match the $Ξ(2250)$, which are $Σ^* \bar{K}^*$ state with $J^P = 3/2^-$, $Σ^* \bar{K}^*$ state with $J^P =5/2^-$, and $Ξ^* ρ$ state with $J^P =5/2^-$. This may explain the conflicting experimental values for the width of the $Ξ(2250)$. A new $Ξ$ resonance is predicted, whose mass and width are 2066--2079 MeV and 186--189 MeV, respectively. These results contribute to understanding the nature of the $Ξ$ resonances and to the future search for new $Ξ$ resonances. Moreover, it is meaningful to further investigate the $Ξ$ resonances from an unquenched picture on the basis of pentaquark investigation. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 15 pages, 7 figures. arXiv admin note: text overlap with arXiv:2312.04977, arXiv:2309.15380

arXiv:2404.14569 [pdf, other]

LIGO operates with quantum noise below the Standard Quantum Limit

Authors: Wenxuan Jia, Victoria Xu, Kevin Kuns, Masayuki Nakano, Lisa Barsotti, Matthew Evans, Nergis Mavalvala, Rich Abbott, Ibrahim Abouelfettouh, Rana Adhikari, Alena Ananyeva, Stephen Appert, Koji Arai, Naoki Aritomi, Stuart Aston, Matthew Ball, Stefan Ballmer, David Barker, Beverly Berger, Joseph Betzwieser, Dripta Bhattacharjee, Garilynn Billingsley, Nina Bode, Edgard Bonilla, Vladimir Bossilkov , et al. (146 additional authors not shown)

Abstract: Precision measurements of space and time, like those made by the detectors of the Laser Interferometer Gravitational-wave Observatory (LIGO), are often confronted with fundamental limitations imposed by quantum mechanics. The Heisenberg uncertainty principle dictates that the position and momentum of an object cannot both be precisely measured, giving rise to an apparent limitation called the Stan… ▽ More Precision measurements of space and time, like those made by the detectors of the Laser Interferometer Gravitational-wave Observatory (LIGO), are often confronted with fundamental limitations imposed by quantum mechanics. The Heisenberg uncertainty principle dictates that the position and momentum of an object cannot both be precisely measured, giving rise to an apparent limitation called the Standard Quantum Limit (SQL). Reducing quantum noise below the SQL in gravitational-wave detectors, where photons are used to continuously measure the positions of freely falling mirrors, has been an active area of research for decades. Here we show how the LIGO A+ upgrade reduced the detectors' quantum noise below the SQL by up to 3 dB while achieving a broadband sensitivity improvement, more than two decades after this possibility was first presented. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Report number: LIGO-P2400059

arXiv:2404.14566 [pdf, other]

Superconducting Diode Effect in Two-dimensional Topological Insulator Edges and Josephson Junctions

Authors: Haixuan Huang, Tatiana de Picoli, Jukka I. Väyrynen

Abstract: The superconducting diode effect -- the dependence of critical current on its direction -- can arise from the simultaneous breaking of inversion and time-reversal symmetry in a superconductor and has gained interest for its potential applications in superconducting electronics. In this letter, we study the effect in a two-dimensional topological insulator (2D TI) in both a uniform geometry as well… ▽ More The superconducting diode effect -- the dependence of critical current on its direction -- can arise from the simultaneous breaking of inversion and time-reversal symmetry in a superconductor and has gained interest for its potential applications in superconducting electronics. In this letter, we study the effect in a two-dimensional topological insulator (2D TI) in both a uniform geometry as well as in a long Josephson junction. We show that in the presence of Zeeman fields, a circulating edge current enables a large non-reciprocity of the critical current. We find a maximum diode efficiency 1 for the uniform 2D TI and $(\sqrt{2} - 1)^2 \approx 0.17$ for the long Josephson junction. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Submitted to Applied Physics Letters on April 8th, 2024

arXiv:2404.13953 [pdf, other]

360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

Authors: Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

Abstract: Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360° images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidire… ▽ More Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360° images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidirectional visual object tracking and segmentation tasks. Building upon our previous work on omnidirectional visual object tracking (360VOT), we propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS). The 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. To support both the development and evaluation of algorithms in this domain, we divide the dataset into a training subset with 170 sequences and a testing subset with 120 sequences. Furthermore, we tailor evaluation metrics for both omnidirectional tracking and segmentation to ensure rigorous assessment. Through extensive experiments, we benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset. Homepage: https://360vots.hkustvgd.com/ △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13707 [pdf, other]

Robust inference for the unification of confidence intervals in meta-analysis

Authors: Wei Liang, Haicheng Huang, Hongsheng Dai, Yinghui Wei

Abstract: Traditional meta-analysis assumes that the effect sizes estimated in individual studies follow a Gaussian distribution. However, this distributional assumption is not always satisfied in practice, leading to potentially biased results. In the situation when the number of studies, denoted as K, is large, the cumulative Gaussian approximation errors from each study could make the final estimation un… ▽ More Traditional meta-analysis assumes that the effect sizes estimated in individual studies follow a Gaussian distribution. However, this distributional assumption is not always satisfied in practice, leading to potentially biased results. In the situation when the number of studies, denoted as K, is large, the cumulative Gaussian approximation errors from each study could make the final estimation unreliable. In the situation when K is small, it is not realistic to assume the random-effect follows Gaussian distribution. In this paper, we present a novel empirical likelihood method for combining confidence intervals under the meta-analysis framework. This method is free of the Gaussian assumption in effect size estimates from individual studies and from the random-effects. We establish the large-sample properties of the non-parametric estimator, and introduce a criterion governing the relationship between the number of studies, K, and the sample size of each study, n_i. Our methodology supersedes conventional meta-analysis techniques in both theoretical robustness and computational efficiency. We assess the performance of our proposed methods using simulation studies, and apply our proposed methods to two examples. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13644 [pdf, ps, other]

Error Estimation in the Mean-Field Limit of Kinetic Flocking Models with Local Alignments

Authors: **huan Wang, Keyu Li, Hui Huang

Abstract: In this paper, we present an innovative particle system characterized by moderate interactions, designed to accurately approximate kinetic flocking models that incorporate singular interaction forces and local alignment mechanisms. We establish the existence of weak solutions to the corresponding flocking equations and provide an error estimate for the mean-field limit. This is achieved through th… ▽ More In this paper, we present an innovative particle system characterized by moderate interactions, designed to accurately approximate kinetic flocking models that incorporate singular interaction forces and local alignment mechanisms. We establish the existence of weak solutions to the corresponding flocking equations and provide an error estimate for the mean-field limit. This is achieved through the regularization of singular forces and a nonlocal approximation strategy for local alignments. We show that, by selecting the regularization and localization parameters logarithmically with respect to the number of particles, the particle system effectively approximates the mean-field equation. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13631 [pdf, other]

Fermi-Bose Machine

Authors: Mingshan Xie, Yuchen Wang, Hai** Huang

Abstract: Distinct from human cognitive processing, deep neural networks trained by backpropagation can be easily fooled by adversarial examples. To design a semantically meaningful representation learning, we discard backpropagation, and instead, propose a local contrastive learning, where the representation for the inputs bearing the same label shrink (akin to boson) in hidden layers, while those of diffe… ▽ More Distinct from human cognitive processing, deep neural networks trained by backpropagation can be easily fooled by adversarial examples. To design a semantically meaningful representation learning, we discard backpropagation, and instead, propose a local contrastive learning, where the representation for the inputs bearing the same label shrink (akin to boson) in hidden layers, while those of different labels repel (akin to fermion). This layer-wise learning is local in nature, being biological plausible. A statistical mechanics analysis shows that the target fermion-pair-distance is a key parameter. Moreover, the application of this local contrastive learning to MNIST benchmark dataset demonstrates that the adversarial vulnerability of standard perceptron can be greatly mitigated by tuning the target distance, i.e., controlling the geometric separation of prototype manifolds. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 17 pages, 6 figures, a physics inspired machine without backpropagation and enhanced adversarial robustness

arXiv:2404.13033 [pdf, other]

Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs

Authors: Biyang Guo, He Wang, Wenyilin Xiao, Hong Chen, Zhuxin Lee, Songqiao Han, Hailiang Huang

Abstract: In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical appro… ▽ More In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical approach to enhancing LLMs' post-tuning performance by refining input, output, and reasoning designs. We conduct a series of in-domain (ID) and out-of-domain (OOD) experiments to assess the impact of various design options on LLMs' downstream performance, revealing several intriguing patterns that hold consistently across different LLMs. Based on these insights, we propose an integrated SDE strategy, combining the most effective options, and validate its consistent superiority over heuristic sample designs in complex downstream tasks like multi-aspect sentiment analysis, event extraction, and nested entity recognition. Additionally, analyses of LLMs' inherent prompt/output perplexity, zero-shot, and ICL abilities illustrate that good PE strategies may not always translate to good SDE strategies. Code available at https://github.com/beyondguo/LLM-Tuning. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 23 pages, 12 figures, 14 tables

arXiv:2404.12768 [pdf, other]

MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

Authors: Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

Abstract: Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-freq… ▽ More Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-frequency or low-frequency components. This paper presents MixLight, a joint model that utilizes the complementary characteristics of SH and SG to achieve a more complete illumination representation, which uses SH and SG to capture low-frequency ambient and high-frequency light sources respectively. In addition, a special spherical light source sparsemax (SLSparsemax) module that refers to the position and brightness relationship between spherical light sources is designed to improve their sparsity, which is significant but omitted by prior works. Extensive experiments demonstrate that MixLight surpasses state-of-the-art (SOTA) methods on multiple metrics. In addition, experiments on Web Dataset also show that MixLight as a parametric method has better generalization performance than non-parametric methods. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12242 [pdf, other]

CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

Authors: Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang

Abstract: Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE,… ▽ More Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, 7 figures, accepted to LREC-COLING 2024

arXiv:2404.11422 [pdf]

Short-term wind speed forecasting model based on an attention-gated recurrent neural network and error correction strategy

Authors: Haojian Huang

Abstract: The accurate wind speed series forecast is very pivotal to security of grid dispatching and the application of wind power. Nevertheless, on account of their nonlinear and non-stationary nature, their short-term forecast is extremely challenging. Therefore, this dissertation raises one short-term wind speed forecast pattern on the foundation of attention with an improved gated recurrent neural netw… ▽ More The accurate wind speed series forecast is very pivotal to security of grid dispatching and the application of wind power. Nevertheless, on account of their nonlinear and non-stationary nature, their short-term forecast is extremely challenging. Therefore, this dissertation raises one short-term wind speed forecast pattern on the foundation of attention with an improved gated recurrent neural network (AtGRU) and a tactic of error correction. That model uses the AtGRU model as the preliminary predictor and the GRU model as the error corrector. At the beginning, SSA (singular spectrum analysis) is employed in previous wind speed series for lessening the noise. Subsequently, historical wind speed series is going to be used for the predictor training. During this process, the prediction can have certain errors. The sequence of these errors processed by variational modal decomposition (VMD) is used to train the corrector of error. The eventual forecast consequence is just the sum of predictor forecast and error corrector. The proposed SSA-AtGRU-VMD-GRU model outperforms the compared models in three case studies on Woodburn, St. Thomas, and Santa Cruz. It is indicated that the model evidently enhances the correction of the wind speed forecast. △ Less

Submitted 22 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 23 pages, 11 figures, 6 tables, Technical Report

arXiv:2404.11199 [pdf, other]

RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models

Authors: Han Huang, Ziqian Lin, Dongchen He, Liang Hong, Yu Li

Abstract: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA… ▽ More RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the non-unique structure-sequence map**, and the flexibility of RNA conformation. In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of $11\%$ for sequence similarity splits and $16\%$ for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in-silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2404.10681 [pdf, other]

StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization

Authors: Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung

Abstract: Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a larg… ▽ More Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generates a harmonic omnidirectional sky background. To achieve that, we propose to stylize a neural texture field by transferring 2D vision-and-text priors to 3D globally and locally. During 3D stylization, we progressively scale the planned training views of the input 3D scene at different levels in order to preserve high-quality scene content. We then optimize the scene style globally by adapting the scale of the style image with the scale of the training views. Moreover, we enhance local semantics consistency by the semantics-aware style loss which is crucial for photo-realistic stylization. Besides texture stylization, we further adopt a generative diffusion model to synthesize a style-consistent omnidirectional sky image, which offers a more immersive atmosphere and assists the semantic stylization process. The stylized neural texture field can be baked into an arbitrary-resolution texture, enabling seamless integration into conventional rendering pipelines and significantly easing the virtual production prototy** process. Extensive experiments demonstrate our stylized scenes' superiority in qualitative and quantitative performance and user preferences. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: project page: https://chenyingshu.github.io/stylecity3d/

arXiv:2404.10400 [pdf, other]

Phase space analysis of the evolution of the early universe in Einstein-Cartan theory

Authors: Qihong Huang, He Huang, Bing Xu, Kaituo Zhang, Hao Chen

Abstract: In this paper, we perform the phase space analysis to investigate the evolution of the early universe in Einstein-Cartan theory. By studying the stability of critical points in dynamical system, it is found that there exist two stable critical points which represent an expanding solution and an Einstein static solution respectively. After analyzing the phase diagram of the dynamical system, we fin… ▽ More In this paper, we perform the phase space analysis to investigate the evolution of the early universe in Einstein-Cartan theory. By studying the stability of critical points in dynamical system, it is found that there exist two stable critical points which represent an expanding solution and an Einstein static solution respectively. After analyzing the phase diagram of the dynamical system, we find that there may exist a bouncing universe, an oscillating universe or an Einstein static universe in the early time of universe. In addition, by assuming that the early universe filled by the radiation with $ω= 1/3$ , the initial states of the early universe are Einstein static universe or oscillating universe. When the equation of state $ω$ decreases with time, the universe can exit from the initial state and evolve into an expanding phase. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10253 [pdf, other]

Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiao**g Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to minimizes manual code modifications, our project tries to achieve both improvement of performance and consistency of the model code. By using a hierarchical grid system and an OpenMP-based offloading toolkit, our porting and parallelization effort covers over 80% of the code, and achieves a simulation speed of 340 SDPD (simulated days per day) for 5-km atmosphere, 265 SDPD for 3-km ocean, and 222 SDPD for a coupled model, thus making multi-year or even multi-decadal experiments at such high resolution possible. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 18 pages, 13 figures

arXiv:2404.09793 [pdf, other]

First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment

Authors: J. X. Liu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne… ▽ More We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.09640 [pdf, other]

CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

Authors: Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li

Abstract: Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances an… ▽ More Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances and attribute co-occurrence among instances often hinder the discernment of local variances in images, a problem exacerbated by the scarcity of fine-grained, region-specific attribute annotations. Moreover, the variability in visual presentation within categories can also skew attribute-category associations. In response, we propose a bidirectional cross-modal ZSL approach CREST. It begins by extracting representations for attribute and visual localization and employs Evidential Deep Learning (EDL) to measure underlying epistemic uncertainty, thereby enhancing the model's resilience against hard negatives. CREST incorporates dual learning pathways, focusing on both visual-category and attribute-category alignments, to ensure robust correlation between latent and observable spaces. Moreover, we introduce an uncertainty-informed cross-modal fusion technique to refine visual-attribute inference. Extensive experiments demonstrate our model's effectiveness and unique explainability across multiple datasets. Our code and data are available at: https://github.com/JethroJames/CREST △ Less

Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Ongoing work; 10 pages, 2 Tables, 9 Figures; Repo is available at: https://github.com/JethroJames/CREST

arXiv:2404.09540 [pdf, other]

Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

Authors: Chi Wang, Junming Huang, Rong Zhang, Qi Wang, Haotian Yang, Haibin Huang, Chongyang Ma, Weiwei Xu

Abstract: Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high… ▽ More Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high-quality and diverse PBR textures, including albedo, normal, and roughness. It starts with enhancing Generative Adversarial Networks (GANs) for text-guided and diverse texture generation. To this end, we design a self-supervised paradigm to overcome the reliance on ground truth 3D textures and train the generative model with only entangled texture maps. Besides, we foster mutual enhancement between GANs and Score Distillation Sampling (SDS). SDS boosts GANs with more generative modes, while GANs promote more efficient optimization of SDS. Furthermore, we introduce an edge-aware SDS for multi-view consistent facial structure. Experiments demonstrate that our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09192 [pdf, other]

Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, **zuomu Zhong, Benlai Tang

Abstract: Over the past decade, a series of unflagging efforts have been dedicated to develo** highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues… ▽ More Over the past decade, a series of unflagging efforts have been dedicated to develo** highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues to speech. The research community has shown growing interest in the study of the frontend component, recognizing its pivotal role in text-to-speech systems, including Text Normalization (TN), Prosody Boundary Prediction (PBP), and Polyphone Disambiguation (PD). Nonetheless, the limitations posed by insufficient annotated textual data and the reliance on homogeneous text signals significantly undermine the effectiveness of its supervised learning. To evade this obstacle, a novel two-stage TTS frontend prediction pipeline, named TAP-FM, is proposed in this paper. Specifically, during the first learning phase, we present a Multi-scale Contrastive Text-audio Pre-training protocol (MC-TAP), which hammers at acquiring richer insights via multi-granularity contrastive pre-training in an unsupervised manner. Instead of mining homogeneous features in prior pre-training approaches, our framework demonstrates the ability to delve deep into both global and local text-audio semantic and acoustic representations. Furthermore, a parallelized TTS frontend model is delicately devised to execute TN, PD, and PBP prediction tasks, respectively in the second stage. Finally, extensive experiments illustrate the superiority of our proposed method, achieving state-of-the-art performance. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09115 [pdf, other]

GCC: Generative Calibration Clustering

Authors: Haifeng Xia, Hai Huang, Zhengming Ding

Abstract: Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-lev… ▽ More Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-level representation. This hypothesis actually is too strict to be satisfied for real-world applications. To overcome such a challenge, the natural strategy is utilizing generative models to augment considerable instances. How to use these novel samples to effectively fulfill clustering performance improvement is still difficult and under-explored. In this paper, we propose a novel Generative Calibration Clustering (GCC) method to delicately incorporate feature learning and augmentation into clustering procedure. First, we develop a discriminative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment to boost the conditional diffusion generation. Extensive experimental results on three benchmarks validate the effectiveness and advantage of our proposed method over the state-of-the-art methods. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08145 [pdf]

Polar vortex hidden in twisted bilayers of paraelectric SrTiO3

Authors: Haozhi Sha, Yixuan Zhang, Yunpeng Ma, Wei Li, Wenfeng Yang, Jizhe Cui, Qian Li, Houbing Huang, Rong Yu

Abstract: Polar topologies, such as vortex and skyrmion, have attracted significant interest due to their unique physical properties and promising applications in high-density memory devices. Currently, most polar vortices are observed in heterostructures containing ferroelectric materials and constrained by substrates. In this study, we unravel arrays of polar vortices formed in twisted freestanding bilaye… ▽ More Polar topologies, such as vortex and skyrmion, have attracted significant interest due to their unique physical properties and promising applications in high-density memory devices. Currently, most polar vortices are observed in heterostructures containing ferroelectric materials and constrained by substrates. In this study, we unravel arrays of polar vortices formed in twisted freestanding bilayers composed of SrTiO3, a quantum-paraelectric material. Depth-resolved structures of the bilayers are measured with deep-sub-angstrom resolution and one picometer accuracy using multislice ptychography, enabling identification of the three-dimensional variations of polarization topology. Our findings reveal the evolution of the polar vortices in the twisted overlap** layers, demonstrating the reverse of rotation manner in the depth direction. Twisted freestanding bilayers provide a unique platform for exploration and modulation of novel polar topologies. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07477 [pdf, ps, other]

Integrated Sensing and Communication Under DISCO Physical-Layer Jamming Attacks

Authors: Huan Huang, Hongliang Zhang, Weidong Mei, Jun Li, Yi Cai, A. Lee Swindlehurst, Zhu Han

Abstract: Integrated sensing and communication (ISAC) systems traditionally presuppose that sensing and communication (S&C) channels remain approximately constant during their coherence time. However, a "DISCO" reconfigurable intelligent surface (DRIS), i.e., an illegitimate RIS with random, time-varying reflection properties that acts like a "disco ball," introduces a paradigm shift that enables active cha… ▽ More Integrated sensing and communication (ISAC) systems traditionally presuppose that sensing and communication (S&C) channels remain approximately constant during their coherence time. However, a "DISCO" reconfigurable intelligent surface (DRIS), i.e., an illegitimate RIS with random, time-varying reflection properties that acts like a "disco ball," introduces a paradigm shift that enables active channel aging more rapidly during the channel coherence time. In this letter, we investigate the impact of DISCO jamming attacks launched by a DRISbased fully-passive jammer (FPJ) on an ISAC system. Specifically, an ISAC problem formulation and a corresponding waveform optimization are presented in which the ISAC waveform design considers the trade-off between the S&C performance and is formulated as a Pareto optimization problem. Moreover, a theoretical analysis is conducted to quantify the impact of DISCO jamming attacks. Numerical results are presented to evaluate the S&C performance under DISCO jamming attacks and to validate the derived theoretical analysis. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: This paper has been submitted for possible publication. For the code of the DISCO RIS is available on Github (https://github.com/huanhuan1799/Disco-Intelligent-Reflecting-Surfaces-Active-Channel-Aging-for-Fully-Passive-Jamming-Attacks)

arXiv:2404.07281 [pdf, other]

Certifying almost all quantum states with few single-qubit measurements

Authors: Hsin-Yuan Huang, John Preskill, Mehdi Soleimanifar

Abstract: Certifying that an n-qubit state synthesized in the lab is close to the target state is a fundamental task in quantum information science. However, existing rigorous protocols either require deep quantum circuits or exponentially many single-qubit measurements. In this work, we prove that almost all n-qubit target states, including those with exponential circuit complexity, can be certified from o… ▽ More Certifying that an n-qubit state synthesized in the lab is close to the target state is a fundamental task in quantum information science. However, existing rigorous protocols either require deep quantum circuits or exponentially many single-qubit measurements. In this work, we prove that almost all n-qubit target states, including those with exponential circuit complexity, can be certified from only O(n^2) single-qubit measurements. This result is established by a new technique that relates certification to the mixing time of a random walk. Our protocol has applications for benchmarking quantum systems, for optimizing quantum circuits to generate a desired target state, and for learning and verifying neural networks, tensor networks, and various other representations of quantum states using only single-qubit measurements. We show that such verified representations can be used to efficiently predict highly non-local properties that would otherwise require an exponential number of measurements. We demonstrate these applications in numerical experiments with up to 120 qubits, and observe advantage over existing methods such as cross-entropy benchmarking (XEB). △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 63 pages, 5 figures

arXiv:2404.07092 [pdf, other]

Net 835-Gb/s/λ Carrier- and LO-Free 100-km Transmission Using Channel-Aware Phase Retrieval Reception

Authors: Hanzi Huang, Haoshuo Chen, Qian Hu, Di Che, Yetian Huang, Brian Stern, Nicolas K. Fontaine, Mikael Mazur, Lauren Dallachiesa, Roland Ryf, Zhengxuan Li, Yingxiong Song

Abstract: We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency. We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 3 pages, 3 figures

arXiv:2404.06681 [pdf, other]

Causal Unit Selection using Tractable Arithmetic Circuits

Authors: Haiying Huang, Adnan Darwiche

Abstract: The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work… ▽ More The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work allows one to find optimal units exactly by reducing the causal objective to a classical objective on a meta-model, and then applying a variant of the classical Variable Elimination (VE) algorithm to the meta-model -- assuming a fully specified causal model is available. In practice, however, finding optimal units using this approach can be very expensive because the used VE algorithm must be exponential in the constrained treewidth of the meta-model, which is larger and denser than the original model. We address this computational challenge by introducing a new approach for unit selection that is not necessarily limited by the constrained treewidth. This is done through compiling the meta-model into a special class of tractable arithmetic circuits that allows the computation of optimal units in time linear in the circuit size. We finally present empirical results on random causal models that show order-of-magnitude speedups based on the proposed method for solving unit selection. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06098 [pdf, other]

Weak lensing combined with the kinetic Sunyaev Zel'dovich effect: A study of baryonic feedback

Authors: L. Bigwood, A. Amon, A. Schneider, J. Salcido, I. G. McCarthy, C. Preston, D. Sanchez, D. Sijacki, E. Schaan, S. Ferraro, N. Battaglia, A. Chen, S. Dodelson, A. Roodman, A. Pieres, A. Ferte, A. Alarcon, A. Drlica-Wagner, A. Choi, A. Navarro-Alsina, A. Campos, A. J. Ross, A. Carnero Rosell, B. Yin, B. Yanny , et al. (100 additional authors not shown)

Abstract: Extracting precise cosmology from weak lensing surveys requires modelling the non-linear matter power spectrum, which is suppressed at small scales due to baryonic feedback processes. However, hydrodynamical galaxy formation simulations make widely varying predictions for the amplitude and extent of this effect. We use measurements of Dark Energy Survey Year 3 weak lensing (WL) and Atacama Cosmolo… ▽ More Extracting precise cosmology from weak lensing surveys requires modelling the non-linear matter power spectrum, which is suppressed at small scales due to baryonic feedback processes. However, hydrodynamical galaxy formation simulations make widely varying predictions for the amplitude and extent of this effect. We use measurements of Dark Energy Survey Year 3 weak lensing (WL) and Atacama Cosmology Telescope DR5 kinematic Sunyaev-Zel'dovich (kSZ) to jointly constrain cosmological and astrophysical baryonic feedback parameters using a flexible analytical model, `baryonification'. First, using WL only, we compare the $S_8$ constraints using baryonification to a simulation-calibrated halo model, a simulation-based emulator model and the approach of discarding WL measurements on small angular scales. We find that model flexibility can shift the value of $S_8$ and degrade the uncertainty. The kSZ provides additional constraints on the astrophysical parameters and shifts $S_8$ to $S_8=0.823^{+0.019}_{-0.020}$, a higher value than attained using the WL-only analysis. We measure the suppression of the non-linear matter power spectrum using WL + kSZ and constrain a mean feedback scenario that is more extreme than the predictions from most hydrodynamical simulations. We constrain the baryon fractions and the gas mass fractions and find them to be generally lower than inferred from X-ray observations and simulation predictions. We conclude that the WL + kSZ measurements provide a new and complementary benchmark for building a coherent picture of the impact of gas around galaxies across observations. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Showing 101–150 of 2,982 results for author: Huang, H