Skip to main content

Showing 51–100 of 986 results for author: Cai, Z

.
  1. arXiv:2404.16704  [pdf, other

    cond-mat.dis-nn

    Fidelity and criticality in the nonreciprocal Aubry-Andr{é}-Harper model

    Authors: Chen-Chang Zeng, Zhen Cai, Guang-Heng Wang, Gaoyong Sun

    Abstract: We study the critical behaviors of the ground and first excited states in the one-dimensional nonreciprocal Aubry-Andr{é}-Harper model using both the self-normal and biorthogonal fidelity susceptibilities. We demonstrate that fidelity susceptibilities serve as a probe for the phase transition in the nonreciprocal AAH model. For ground states, characterized by real eigenenergies across the entire r… ▽ More

    Submitted 30 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures

  2. arXiv:2404.16425  [pdf, other

    astro-ph.HE

    Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

    Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

    Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 41 pages, 8 figures, 7 tables

  3. arXiv:2404.15963  [pdf, other

    astro-ph.GA astro-ph.CO

    Cosmic Himalayas: The Highest Quasar Density Peak Identified in a 10,000 deg$^2$ Sky with Spatial Discrepancies between Galaxies, Quasars, and IGM HI

    Authors: Yongming Liang, Masami Ouchi, Dongsheng Sun, Nobunari Kashikawa, Zheng Cai, Sebastiano Cantalupo, Kentaro Nagamine, Hidenobu Yajima, Takanobu Kirihara, Haibin Zhang, Mingyu Li, Rhythm Shimakawa, Xiaohui Fan, Kei Ito, Masayuki Tanaka, Yuichi Harikane, J. Xavier Prochaska, Andrea Travascio, Weichen Wang, Martin Elvis, Giuseppina Fabbiano, Junya Arita, Masafusa Onoue, John D. Silverman, Dongdong Shi , et al. (5 additional authors not shown)

    Abstract: We report the identification of a quasar overdensity in the BOSSJ0210 field, dubbed Cosmic Himalayas, consisting of 11 quasars at $z=2.16-2.20$, the densest overdensity of quasars ($17σ$) in the $\sim$10,000 deg$^2$ of the Sloan Digital Sky Survey. We present the spatial distributions of galaxies and quasars and an HI absorption map of the intergalactic medium (IGM). On the map of 465 galaxies sel… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 19 pages, 11 figures, submitted to ApJ, comments are welcome

  4. arXiv:2404.15506  [pdf, other

    cs.CV

    Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

    Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

    Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recov… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

    Comments: Our project page is at https://JUGGHM.github.io/Metric3Dv2. arXiv admin note: substantial text overlap with arXiv:2307.10984

  5. arXiv:2404.15127  [pdf, other

    cs.CV cs.CL

    MedDr: Diagnosis-Guided Bootstrap** for Large-Scale Medical Vision-Language Learning

    Authors: Sunan He, Yuxiang Nie, Zhixuan Chen, Zhiyuan Cai, Hongmei Wang, Shu Yang, Hao Chen

    Abstract: The rapid advancement of large-scale vision-language models has showcased remarkable capabilities across various tasks. However, the lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models. In this work, we present a diagnosis-guided bootstrap** strategy that exploits both image and label information to con… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  6. Euclid view of dusty star forming galaxies at z>~1.5 detected in wide area submillimetre surveys

    Authors: Dipanjan Mitra, Mattia Negrello, Gianfranco De Zotti, Zhen-Yi Cai

    Abstract: We investigate the constraints provided by the Euclid space observatory on the physical properties of dusty star forming galaxies (DSFGs) at z>~1.5 detected in wide area sub millimetre surveys with Herschel. We adopt a physical model for the high z progenitors of spheroidal galaxies, which form the bulk of the DSFGs at z>~1.5. We improve the model by combining the output of the equations of the mo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 24 pages, 24 figures

  7. arXiv:2404.10573  [pdf, other

    cs.AI cs.CE q-bio.BM

    AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation

    Authors: Lijun Liu, Jiali Yang, Jianfei Song, Xinglin Yang, Lele Niu, Zeqi Cai, Hui Shi, Tingjun Hou, Chang-yu Hsieh, Weiran Shen, Yafeng Deng

    Abstract: Recombinant adeno-associated virus (rAAV) vectors have revolutionized gene therapy, but their broad tropism and suboptimal transduction efficiency limit their clinical applications. To overcome these limitations, researchers have focused on designing and screening capsid libraries to identify improved vectors. However, the large sequence space and limited resources present challenges in identifyin… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  8. arXiv:2404.08491  [pdf, other

    cs.CL cs.AI

    Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

    Authors: Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang

    Abstract: Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tun… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  9. arXiv:2404.08180  [pdf, other

    astro-ph.HE

    Sizes of active galactic nuclei inhomogeneous disks -- large in microlensing, small in reverberation map**

    Authors: Guowei Ren, Mouyuan Sun, Jun-Xian Wang, Zhen-Yi Cai

    Abstract: Magnetohydrodynamics (MHD) turbulence can drive significant temperature fluctuations in the accretion disk of an active galactic nucleus (AGN). As a result, the disk can be highly inhomogeneous and has a half-light radius larger than the static Shakura \& Sunyaev Disk (SSD), in agreement with quasar microlensing observations. Meanwhile, the accretion-disk sizes can also be determined using continu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 14 pages, 7 figures, Accepted to ApJ

  10. arXiv:2404.08045  [pdf, other

    astro-ph.GA astro-ph.CO

    JWST Discovery of $40+$ Microlensed Stars in a Magnified Galaxy, the "Dragon" behind Abell 370

    Authors: Yoshinobu Fudamoto, Fengwu Sun, Jose M. Diego, Liang Dai, Masamune Oguri, Adi Zitrin, Erik Zackrisson, Mathilde Jauzac, David J. Lagattuta, Eiichi Egami, Edoardo Iani, Rogier A. Windhorst, Katsuya T. Abe, Franz Erik Bauer, Fuyan Bian, Rachana Bhatawdekar, Thomas J. Broadhurst, Zheng Cai, Chian-Chou Chen, Wenlei Chen, Seth H. Cohen, Christopher J. Conselice, Daniel Espada, Nicholas Foo, Brenda L. Frye , et al. (21 additional authors not shown)

    Abstract: Strong gravitational magnification by massive galaxy clusters enable us to detect faint background sources, resolve their detailed internal structures, and in the most extreme cases identify and study individual stars in distant galaxies. Highly magnified individual stars allow for a wide range of applications, including studies of stellar populations in distant galaxies and constraining small-sca… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 15 pages, 4 figures, 1 table submitted to Nature Astronomy

  11. arXiv:2404.06995  [pdf, other

    stat.ME

    Model-free Change-point Detection Using Modern Classifiers

    Authors: Rohit Kanrar, Feiyu Jiang, Zhanrui Cai

    Abstract: In contemporary data analysis, it is increasingly common to work with non-stationary complex datasets. These datasets typically extend beyond the classical low-dimensional Euclidean space, making it challenging to detect shifts in their distribution without relying on strong structural assumptions. This paper introduces a novel offline change-point detection method that leverages modern classifier… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  12. arXiv:2404.05445  [pdf, other

    stat.ME cs.LG stat.CO

    Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

    Authors: Hong Ye Tan, Ziruo Cai, Marcelo Pereyra, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

    Abstract: Unsupervised learning is a training approach in the situation where ground truth data is unavailable, such as inverse imaging problems. We present an unsupervised Bayesian training approach to learning convex neural network regularizers using a fixed noisy dataset, based on a dual Markov chain estimation method. Compared to classical supervised adversarial regularization methods, where there is ac… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    MSC Class: 62C12; 62F15; 65C40; 65J22

  13. arXiv:2404.05064  [pdf, other

    cs.LG math.NA

    A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network

    Authors: Zhiqiang Cai, Tong Ding, Min Liu, Xinyu Liu, Jianlin Xia

    Abstract: In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    MSC Class: 65D15; 65K10

  14. arXiv:2404.04469  [pdf, other

    cs.CV

    Mixed-Query Transformer: A Unified Image Segmentation Architecture

    Authors: Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto

    Abstract: Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task. In this paper, we introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  15. arXiv:2404.04458  [pdf, other

    cs.CV

    JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

    Authors: Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi

    Abstract: Understanding human social behaviour is crucial in computer vision and robotics. Micro-level observations like individual actions fall short, necessitating a comprehensive approach that considers individual behaviour, intra-group dynamics, and social group levels for a thorough understanding. To address dataset limitations, this paper introduces JRDB-Social, an extension of JRDB. Designed to fill… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://jrdb.erc.monash.edu/dataset/social

  16. arXiv:2404.01372  [pdf

    cond-mat.mes-hall cond-mat.str-el

    Strong interactions and isospin symmetry breaking in a supermoiré lattice

    Authors: Yonglong Xie, Andrew T. Pierce, Jeong Min Park, Daniel E. Parker, Jie Wang, Patrick Ledwith, Zhuozhen Cai, Kenji Watanabe, Takashi Taniguchi, Eslam Khalaf, Ashvin Vishwanath, Pablo Jarillo-Herrero, Amir Yacoby

    Abstract: In multilayer moiré heterostructures, the interference of multiple twist angles ubiquitously leads to tunable ultra-long-wavelength patterns known as supermoiré lattices. However, their impact on the system's many-body electronic phase diagram remains largely unexplored. We present local compressibility measurements revealing numerous incompressible states resulting from supermoiré-lattice-scale i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  17. arXiv:2404.01284  [pdf, other

    cs.CV

    Large Motion Model for Unified Multi-Modal Motion Generation

    Authors: Mingyuan Zhang, Daisheng **, Chenyang Gu, Fangzhou Hong, Zhongang Cai, **gfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

    Abstract: Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on develo** specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation t… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Homepage: https://mingyuan-zhang.github.io/projects/LMM.html

  18. arXiv:2404.00139  [pdf

    cs.CR cs.AI

    Security Risks Concerns of Generative AI in the IoT

    Authors: Honghui Xu, Yingshu Li, Olusesi Balogun, Shaoen Wu, Yue Wang, Zhipeng Cai

    Abstract: In an era where the Internet of Things (IoT) intersects increasingly with generative Artificial Intelligence (AI), this article scrutinizes the emergent security risks inherent in this integration. We explore how generative AI drives innovation in IoT and we analyze the potential for data breaches when using generative AI and the misuse of generative AI technologies in IoT ecosystems. These risks… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 6 pages, 2 figures

  19. arXiv:2403.20188  [pdf, other

    cs.NI cs.AI cs.LG

    Distributed Swarm Learning for Edge Internet of Things

    Authors: Yue Wang, Zhi Tian, FXin Fan, Zhipeng Cai, Cameron Nowzari, Kai Zeng

    Abstract: The rapid growth of Internet of Things (IoT) has led to the widespread deployment of smart IoT devices at wireless edge for collaborative machine learning tasks, ushering in a new era of edge learning. With a huge number of hardware-constrained IoT devices operating in resource-limited wireless networks, edge learning encounters substantial challenges, including communication and computation bottl… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.16705

  20. arXiv:2403.17934  [pdf, other

    cs.CV

    AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

    Authors: Qing** Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

    Abstract: Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Homepage: https://ttxskk.github.io/AiOS/

  21. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  22. arXiv:2403.15694  [pdf, other

    cs.LG cs.MM

    Group Benefits Instances Selection for Data Purification

    Authors: Zhenhuang Cai, Chuanyi Zhang, Dan Huang, Yuanbo Chen, Xiuyun Guan, Yazhou Yao

    Abstract: Manually annotating datasets for training deep models is very labor-intensive and time-consuming. To overcome such inferiority, directly leveraging web images to conduct training data becomes a natural choice. Nevertheless, the presence of label noise in web data usually degrades the model performance. Existing methods for combating label noise are typically designed and tested on synthetic noisy… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE Intelligent Systems

  23. arXiv:2403.15407  [pdf, other

    cs.CL cs.AI

    X-AMR Annotation Tool

    Authors: Shafiuddin Rehan Ahmed, Jon Z. Cai, Martha Palmer, James H. Martin

    Abstract: This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting a… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: EACL 2024 System Demonstration

  24. arXiv:2403.14979  [pdf, other

    physics.flu-dyn

    Efficient aerodynamic coefficients prediction with a long sequence neural network

    Authors: Zemin Cai, Zhengyuan Fan, Tianshu Liu

    Abstract: Traditionally, deriving aerodynamic parameters for an airfoil via Computational Fluid Dynamics requires significant time and effort. However, recent approaches employ neural networks to replace this process, it still grapples with challenges like lack of end-to-end training and interpretability. A novel and more efficient neural network is proposed in this paper, called AirfoilNet. AirfoilNet seam… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  25. arXiv:2403.13678  [pdf, other

    cs.CV

    AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts

    Authors: Jun Yu, Zerui Zhang, Zhihong Wei, Gongpeng Zhao, Zhongpeng Cai, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: Leveraging the synergy of both audio data and visual data is essential for understanding human emotions and behaviors, especially in in-the-wild setting. Traditional methods for integrating such multimodal information often stumble, leading to less-than-ideal outcomes in the task of facial action unit detection. To overcome these shortcomings, we propose a novel approach utilizing audio-visual mul… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  26. arXiv:2403.13027  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Towards Better Statistical Understanding of Watermarking LLMs

    Authors: Zhongze Cai, Shang Liu, Hanzhao Wang, Huaiyang Zhong, Xiaocheng Li

    Abstract: In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better under… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  27. arXiv:2403.12959  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    WHAC: World-grounded Humans and Cameras

    Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qing** Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

    Abstract: Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our a… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Homepage: https://wqyin.github.io/projects/WHAC/

  28. arXiv:2403.12884  [pdf, other

    cs.CV

    HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

    Authors: Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi

    Abstract: Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities. Compositional visual reasoning approaches have emerged as effective strategies; however, they heavily rely on the commonsense knowledge encode… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  29. arXiv:2403.12425  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

    Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages,3 figures

  30. arXiv:2403.11942  [pdf, other

    cs.CV cs.AI

    Exploring Facial Expression Recognition through Semi-Supervised Pretraining and Temporal Modeling

    Authors: Jun Yu, Zhihong Wei, Zhongpeng Cai, Gongpeng Zhao, Zerui Zhang, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: Facial Expression Recognition (FER) plays a crucial role in computer vision and finds extensive applications across various fields. This paper aims to present our approach for the upcoming 6th Affective Behavior Analysis in-the-Wild (ABAW) competition, scheduled to be held at CVPR2024. In the facial expression recognition task, The limited size of the FER dataset poses a challenge to the expressio… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  31. arXiv:2403.11082  [pdf, other

    cs.CL cs.AI cs.LG

    RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning

    Authors: Javad Rafiei Asl, Prajwal Panzade, Eduardo Blanco, Daniel Takabi, Zhipeng Cai

    Abstract: Pre-trained language models (PLMs) have consistently demonstrated outstanding performance across a diverse spectrum of natural language processing tasks. Nevertheless, despite their success with unseen data, current PLM-based representations often exhibit poor robustness in adversarial settings. In this paper, we introduce RobustSentEmbed, a self-supervised sentence embedding framework designed to… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted at the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings) 2024. [https://openreview.net/forum?id=9dEAg4lJEA]

  32. arXiv:2403.10719  [pdf

    cond-mat.mtrl-sci

    X-ray Nano-imaging of a Heterogeneous Structural Phase Transition in V2O3

    Authors: Ziming Shao, Aileen Luo, Eti Barazani, Tao Zhou, Zhonghou Cai, Martin V. Holt, Yoav Kalcheim, Andrej Singer

    Abstract: Controlling the Mott transition through strain engineering is crucial for advancing the development and application of memristive and neuromorphic computing devices. Yet, Mott insulators are heterogeneous due to intrinsic phase boundaries and extrinsic defects, posing significant challenges to fully understanding the impact of local microscopic distortions on the local Mott transition. Addressing… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  33. arXiv:2403.09326  [pdf, other

    cs.GR cs.AI

    HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

    Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhi**g Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang

    Abstract: We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 17 figures

    ACM Class: I.2.6; I.3.8

  34. arXiv:2403.05989  [pdf, other

    cs.SD eess.AS

    HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling

    Authors: Chunhui Wang, Chang Zeng, Bowen Zhang, Ziyang Ma, Yefan Zhu, Zifeng Cai, Jian Zhao, Zhonglin Jiang, Yong Chen

    Abstract: Token-based text-to-speech (TTS) models have emerged as a promising avenue for generating natural and realistic speech, yet they grapple with low pronunciation accuracy, speaking style and timbre inconsistency, and a substantial need for diverse training data. In response, we introduce a novel hierarchical acoustic modeling approach complemented by a tailored data augmentation strategy and train i… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  35. arXiv:2403.05425  [pdf, ps, other

    stat.ML stat.ME

    An Adaptive Dimension Reduction Estimation Method for High-dimensional Bayesian Optimization

    Authors: Shouri Hu, Jiawei Li, Zhibo Cai

    Abstract: Bayesian optimization (BO) has shown impressive results in a variety of applications within low-to-moderate dimensional Euclidean spaces. However, extending BO to high-dimensional settings remains a significant challenge. We address this challenge by proposing a two-step optimization framework. Initially, we identify the effective dimension reduction (EDR) subspace for the objective function using… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: First draft

  36. arXiv:2403.05265  [pdf, other

    cs.AI

    MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

    Authors: Zinan Zeng, Sen Ye, Zijian Cai, Heng Wang, Yuhan Liu, Haokai Zhang, Minnan Luo

    Abstract: Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  37. arXiv:2403.04268  [pdf

    quant-ph cs.LG

    Qubit-Wise Architecture Search Method for Variational Quantum Circuits

    Authors: Jialin Chen, Zhiqiang Cai, Ke Xu, Di Wu, Wei Cao

    Abstract: Considering the noise level limit, one crucial aspect for quantum machine learning is to design a high-performing variational quantum circuit architecture with small number of quantum gates. As the classical neural architecture search (NAS), quantum architecture search methods (QAS) employ methods like reinforcement learning, evolutionary algorithms and supernet optimiza-tion to improve the search… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  38. arXiv:2403.02620  [pdf

    physics.optics physics.app-ph

    Polarization-Encoded Lenticular Nano-Printing with Single-Layer Metasurfaces

    Authors: Lin Deng, Ziqiang Cai, Yongmin Liu

    Abstract: Metasurface-based nano-printing has enabled ultrahigh-resolution grayscale or color image display. However, the maximum number of independent nano-printing images allowed by one single-layer metasurface is still limited despite many multiplexing methods that have been proposed to increase the design degree of freedom. In this work, we substantially push the multiplexing limit of nano-printing by t… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figures

  39. arXiv:2403.02586  [pdf, other

    cs.CL

    Improving Event Definition Following For Zero-Shot Event Detection

    Authors: Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

    Abstract: Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In this work, we aim to improve zero-shot event detection by training models to better follow event definitions. We hypothesize that a diverse set of ev… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  40. arXiv:2403.02399  [pdf, other

    astro-ph.GA

    The true number density of massive galaxies in the early Universe revealed by JWST/MIRI

    Authors: Tao Wang, Hanwen Sun, Luwenjia Zhou, Ke Xu, Cheng Cheng, Zhaozhou Li, Yangyao Chen, H. J. Mo, Avishai Dekel, Xianzhong Zheng, Zheng Cai, Tiacheng Yang, Y. -S. Dai, David Elbaz, J. -S. Huang

    Abstract: One of the main challenges in galaxy formation that has emerged recently is the early assembly of massive galaxies. The observed number density and the maximum stellar mass ($M_{\star}$) of massive galaxies in the early Universe appear to be higher than model predictions, which may pose a serious problem to the LCDM cosmology. A major limitation in many previous studies is the large uncertainty in… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 23 pages, 10 figures, submitted

  41. arXiv:2403.01691  [pdf, other

    astro-ph.HE

    How long will the quasar UV/optical flickering be damped?

    Authors: Shuying Zhou, Mouyuan Sun, Zhen-Yi Cai, Guowei Ren, Jun-Xian Wang, Yongquan Xue

    Abstract: The UV/optical light curves of Active Galactic Nuclei (AGNs) are commonly described by the Damped Random Walk (DRW) model. However, the physical interpretation of the dam** timescale, a key parameter in the DRW model, remains unclear. Particularly, recent observations indicate a weak dependence of the dam** timescale upon both wavelength and accretion rate, clearly being inconsistent with the… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 19 pages, 16 figures, accepted to ApJ

  42. arXiv:2403.01676  [pdf, ps, other

    hep-ph

    Production of $X_b$ via radiative transition of $Υ(10753)$

    Authors: Shi-Dong Liu, Hao-Dong Cai, Zu-Xin Cai, Hong-Shuo Gao, Gang Li, Fan Wang, Ju-Jun Xie

    Abstract: We studied the radiative transitions between the $Υ(10753)$, the $S$-$D$ mixed state of the $Υ(4S)$ and $Υ_1(3\,{}^3D_1)$, and the $X_b$, the heavy quark flavor symmetry counterpart of the $X(3782)$ in the bottomonium sector. The radiative transition was assumed to occur through the intermediate bottom mesons, including $P$-wave $B_1^{(\prime)}$ mesons as well as the $S$-wave $B^{(*)}$ ones. The c… ▽ More

    Submitted 9 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 7 pages, 4 figures, accepted by PRD(20240510)

  43. arXiv:2403.00873  [pdf, ps, other

    cs.CR cs.LG

    Blockchain-empowered Federated Learning: Benefits, Challenges, and Solutions

    Authors: Zeju Cai, Jianguo Chen, Yuting Fan, Zibin Zheng, Keqin Li

    Abstract: Federated learning (FL) is a distributed machine learning approach that protects user data privacy by training models locally on clients and aggregating them on a parameter server. While effective at preserving privacy, FL systems face limitations such as single points of failure, lack of incentives, and inadequate security. To address these challenges, blockchain technology is integrated into FL… ▽ More

    Submitted 5 July, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pu** Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  45. arXiv:2402.15527  [pdf, other

    cs.CL cs.AI cs.CV

    PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

    Authors: Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang

    Abstract: We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. Given task instructions and diverse contexts, t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Code and Data released at https://github.com/pkunlp-icler/PCA-EVAL. Leaderboard at: https://docs.qq.com/sheet/DVUd4WUpGRHRqUnNV. This article supersedes its workshop version arxiv: 2310.02071. arXiv admin note: text overlap with arXiv:2310.02071

  46. arXiv:2402.11095  [pdf, other

    cs.CV

    GIM: Learning Generalizable Image Matcher From Internet Videos

    Authors: Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias Müller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, Cheng Wang

    Abstract: Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such methods typically need to train separate models for different scene types and are impractical when the scene type is unknown in advance. One of the underlying problems is the limited scalability of exis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024 for spotlight presentation

  47. arXiv:2402.11000  [pdf, other

    cs.CL cs.AI

    ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment

    Authors: Yangyifei Luo, Zhuo Chen, Lingbing Guo, Qian Li, Wenxuan Zeng, Zhixin Cai, Jianxin Li

    Abstract: Entity alignment (EA) aims to identify entities across different knowledge graphs that represent the same real-world objects. Recent embedding-based EA methods have achieved state-of-the-art performance in EA yet faced interpretability challenges as they purely rely on the embedding distance and neglect the logic rules behind a pair of aligned entities. In this paper, we propose the Align-Subgraph… ▽ More

    Submitted 5 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Ongoing work; 16 pages, 9 Tables, 8 Figures; Code: https://github.com/lyyf2002/ASGEA

  48. arXiv:2402.09511  [pdf, other

    quant-ph

    Biased Estimator Channels for Classical Shadows

    Authors: Zhenyu Cai, Adrian Chapman, Hamza Jnane, Bálint Koczor

    Abstract: Extracting classical information from quantum systems is of fundamental importance, and classical shadows allow us to extract a large amount of information using relatively few measurements. Conventional shadow estimators are unbiased and thus approach the true mean in the infinite-sample limit. In this work, we consider a biased scheme, intentionally introducing a bias by rescaling the convention… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures

  49. arXiv:2402.09059  [pdf, other

    cs.LG cs.AI cs.CR

    I can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic Encryption

    Authors: Prajwal Panzade, Daniel Takabi, Zhipeng Cai

    Abstract: In today's machine learning landscape, fine-tuning pretrained transformer models has emerged as an essential technique, particularly in scenarios where access to task-aligned training data is limited. However, challenges surface when data sharing encounters obstacles due to stringent privacy regulations or user apprehension regarding personal information disclosure. Earlier works based on secure m… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted for the presentation at PPAI @The 38th Annual AAAI Conference on Artificial Intelligence 2024

  50. arXiv:2402.07866  [pdf, other

    quant-ph

    Virtual Channel Purification

    Authors: Zhenhuan Liu, Xingjian Zhang, Yue-Yang Fei, Zhenyu Cai

    Abstract: Quantum error mitigation is a key approach for extracting target state properties on state-of-the-art noisy machines and early fault-tolerant devices. Using the ideas from flag fault tolerance and virtual state purification, we develop the virtual channel purification (VCP) protocol, which consumes similar qubit and gate resources as virtual state purification but offers up to exponentially strong… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.