Skip to main content

Showing 1–50 of 15,179 results for author: Yao

.
  1. arXiv:2407.09395  [pdf, other

    cs.IR cs.AI cs.CL

    Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

    Authors: Zhe Lin, Jiwei Tan, Dan Ou, Xi Chen, Shaowei Yao, Bo Zheng

    Abstract: Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these mo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: KDD'24 accepted paper

  2. arXiv:2407.09164  [pdf, other

    cs.CR cs.AI

    TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

    Authors: Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

    Abstract: Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversaria… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  3. arXiv:2407.09073  [pdf, other

    cs.CV

    Open Vocabulary Multi-Label Video Classification

    Authors: Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

    Abstract: Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  4. arXiv:2407.09026  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    HPC: Hierarchical Progressive Coding Framework for Volumetric Video

    Authors: Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang

    Abstract: Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hie… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  5. arXiv:2407.08745  [pdf, other

    cs.NE cs.AI

    Evolutionary Computation for the Design and Enrichment of General-Purpose Artificial Intelligence Systems: Survey and Prospects

    Authors: Javier Poyatos, Javier Del Ser, Salvador Garcia, Hisao Ishibuchi, Daniel Molina, Isaac Triguero, Bing Xue, Xin Yao, Francisco Herrera

    Abstract: In Artificial Intelligence, there is an increasing demand for adaptive models capable of dealing with a diverse spectrum of learning tasks, surpassing the limitations of systems devised to cope with a single task. The recent emergence of General-Purpose Artificial Intelligence Systems (GPAIS) poses model configuration and adaptability challenges at far greater complexity scales than the optimal de… ▽ More

    Submitted 3 June, 2024; originally announced July 2024.

  6. arXiv:2407.08596  [pdf, other

    astro-ph.HE astro-ph.GA

    Modeling X-Ray Multi-Reflection in Super-Eddington Winds

    Authors: Zijian Zhang, Lars Lund Thomsen, Lixin Dai, Christopher S. Reynolds, Javier A. García, Erin Kara, Riley Connors, Megan Masterson, Yuhan Yao, Thomas Dauser

    Abstract: It has been recently discovered that a few super-Eddington sources undergoing black hole super-Eddington accretion exhibit X-ray reflection signatures. In such new systems, one expects that the coronal X-ray emissions are mainly reflected by optically thick super-Eddington winds instead of thin disks. In this paper, we conduct a series of general relativistic ray-tracing and Monte Carlo radiative… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 21 figures, 2 tables. Comments are welcome

  7. arXiv:2407.08583  [pdf, other

    cs.AI cs.CV cs.LG

    The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

    Authors: Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng

    Abstract: The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the broader range of application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the impo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Ongoing work. 31 pages. Related materials are continually maintained and available at https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md

  8. arXiv:2407.08512  [pdf, other

    math.SG

    Anchored symplectic embeddings

    Authors: Michael Hutchings, Agniva Roy, Morgan Weiler, Yuan Yao

    Abstract: Given two four-dimensional symplectic manifolds, together with knots in their boundaries, we define an ``anchored symplectic embedding'' to be a symplectic embedding, together with a two-dimensional symplectic cobordism between the knots (in the four-dimensional cobordism determined by the embedding). We use techniques from embedded contact homology to determine quantitative critera for when ancho… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 30 pages, 3 figures

    MSC Class: 57K43

  9. arXiv:2407.08443  [pdf, other

    cs.CV

    Infinite Motion: Extended Motion Generation via Long Text Instructions

    Authors: Mengtian Li, Chengshuo Zhai, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang

    Abstract: In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reass… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 12 pages,13 figures

  10. arXiv:2407.08255  [pdf, other

    cs.CV cs.LG

    GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification

    Authors: Aitao Yang, Min Li, Yao Ding, Leyuan Fang, Yaoming Cai, Yujie He

    Abstract: Efficient extraction of spectral sequences and geospatial information has always been a hot topic in hyperspectral image classification. In terms of spectral sequence feature capture, RNN and Transformer have become mainstream classification frameworks due to their long-range feature capture capabilities. In terms of spatial information aggregation, CNN enhances the receptive field to retain integ… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  11. arXiv:2407.08150  [pdf, other

    cs.CV

    Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

    Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, ** Wang

    Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.08141  [pdf, ps, other

    eess.SP

    A Framework of FAS-RIS Systems: Performance Analysis and Throughput Optimization

    Authors: Junteng Yao, Xiazhi Lai, Kangda Zhi, Tuo Wu, Ming **, Cunhua Pan, Maged Elkashlan, Chau Yuen, Kai-Kit Wong

    Abstract: In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  13. arXiv:2407.08120  [pdf, other

    astro-ph.GA

    Spectroastrometry and Reverberation Map** (SARM) of Active Galactic Nuclei. I. The H$β$ Broad-line Region Structure and Black Hole Mass of Five Quasars

    Authors: Yan-Rong Li, Chen Hu, Zhu-Heng Yao, Yong-Jie Chen, Hua-Rui Bai, Sen Yang, Pu Du, Feng-Na Fang, Yi-Xin Fu, Jun-Rong Liu, Yue-Chang Peng, Yu-Yang Songsheng, Yi-Lin Wang, Ming Xiao, Shuo Zhai, Hartmut Winkler, **-Ming Bai, Luis C. Ho, Romain G. Petrov, Jesus Aceituno, Jian-Min Wang

    Abstract: We conduct a reverberation map** (RM) campaign to spectroscopically monitor a sample of selected bright active galactic nuclei with large anticipated broad-line region (BLR) sizes adequate for spectroastrometric observations by the GRAVITY instrument on the Very Large Telescope Interferometer. We report the first results for five objects, IC 4329A, Mrk 335, Mrk 509, Mrk 1239, and PDS 456, among… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 32 pages, 6 tables, 20 figures. To appear in ApJ

  14. arXiv:2407.08020  [pdf, other

    cs.CV

    Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound images

    Authors: Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz

    Abstract: Placenta volume measurement from 3D ultrasound images is critical for predicting pregnancy outcomes, and manual annotation is the gold standard. However, such manual annotation is expensive and time-consuming. Automated segmentation algorithms can often successfully segment the placenta, but these methods may not consistently produce robust segmentations suitable for practical use. Recently, inspi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  15. arXiv:2407.08010  [pdf

    cs.LG cs.NE

    A New Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction

    Authors: Fulong Yao, Wanqing Zhao, Matthew Forshaw, Yang Song

    Abstract: This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction. Differing from the traditional six-layer IT2FNN, a nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability. First, a new co-antecedent layer and a modified consequent layer are devised to im… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  16. arXiv:2407.07933  [pdf, other

    stat.ME cs.LG stat.ML

    Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments

    Authors: Feng Xie, Zhen Yao, Lin Xie, Yan Zeng, Zhi Geng

    Abstract: We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming t… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 27 pages, 6 tables, 7 figures

  17. arXiv:2407.07651  [pdf, other

    hep-ex physics.data-an

    Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

    Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

    Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  18. arXiv:2407.07395  [pdf, other

    cs.CV cs.MM eess.IV

    Standard compliant video coding using low complexity, switchable neural wrappers

    Authors: Yueyu Hu, Chenhao Zhang, Onur G. Guleryuz, Debargha Mukherjee, Yao Wang

    Abstract: The proliferation of high resolution videos posts great storage and bandwidth pressure on cloud video services, driving the development of next-generation video codecs. Despite great progress made in neural video coding, existing approaches are still far from economical deployment considering the complexity and rate-distortion performance tradeoff. To clear the roadblocks for neural video coding,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE ICIP 2024

  19. arXiv:2407.07346  [pdf, other

    cs.LG cs.CE

    INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers

    Authors: Souradip Poddar, Youngmin Oh, Yao Lai, Hanqing Zhu, Bosun Hwang, David Z. Pan

    Abstract: Analog front-end design heavily relies on specialized human expertise and costly trial-and-error simulations, which motivated many prior works on analog design automation. However, efficient and effective exploration of the vast and complex design space remains constrained by the time-consuming nature of CPU-based SPICE simulations, making effective design automation a challenging endeavor. In thi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  20. arXiv:2407.07302  [pdf, other

    eess.IV cs.CV

    Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution

    Authors: Yuehan Zhang, Seungjun Lee, Angela Yao

    Abstract: Standard single-image super-resolution creates paired training data from high-resolution images through fixed downsampling kernels. However, real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data. Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  21. arXiv:2407.06715  [pdf, other

    quant-ph cond-mat.stat-mech

    On the relation between momentum uncertainty and thermal wavelength

    Authors: Zi-Fan Zhu, Yao Wang

    Abstract: For quantum particles in a Boltzmann state, we derive an inequality between momentum uncertainty $Δp$ and thermal de Broglie wavelength $λ_{\rm th}$, expressed as $Δp \geq \sqrt{2π}\hbar/λ_{\rm th}$, as a corollary of the Boltzmann lower bound for the Heisenberg uncertainty product proposed in the previous work [EPL, 143, 20001 (2023)]

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 4 pages, 1 figure

  22. arXiv:2407.06714  [pdf, other

    cs.CV

    Improving the Transferability of Adversarial Examples by Feature Augmentation

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Xiaohu Zheng, Junqi Wu, Xiaoqian Chen

    Abstract: Despite the success of input transformation-based attacks on boosting adversarial transferability, the performance is unsatisfying due to the ignorance of the discrepancy across models. In this paper, we propose a simple but effective feature augmentation attack (FAUG) method, which improves adversarial transferability without introducing extra computation costs. Specifically, we inject the random… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 4 figures, 4 tables

  23. arXiv:2407.06688  [pdf, other

    cs.CV

    Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Chao Li, Xiaoqian Chen

    Abstract: Object detectors have demonstrated vulnerability to adversarial examples crafted by small perturbations that can deceive the object detector. Existing adversarial attacks mainly focus on white-box attacks and are merely valid at a specific viewpoint, while the universal multi-view black-box attack is less explored, limiting their generalization in practice. In this paper, we propose a novel univer… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 12 pages, 13 figures, 5 tables

  24. arXiv:2407.06634  [pdf, other

    cond-mat.quant-gas cond-mat.str-el quant-ph

    Finite size analysis for interacting bosons at the 2D-1D Dimensional Crossover

    Authors: Lorenzo Pizzino, Hepeng Yao, Thierry Giamarchi

    Abstract: In this work, we extend the analysis of interacting bosons at 2D-1D dimensional crossover for finite size and temperature by using field-theory approach (bosonization) and quantum Monte Carlo simulations. Stemming from the fact that finite size low-dimensional systems are allowed only to have quasi-ordered phase, we consider the self-consistent harmonic approximation to compute the fraction of qua… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  25. arXiv:2407.06567  [pdf, other

    cs.CL

    FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

    Authors: Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Rong Liu, Zhenyu Cui, Denghui Zhang, Koduvayur Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, Qianqian Xie

    Abstract: Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and man… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: LLM Applications, LLM Agents, Financial Technology, Quantitative Finance, Algorithmic Trading, Cognitive Science

  26. arXiv:2407.06504  [pdf, other

    cs.CV

    Reprogramming Distillation for Medical Foundation Models

    Authors: Yuhang Zhou, Siyuan Du, Haolin Li, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: Medical foundation models pre-trained on large-scale datasets have demonstrated powerful versatile capabilities for various tasks. However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios. Previous methods,… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  27. arXiv:2407.06498  [pdf, other

    cs.HC

    Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

    Authors: Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li

    Abstract: The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  28. arXiv:2407.06176  [pdf, other

    cs.CV eess.IV

    Contour-weighted loss for class-imbalanced image segmentation

    Authors: Zhhengyong Huang, Yao Sui

    Abstract: Image segmentation is critically important in almost all medical image analysis for automatic interpretations and processing. However, it is often challenging to perform image segmentation due to data imbalance between intra- and inter-class, resulting in over- or under-segmentation. Consequently, we proposed a new methodology to address the above issue, with a compact yet effective contour-weight… ▽ More

    Submitted 7 June, 2024; originally announced July 2024.

    Comments: ICIP 2024

  29. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  30. arXiv:2407.05954  [pdf, other

    cs.LG stat.ME

    Causality-driven Sequence Segmentation for Enhancing Multiphase Industrial Process Data Analysis and Soft Sensing

    Authors: Yimeng He, Le Yao, Xinmin Zhang, Xiangyin Kong, Zhihuan Song

    Abstract: The dynamic characteristics of multiphase industrial processes present significant challenges in the field of industrial big data modeling. Traditional soft sensing models frequently neglect the process dynamics and have difficulty in capturing transient phenomena like phase transitions. To address this issue, this article introduces a causality-driven sequence segmentation (CDSS) model. This mode… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  31. arXiv:2407.05869  [pdf, other

    cs.AI

    PORCA: Root Cause Analysis with Partially Observed Data

    Authors: Chang Gong, Di Yao, ** Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, **g** Bi

    Abstract: Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which… ▽ More

    Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  32. arXiv:2407.05846  [pdf, ps, other

    quant-ph

    Implementation of Composite Photon Blockade Based on Four-wave Mixing System

    Authors: Hongyu Lina, Zhi-Hai Yao, Xiao-Qian Wang, Feng Gao

    Abstract: A high-quality single-photon blockade system can effectively enhance the quality of single-photon sources. Conventional photon blockade(CPB) suffers from low single-photon purity and high requirements for system nonlinearity, while unconventional photon blockade(UPB) has the disadvantage of low brightness. Recent research by [Laser Photon.Rev 14,1900279,2020] demonstrates that UPB can be used to e… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  33. arXiv:2407.05765  [pdf, other

    cs.CV

    Enlarging Feature Support Overlap for Domain Generalization

    Authors: Yaoyao Zhu, Xiuding Cai, Dong Miao, Yu Yao, Zhongliang Fu

    Abstract: Deep models often struggle with out-of-distribution (OOD) generalization, limiting their real-world applicability beyond controlled laboratory settings. Invariant risk minimization (IRM) addresses this issue by learning invariant features and minimizing the risk across different domains. Thus, it avoids the pitfalls of pseudo-invariant features and spurious causality associated with empirical risk… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  34. arXiv:2407.05670  [pdf, other

    cond-mat.supr-con cond-mat.mes-hall

    Influence of capacitance and thermal fluctuations on the Josephson diode effect in asymmetric higher-harmonic SQUIDs

    Authors: G. S. Seleznev, Ya. V. Fominov

    Abstract: Asymmetric two-junction SQUIDs with different current-phase relations in the two Josephson junctions, involving higher Josephson harmonics, demonstrate a flux-tunable Josephson diode effect (asymmetry between currents flowing in the opposite directions, which can be tuned by the magnetic flux through the interferometer loop). We theoretically investigate influence of junction capacitance and therm… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 20 pages, 10 figures

  35. arXiv:2407.05655  [pdf

    cs.HC

    Constrained Online Recursive Source Separation Framework for Real-time Electrophysiological Signal Processing

    Authors: Li Yao, Zhao Haowen, Liu Yunfei, Zhang Xu

    Abstract: Electrophysiological signal processing often requires blind source separation (BSS) techniques due to the nature of mixing source signals. However, its complex computational demands make real-time applicability challenging. In this study, we propose a Constrained Online Recursive Source Separation (CORSS) framework for real time electrophysiological signals processing. With a stepwise recursive un… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.05577  [pdf, other

    cs.CV

    Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN

    Authors: Jiacheng Su, Kunhong Liu, Liyan Chen, Junfeng Yao, Qingsong Liu, Dongdong Lv

    Abstract: The existing methods for audio-driven talking head video editing have the limitations of poor visual effects. This paper tries to tackle this problem through editing talking face images seamless with different emotions based on two modules: (1) an audio-to-landmark module, consisting of the CrossReconstructed Emotion Disentanglement and an alignment network module. It bridges the gap between speec… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  37. arXiv:2407.05131  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

    Authors: Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, Huaxiu Yao

    Abstract: The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challen… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  38. arXiv:2407.05128  [pdf, other

    cs.CV

    SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention

    Authors: Yunzhong Si, Huiying Xu, Xinzhong Zhu, Wenhao Zhang, Yao Dong, Yuxing Chen, Hongbo Li

    Abstract: Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potenti… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  39. arXiv:2407.05023  [pdf, other

    cs.CV

    SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

    Authors: Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

    Abstract: Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  40. arXiv:2407.04842  [pdf, other

    cs.CV cs.CL cs.LG

    MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

    Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 42 pages, 13 figures, 33 tables

  41. arXiv:2407.04557  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

    Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

    Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 512 pages total, 4 main figures + 218 supplementary figures

  42. arXiv:2407.04509  [pdf, other

    math.AP

    Analysis of SIR Reaction diffusion system with constant birth and death rate

    Authors: Yiting Yao

    Abstract: This is a truncation of the second year group project at Imperial college london. In this paper, we consider a semilinear reaction diffusion system of SIR model which involves the birth rate and the death rate. We first prove the non-negativity and global existence theorem to ensure that the model makes sense. We prove the uniform convergence of the infection-free solution and study an example tha… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  43. PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation

    Authors: Yinghua Yao, Yuangang Pan, **g Li, Ivor Tsang, Xin Yao

    Abstract: Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Journal ref: Machine Learning 2024

  44. Exploration of Class Center for Fine-Grained Visual Classification

    Authors: Hang Yao, Qiguang Miao, Peipei Zhao, Chaoneng Li, Xin Li, Guanwen Feng, Ruyi Liu

    Abstract: Different from large-scale classification tasks, fine-grained visual classification is a challenging task due to two critical problems: 1) evident intra-class variances and subtle inter-class differences, and 2) overfitting owing to fewer training samples in datasets. Most existing methods extract key features to reduce intra-class variances, but pay no attention to subtle inter-class differences… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accpeted by TCSVT. Code and trained models are here:https://github.com/hyao1/ECC

  45. arXiv:2407.04179  [pdf, other

    cs.CL

    Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

    Authors: Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

    Abstract: Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  46. arXiv:2407.04125  [pdf, other

    cs.CL cs.AI cs.LG

    Query-Guided Self-Supervised Summarization of Nursing Notes

    Authors: Ya Gao, Hans Moen, Saila Koivusalo, Miika Koskinen, Pekka Marttinen

    Abstract: Nursing notes, an important component of Electronic Health Records (EHRs), keep track of the progression of a patient's health status during a care episode. Distilling the key information in nursing notes through text summarization techniques can improve clinicians' efficiency in understanding patients' conditions when reviewing nursing notes. However, existing abstractive summarization methods in… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  47. arXiv:2407.04067  [pdf, other

    cs.CL

    Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM

    Authors: Peiran Yao, Kostyantyn Guzhva, Denilson Barbosa

    Abstract: Symbolic sentence meaning representations, such as AMR (Abstract Meaning Representation) provide expressive and structured semantic graphs that act as intermediates that simplify downstream NLP tasks. However, the instruction-following capability of large language models (LLMs) offers a shortcut to effectively solve NLP tasks, questioning the utility of semantic graphs. Meanwhile, recent work has… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted at TextGraphs-17 @ ACL 2024

  48. arXiv:2407.04020  [pdf, other

    cs.CL

    LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking

    Authors: Amy Xin, Yunjia Qi, Zijun Yao, Fangwei Zhu, Kaisheng Zeng, Xu Bin, Lei Hou, Juanzi Li

    Abstract: Entity Linking (EL) models are well-trained at map** mentions to their corresponding entities according to a given context. However, EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, large language models (LLMs) are more robust at interpreting uncommon mentions. Yet, due to a lack of specialized training, LLMs suffer at generating correct entity… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  49. arXiv:2407.03917  [pdf, other

    cs.CV

    Timestep-Aware Correction for Quantized Diffusion Models

    Authors: Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong Liu, **gdong Wang

    Abstract: Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-constrained platforms like mobile devices. Existing post-training quantization (PTQ) methods have managed to compress diffusion models to low precisio… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  50. arXiv:2407.03772  [pdf, other

    eess.IV cs.CV q-bio.QM

    CS3: Cascade SAM for Sperm Segmentation

    Authors: Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan

    Abstract: Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory stu… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Early accepted by MICCAI2024