Skip to main content

Showing 101–150 of 18,850 results for author: Zhang, Y

.
  1. arXiv:2406.16476  [pdf, other

    cs.CV

    ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

    Authors: Shuwei Shi, Wenbo Li, Yuechen Zhang, **gwen He, Biao Gong, Yinqiang Zheng

    Abstract: Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMast… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.16457  [pdf, other

    cond-mat.mtrl-sci

    A hybrid FEM-NN optimization method to learn the physics-constrained constitutive relations from full-field data

    Authors: Xinxin Wu Kaiqiang Sun, Shaohua Yang, Huan Wang, Ye Xu, Yin Zhang, Sheng Mao

    Abstract: Neural networks (NNs) have demonstrated strong capabilities of representing high-dimensional, complex functional relations, and hence have been widely used to characterize complex constitutive relations for various types of materials, such as polycrystals, polymers, etc. However, to construct a reliable NN-based constitutive model, a considerable amount of data, i.e. stress-strain states along dif… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 14 pages,7 figures

  3. arXiv:2406.16431  [pdf, other

    physics.ins-det

    A ROOT based detector geometry and event visualization system for JUNO-TAO

    Authors: Minghua Liao, Kaixuan Huang, Yumei Zhang, Jiayang Xu, Guofu Cao, Zhengyun You

    Abstract: The Taishan Antineutrino Observatory (TAO or JUNO-TAO) is a satellite experiment of Jiangmen Underground Neutrino Observatory (JUNO) and located near the Taishan nuclear power plant (NPP). TAO will measure the energy spectrum of reactor antineutrinos with unprecedented precision, which will benefit both reactor neutrino physics and the nuclear database. A detector geometry and event visualization… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.16427  [pdf, other

    cs.CV cs.AI

    Dynamic Pseudo Label Optimization in Point-Supervised Nuclei Segmentation

    Authors: Ziyue Wang, Ye Zhang, Yifeng Wang, Linghan Cai, Yongbing Zhang

    Abstract: Deep learning has achieved impressive results in nuclei segmentation, but the massive requirement for pixel-wise labels remains a significant challenge. To alleviate the annotation burden, existing methods generate pseudo masks for model training using point labels. However, the generated masks are inevitably different from the ground truth, and these dissimilarities are not handled reasonably dur… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: early accepted by MICCAI2024

  5. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, **g Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.16129  [pdf

    cs.CV

    UDHF2-Net: An Uncertainty-diffusion-model-based High-Frequency TransFormer Network for High-accuracy Interpretation of Remotely Sensed Imagery

    Authors: Pengfei Zhang, Chang Li, Yongjun Zhang, Rongjun Qin

    Abstract: Remotely sensed image high-accuracy interpretation (RSIHI), including tasks such as semantic segmentation and change detection, faces the three major problems: (1) complementarity problem of spatially stationary-and-non-stationary frequency; (2) edge uncertainty problem caused by down-sampling in the encoder step and intrinsic edge noises; and (3) false detection problem caused by imagery registra… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  7. arXiv:2406.16011  [pdf, ps, other

    math.RT math.RA

    The derived dimensions and representation distances of artin algebras

    Authors: Junling Zheng, Yingying Zhang, **bi Zhang

    Abstract: There is a well-known class of algebras called Igusa-Todorov algebras which were introduced in relation to finitistic dimension conjecture. As a generalization of Igusa-Todorov algebras, the new notion of $(m,n)$-Igusa-Todorov algebras provides a wider framework for studying derived dimensions. In this paper, we give a method for constructing $(m,n)$-Igusa-Todorov algebras. As an application, we p… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: accepted for publication in Archiv der Mathematik

    MSC Class: 18G20; 16E10; 18E10

  8. arXiv:2406.15992  [pdf, other

    cs.CL

    Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

    Authors: Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov

    Abstract: Large language models (LLMs) demonstrate great potential for problems with implicit graphical structures, while recent works seek to enhance the graph reasoning capabilities of LLMs through specialized instruction tuning. The resulting 'graph LLMs' are evaluated with in-distribution settings only, thus it remains underexplored whether LLMs are learning generalizable graph reasoning skills or merel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures, Code and data will be publicly available at https://github.com/MatthewYZhang/NLGift

    ACM Class: I.2.7

  9. arXiv:2406.15956  [pdf

    cond-mat.mtrl-sci

    Decoupling Many-Body Interactions in CeO2 (111) Oxygen Vacancy Structure: Insights from Machine-Learning and Cluster Expansion

    Authors: Yu**g Zhang, Zhong-Kang Han, Beien Zhu, Xiaojuan Hu, Maria Troppenz, Santiago Riga-monti, Hui Li, Claudia Draxl, M. Verónica Ganduglia-Pirovano, Yi Gao

    Abstract: Oxygen vacancies (VO's) are of paramount importance in influencing the properties and applications of ceria (CeO2). Yet, comprehending the distribution and nature of the VO's poses a significant challenge due to the vast number of electronic configurations and intricate many-body interactions among VO's and polarons (Ce3+'s). In this study, we employed a combination of LASSO regression in machine… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 22 pages, 1 scheme, 5 figures

  10. arXiv:2406.15945  [pdf, other

    eess.SP cs.IT

    Full-Space Wireless Sensing Enabled by Multi-Sector Intelligent Surfaces

    Authors: Yumeng Zhang, Xiaodan Shao, Hongyu Li, Bruno Clerckx, Rui Zhang

    Abstract: The multi-sector intelligent surface (IS), benefiting from a smarter wave manipulation capability, has been shown to enhance channel gain and offer full-space coverage in communications. However, the benefits of multi-sector IS in wireless sensing remain unexplored. This paper introduces the application of multi-sector IS for wireless sensing/localization. Specifically, we propose a new self-sensi… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  11. arXiv:2406.15836  [pdf, other

    cs.LG cs.AI cs.MA

    Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models

    Authors: Yang Zhang, Chenjia Bai, Bin Zhao, Junchi Yan, Xiu Li, Xuelong Li

    Abstract: Learning a world model for model-free Reinforcement Learning (RL) agents can significantly improve the sample efficiency by learning policies in imagination. However, building a world model for Multi-Agent RL (MARL) can be particularly challenging due to the scalability issue in a centralized architecture arising from a large number of agents, and also the non-stationarity issue in a decentralized… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  12. arXiv:2406.15829  [pdf, other

    cs.CV

    MVOC: a training-free multiple video object composition method with diffusion models

    Authors: Wei Wang, Yaosen Chen, Yuegen Liu, Qi Yuan, Shubin Yang, Yanru Zhang

    Abstract: Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to c… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  13. arXiv:2406.15768  [pdf, other

    cs.CV

    MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

    Authors: Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang

    Abstract: In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation. However, MLLMs mainly focus on high-level image-text interpretations and struggle with fine-grained visual understanding,… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 8 figures

  14. arXiv:2406.15741  [pdf, other

    cs.CL cs.AI cs.LG

    Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

    Authors: Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu

    Abstract: General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedent… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/fzp0424/Ladder

  15. arXiv:2406.15696  [pdf

    physics.med-ph eess.SP

    Functional photoacoustic noninvasive Doppler angiography in humans

    Authors: Yang Zhang, Joshua Olick-Gibson, Karteekeya Sastry, Lihong V. Wang

    Abstract: Optical imaging of blood flow yields critical functional insights into the circulatory system, but its clinical implementation has typically been limited to shallow depths (~1 millimeter) due to light scattering in biological tissue. Here, we present photoacoustic noninvasive Doppler angiography (PANDA) for deep blood flow imaging. PANDA synergizes the photoacoustic and Doppler effects to generate… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 38 pages, 7 main figures, 10 supplementary figures

  16. arXiv:2406.15557  [pdf, other

    quant-ph cond-mat.str-el

    Observation of a non-Hermitian supersonic mode

    Authors: Yuxuan Zhang, Juan Carrasquilla, Yong Baek Kim

    Abstract: Quantum computers have long been anticipated to excel in simulating quantum many-body physics. While most previous work has focused on Hermitian physics, we demonstrate the power of variational quantum circuits for resource-efficient simulations of dynamical and equilibrium physics in non-Hermitian systems, revealing new phenomena beyond standard Hermitian quantum machines. Using a variational qua… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  17. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  18. arXiv:2406.15474  [pdf, other

    cs.AI cs.CL cs.HC

    WundtGPT: Sha** Large Language Models To Be An Empathetic, Proactive Psychologist

    Authors: Chenyu Ren, Yazhou Zhang, Daihai He, **g Qin

    Abstract: Large language models (LLMs) are raging over the medical domain, and their momentum has carried over into the mental health domain, leading to the emergence of few mental health LLMs. Although such mental health LLMs could provide reasonable suggestions for psychological counseling, how to develop an authentic and effective doctor-patient relationship (DPR) through LLMs is still an important probl… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  19. arXiv:2406.15420  [pdf

    physics.ins-det physics.med-ph physics.optics

    A comprehensive overview of diffuse correlation spectroscopy: theoretical framework, recent advances in hardware, analysis, and applications

    Authors: Quan Wang, Mingliang Pan, Lucas Kreiss, Saeed Samaei, Stefan A. Carp, Johannes D. Johansson, Yuanzhe Zhang, Melissa Wu, Roarke Horstmeyer, Mamadou Diop, David Day-Uei Li

    Abstract: Diffuse correlation spectroscopy (DCS) is a powerful tool for assessing microvascular hemodynamic in deep tissues. Recent advances in sensors, lasers, and deep learning have further boosted the development of new DCS methods. However, newcomers might feel overwhelmed, not only by the already complex DCS theoretical framework but also by the broad range of component options and system architectures… ▽ More

    Submitted 18 May, 2024; originally announced June 2024.

  20. arXiv:2406.15303  [pdf, other

    cs.CV

    ADR: Attention Diversification Regularization for Mitigating Overfitting in Multiple Instance Learning based Whole Slide Image Classification

    Authors: Yunlong Zhang, Zhongyi Shui, Yunxuan Sun, Honglin Li, **gxiong Li, Chenglu Zhu, Sunyi Zheng, Lin Yang

    Abstract: Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications. This paper reveals the correlation between MIL's performance and the entropy of attention values. Based on this observation, we propose Attention Diversity Regularization (ADR), a simple but effective technique aimed at p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  21. arXiv:2406.15283  [pdf, other

    cs.LG

    FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

    Authors: Austin Coursey, Junyi Ji, Marcos Quinones-Grueiro, William Barbour, Yuhang Zhang, Tyler Derr, Gautam Biswas, Daniel B. Work

    Abstract: Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-s… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.15093  [pdf, other

    cs.CR cs.CV eess.IV

    ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

    Authors: Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai **

    Abstract: Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, bui… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ESORICS 2024

  23. arXiv:2406.15068  [pdf, other

    cs.AR

    Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

    Authors: Gianna Paulin, Paul Scheffler, Thomas Benz, Matheus Cavalcante, Tim Fischer, Manuel Eggimann, Yichao Zhang, Nils Wistoff, Luca Bertaccini, Luca Colagrande, Gianmarco Ottavi, Frank K. Gürkaynak, Davide Rossi, Luca Benini

    Abstract: We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stenc… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 2 pages, 7 figures. Accepted at the 2024 IEEE Symposium on VLSI Technology & Circuits

  24. arXiv:2406.15030  [pdf, ps, other

    hep-ex

    Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures

  25. arXiv:2406.15028  [pdf, other

    astro-ph.IM astro-ph.EP

    The high-contrast performance of the Keck Planet Imager and Characterizer

    Authors: Jason J. Wang, Dimitri Mawet, Jerry W. Xuan, Chih-Chun Hsu, Jean-Baptiste Ruffio, Katelyn Horstman, Yinzi Xin, Jacques-Robert Delorme, Nemanja Jovanovic, Yapeng Zhang, Luke Finnerty, Ashley Baker, Randall Bartos, Geoffrey A. Blake, Benjamin Calvin, Sylvain Cetre, Gregory W. Doppmann, Daniel Echeverri, Michael P. Fitzgerald, Joshua Liberman, Ronald Lopez, Evan Morris, Jacklyn Pezzato-Rovner, Ben Sappey, Tobias Schofield , et al. (3 additional authors not shown)

    Abstract: The Keck Planet Imager and Characterizer (KPIC), a series of upgrades to the Keck II Adaptive Optics System and Instrument Suite, aims to demonstrate high-resolution spectroscopy of faint exoplanets that are spatially resolved from their host stars. In this paper, we measure KPIC's sensitivity to companions as a function of separation (i.e., the contrast curve) using on-sky data collected over fou… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures, submitted to the proceedings of SPIE Astronomical Telescopes + Instrumentation 2024, 13096-69

  26. arXiv:2406.14977  [pdf, other

    cs.AI eess.IV

    Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

    Authors: Shan Cong, Zhoujie Fan, Hongwei Liu, Yinghan Zhang, Xin Wang, Haoran Luo, Xiaohui Yao

    Abstract: Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  27. arXiv:2406.14966  [pdf, other

    cs.CY cs.CR

    AIGC-Chain: A Blockchain-Enabled Full Lifecycle Recording System for AIGC Product Copyright Management

    Authors: Jiajia Jiang, Moting Su, Xiangli Xiao, Yushu Zhang, Yuming Fang

    Abstract: As artificial intelligence technology becomes increasingly prevalent, Artificial Intelligence Generated Content (AIGC) is being adopted across various sectors. Although AIGC is playing an increasingly significant role in business and culture, questions surrounding its copyright have sparked widespread debate. The current legal framework for copyright and intellectual property is grounded in the co… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  28. arXiv:2406.14900  [pdf, other

    cs.IR

    Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

    Authors: Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng

    Abstract: Adapting Large Language Models (LLMs) for recommendation requires careful consideration of the decoding process, given the inherent differences between generating items and natural language. Existing approaches often directly apply LLMs' original decoding methods. However, we find these methods encounter significant challenges: 1) amplification bias -- where standard length normalization inflates… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  29. arXiv:2406.14863  [pdf, other

    cs.CR cs.AR

    Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks

    Authors: Ning Lin, Shaocong Wang, Yue Zhang, Yangu He, Kwunhang Wong, Arindam Basu, Dashan Shang, Xiaoming Chen, Zhongrui Wang

    Abstract: Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference 2024

  30. arXiv:2406.14604  [pdf, other

    hep-th hep-ph

    Two-Loop Spacelike Splitting Amplitude for N=4 Super-Yang-Mills Theory

    Authors: Johannes Henn, Rourou Ma, Yongqun Xu, Kai Yan, Yang Zhang, Hua Xing Zhu

    Abstract: The study of collinear behavior for gauge theories in the spacelike region is of great phenomenological and theoretical importance. We analytically calculate the two-loop spacelike splitting amplitude for the full color N=4 Super-Yang-Mills theory. The result is derived by two complementary methods starting from the known amplitude: one is based on a discontinuity analysis, while the other one is… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 6 packages, 3 figures

    Report number: USTC-ICTS/PCFT-24-18

  31. arXiv:2406.14377  [pdf, other

    cs.LG cs.AI

    Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection

    Authors: Rushuang Zhou, Zijun Liu, Lei Clifton, David A. Clifton, Kannie W. Y. Chan, Yuan-Ting Zhang, Yining Dong

    Abstract: Label scarcity problem is the main challenge that hinders the wide application of deep learning systems in automatic cardiovascular diseases (CVDs) detection using electrocardiography (ECG). Tuning pre-trained models alleviates this problem by transferring knowledge learned from large datasets to downstream small datasets. However, bottlenecks in computational efficiency and CVDs detection perform… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  32. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  33. arXiv:2406.14176  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

    Authors: Kyungbok Lee, You Zhang, Zhiyao Duan

    Abstract: This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  34. arXiv:2406.14096  [pdf, other

    cs.AI cs.LG

    Graph Neural Networks for Job Shop Scheduling Problems: A Survey

    Authors: Igor G. Smit, Jianan Zhou, Robbert Reijnen, Yaoxin Wu, Jian Chen, Cong Zhang, Zaharah Bukhsh, Wim Nuijten, Yingqian Zhang

    Abstract: Job shop scheduling problems (JSSPs) represent a critical and challenging class of combinatorial optimization problems. Recent years have witnessed a rapid increase in the application of graph neural networks (GNNs) to solve JSSPs, albeit lacking a systematic survey of the relevant literature. This paper aims to thoroughly review prevailing GNN methods for different types of JSSPs and the closely… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  35. arXiv:2406.14095  [pdf, other

    cs.LG cs.AI

    Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

    Authors: Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang Zhang, Jonathan Scarlett, Zhanxing Zhu, Kenji Kawaguchi

    Abstract: Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inherent characteristics, are ill-suited to meet the dem… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  36. arXiv:2406.14054  [pdf, other

    cs.LG

    Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing

    Authors: Xinbo Zhao, Yingxue Zhang, Xin Zhang, Yu Yang, Yiqun Xie, Yanhua Li, Jun Luo

    Abstract: Enhancing diverse human decision-making processes in an urban environment is a critical issue across various applications, including ride-sharing vehicle dispatching, public transportation management, and autonomous driving. Offline reinforcement learning (RL) is a promising approach to learn and optimize human urban strategies (or policies) from pre-collected human-generated spatial-temporal urba… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  37. arXiv:2406.14039  [pdf

    cs.AI cs.CE cs.CL cs.NE

    CryptoGPT: a 7B model rivaling GPT-4 in the task of analyzing and classifying real-time financial news

    Authors: Ying Zhang, Matthieu Petit Guillaume, Aurélien Krauth, Manel Labidi

    Abstract: CryptoGPT: a 7B model competing with GPT-4 in a specific task -- The Impact of Automatic Annotation and Strategic Fine-Tuning via QLoRAIn this article, we present a method aimed at refining a dedicated LLM of reasonable quality with limited resources in an industrial setting via CryptoGPT. It is an LLM designed for financial news analysis for the cryptocurrency market in real-time. This project wa… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Journ{é}e Nationale sur la Fouille de Textes, Pascal CUXAC; Adrien GUILLE; C{é}dric LOPEZ, Jun 2024, Lyon (Universit{é} Lumi{è}re Lyon 2), France

  38. arXiv:2406.13979  [pdf, other

    eess.IV cs.CV cs.LG

    Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning

    Authors: Yupei Zhang, Xiaofei Wang, Fangliangzi Meng, ** Tang, Chao Li

    Abstract: Multi-modal learning plays a crucial role in cancer diagnosis and prognosis. Current deep learning based multi-modal approaches are often limited by their abilities to model the complex correlations between genomics and histology data, addressing the intrinsic complexity of tumour ecosystem where both tumour and microenvironment contribute to malignancy. We propose a biologically interpretative an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  39. arXiv:2406.13974  [pdf, other

    math.CO

    A Combinatorial Decomposition of Knapsack Cones

    Authors: Guoce Xin, Yingrui Zhang, Zihao Zhang

    Abstract: In this paper, we focus on knapsack cones, a specific type of simplicial cones that arise naturally in the context of the knapsack problem $x_1 a_1 + \cdots + x_n a_n = a_0$. We present a novel combinatorial decomposition for these cones, named \texttt{DecDenu}, which aligns with Barvinok's unimodular cone decomposition within the broader framework of Algebraic Combinatorics. Computer experiments… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 22 pages

    MSC Class: Primary 52C07; Secondary 05--04; 05--08

  40. arXiv:2406.13940  [pdf, other

    cs.CL

    AutoCAP: Towards Automatic Cross-lingual Alignment Planning for Zero-shot Chain-of-Thought

    Authors: Yongheng Zhang, Qiguang Chen, Min Li, Wanxiang Che, Libo Qin

    Abstract: Cross-lingual chain-of-thought can effectively complete reasoning tasks across languages, which gains increasing attention. Recently, dominant approaches in the literature improve cross-lingual alignment capabilities by integrating reasoning knowledge from different languages. Despite achieving excellent performance, current methods still have two main challenges: (1) Manual language specification… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL2024 Findings

  41. arXiv:2406.13939  [pdf, other

    cs.CV

    2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

    Authors: Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, **g Liu

    Abstract: Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task focuses more on the motion in video content for language-guided video object segmentation, requiring an enhanced ability to model longer temporal, moti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  43. arXiv:2406.13890  [pdf, other

    cs.CL cs.AI

    ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

    Authors: Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu

    Abstract: LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical eval… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  44. FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

    Authors: Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu, Yinger Zhang, **jie Gu

    Abstract: Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the g… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Report number: 30th

    Journal ref: KDD 2024

  45. arXiv:2406.13611  [pdf, other

    quant-ph

    Solving k-SAT problems with generalized quantum measurement

    Authors: Yipei Zhang, Philippe Lewalle, K. Birgitta Whaley

    Abstract: We generalize the projection-based quantum measurement-driven $k$-SAT algorithm of Benjamin, Zhao, and Fitzsimons (BZF, arxiv:1711.02687) to arbitrary strength quantum measurements, including the limit of continuous monitoring. In doing so, we clarify that this algorithm is a particular case of the measurement-driven quantum control strategy elsewhere referred to as "Zeno dragging". We argue that… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 23 + 8 pages, 15 figures

  46. arXiv:2406.13538  [pdf, other

    physics.optics physics.ins-det

    Farey tree locking of terahertz semiconductor laser frequency combs

    Authors: Guibin Liu, Xuhong Ma, Kang Zhou, Binbin Liu, Lulu Zheng, Xianglong Bi, Shumin Wu, Yanming Lu, Zi** Li, Wenjian Wan, Zhenzhen Zhang, Junsong Peng, Ya Zhang, He** Zeng, Hua Li

    Abstract: Frequency combs show various applications in molecular fingerprinting, imaging, communications, and so on. In the terahertz frequency range, semiconductor-based quantum cascade lasers (QCLs) are ideal platforms for realizing the frequency comb operation. Although self-started frequency comb operation can be obtained in free-running terahertz QCLs due to the four-wave mixing locking effects, resona… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 22 page, 7 figures

  47. arXiv:2406.13478  [pdf, other

    stat.ME

    Semiparametric Localized Principal Stratification Analysis with Continuous Strata

    Authors: Yichi Zhang, Shu Yang

    Abstract: Principal stratification is essential for revealing causal mechanisms involving post-treatment intermediate variables. Principal stratification analysis with continuous intermediate variables is increasingly common but challenging due to the infinite principal strata and the nonidentifiability and nonregularity of principal causal effects. Inspired by recent research, we resolve these challenges b… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  48. arXiv:2406.13457  [pdf, other

    cs.CV cs.AI

    EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

    Authors: Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun

    Abstract: Event-based vision has drawn increasing attention due to its unique characteristics, such as high temporal resolution and high dynamic range. It has been used in video super-resolution (VSR) recently to enhance the flow estimation and temporal alignment. Rather than for motion learning, we propose in this paper the first VSR method that utilizes event signals for texture enhancement. Our method, c… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2024. Project page: https://dachunkai.github.io/evtexture.github.io/

  49. arXiv:2406.13413  [pdf, other

    eess.IV cs.CV

    Recurrent Inference Machine for Medical Image Registration

    Authors: Yi Zhang, Yidong Zhao, Hui Xue, Peter Kellman, Stefan Klein, Qian Tao

    Abstract: Image registration is essential for medical image applications where alignment of voxels across multiple images is needed for qualitative or quantitative analysis. With recent advancements in deep neural networks and parallel computing, deep learning-based medical image registration methods become competitive with their flexible modelling and fast inference capabilities. However, compared to tradi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint

  50. PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

    Authors: Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xian**g Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann

    Abstract: Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ACM Multimedia 2023. This version contains additional supplementary materials

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023) 56-66