Skip to main content

Showing 201–250 of 14,365 results for author: Wang, X

.
  1. arXiv:2406.07580  [pdf, other

    cs.CR cs.LG

    DMS: Addressing Information Loss with More Steps for Pragmatic Adversarial Attacks

    Authors: Zhiyu Zhu, Jiayu Zhang, Xinyi Wang, Zhibo **, Huaming Chen

    Abstract: Despite the exceptional performance of deep neural networks (DNNs) across different domains, they are vulnerable to adversarial samples, in particular for tasks related to computer vision. Such vulnerability is further influenced by the digital container formats used in computers, where the discrete numerical values are commonly used for storing the pixel values. This paper examines how informatio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2406.07571  [pdf, other

    cs.CY

    Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

    Authors: Harsh Kumar, Ruiwei Xiao, Benjamin Lawson, Ilya Musabirov, Jiakai Shi, Xinyuan Wang, Huayin Luo, Joseph Jay Williams, Anna Rafferty, John Stamper, Michael Liut

    Abstract: Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mi… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Accepted at L@S'24

  3. arXiv:2406.07480  [pdf, other

    cs.CV

    Image Neural Field Diffusion Models

    Authors: Yinbo Chen, Oliver Wang, Richard Zhang, Eli Shechtman, Xiaolong Wang, Michael Gharbi

    Abstract: Diffusion models have shown an impressive ability to model complex data distributions, with several key advantages over GANs, such as stable training, better coverage of the training distribution's modes, and the ability to solve inverse problems without extra training. However, most diffusion models learn the distribution of fixed-resolution images. We propose to learn the distribution of continu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project page: https://yinboc.github.io/infd/

  4. arXiv:2406.07411  [pdf, other

    cs.SE cs.CL

    VersiCode: Towards Version-controllable Code Generation

    Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, ** Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.07381  [pdf, other

    cs.AI cs.LG

    World Models with Hints of Large Language Models for Goal Achieving

    Authors: Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu

    Abstract: Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2406.07209  [pdf, other

    cs.CV

    MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

    Authors: X. Wang, Siming Fu, Qihan Huang, Wanggui He, Hao Jiang

    Abstract: Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subje… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2406.07078  [pdf, other

    cs.CV cs.AI

    Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

    Authors: Huahui Yi, Xiaofei Wang, Kang Li, Chao Li

    Abstract: Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a h… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.06951  [pdf, other

    astro-ph.SR astro-ph.GA

    Determination method of binary fractions by the integrated spectrum

    Authors: F. Zhang, L. Li, Z. Han, X. Wang

    Abstract: We need to resolve the individual stars for binary fraction determinations of stellar systems. Therefore, it is not possible to obtain the binary fractions for dense or distant stellar systems. % We proposed a method to determine the binary fraction of a dense or distant stellar system. The method is to first determine the binary fraction variation for any two adjacent regions and then add up thos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, accepted by MNRAS

  9. arXiv:2406.06911  [pdf, other

    cs.CV cs.AI

    AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

    Authors: Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang

    Abstract: Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enable… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Work in progress. Project Page: https://czg1225.github.io/asyncdiff_page/

  10. arXiv:2406.06872  [pdf, other

    eess.SP

    Revolutionizing Wireless Networks with Self-Supervised Learning: A Pathway to Intelligent Communications

    Authors: Zhixiang Yang, Hongyang Du, Dusit Niyato, Xudong Wang, Yu Zhou, Lei Feng, Fanqin Zhou, Wen**g Li, Xuesong Qiu

    Abstract: With the rapid proliferation of mobile devices and data, next-generation wireless communication systems face stringent requirements for ultra-low latency, ultra-high reliability, and massive connectivity. Traditional AI-driven wireless network designs, while promising, often suffer from limitations such as dependency on labeled data and poor generalization. To address these challenges, we present… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  11. arXiv:2406.06864  [pdf, other

    cs.SE cs.AI

    Validating LLM-Generated Programs with Metamorphic Prompt Testing

    Authors: Xiaoyin Wang, Dakai Zhu

    Abstract: The latest paradigm shift in software development brings in the innovation and automation afforded by Large Language Models (LLMs), showcased by Generative Pre-trained Transformer (GPT), which has shown remarkable capacity to generate code autonomously, significantly reducing the manual effort required for various programming tasks. Although, the potential benefits of LLM-generated code are vast,… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  12. arXiv:2406.06606  [pdf, other

    cs.CL cs.AI

    Prototypical Reward Network for Data-Efficient RLHF

    Authors: **ghan Zhang, Xiting Wang, Yiqiao **, Changyu Chen, Xinhao Zhang, Kunpeng Liu

    Abstract: The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback for RLHF can be resource-intensive and lead to scalability issues for LLMs and complex tasks. Our proposed framework Proto-RM leverages prototypical networks to enhance reward models under limited human feedback. By enabling sta… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024

  13. arXiv:2406.06577  [pdf, other

    cs.CL cs.AI

    RAG-based Crowdsourcing Task Decomposition via Masked Contrastive Learning with Prompts

    Authors: **g Yang, Xiao Wang, Yu Zhao, Yuhang Liu, Fei-Yue Wang

    Abstract: Crowdsourcing is a critical technology in social manufacturing, which leverages an extensive and boundless reservoir of human resources to handle a wide array of complex tasks. The successful execution of these complex tasks relies on task decomposition (TD) and allocation, with the former being a prerequisite for the latter. Recently, pre-trained language models (PLMs)-based methods have garnered… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  14. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  15. arXiv:2406.06426  [pdf, other

    stat.ME

    Biomarker-Guided Adaptive Enrichment Design with Threshold Detection for Clinical Trials with Time-to-Event Outcome

    Authors: Kaiyuan Hua, Hwanhee Hong, Xiaofei Wang

    Abstract: Biomarker-guided designs are increasingly used to evaluate personalized treatments based on patients' biomarker status in Phase II and III clinical trials. With adaptive enrichment, these designs can improve the efficiency of evaluating the treatment effect in biomarker-positive patients by increasing their proportion in the randomized trial. While time-to-event outcomes are often used as the prim… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  16. arXiv:2406.06367  [pdf, other

    cs.CV

    MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

    Authors: Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

    Abstract: Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-vi… ▽ More

    Submitted 20 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  17. arXiv:2406.06202  [pdf, other

    cs.LG

    Federated learning in food research

    Authors: Zuzanna Fendor, Bas H. M. van der Velden, Xinxin Wang, Andrea Jr. Carnoli, Osman Mutlu, Ali Hürriyetoğlu

    Abstract: Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing o… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  18. arXiv:2406.06119  [pdf, other

    cs.LG

    A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends

    Authors: Xiang Li, Jiexi Liu, Xinrui Wang, Songcan Chen

    Abstract: In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomple… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 10 pages, 3 figures

  19. arXiv:2406.06118  [pdf, other

    hep-ex

    Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  20. arXiv:2406.06110  [pdf, other

    cs.CL cs.AI

    Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

    Authors: Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guo**g Ge, Haoran Chen, Dong Yi, **qiao Wang

    Abstract: To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also invest… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  21. arXiv:2406.06008  [pdf, ps, other

    math.NA

    Efficient algorithm for the oscillatory matrix functions

    Authors: Dong** Li, Xue Wang, Xiuying Zhang

    Abstract: This paper introduces an efficient algorithm for computing the general oscillatory matrix functions. These computations are crucial for solving second-order semi-linear initial value problems. The method is exploited using the scaling and restoring technique based on a quadruple angle formula in conjunction with a truncated Taylor series. The choice of the scaling parameter and the degree of the T… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages

    MSC Class: 65F30; 65F60 ACM Class: G.1.3

  22. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  23. arXiv:2406.05974  [pdf, other

    eess.IV cs.CV

    Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning

    Authors: Xin Wang, Zhiyun Song, Yitao Zhu, Sheng Wang, Lichi Zhang, Dinggang Shen, Qian Wang

    Abstract: In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated.… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: ISBI 2024

  24. arXiv:2406.05938  [pdf, other

    cs.LG math.OC

    Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs

    Authors: Ziang Chen, Xiaohan Chen, Jialin Liu, Xinshang Wang, Wotao Yin

    Abstract: Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  25. arXiv:2406.05925  [pdf, other

    cs.CL cs.AI

    Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

    Authors: Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

    Abstract: Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 17 pages, 4 figures

  26. arXiv:2406.05906  [pdf, other

    cs.CL cs.AI

    TTM-RE: Memory-Augmented Document-Level Relation Extraction

    Authors: Chufan Gao, Xuan Wang, Jimeng Sun

    Abstract: Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-q… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2024 Main

  27. arXiv:2406.05898  [pdf, other

    cs.IR cs.AI cs.LG

    Async Learned User Embeddings for Ads Delivery Optimization

    Authors: Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, Sri Reddy

    Abstract: In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul… ▽ More

    Submitted 23 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by workshop on Multimodal Representation and Retrieval at SIGIR 2024, Washington DC

  28. arXiv:2406.05875  [pdf, other

    physics.optics cond-mat.mes-hall cond-mat.mtrl-sci physics.app-ph

    Hybrid terahertz emitter for pulse sha** and chirality control

    Authors: Weipeng Wu, Wilder Acuna, Zhixiang Huang, Xi Wang, Lars Gundlach, Matthew F. Doty, Joshua M. O. Zide, M. Benjamin Jungfleisch

    Abstract: Terahertz (THz) radiation, spanning from 0.3 to 3x10^12 Hz, fills the crucial gap between the microwave and infrared spectral range. THz technology has found applications in various fields, from imaging and sensing to telecommunication and biosensing. However, the full potential of these applications is often hindered by the need for precise control and manipulation of the frequency and polarizati… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  29. arXiv:2406.05827  [pdf, ps, other

    hep-ex

    Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  30. arXiv:2406.05699  [pdf, ps, other

    eess.AS cs.AI eess.SP

    An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

    Authors: Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, **zhu Li, Sheng Zhao, **yu Li, Naoyuki Kanda

    Abstract: Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audi… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH2024

  31. arXiv:2406.05657  [pdf, other

    physics.ins-det

    Single channel PICOSEC Micromegas detector with improved time resolution

    Authors: A. Utrobicic, R. Aleksan, Y. Angelis, J. Bortfeldt, F. Brunbauer, M. Brunoldi, E. Chatzianagnostou, J. Datta, K. Dehmelt, G. Fanourakis, D. Fiorina, K. J. Floethner, M. Gallinaro, F. Garcia, I. Giomataris, K. Gnanvo, F. J. Iguaz, D. Janssens, A. Kallitsopoulou, M. Kovacic, B. Kross, P. Legou, M. Lisowska, J. Liu, M. Lupberger , et al. (25 additional authors not shown)

    Abstract: This paper presents design guidelines and experimental verification of a single-channel PICOSEC Micromegas (MM) detector with an improved time resolution. The design encompasses the detector board, vessel, auxiliary mechanical parts, and electrical connectivity for high voltage (HV) and signals, focusing on improving stability, reducing noise, and ensuring signal integrity to optimize timing perfo… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  32. arXiv:2406.05619  [pdf, other

    quant-ph

    Variational Quantum Circuit Decoupling

    Authors: Ximing Wang, Chengran Yang, Mile Gu

    Abstract: Decoupling systems into independently evolving components has a long history of simplifying seemingly complex systems. They enable a better understanding of the underlying dynamics and causal structures while providing more efficient means to simulate such processes on a computer. Here we outline a variational decoupling algorithm for decoupling unitary quantum dynamics -- allowing us to decompose… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  33. arXiv:2406.05498  [pdf, other

    cs.CR cs.AI

    SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

    Authors: Xunguang Wang, Daoyuan Wu, Zhenlan Ji, Zongjie Li, **chuan Ma, Shuai Wang, Yingjiu Li, Yang Liu, Ning Liu, Juergen Rahmel

    Abstract: Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into four major categories: optimization-based attacks such as Greedy Coordinate Gradient (GCG), jailbreak template-based attacks such as "Do-Anything-Now", advanced indirect attacks like DrAttack, and multilingual jailbreaks. However, delivering… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: This paper completes its earlier vision paper, available at arXiv:2402.15727

  34. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-** Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  35. arXiv:2406.05271  [pdf, other

    cs.CV

    USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

    Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

    Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  36. arXiv:2406.05206  [pdf, ps, other

    math.AP math-ph math.SP

    Spectral properties of the Kramers-Fokker-Planck operator with a long-range potential

    Authors: Xue ** Wang

    Abstract: We study real resonances and embedded eigenvalues of the Kramers--Fokker--Planck operator with a long-range potential. We prove that thresholds are only possible accumulation points of eigenvalues and that the limiting absorption principle holds true for energies outside an exceptional set. We also prove that the eigenfunctions associated with discrete eigenvalues decay exponentially and those ass… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    MSC Class: 35J10; 35P15; 47A55

  37. arXiv:2406.05133  [pdf, other

    cond-mat.mtrl-sci physics.data-an

    Hierarchical Bayesian approach for adaptive integration of Bragg peaks in time-of-flight neutron scattering data

    Authors: Viktor Reshniak, ** Wang, Guannan Zhang, Siyan Liu, Junqi Yin

    Abstract: The Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL) operates in the event mode. Time-of-flight (TOF) information about each detected neutron is collected separately and saved as a descriptive entry in a database enabling unprecedented accuracy of the collected experimental data. Nevertheless, the common data processing pipeline still involves the binning of data to perform… ▽ More

    Submitted 1 April, 2024; originally announced June 2024.

  38. arXiv:2406.05082  [pdf, other

    cs.CV

    CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion

    Authors: Xingrui Wang, Xin Li, Zhibo Chen

    Abstract: Tuning-free long video diffusion has been proposed to generate extended-duration videos with enriched content by reusing the knowledge from pre-trained short video diffusion model without retraining. However, most works overlook the fine-grained long-term video consistency modeling, resulting in limited scene consistency (i.e., unreasonable object or background transitions), especially with multip… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 21 pages

  39. arXiv:2406.04942  [pdf, other

    cs.CV

    Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement

    Authors: Wei Qian, Qi Li, Kun Li, Xinke Wang, Xiao Sun, Meng Wang, Dan Guo

    Abstract: This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024. The goal is to develop a self-supervised learning algorithm for heart rate (HR) estimation using unlabeled facial videos. To tackle this task, we present two self-superv… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  40. arXiv:2406.04822  [pdf, other

    cs.LG

    M2NO: Multiresolution Operator Learning with Multiwavelet-based Algebraic Multigrid Method

    Authors: Zhihao Li, Zhilu Lai, Xiaobo Wang, Wei Wang

    Abstract: Solving partial differential equations (PDEs) effectively necessitates a multi-scale approach, particularly critical in high-dimensional scenarios characterized by increasing grid points or resolution. Traditional methods often fail to capture the detailed features necessary for accurate modeling, presenting a significant challenge in scientific computing. In response, we introduce the Multiwavele… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  41. arXiv:2406.04745  [pdf, other

    cs.LG cs.CV

    Confidence-aware Contrastive Learning for Selective Classification

    Authors: Yu-Chang Wu, Shen-Huan Lyu, Haopu Shang, Xiangyu Wang, Chao Qian

    Abstract: Selective classification enables models to make predictions only when they are sufficiently confident, aiming to enhance safety and reliability, which is important in high-stakes scenarios. Previous methods mainly use deep neural networks and focus on modifying the architecture of classification layers to enable the model to estimate the confidence of its prediction. This work provides a generaliz… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  42. arXiv:2406.04683  [pdf, other

    cs.SD eess.AS

    PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

    Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

    Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  43. arXiv:2406.04595  [pdf, other

    cs.SD cs.CL eess.AS

    Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

    Authors: Xintong Wang, Mingqian Shi, Ye Wang

    Abstract: Mispronunciation Detection and Diagnosis (MDD) systems, leveraging Automatic Speech Recognition (ASR), face two main challenges in Mandarin Chinese: 1) The two-stage models create an information gap between the phoneme or tone classification stage and the MDD stage. 2) The scarcity of Mandarin MDD datasets limits model training. In this paper, we introduce a stateless RNN-T model for Mandarin MDD,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  44. arXiv:2406.04371  [pdf, other

    cs.CL cs.AI

    Phased Instruction Fine-Tuning for Large Language Models

    Authors: Wei Pang, Chuan Zhou, Xiao-Hua Zhou, Xiaojie Wang

    Abstract: Instruction Fine-Tuning enhances pre-trained language models from basic next-word prediction to complex instruction-following. However, existing One-off Instruction Fine-Tuning (One-off IFT) method, applied on a diverse instruction, may not effectively boost models' adherence to instructions due to the simultaneous handling of varying instruction complexities. To improve this, Phased Instruction F… ▽ More

    Submitted 16 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: The final version, to be appear at ACL 2024 Findings

  45. arXiv:2406.04281  [pdf, other

    eess.AS

    Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, **yu Li, Sheng Zhao, Naoyuki Kanda

    Abstract: Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  46. arXiv:2406.04277  [pdf, other

    cs.CV

    VideoTetris: Towards Compositional Text-to-Video Generation

    Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, **gmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui

    Abstract: Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/YangLing0818/VideoTetris

  47. arXiv:2406.04025  [pdf

    cs.CL

    The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

    Authors: Caimei Yang, Qihang Yang, Xingzhi Su, Chenxi Fu, Xiaoyi Wang, Ying Yan, Zaijiang Man

    Abstract: There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  48. arXiv:2406.04006  [pdf

    physics.chem-ph

    Fe-MoS$_2$ nanoenzyme with photothermal enhanced enzyme activity for glucose colorimetric detection

    Authors: Xiaolu Wang, Guiye Shan

    Abstract: With the development of nanotechnology, it has been discovered that some nanomaterials have the activity of mimicking enzymes. This type of inorganic nanomaterial with characteristics similar to natural enzymes is called nanoenzyme. Compared with natural enzymes, nanoenzymes have advantages such as low deactivation, good stability, low production and storage costs, surface modification, and large-… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  49. arXiv:2406.04005  [pdf, other

    cs.SI

    The Failed Migration of Academic Twitter

    Authors: Xinyu Wang, Sai Koneru, Sarah Rajtmajer

    Abstract: Following change in Twitter's ownership and subsequent changes to content moderation policies, many in academia looked to move their discourse elsewhere and migration to Mastodon was pursued by some. Our study looks at the dynamics of this migration. Utilizing publicly available user account data, we track the posting activity of academics on Mastodon over a one year period. Our analyses reveal si… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  50. arXiv:2406.03967  [pdf, ps, other

    math.OC

    Model order reduction for discrete time-delay systems with inhomogeneous initial conditions

    Authors: Xiaolong Wang, Kejia Xu

    Abstract: We propose two kinds of model order reduction methods for discrete time-delay systems with inhomogeneous initial conditions. The peculiar properties of discrete Walsh functions are directly utilized to compute the Walsh coefficients of the systems, and the projection matrix is defined properly to generate reduced models by taking into account the non-zero initial conditions. It is shown that reduc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.