Skip to main content

Showing 1–50 of 18,850 results for author: Zhang, Y

.
  1. arXiv:2407.01531  [pdf, other

    cs.RO cs.LG

    Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

    Authors: Yixiao Wang, Yifei Zhang, Mingxiao Huo, Ran Tian, Xiang Zhang, Yichen Xie, Chenfeng Xu, Pengliang Ji, Wei Zhan, Mingyu Ding, Masayoshi Tomizuka

    Abstract: The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). B… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01494  [pdf, other

    cs.CV cs.SD eess.AS

    FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

    Authors: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen

    Abstract: We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page: https://foleycrafter.github.io/

  3. arXiv:2407.01468  [pdf, other

    cs.RO

    Active Shadowing (ASD): Manipulating Visual Perception of Robotics Behaviors via Implicit Communication

    Authors: Andrew Boateng, Prakhar Bhartiya, Yu Zhang

    Abstract: Explicit communication is often valued for its directness during interaction. Implicit communication, on the other hand, is indirect in that its communicative content must be inferred. Implicit communication is considered more desirable in teaming situations that requires reduced interruptions for improved fluency. In this paper, we investigate another unique advantage of implicit communication: i… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2407.01299  [pdf, other

    cs.CV

    Preserving Full Degradation Details for Blind Image Super-Resolution

    Authors: Hongda Liu, Longguang Wang, Ye Zhang, Kaiwen Xue, Shunbo Zhou, Yulan Guo

    Abstract: The performance of image super-resolution relies heavily on the accuracy of degradation information, especially under blind settings. Due to absence of true degradation models in real-world scenarios, previous methods learn distinct representations by distinguishing different degradations in a batch. However, the most significant degradation differences may provide shortcuts for the learning of re… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages, 11 figures, 4 tables

  5. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  6. arXiv:2407.01209  [pdf, other

    cs.RO

    6-DoF Grasp Detection in Clutter with Enhanced Receptive Field and Graspable Balance Sampling

    Authors: Hanwen Wang, Ying Zhang, Yunlong Wang, Jian Li

    Abstract: 6-DoF grasp detection of small-scale grasps is crucial for robots to perform specific tasks. This paper focuses on enhancing the recognition capability of small-scale gras**, aiming to improve the overall accuracy of gras** prediction results and the generalization ability of the network. We propose an enhanced receptive field method that includes a multi-radii cylinder grou** module and a p… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.01145  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Machine Learning-Assisted 3D Printing of Thermoelectric Materials of Ultrahigh Performances at Room Temperature

    Authors: Kaidong Song, Guoyue Xu, A. N. M. Tanvir, Ke Wang, Md Omarsany Bappy, Haijian Yang, Wenjie Shang, Le Zhou, Alexander Dowling, Tengei Luo, Yanliang Zhang

    Abstract: Thermoelectric energy conversion is an attractive technology for generating electricity from waste heat and using electricity for solid-state cooling. However, conventional manufacturing processes for thermoelectric devices are costly and limited to simple device geometries. This work reports an extrusion printing method to fabricate high-performance thermoelectric materials with complex 3D archit… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2407.01009  [pdf, other

    cs.CL

    DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models

    Authors: Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, Haizhou Li

    Abstract: Large language models (LLMs) have demonstrated emergent capabilities across diverse reasoning tasks via popular Chains-of-Thought (COT) prompting. However, such a simple and fast COT approach often encounters limitations in dealing with complicated problems, while a thorough method, which considers multiple reasoning pathways and verifies each step carefully, results in slower inference. This pape… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2407.00992  [pdf, other

    physics.flu-dyn

    Turbulence modulation in liquid-liquid two-phase Taylor-Couette turbulence

    Authors: **ghong Su, Cheng Wang, Yi-bao Zhang, Fan Xu, Junwu Wang, Chao Sun

    Abstract: We investigate the coupling effects of the two-phase interface, viscosity ratio, and density ratio of the dispersed phase to the continuous phase on the flow statistics in two-phase Taylor-Couette turbulence at a system Reynolds number of 6000 and a system Weber number of 10 using interface-resolved three-dimensional direct numerical simulations with the volume-of-fluid method. Our study focuses o… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.00981  [pdf, other

    cs.HC cs.CL

    VisEval: A Benchmark for Data Visualization in the Era of Large Language Models

    Authors: Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang

    Abstract: Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00952  [pdf, other

    cs.LG cs.CL cs.DC

    SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao

    Abstract: The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently h… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  12. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  13. arXiv:2407.00933  [pdf, other

    cs.DC eess.SP

    Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis

    Authors: Xueyao Zhang, Bo Yang, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

    Abstract: This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  14. Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation

    Authors: Yuting Zhang, Yiqing Wu, Ruidong Han, Ying Sun, Yongchun Zhu, Xiang Li, Wei Lin, Fuzhen Zhuang, Zhulin An, Yongjun Xu

    Abstract: Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  15. arXiv:2407.00906  [pdf, other

    cs.CV cs.LG

    GSO-YOLO: Global Stability Optimization YOLO for Construction Site Detection

    Authors: Yuming Zhang, Dongzhi Guan, Shouxin Zhang, Junhao Su, Yunzhi Han, Jiabin Liu

    Abstract: Safety issues at construction sites have long plagued the industry, posing risks to worker safety and causing economic damage due to potential hazards. With the advancement of artificial intelligence, particularly in the field of computer vision, the automation of safety monitoring on construction sites has emerged as a solution to this longstanding issue. Despite achieving impressive performance,… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  16. arXiv:2407.00874  [pdf, other

    hep-ph

    A plan for a super $η$ factory at Huizhou accelerator complex

    Authors: Xu-Rong Chen, Xiong-Hong He, Qiang Hu, De-Xu Lin, Yang Liu, Hao Qiu, Xu Sun, Ye Tian, Rong Wang, Hong-Lin Zhang, Ya-Peng Zhang, Cheng-Xin Zhao

    Abstract: As a Goldstone boson with zero quantum number and zero SM charge, the decays of long-lived $η$ ($η^{\prime}$) meson provide a unique window to search new physics beyond the standard model and new sources of CP violation, to test the low-energy QCD theory, and to measure the fundamental parameters of light quarks. For such goals in the physics frontiers we discuss a plan of building a super $η$ fac… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 19 pages, 9 figures

  17. arXiv:2407.00869  [pdf, other

    cs.CL cs.AI

    Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks

    Authors: Yue Zhou, Henry Peng Zou, Barbara Di Eugenio, Yang Zhang

    Abstract: We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious y… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  18. arXiv:2407.00810  [pdf, other

    q-bio.NC math.NA

    Neurodevelopmental disorders modeling using isogeometric analysis, dynamic domain expansion and local refinement

    Authors: Kuanren Qian, Genesis Omana Suarez, Toshihiko Nambara, Takahisa Kanekiyo, Ashlee S. Liao, Victoria A. Webster-Wood, Yongjie Jessica Zhang

    Abstract: Neurodevelopmental disorders (NDDs) have arisen as one of the most prevailing chronic diseases within the US. Often associated with severe adverse impacts on the formation of vital central and peripheral nervous systems during the neurodevelopmental process, NDDs are comprised of a broad spectrum of disorders, such as autism spectrum disorder, attention deficit hyperactivity disorder, and epilepsy… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 23 pages, 10 figures, 1 table

  19. arXiv:2407.00753  [pdf, other

    eess.AS cs.SD

    FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis

    Authors: Yinlin Guo, Yening Lv, **qiao Dou, Yan Zhang, Yuehai Wang

    Abstract: While recent advances in Text-To-Speech synthesis have yielded remarkable improvements in generating high-quality speech, research on lightweight and fast models is limited. This paper introduces FLY-TTS, a new fast, lightweight and high-quality speech synthesis system based on VITS. Specifically, 1) We replace the decoder with ConvNeXt blocks that generate Fourier spectral coefficients followed b… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to Interspeech 2024. 5 pages, 1 figure

  20. arXiv:2407.00683  [pdf, other

    quant-ph

    Quantum State Transfer via a Multimode Resonator

    Authors: Yang He, Yu-Xiang Zhang

    Abstract: Large-scale fault-tolerant superconducting quantum computation needs rapid quantum communication to network qubits fabricated on different chips and long-range couplers to implement efficient quantum error-correction codes. Quantum channels used for these purposes are best modeled by multimode resonators, which lie between single-mode cavities and waveguides with a continuum of modes. In this Lett… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, and 2 pages of supplemental material

  21. UWBAD: Towards Effective and Imperceptible Jamming Attacks Against UWB Ranging Systems with COTS Chips

    Authors: Yuqiao Yang, Zhongjie Wu, Yongzhao Zhang, Ting Chen, Jun Li, Jie Yang, Wenhao Liu, Xiaosong Zhang, Ruicong Shi, **gwei Li, Yu Jiang, Zhuo Su

    Abstract: UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block rangin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

  22. arXiv:2407.00653  [pdf, other

    cs.CL cs.AI

    Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

    Authors: Yifei Zhang, Xintao Wang, Jiaqing Liang, Sirui Xia, Lida Chen, Yanghua Xiao

    Abstract: Large Language Models (LLMs) have exhibited impressive proficiency in various natural language processing (NLP) tasks, which involve increasingly complex reasoning. Knowledge reasoning, a primary type of reasoning, aims at deriving new knowledge from existing one.While it has been widely studied in the context of knowledge graphs (KGs), knowledge reasoning in LLMs remains underexplored. In this pa… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  23. arXiv:2407.00634  [pdf, other

    cs.CV cs.LG

    Tarsier: Recipes for Training and Evaluating Large Video Description Models

    Authors: Jiawei Wang, Li** Yuan, Yuchen Zhang

    Abstract: Generating fine-grained video descriptions is a fundamental challenge in video understanding. In this work, we introduce Tarsier, a family of large-scale video-language models designed to generate high-quality video descriptions. Tarsier employs CLIP-ViT to encode frames separately and then uses an LLM to model temporal relationships. Despite its simple architecture, we demonstrate that with a met… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  24. arXiv:2407.00617  [pdf, other

    cs.LG cs.AI cs.CL cs.GT

    Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

    Authors: Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

    Abstract: Reinforcement Learning with Human Feedback (RLHF) has achieved great success in aligning large language models (LLMs) with human preferences. Prevalent RLHF approaches are reward-based, following the Bradley-Terry (BT) model assumption, which may not fully capture the complexity of human preferences. In this paper, we explore RLHF under a general preference framework and approach it from a game-th… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  25. arXiv:2407.00580  [pdf, ps, other

    math.CV

    On the modulus of meromorphic solutions of a first order differential equation

    Authors: Yueyang Zhang

    Abstract: Let $P(z)$ be a polynomial of degree $n\geq 1$ and $S(z)$ be a nonzero rational function. It is shown that if $f(z)$ is a meromorphic solution of the first order differential equation $f'(z)=S(z)e^{P(z)}f(z)+1$, then there is a curve $Γ: x\to x+iy(x)$, where $x_0\leq x<\infty$ and $π<nx^{n-1}y<3π/2$ such that for all $z\in Γ$, \begin{equation}\tag† |f(z)|> \exp\left(\frac{1}{8}e^{\frac{1}{8}x^{n}}… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 14 pages

    MSC Class: Primary 30D35; Secondary 34M10

  26. arXiv:2407.00578  [pdf, other

    cs.RO

    UniQuad: A Unified and Versatile Quadrotor Platform Series for UAV Research and Application

    Authors: Yichen Zhang, Xinyi Chen, Peize Liu, Junzhe Wang, Hetai Zou, Shaojie Shen

    Abstract: As quadrotors take on an increasingly diverse range of roles, researchers often need to develop new hardware platforms tailored for specific tasks, introducing significant engineering overhead. In this article, we introduce the UniQuad series, a unified and versatile quadrotor platform series that offers high flexibility to adapt to a wide range of common tasks, excellent customizability for advan… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA-X40)

  27. arXiv:2407.00577  [pdf, other

    cs.RO

    FALCON: Fast Autonomous Aerial Exploration using Coverage Path Guidance

    Authors: Yichen Zhang, Xinyi Chen, Chen Feng, Boyu Zhou, Shaojie Shen

    Abstract: This paper introduces FALCON, a novel Fast Autonomous expLoration framework using COverage path guidaNce, which aims at setting a new performance benchmark in the field of autonomous aerial exploration. Despite recent advancements in the domain, existing exploration planners often suffer from inefficiencies such as frequent revisitations of previously explored regions. FALCON effectively harnesses… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  28. arXiv:2407.00563  [pdf, other

    cs.RO

    An abstract theory of sensor eventification

    Authors: Yulin Zhang, Dylan A. Shell

    Abstract: Unlike traditional cameras, event cameras measure changes in light intensity and report differences. This paper examines the conditions necessary for other traditional sensors to admit eventified versions that provide adequate information despite outputting only changes. The requirements depend upon the regularity of the signal space, which we show may depend on several factors including structure… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, 14 figures

    Journal ref: Robotics: Science and Systems 2024

  29. arXiv:2407.00499  [pdf, other

    cs.CL cs.AI cs.LG

    ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

    Authors: Zhiyuan Wang, **hao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures, 6 tables

  30. arXiv:2407.00478  [pdf, other

    cs.LG cs.AI

    Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs

    Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

    Abstract: The scaling law, a strategy that involves the brute-force scaling of the training dataset and learnable parameters, has become a prevalent approach for develo** stronger learning models. In this paper, we examine its rationale in terms of learning from relational graphs. We demonstrate that directly adhering to such a scaling law does not necessarily yield stronger models due to architectural in… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  31. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: **sheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, **gyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  32. arXiv:2407.00427  [pdf, ps, other

    math.CO

    On the boundedness of degenerate hypergraphs

    Authors: Jianfeng Hou, Caiyun Hu, Heng Li, Xizhi Liu, Caihong Yang, Yixiao Zhang

    Abstract: We investigate the impact of a high-degree vertex in Turán problems for degenerate hypergraphs (including graphs). We say an $r$-graph $F$ is bounded if there exist constants $α, β>0$ such that for large $n$, every $n$-vertex $F$-free $r$-graph with a vertex of degree at least $α\binom{n-1}{r-1}$ has fewer than $(1-β) \cdot \mathrm{ex}(n,F)$ edges. The boundedness property is crucial for recent wo… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: comments are welcome

  33. arXiv:2407.00386  [pdf, other

    cs.NE cs.AI

    Multi-task multi-constraint differential evolution with elite-guided knowledge transfer for coal mine integrated energy system dispatching

    Authors: Canyun Dai, Xiaoyan Sun, Hejuan Hu, Wei Song, Yong Zhang, Dunwei Gong

    Abstract: The dispatch optimization of coal mine integrated energy system is challenging due to high dimensionality, strong coupling constraints, and multiobjective. Existing constrained multiobjective evolutionary algorithms struggle with locating multiple small and irregular feasible regions, making them inaplicable to this problem. To address this issue, we here develop a multitask evolutionary algorithm… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  34. arXiv:2407.00372  [pdf, other

    hep-ph

    Study of semileptonic $B\to DP\ell^+ν_\ell$ decays based on the SU(3) flavor symmetry

    Authors: Ru-Min Wang, Yi-Jie Zhang, Meng-Yuan Wan, Xiao-Dong Cheng, Yuan-Guo Xu

    Abstract: Decays $B\to DP\ell^+ν_\ell~(\ell=e,μ,τ)$ with the non-resonance, the charmed vector resonances, the charmed scalar resonances and the charmed tensor resonances are calculated by using the SU(3) flavor symmetry. Firstly, the decay amplitudes of different modes are related by the SU(3) flavor symmetry. Then, relevant experiential data are used to constrain nonperturbative coefficients in the non-re… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 16 pages

  35. arXiv:2407.00367  [pdf, other

    cs.CV

    SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

    Authors: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, Yinda Zhang

    Abstract: Video generation models have demonstrated great capabilities of producing impressive monocular videos, however, the generation of 3D stereoscopic video remains under-explored. We propose a pose-free and training-free approach for generating 3D stereoscopic videos using an off-the-shelf monocular video generation model. Our method warps a generated monocular video into camera views on stereoscopic… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 3D stereoscopic video generation, video diffusion, inpainting

  36. arXiv:2407.00283  [pdf, other

    gr-qc

    Gravitational waveforms from periodic orbits around a quantum-corrected black hole

    Authors: Sen Yang, Yu-Peng Zhang, Tao Zhu, Li Zhao, Yu-Xiao Liu

    Abstract: Extreme mass-ratio inspirals are crucial sources for future space-based gravitational wave detections. Gravitational waveforms emitted by extreme mass-ratio inspirals are closely related to the orbital dynamics of small celestial objects, which vary with the underlying spacetime geometry. Despite the tremendous success of general relativity, there are unsolved issues such as singularities in both… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures, and 2 tables

  37. arXiv:2407.00262  [pdf, other

    astro-ph.HE hep-ph

    Prospects for the detection of very-high-energy pulsars with LHAASO and SWGO

    Authors: Quan Hu, Yi Zhang, Kaikai Duan, Houdun Zeng

    Abstract: Pulsations from the Crab pulsar have been detected by the MAGIC telescopes at energies up to 1.5 TeV, and the pulsed emission from the Vela pulsar was detected by H.E.S.S., reaching tens of TeV. These discoveries, along with the proposed additional emission due to inverse Compton scattering at TeV energies, lead us to consider suitable candidates for detection with current and future extensive air… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, 2 Tables and accepted for publication in MNRAS

  38. arXiv:2407.00203  [pdf, other

    cs.CV

    PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

    Authors: Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, **gxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

    Abstract: Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology imag… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 13 pages, 3 figures

  39. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  40. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  41. arXiv:2407.00023  [pdf, other

    cs.DC cs.LG

    Preble: Efficient Distributed Prompt Scheduling for LLM Serving

    Authors: Vikranth Srivatsa, Zijian He, Reyna Abhyankar, Dongming Li, Yiying Zhang

    Abstract: Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices include domain-specific instructions, illustration of tool usages, and long context, such as textbook chapters in prompts. As such, many parts of prompts are repetitive across requests, and their attention computation results can be reused. However, today's LLM s… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  42. arXiv:2406.20087  [pdf, other

    cs.LG cs.AI cs.CL cs.CY cs.HC

    ProgressGym: Alignment with a Millennium of Moral Progress

    Authors: Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang

    Abstract: Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigat… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  43. arXiv:2406.20076  [pdf, other

    cs.CV

    EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

    Authors: Yuxuan Zhang, Tianheng Cheng, Rui Hu, ei Liu, Heng Liu, Long** Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

    Abstract: Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Preprint

  44. arXiv:2406.20064  [pdf, ps, other

    astro-ph.EP

    BYORP and Dissipation in Binary Asteroids: Lessons from DART

    Authors: Matija Ćuk, Harrison Agrusa, Rachel H. Cueva, Fabio Ferrari, Masatoshi Hirabayashi, Seth A. Jacobson, Jay McMahon, Patrick Michel, Paul Sánchez, Daniel J. Scheeres, Stephen Schwartz, Kevin J. Walsh, Yun Zhang

    Abstract: The Near-Earth binary asteroid Didymos was the target of a planetary defense demonstration mission DART in September 2022. The smaller binary component, Dimorphos, was impacted by the spacecraft in order to measure momentum transfer in kinetic impacts into rubble piles. DART and associated Earth-based observation campaigns have provided a wealth of scientific data on the Didymos-Dimorphos binary.… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted for PSJ

  45. arXiv:2406.20016  [pdf, ps, other

    hep-ph

    A new method for finding more symmetry relations of Feynman integrals

    Authors: Zihao Wu, Yang Zhang

    Abstract: We introduce a new method for deriving Feynman integral symmetry relation. By solving the ansatz of momentum transformation in the field of rational functions rather than constants, the method can sometimes find more symmetry relations, comparing with some state-of-art software. The new method may help to further decrease the number of master integrals in an integral family. Well-chosen gauge cond… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

    Report number: USTC-ICTS/PCFT-24-20

  46. arXiv:2406.20015  [pdf, other

    cs.CL cs.AI

    ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

    Authors: Yuxiang Zhang, **g Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen Wan, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

    Abstract: Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and bre… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  47. arXiv:2406.19972  [pdf, other

    cs.RO

    HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

    Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu

    Abstract: Physical Human-Scene Interaction (HSI) plays a crucial role in numerous applications. However, existing HSI techniques are limited to specific object dynamics and privileged information, which prevents the development of more comprehensive applications. To address this limitation, we introduce HumanVLA for general object rearrangement directed by practical vision and language. A teacher-stud… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  48. arXiv:2406.19923  [pdf, other

    astro-ph.EP

    Water Evolution & Inventories of Super-Earths Orbiting Late M-Dwarfs

    Authors: Keavin Moore, Benjamin David, Albert Yian Zhang, Nicolas B. Cowan

    Abstract: Super-Earths orbiting M-dwarf stars may be the most common habitable planets in the Universe. However, their habitability is threatened by intense irradiation from their host stars, which drives the escape of water to space and can lead to surface desiccation. We present simulation results of a box model of water cycling between interior and atmosphere and loss to space, for terrestrial planets of… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures; revisions submitted to The Astrophysical Journal

  49. arXiv:2406.19833  [pdf, other

    cs.CV

    LightStereo: Channel Boost Is All Your Need for Efficient 2D Cost Aggregation

    Authors: Xianda Guo, Chenming Zhang, Dujun Nie, Wenzhao Zheng, Youmin Zhang, Long Chen

    Abstract: We present LightStereo, a cutting-edge stereo-matching network crafted to accelerate the matching process. Departing from conventional methodologies that rely on aggregating computationally intensive 4D costs, LightStereo adopts the 3D cost volume as a lightweight alternative. While similar approaches have been explored previously, our breakthrough lies in enhancing performance through a dedicated… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Code will be available at \url{https://github.com/XiandaGuo/OpenStereo}

  50. arXiv:2406.19781  [pdf, other

    cs.RO

    LCSim: A Large-Scale Controllable Traffic Simulator

    Authors: Yuheng Zhang, Tianjian Ouyang, Fudan Yu, Cong Ma, Lei Qiao, Wei Wu, Jian Yuan, Yong Li

    Abstract: With the rapid development of urban transportation and the continuous advancement in autonomous vehicles, the demand for safely and efficiently testing autonomous driving and traffic optimization algorithms arises, which needs accurate modeling of large-scale urban traffic scenarios. Existing traffic simulation systems encounter two significant limitations. Firstly, they often rely on open-source… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks